Regularization: Logistic Regression
The problem of overfitting can occur in a Logistic regression model in case the model includes high order polynomial terms, like the following quation
$h_\theta(x) = g(\theta_0 +\theta_1x_1 + \theta_2x_1^2 + \theta_3x_1^2x_2 + \theta_4x_1^2x_2^2 + \theta_5x_1^2x_2^3... )$
The cost function of a Logistic Regression model is given by:
$J(\theta) = -\frac 1 m \sum^m_{i=1}{y^{(i)}log(h_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))}$
In a similar way as regularization using Linear Regression, we add a regularization term to the cost function which is defined as $\frac \lambda {2m} \sum_{j=1}^n\theta_j^2$
We do not add $\theta_0$ in the regularization term, and the regularization parameter is defined for $\theta_1, \theta_2, \theta_3.....\theta_n$
The Cost Function for Logistic regression becomes:
$J(\theta) = -\left[\frac 1 m \sum^m_{i=1}{y^{(i)}log(h_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))}\right] + \frac \lambda {2m} \sum_{j=1}^n\theta_j^2 $
Gradient Descent with Regularization:
$\theta_0 := \theta_0-\alpha\left[ {\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.{x^{(i)}_j}\right]$
$\theta_j := \theta_j-\alpha\left[ {\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.{x^{(i)}_j} - \frac \lambda m \theta_j \right]$ Simultaneously update for all $\theta_j$
The value of $\theta_0$ is calculated separately without adding the regularization term. The value of j ranges from 1...n in the regularization term.
Regularization with Advanced Optimization:
Estimating $\theta$ using advanced optimization
Code:
function [jVal, gradient] = costFunction(theta)
jVal = [code to compute $J(\theta)$]
$J(\theta) = -\left[\frac 1 m \sum^m_{i=1}{y^{(i)}log(h_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))}\right] + \frac \lambda {2m} \sum_{j=1}^n\theta_j^2 $
The problem of overfitting can occur in a Logistic regression model in case the model includes high order polynomial terms, like the following quation
$h_\theta(x) = g(\theta_0 +\theta_1x_1 + \theta_2x_1^2 + \theta_3x_1^2x_2 + \theta_4x_1^2x_2^2 + \theta_5x_1^2x_2^3... )$
The cost function of a Logistic Regression model is given by:
$J(\theta) = -\frac 1 m \sum^m_{i=1}{y^{(i)}log(h_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))}$
In a similar way as regularization using Linear Regression, we add a regularization term to the cost function which is defined as $\frac \lambda {2m} \sum_{j=1}^n\theta_j^2$
We do not add $\theta_0$ in the regularization term, and the regularization parameter is defined for $\theta_1, \theta_2, \theta_3.....\theta_n$
The Cost Function for Logistic regression becomes:
$J(\theta) = -\left[\frac 1 m \sum^m_{i=1}{y^{(i)}log(h_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))}\right] + \frac \lambda {2m} \sum_{j=1}^n\theta_j^2 $
Gradient Descent with Regularization:
$\theta_0 := \theta_0-\alpha\left[ {\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.{x^{(i)}_j}\right]$
$\theta_j := \theta_j-\alpha\left[ {\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.{x^{(i)}_j} - \frac \lambda m \theta_j \right]$ Simultaneously update for all $\theta_j$
The value of $\theta_0$ is calculated separately without adding the regularization term. The value of j ranges from 1...n in the regularization term.
Regularization with Advanced Optimization:
Estimating $\theta$ using advanced optimization
Code:
function [jVal, gradient] = costFunction(theta)
jVal = [code to compute $J(\theta)$]
$J(\theta) = -\left[\frac 1 m \sum^m_{i=1}{y^{(i)}log(h_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))}\right] + \frac \lambda {2m} \sum_{j=1}^n\theta_j^2 $
gradient(1) = [code to compute $\frac \partial {\partial\theta_0} J(\theta)$]
No regularization term for $\theta_0$
gradient(2) = [code to compute $\frac \partial {\partial\theta_1} J(\theta)$]
$\frac 1 m \sum_{i=1}^m (h_\theta(x^{(i)})-y^{(i)}).x_1^{(i)} - \frac \lambda m \theta_1$
gradient(3) = [code to compute $\frac \partial {\partial\theta_2} J(\theta)$]
$\frac 1 m \sum_{i=1}^m (h_\theta(x^{(i)})-y^{(i)}).x_2^{(i)} - \frac \lambda m \theta_2$
.
.
.
gradient(n+1) = [code to compute $\frac \partial {\partial\theta_n} J(\theta)$]
$\frac 1 m \sum_{i=1}^m (h_\theta(x^{(i)})-y^{(i)}).x_n^{(i)} - \frac \lambda m \theta_1$
No comments:
Post a Comment