Logistic Regression: Cost Function
The Logistic Regression hypotheses function is given by
$h_\theta(x) = \frac 1 {1+e^{-\theta^Tx}}$
The question is : how do we choose the parameters $\theta$?
Recap: Linear Regression Cost Function
$J(\theta) = \frac 1 m \sum^m_{i=1} \frac 1 2 (h_\theta(x^{(i)}) - y^{(i)})^2$
Logistic Regression Cost Function:
In Logistic Regression, $h_\theta(x) = \frac 1 {1+e^{-\theta^Tx}}$ as opposed to linear regression where $h_\theta(x) = \theta_0 + \theta_1x_1...$.
The problem with representing the cost function of Logistic regression as $\frac 1 2 (h_\theta(x^{(i)}) - y^{(i)})^2$ is that the curve of $J(\theta)$ is a non convex one, i.e. it has multiple local minima which cannot be optimized by the Gradient Descent function. For Gradient Descent to converge, the cost function has to be a convex function as is the case with Linear Regression.
Pseudocode : repeat until convergence $\{$
$\theta_j := \theta_j - {\alpha}{\frac {\partial }{ \partial {\theta_j}}}{J(\theta)}$
$\}$
Putting in the value of the Cost Function:
$\theta_j : \theta_j-\alpha {\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.{x^{(i)}_j}$ Simultaneously update for all $\theta_j$
The algorithm of Gradient Descent for Logistic Regression is same as that for linear regression, the only difference is the value of $h_\theta(x)$ which in this case is $h_\theta(x) = \frac 1 {1+e^{-\theta^Tx}}$ instead of $h_\theta(x) = \theta^Tx$ for Linear Regression.
The Logistic Regression hypotheses function is given by
$h_\theta(x) = \frac 1 {1+e^{-\theta^Tx}}$
The question is : how do we choose the parameters $\theta$?
Recap: Linear Regression Cost Function
$J(\theta) = \frac 1 m \sum^m_{i=1} \frac 1 2 (h_\theta(x^{(i)}) - y^{(i)})^2$
Logistic Regression Cost Function:
In Logistic Regression, $h_\theta(x) = \frac 1 {1+e^{-\theta^Tx}}$ as opposed to linear regression where $h_\theta(x) = \theta_0 + \theta_1x_1...$.
The problem with representing the cost function of Logistic regression as $\frac 1 2 (h_\theta(x^{(i)}) - y^{(i)})^2$ is that the curve of $J(\theta)$ is a non convex one, i.e. it has multiple local minima which cannot be optimized by the Gradient Descent function. For Gradient Descent to converge, the cost function has to be a convex function as is the case with Linear Regression.
Cost Function :
$Cost (h_\theta(x),y) = -log(h_\theta(x))$ if y = 1
$Cost (h_\theta(x),y) = -log(1- h_\theta(x))$ if y = 0
or
$Cost (h_\theta(x),y) = -ylog(h_\theta(x)) -(1-y)log(1- h_\theta(x))$
If y=1:
Cost = 0 if y=1 and $h_\theta(x)$ = 1 (i.e. if the actual value of y = 1 and the predicted value of y is also 1)
But as $h_\theta(x) \rightarrow 0$, $Cost \rightarrow \infty$
If y=0:
Cost = 0 if y=0 and $h_\theta(x)$ = 0 (i.e. if the actual value of y = 0 and the predicted value of y is also 0)
But as $h_\theta(x) \rightarrow 1$, $Cost \rightarrow \infty$
Simplified Cost Function:
$J(\theta) = \frac 1 m \sum^m_{i=1}Cost(h_\theta(x^{(i)}) - y^{(i)})^2$
$J(\theta) = -\frac 1 m \sum^m_{i=1}{y^{(i)}log(h_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))}$
Now, to fit parameters $\theta$, we need to minimize $J(\theta)$
To make a prediction given a new $x$:
Output $h_\theta(x) = \frac 1 {1+e^{-\theta^Tx}}$
Gradient Descent Function for Logistic Regression:
$\theta_j := \theta_j - {\alpha}{\frac {\partial }{ \partial {\theta_j}}}{J(\theta)}$
$\}$
Putting in the value of the Cost Function:
$\theta_j : \theta_j-\alpha {\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.{x^{(i)}_j}$ Simultaneously update for all $\theta_j$
The algorithm of Gradient Descent for Logistic Regression is same as that for linear regression, the only difference is the value of $h_\theta(x)$ which in this case is $h_\theta(x) = \frac 1 {1+e^{-\theta^Tx}}$ instead of $h_\theta(x) = \theta^Tx$ for Linear Regression.
No comments:
Post a Comment