Monday, May 4, 2015

Logistic Regression :Cost Function & Gradient Descent

Logistic Regression: Cost Function

The Logistic Regression hypotheses function is given by
$h_\theta(x) = \frac 1 {1+e^{-\theta^Tx}}$

The question is : how do we choose the parameters $\theta$?

Recap: Linear Regression Cost Function

$J(\theta) = \frac 1 m \sum^m_{i=1} \frac 1 2 (h_\theta(x^{(i)}) - y^{(i)})^2$

Logistic Regression Cost Function:

In Logistic Regression, $h_\theta(x) = \frac 1 {1+e^{-\theta^Tx}}$  as opposed to linear regression where $h_\theta(x) = \theta_0 + \theta_1x_1...$.

The problem with representing the cost function of Logistic regression as $\frac 1 2 (h_\theta(x^{(i)}) - y^{(i)})^2$ is that the curve of $J(\theta)$ is a non convex one, i.e. it has multiple local minima which cannot be optimized by the Gradient Descent function. For Gradient Descent to converge, the cost function has to be a convex function as is the case with Linear Regression.


Cost Function :

$Cost (h_\theta(x),y) = -log(h_\theta(x))$ if y = 1
$Cost (h_\theta(x),y) = -log(1- h_\theta(x))$ if y = 0

or 

$Cost (h_\theta(x),y) = -ylog(h_\theta(x)) -(1-y)log(1- h_\theta(x))$

If y=1:
Cost = 0 if y=1 and $h_\theta(x)$ = 1 (i.e. if the actual value of y = 1 and the predicted value of y is also 1)

But as $h_\theta(x) \rightarrow 0$, $Cost \rightarrow \infty$

If y=0:
Cost = 0 if y=0 and $h_\theta(x)$ = 0 (i.e. if the actual value of y = 0 and the predicted value of y is also 0)

But as $h_\theta(x) \rightarrow 1$, $Cost \rightarrow \infty$

Simplified Cost Function:

$J(\theta) = \frac 1 m \sum^m_{i=1}Cost(h_\theta(x^{(i)}) - y^{(i)})^2$

$J(\theta) = -\frac 1 m \sum^m_{i=1}{y^{(i)}log(h_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))}$

Now, to fit parameters $\theta$, we need to minimize $J(\theta)$

To make a prediction given a new $x$:
Output $h_\theta(x) = \frac 1 {1+e^{-\theta^Tx}}$

Gradient Descent Function for Logistic Regression:

Pseudocode : repeat until convergence $\{$

$\theta_j := \theta_j - {\alpha}{\frac {\partial }{ \partial {\theta_j}}}{J(\theta)}$

$\}$

Putting in the value of the Cost Function:

$\theta_j : \theta_j-\alpha {\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.{x^{(i)}_j}$ Simultaneously update for all $\theta_j$

The algorithm of Gradient Descent for Logistic Regression is same as that for linear regression, the only difference is the value of $h_\theta(x)$ which in this case is $h_\theta(x) = \frac 1 {1+e^{-\theta^Tx}}$ instead of $h_\theta(x) = \theta^Tx$ for Linear Regression.





No comments:

Post a Comment