Friday, May 15, 2015

Neural networks : Cost Function

Neural Network Classification:

Above: A Neural Network with 4 layers

Input Unit:  $\{ (x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),(x^{(3)},y^{(3)}),.,.,.,.,(x^{(m)},y^{(m)})\}$
L = Total number of layers in the network (L=4 in the above case)
$s_l$ = number of units (not counting bias units) in the layer $l$

There are two types of Neural Network outcomes:

Binary Classification:
$y$ = 0 or 1 ; 1 Output Unit
$s_L=1$, K=1

Multiclass Classification:

Number of output units: K

$y\in\Re^{(k)}$

E.g. if K=4

Output will be K vectors: $\begin{bmatrix}1 \cr 0\cr0\cr0\cr\end{bmatrix}$, $\begin{bmatrix}0 \cr 1\cr0\cr0\cr\end{bmatrix}$,$\begin{bmatrix}0\cr 0\cr1\cr0\cr\end{bmatrix}$,$\begin{bmatrix}0\cr 0\cr0\cr1\cr\end{bmatrix}$



Cost Function:

Cost function for a Neural Network is a generalization of the Cost Function of a Logistic Regression.

Logistic Regression Cost Function:

$J(\theta) = -\frac 1 m \left[\sum^m_{i=1}{y^{(i)}log(h_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))}\right] + \frac \lambda {2m} \sum_{j=1}^n\theta_j^2 $

Neural Network Cost Function:

Neural Network outputs vectors in $\Re^K$

$h_\Theta(x)\in \Re^K$;
$(h_\Theta(x))_i = i^{th}\  output$

Cost Function:
$J(\Theta) = -{\frac 1 m} \left[ {\sum_{i=1}^m}{\sum_{k=1}^K}{y_k^{(i)}log(h_\Theta(x)^{(i)})_k}\ +\
{(1-y_k^{(i)})log(1-(h_\Theta(x)^{(i)})_k)}
\right]\ + \
{\frac \lambda {2m}}{\sum_{l=1}^{L-1}}{\sum_{i=1}^{s_l}}{\sum_{j=1}^{s_{l+1}}}
(\Theta_{ji}^{(l)})^2
$

The Summation ${\sum_{k=1}^K}$ is over the 'K' output units i.e. summing the cost function for each of the output units K.

Regularization Term : We don't sum over the terms corresponding to the bias units $a_0$, corresponding to $\Theta_{i0}x_0$. Even if we include the bias terms, it will output similar result.


No comments:

Post a Comment