Neural Network Classification:
Above: A Neural Network with 4 layers
Input Unit: $\{ (x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),(x^{(3)},y^{(3)}),.,.,.,.,(x^{(m)},y^{(m)})\}$
L = Total number of layers in the network (L=4 in the above case)
$s_l$ = number of units (not counting bias units) in the layer $l$
There are two types of Neural Network outcomes:
Binary Classification:
$y$ = 0 or 1 ; 1 Output Unit
$s_L=1$, K=1
Multiclass Classification:
Number of output units: K
$y\in\Re^{(k)}$
E.g. if K=4
Output will be K vectors: $\begin{bmatrix}1 \cr 0\cr0\cr0\cr\end{bmatrix}$, $\begin{bmatrix}0 \cr 1\cr0\cr0\cr\end{bmatrix}$,$\begin{bmatrix}0\cr 0\cr1\cr0\cr\end{bmatrix}$,$\begin{bmatrix}0\cr 0\cr0\cr1\cr\end{bmatrix}$
Cost Function:
Cost function for a Neural Network is a generalization of the Cost Function of a Logistic Regression.
Logistic Regression Cost Function:
$J(\theta) = -\frac 1 m \left[\sum^m_{i=1}{y^{(i)}log(h_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))}\right] + \frac \lambda {2m} \sum_{j=1}^n\theta_j^2 $
Neural Network Cost Function:
Neural Network outputs vectors in $\Re^K$
$h_\Theta(x)\in \Re^K$;
$(h_\Theta(x))_i = i^{th}\ output$
Cost Function:
$J(\Theta) = -{\frac 1 m} \left[ {\sum_{i=1}^m}{\sum_{k=1}^K}{y_k^{(i)}log(h_\Theta(x)^{(i)})_k}\ +\
{(1-y_k^{(i)})log(1-(h_\Theta(x)^{(i)})_k)}
\right]\ + \
{\frac \lambda {2m}}{\sum_{l=1}^{L-1}}{\sum_{i=1}^{s_l}}{\sum_{j=1}^{s_{l+1}}}
(\Theta_{ji}^{(l)})^2
$
The Summation ${\sum_{k=1}^K}$ is over the 'K' output units i.e. summing the cost function for each of the output units K.
Regularization Term : We don't sum over the terms corresponding to the bias units $a_0$, corresponding to $\Theta_{i0}x_0$. Even if we include the bias terms, it will output similar result.
Above: A Neural Network with 4 layers
Input Unit: $\{ (x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),(x^{(3)},y^{(3)}),.,.,.,.,(x^{(m)},y^{(m)})\}$
L = Total number of layers in the network (L=4 in the above case)
$s_l$ = number of units (not counting bias units) in the layer $l$
There are two types of Neural Network outcomes:
Binary Classification:
$y$ = 0 or 1 ; 1 Output Unit
$s_L=1$, K=1
Multiclass Classification:
Number of output units: K
$y\in\Re^{(k)}$
E.g. if K=4
Output will be K vectors: $\begin{bmatrix}1 \cr 0\cr0\cr0\cr\end{bmatrix}$, $\begin{bmatrix}0 \cr 1\cr0\cr0\cr\end{bmatrix}$,$\begin{bmatrix}0\cr 0\cr1\cr0\cr\end{bmatrix}$,$\begin{bmatrix}0\cr 0\cr0\cr1\cr\end{bmatrix}$
Cost Function:
Cost function for a Neural Network is a generalization of the Cost Function of a Logistic Regression.
Logistic Regression Cost Function:
$J(\theta) = -\frac 1 m \left[\sum^m_{i=1}{y^{(i)}log(h_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))}\right] + \frac \lambda {2m} \sum_{j=1}^n\theta_j^2 $
Neural Network Cost Function:
Neural Network outputs vectors in $\Re^K$
$h_\Theta(x)\in \Re^K$;
$(h_\Theta(x))_i = i^{th}\ output$
Cost Function:
$J(\Theta) = -{\frac 1 m} \left[ {\sum_{i=1}^m}{\sum_{k=1}^K}{y_k^{(i)}log(h_\Theta(x)^{(i)})_k}\ +\
{(1-y_k^{(i)})log(1-(h_\Theta(x)^{(i)})_k)}
\right]\ + \
{\frac \lambda {2m}}{\sum_{l=1}^{L-1}}{\sum_{i=1}^{s_l}}{\sum_{j=1}^{s_{l+1}}}
(\Theta_{ji}^{(l)})^2
$
The Summation ${\sum_{k=1}^K}$ is over the 'K' output units i.e. summing the cost function for each of the output units K.
Regularization Term : We don't sum over the terms corresponding to the bias units $a_0$, corresponding to $\Theta_{i0}x_0$. Even if we include the bias terms, it will output similar result.
No comments:
Post a Comment