Training a Neural network:
Pick a network architecture (connectivity pattern between neurons)
No. of input units: Dimension of features $x^{(i)}$
No. of output units: Number of classes
Reasonable default: 1 hidden layer, or if >1 hidden layer, have same no. of hidden units in every layer (usually the more the better)
Training:
Pick a network architecture (connectivity pattern between neurons)
No. of input units: Dimension of features $x^{(i)}$
No. of output units: Number of classes
Reasonable default: 1 hidden layer, or if >1 hidden layer, have same no. of hidden units in every layer (usually the more the better)
Training:
- Randomly initialize weights
- Implement forward propagation to get $h_\Theta(x^{(i)})$ for any $x^{(i)}$
- Implement code to calculate the cost function $J(\Theta)$
- Implement Backpropagation to compute partial derivatives $\frac \partial {\partial\Theta_{jk}^{(l)}}$
for i = 1 to m
Perform forward propagation and backpropagation using examples $(x^{(i)},y^{(i)})$
(Get activations $a^{(l)}$ and delta terms $\delta^{(l)}$ for $l=2,3,4....,L$
Calculate $\Delta^{(l)}:=\Delta^{(l)} + \delta^{(l+1)}(a^{(l)})^T
...
end;
Compute $\frac \partial {\partial\Theta_{jk}^{(l)}}J(\Theta)$
5. Use Gradient Checking to compare $\frac \partial {\partial\Theta_{jk}${(l)}}J(\Theta)$ computed using backpropagation vs. using numerical estimate of gradient of $J(\Theta)$
6. Use Gradient Descent or advanced optimization method with backpropagation to try to minimize $J(\Theta)$ as a function of parameters $\Theta$
No comments:
Post a Comment