Advanced Optimization for Logistic Regression : Finding the values of $\theta$
Gradient Descent Algorithm is one way to calculate the value of parameters $\theta$. However, it involves selecting a appropriate value of $\alpha$ which might cause multiple iterations.
There are various other optimization algorithms available for minimizing a cost function $J(\theta)$ which are a bit more complex, but there is no need to manually pick a value of $\alpha$ and are much faster than Gradient Descent. The other algorithms are Conjugate Gradient, BFGS and L-BFGS.
Coding the Advanced Optimization Algorithms in MATLAB/Octave:
Example:
say we have to optimize $\theta_1 and \theta_2$, and $J(\theta)$ is given by
$J(\theta) = {(\theta_1-5)}^2+{(\theta_2-5)}^2$
$\frac \partial {\partial\theta_1} J(\theta) = 2(\theta_1-5)$
$\frac \partial {\partial\theta_2} J(\theta) = 2(\theta_2-5)$
We write the function in Matlab/Octave which calculates the value of the cost function $J(\theta)$ and the partial derivatives (gradient 1 for $\theta_1$ and gradient 2 for $\theta_2$)
Note: The index in Octave/Matlab starts from 1; so $\theta_0, \theta_1.....\theta_n$ in the equation is equal to $\theta_1, \theta_2,....\theta_{n+1}$
Code:
function[jVal, gradient] = costFunction(theta)
jVal = (theta(1)-5)^2 + (theta(2)-5)^2;
gradient = zeros(2,1)
gradient(1) = 2*(theta(1)-5)
gradient(2) = 2*(theta(2)-5)
....
Once the code for calculating jVal (the cost function) and gradient is written, the values of $\theta$ are optimized by the following code:
options = optimset('GradObj','on','MaxIter','100');
initialTheta = zeros(2,1)
[optTheta, functionVal, exitFlag]...
= fminunc(@costFunction, intitalTheta,options);
Recap:
theta = $\pmatrix{\theta_0 \cr \theta_1 \cr .\cr .\cr \theta_{n}}$
function[jVal, gradient] = costFunction(theta)
jVal = [code to compute $J(\theta)$]
gradient(1) = [code to compute $\frac \partial {\partial\theta_0} J(\theta)$]
gradient(2) = [code to compute $\frac \partial {\partial\theta_1} J(\theta)$]
gradient(n+1) = [code to compute $\frac \partial {\partial\theta_n} J(\theta)$]
Gradient Descent Algorithm is one way to calculate the value of parameters $\theta$. However, it involves selecting a appropriate value of $\alpha$ which might cause multiple iterations.
There are various other optimization algorithms available for minimizing a cost function $J(\theta)$ which are a bit more complex, but there is no need to manually pick a value of $\alpha$ and are much faster than Gradient Descent. The other algorithms are Conjugate Gradient, BFGS and L-BFGS.
Coding the Advanced Optimization Algorithms in MATLAB/Octave:
Example:
say we have to optimize $\theta_1 and \theta_2$, and $J(\theta)$ is given by
$J(\theta) = {(\theta_1-5)}^2+{(\theta_2-5)}^2$
$\frac \partial {\partial\theta_1} J(\theta) = 2(\theta_1-5)$
$\frac \partial {\partial\theta_2} J(\theta) = 2(\theta_2-5)$
We write the function in Matlab/Octave which calculates the value of the cost function $J(\theta)$ and the partial derivatives (gradient 1 for $\theta_1$ and gradient 2 for $\theta_2$)
Note: The index in Octave/Matlab starts from 1; so $\theta_0, \theta_1.....\theta_n$ in the equation is equal to $\theta_1, \theta_2,....\theta_{n+1}$
Code:
function[jVal, gradient] = costFunction(theta)
jVal = (theta(1)-5)^2 + (theta(2)-5)^2;
gradient = zeros(2,1)
gradient(1) = 2*(theta(1)-5)
gradient(2) = 2*(theta(2)-5)
....
Once the code for calculating jVal (the cost function) and gradient is written, the values of $\theta$ are optimized by the following code:
options = optimset('GradObj','on','MaxIter','100');
initialTheta = zeros(2,1)
[optTheta, functionVal, exitFlag]...
= fminunc(@costFunction, intitalTheta,options);
Recap:
theta = $\pmatrix{\theta_0 \cr \theta_1 \cr .\cr .\cr \theta_{n}}$
function[jVal, gradient] = costFunction(theta)
jVal = [code to compute $J(\theta)$]
gradient(1) = [code to compute $\frac \partial {\partial\theta_0} J(\theta)$]
gradient(2) = [code to compute $\frac \partial {\partial\theta_1} J(\theta)$]
gradient(n+1) = [code to compute $\frac \partial {\partial\theta_n} J(\theta)$]
Nice copy pasta from coursera
ReplyDelete