Monday, May 4, 2015

Logistic Regression : Advanced Optimization

Advanced Optimization for Logistic Regression : Finding the values of $\theta$

Gradient Descent Algorithm is one way to calculate the value of parameters $\theta$. However, it involves selecting a appropriate value of $\alpha$ which might cause multiple iterations.

There are various other optimization algorithms available for minimizing a cost function $J(\theta)$ which are a bit more complex, but there is no need to manually pick a value of $\alpha$ and are much faster than Gradient Descent. The other algorithms are Conjugate Gradient, BFGS and L-BFGS.

Coding the Advanced Optimization Algorithms in MATLAB/Octave:

Example:

say we have to optimize $\theta_1 and \theta_2$, and $J(\theta)$ is given by
$J(\theta) = {(\theta_1-5)}^2+{(\theta_2-5)}^2$
$\frac \partial {\partial\theta_1} J(\theta) = 2(\theta_1-5)$
$\frac \partial {\partial\theta_2} J(\theta) = 2(\theta_2-5)$

We write the function in Matlab/Octave which calculates the value of the cost function $J(\theta)$ and the partial derivatives (gradient 1 for $\theta_1$ and gradient 2 for $\theta_2$)

Note: The index in Octave/Matlab starts from 1; so $\theta_0, \theta_1.....\theta_n$ in the equation is equal to $\theta_1, \theta_2,....\theta_{n+1}$

Code:
function[jVal, gradient] = costFunction(theta)

jVal = (theta(1)-5)^2 + (theta(2)-5)^2;
gradient = zeros(2,1)
gradient(1) = 2*(theta(1)-5)
gradient(2) = 2*(theta(2)-5)
....

Once the code for calculating jVal (the cost function) and gradient is written, the values of $\theta$ are optimized by the following code:

options = optimset('GradObj','on','MaxIter','100');
initialTheta = zeros(2,1)
[optTheta, functionVal, exitFlag]...
     = fminunc(@costFunction, intitalTheta,options);



Recap:

theta = $\pmatrix{\theta_0 \cr \theta_1 \cr .\cr .\cr \theta_{n}}$
function[jVal, gradient] = costFunction(theta)

jVal = [code to compute $J(\theta)$]

gradient(1) = [code to compute $\frac \partial {\partial\theta_0} J(\theta)$]
gradient(2) = [code to compute $\frac \partial {\partial\theta_1} J(\theta)$]

gradient(n+1) = [code to compute $\frac \partial {\partial\theta_n} J(\theta)$]

1 comment: