Thursday, April 30, 2015

Linear Regression : Implementation using Normal Equation

Normal Equation: Intuition

Normal Equations is a method to solve for $\theta$ analytically.

The first step is to convert or represent the dataset in a Matrix and vector form.

Consider the dataset:


The variables here are the predictors ($x_1,x_2,x_3,x_4$) and the outcome $y$. The coefficients will be $\theta_0,\theta_1, \theta_2, \theta_3 and \theta_4$.

The hypothesis:
$h_\theta(x)=\theta_0x_0+\theta_1x_1+\theta_2x_2+\theta_3x_3+\theta_4x_4+\theta_5x_5$

Another column needs to be added for $x_0$ which will just be filled with 1's to make the dataset look like:



The matrix X will be denoted as:

$X = \pmatrix{1 & 2104 & 4 & 2 & 3 \cr
1 & 1400 & 2 & 1 & 5 \cr
1 & 3500 & 5 & 2 & 3 \cr
1 & 960 & 1 & 1 & 7 \cr
}$
$m$ x $(n+1)$

The matrix Y  will be denoted as
$Y = \pmatrix{1465 \cr 900 \cr 1000 \cr 435 \cr}$
m-dimensional vector

m training examples, (n+1) features

$\theta = (X^TX)^{-1}X^Ty$

When to use Normal Equation and when to use Gradient Descent:

The Gradient Descent algorithm needs an arbitary parameter $\alpha$ which is not needed in Normal Equations. Also, there is no need to do feature normalization in Normal Equation method. However, if the number of features are too large (n>10,000), Normal Equation method will be too slow because of difficulty in calculating the inverse of a very large matrix. Gradient Descent works well even if the number of features are in the order of ${10}^6$.


No comments:

Post a Comment