Monday, May 4, 2015

Regularization : Linear Regression

Regularization : Linear Regression

$J(\theta)$ = $min \frac 1 {2m} \left[\sum_{i=1}^m {(h_\theta(x^{(i)})-y^{(i)})^2} + \lambda\sum_{j=1}^n\theta_j^2 \right]  $

Goal: minimize $J(\theta)$

Gradient Descent:

GD Algorithm without the regularization term:

repeat until convergence $\{$

$\theta_0 := \theta_0 - \alpha{\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.x_0^{(i)}$

$\theta_j := \theta_j-\alpha {\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.{x_j^{(i)}}$

$\}$

Gradient Descent with Regularization term $\lambda$ added:

Since we are not regularizing $\theta_0$, it is updated separately and the regularization term is not included in the calculation of $\theta_0$

repeat until convergence $\{$

$\theta_0 := \theta_0 - \alpha{\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.x_0^{(i)}$

$\theta_j := \theta_j-\alpha \left[{\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.{x_j^{(i)}} - \frac \lambda m \theta_j \right]$
(j= 1,2,3,....n)
$\}$

Simplifying:

$\theta_j := \theta_j(1-\alpha\frac \lambda m) - \alpha\frac 1 m \sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.{x_j^{(i)}}$


The term $(1-\alpha\frac \lambda m)$ is less than 1 (0.9 or 0.95) and this reduces the value of $\theta_j$ in every step by multiplying it repeatedly with a number less than 1, thus achieveing the goal of regularization.


Normal Equation Method:

Matrices:

X : m x (n+1) matrix
y:  m- dimensional vector

Without Regularization:

$\theta = (X^TX)^{-1}X^Ty$

With Regularization

$\theta = (X^TX + \lambda\begin{bmatrix}0& 0& 0& 0& 0& 0 \cr 0& 1& 0& 0& 0& 0 \cr 0& 0& 1& 0& 0& 0 \cr 0& 0& 0& .& .& . \cr 0& 0& 0& .& .& .\cr 0& 0& 0& .& .& 1\cr\end{bmatrix})^{-1}X^Ty$

The Matrix multiplied with $\lambda$ is a $(n+1)$ x$ (n+1)$ matrix, where the first row, first columns and all other non diagonal elements are 0. Only the elements on the main diagonal (except the first one) are equal to 1

The Normal equation method with regularization takes care of the non-invertibility of the matrix $X^TX$

No comments:

Post a Comment