Regularization : Linear Regression
$J(\theta)$ = $min \frac 1 {2m} \left[\sum_{i=1}^m {(h_\theta(x^{(i)})-y^{(i)})^2} + \lambda\sum_{j=1}^n\theta_j^2 \right] $
Goal: minimize $J(\theta)$
Gradient Descent:
GD Algorithm without the regularization term:
repeat until convergence $\{$
$\theta_0 := \theta_0 - \alpha{\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.x_0^{(i)}$
$\theta_j := \theta_j-\alpha {\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.{x_j^{(i)}}$
$\}$
Gradient Descent with Regularization term $\lambda$ added:
Since we are not regularizing $\theta_0$, it is updated separately and the regularization term is not included in the calculation of $\theta_0$
repeat until convergence $\{$
$\theta_0 := \theta_0 - \alpha{\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.x_0^{(i)}$
$\theta_j := \theta_j-\alpha \left[{\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.{x_j^{(i)}} - \frac \lambda m \theta_j \right]$
(j= 1,2,3,....n)
$\}$
Simplifying:
$\theta_j := \theta_j(1-\alpha\frac \lambda m) - \alpha\frac 1 m \sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.{x_j^{(i)}}$
The term $(1-\alpha\frac \lambda m)$ is less than 1 (0.9 or 0.95) and this reduces the value of $\theta_j$ in every step by multiplying it repeatedly with a number less than 1, thus achieveing the goal of regularization.
Normal Equation Method:
Matrices:
X : m x (n+1) matrix
y: m- dimensional vector
Without Regularization:
$\theta = (X^TX)^{-1}X^Ty$
With Regularization
$\theta = (X^TX + \lambda\begin{bmatrix}0& 0& 0& 0& 0& 0 \cr 0& 1& 0& 0& 0& 0 \cr 0& 0& 1& 0& 0& 0 \cr 0& 0& 0& .& .& . \cr 0& 0& 0& .& .& .\cr 0& 0& 0& .& .& 1\cr\end{bmatrix})^{-1}X^Ty$
The Matrix multiplied with $\lambda$ is a $(n+1)$ x$ (n+1)$ matrix, where the first row, first columns and all other non diagonal elements are 0. Only the elements on the main diagonal (except the first one) are equal to 1
The Normal equation method with regularization takes care of the non-invertibility of the matrix $X^TX$
$J(\theta)$ = $min \frac 1 {2m} \left[\sum_{i=1}^m {(h_\theta(x^{(i)})-y^{(i)})^2} + \lambda\sum_{j=1}^n\theta_j^2 \right] $
Goal: minimize $J(\theta)$
Gradient Descent:
GD Algorithm without the regularization term:
repeat until convergence $\{$
$\theta_0 := \theta_0 - \alpha{\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.x_0^{(i)}$
$\theta_j := \theta_j-\alpha {\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.{x_j^{(i)}}$
$\}$
Gradient Descent with Regularization term $\lambda$ added:
Since we are not regularizing $\theta_0$, it is updated separately and the regularization term is not included in the calculation of $\theta_0$
repeat until convergence $\{$
$\theta_0 := \theta_0 - \alpha{\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.x_0^{(i)}$
$\theta_j := \theta_j-\alpha \left[{\frac{1}{m}}\sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.{x_j^{(i)}} - \frac \lambda m \theta_j \right]$
(j= 1,2,3,....n)
$\}$
Simplifying:
$\theta_j := \theta_j(1-\alpha\frac \lambda m) - \alpha\frac 1 m \sum_{i=1}^m{({h_\theta}(x^{(i)})-y^{(i)})}.{x_j^{(i)}}$
The term $(1-\alpha\frac \lambda m)$ is less than 1 (0.9 or 0.95) and this reduces the value of $\theta_j$ in every step by multiplying it repeatedly with a number less than 1, thus achieveing the goal of regularization.
Normal Equation Method:
Matrices:
X : m x (n+1) matrix
y: m- dimensional vector
Without Regularization:
$\theta = (X^TX)^{-1}X^Ty$
With Regularization
$\theta = (X^TX + \lambda\begin{bmatrix}0& 0& 0& 0& 0& 0 \cr 0& 1& 0& 0& 0& 0 \cr 0& 0& 1& 0& 0& 0 \cr 0& 0& 0& .& .& . \cr 0& 0& 0& .& .& .\cr 0& 0& 0& .& .& 1\cr\end{bmatrix})^{-1}X^Ty$
The Matrix multiplied with $\lambda$ is a $(n+1)$ x$ (n+1)$ matrix, where the first row, first columns and all other non diagonal elements are 0. Only the elements on the main diagonal (except the first one) are equal to 1
The Normal equation method with regularization takes care of the non-invertibility of the matrix $X^TX$
No comments:
Post a Comment