Monday, April 27, 2015

Linear Regression : The Basics



Some Definitions of Linear Regression:
  • When a player plays well in the team, the team always wins. Given that the player in the next match has played well, its highly probable that the team will win again
  • Predicting the value of a variable by observing its relationship with another variable.
  • Wiki :  Linear Regression is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variable) denoted X
  • Example: Price of a house is dependent on its area and number of bedrooms. Given the data about area of house and number of bedrooms (X: area(x1) & num_bedrooms(x2)) and their price (Y), find a relationship between Price of house and its area & number of bedrooms. Now, if you get a house's area and number of bedrooms, predict the price of tha house based on the historical data
All the above statements and examples are from the Supervised Learning paradigm of Machine Learning. They take some initial data, understand the pattern in it, and use the pattern on the new data to figure out the outcome


The Predictor variables (X) are known as predictors of independent variables, since they are independent and their values can change without being affected by any other variable

The Outcome Variable (Y) is known as dependent variable since its value is 'dependent' of the value of independent variables (X's), and if any of the X's change, the value of Y will have to change.

An equation is typically written as 

y: Outcome Variable
X: vector of Predictor variables (independent variables)
$\epsilon$: error term
$\beta$: Coefficient of the terms


Another way of representing the model is :

The Training Dataset goes into a learning model, which proposes a hypothesis 'h' which is used to take independent variables (X's) as input and produce a result (Outcome variable/Dependent Variable)

What is there is no relationship? Or how accurate is our model? How to measure it? What is the error term?  - All models come with a cost function which estimates the difference between the actual values of y's in the training data, and the gap with the prediction model. This gap between actual and estimated values of Ys is known as the Cost.

All optimization objectives are aimed at reducing the cost. Lesser the cost, better is the model (however there are other factors too).

No comments:

Post a Comment