Monday, May 4, 2015

Classification using Logistic Regression

Classification using Logistic Regression:

Classification is different from linear regression as it classifies the data into two or more categories. It is still called regression as it takes the input from the training data and creates a model much like linear regression, but since it uses the Logit function to classify the data points, it is named Logistic Regression.

Why can't we use Linear Regression for Classification : Linear regression can be used to separate two sets of data points using a higher order polynomial (in case of non-linear decision boundary) but presence of any outlier seriously affects the classification when we use Linear Regression. This problem can be easily solved using Logistic Regression.

Logistic regression examples:

  • Emails: Spam/Not Spam
  • Online Transactions: Fraud/Not Fraud
  • Tumor: Malignant/Benign
Typically the target variable (outcome) is classified as 

$y \epsilon \{0,1\}$
0: Negative Class (e.g. benign Tumor)
1: Positive Class (e.g. malignant tumor)

Differences with Linear Regression:

Logistic: y = 0 or 1

Logistic:$0\le h_\theta(x) \le 1$
Linear: $h_\theta(x)$ can be >1 or <0

Logistic Regression Model:

Want: $0\le h_\theta(x) \le 1$

$h_\theta(x) = g(\theta^Tx)$

$g(z) = \frac 1 {1+e^{-z}}$

where $z= \theta^Tx$

The function g(z) is also called the Logit function or the Sigmoid Function


Interpretation of Hypothesized Output:

$h_\theta(x)$ = estimated probability that y=1 on input x

Example: If $h_\theta(x)$ = 0.7, there is a 70% chance that y = 1

$h_\theta(x) = P(y=1 | x ; \theta)$




No comments:

Post a Comment