Logistic Regression
Logistic regression is an approach to classification problem rather than a quantative prediction. More confusingly speeking, logistic regression is NOT used to solve regression problems.
Typically, the logistic regression is used to solve binary classification problem, that is, the output only have two levels, high or low, good or bad, be or not be, etc..
- Pros
- Easy to understand and complete in code;
- Requires little on computing power.
- Cons
- Easily get overfitting
- Low accuracy at rapid growth part
Model
Sigmoid Function
In classification problem, we want the model to accept every input and gives out a possibility of a certain class that the inputs indicates. Unit step function is a choice but it is hard for us to handle the step. Sigmoid function has similar property but a much gentle way of the step.
and the curve shows as
The sigmoid function has a good property on derivative.
Note that σ(z) can be recognized as the posibility for one certain class.
Decision Boundary
In most cases, we treat two classes with equal significance and therefore, we use 0.5 as our decision boundary. That is, σ(z)≥0.5→y^=1 and σ(z)<0.5→y^=0
The Logistic Model
The input of sigmoid function is denoted as
z
and given as following,
Combining equation (1) and (2) and take logarithm of both side we could get
where x⃗ =(1,x1,x2,...,xn)T . The left hand side is called log-odds or logit.
Therefore, the logistic regression can be modelled as,
Coefficient Estimation
In classification problem, we should expect the cost as large as possible if the classification is wrong. The pictures below gives a brief view of the cost.
By using a small math trick, we could combine the two situation above together. Cost function could be shown as,
where
y
and
Gradient Descent
Overall, the gradient descent algorithm could represented as
If we use the gradient descent method to optimize the coefficients, following are the steps.
In every circle, we update
θ
by
Possible Extension
Logit
Recall the equation (3), we get log(σ(x(i))1−σ(x(i))) . This is so called logit, or log-odds. For more information, see wikipedia.
Linear Discriminant Analysis
Though there are several extension of logistic regression to adjust it to multi-class classification problem, Linear Discriminant Analysis is more popular on solving such problems. For more information about LNA, click here. For more information about Multicalss Classification Logistic Regression, see wikipedia
Other Logistic Regressions
- Ordered Logistic Regression handles ordinal dependent variables
- Conditional Logistic Regression handles matched or stratified data when the strata are small. It is mostly used in the analysis of observational studies
- Conditional Random Field apply logistic model to sets of interdependent variables.
Potential Problems
Overfitting
Overfitting problems is common in cost function in logistic regression model.
Following image shows underfit, fit, and overfit in sequence.
To overcome the overfit problem, there are two possible ways.
Reduce the number of features.
This may casue information lost, the features left should be carefully selected.
Rugularization
Keep all the features and add a punishment item to cost function J(θ) . Usually, we use L2 norm as the punishment. The cost function changes to
and corresponding update of theta changes to
where λ is punishment coeffiencient. A larger λ punish the model harder and it may lead to underfitting problem, so a proper selection of λ is necessary.