Logisitc Regrssion vs linear SVM

最新推荐文章于 2023-05-26 22:25:17 发布

masonwang_513

最新推荐文章于 2023-05-26 22:25:17 发布

阅读量403

点赞数

分类专栏： cv 文章标签： Machine Learning

cv 专栏收录该内容

22 篇文章 0 订阅

订阅专栏

Given a binary classification problem, the goal is to find the “best” line that has the maximum probability of classifying unseen points correctly. How you define this notion of “best” gives you different models like SVM and logistic regression (LR).

In SVM, line \ell_1 is better than line \ell_2 if the “margin” of \ell_1 is larger, that is, it is farther from both classes. In LR, a line \ell defines a probability distribution over the input space. Line \ell_1 is better than line \ell_2 if the the distribution defined by \ell_1 is low at class -1 points and high at class +1 points on average, compared to the distribution defined by \ell_2.

This definition of “best” results in different loss functions. If you look at the optimization problems of linear SVM and (regularized) LR, they are very similar:

minwλ∥w∥2+∑imax{0,1−yiwTxi}

minwλ∥w∥2+∑ilog(1+exp(1−yiwTxi))

That is, they only differ in the loss function — SVM minimizes hinge loss while logistic regression minimizes logistic loss.

Let’s take a look at the loss functions:

There are 2 differences to note:

Logistic loss diverges faster than hinge loss. So, in general, it will be more sensitive to outliers.
Logistic loss does not go to zero even if the point is classified sufficiently confidently. This might lead to minor degradation in accuracy.

So, you can typically expect SVM to perform marginally better than logistic regression.

Some other points of comparison:

Logistic regression has a probabilistic interpretation. So LR can be integrated into other probabilistic frameworks much more seamlessly than SVMs.
While both models can be “kernelized”, SVM leads to sparser solutions due to complementary slackness.
SVM has a very efficient SMO algorithm for optimizing the kernelized model. Further, there is LibSVM, an implementation of SMO, that allows training non-linear SVMs very easily.

Both are classification methods with linear decision boundaries, but they're motivated completely differently.

The basic idea of logistic regression is to adapt linear regression so that it estimates the probability a new entry falls in a class. The linear decision boundary is simply a consequence of the structure of the regression function and the use of a threshold in the function to classify. Linear SVM on the other hand, only generates the classification label.

The decision boundary is much more important for Linear SVM's - the whole goal is to place a linear boundary in a smart way. There isn't a probabilistic interpretation of individual classifications, at least not in the original formulation.

Like with any linear regression, every training point has a certain influence on the estimated logistic regression function. SVM's are actually formulated so that only points near the decision boundary really make a difference. Points that are "obvious" have no effect on the decision boundary. This can be very different from logistic regression, as "obvious" points can sometimes be very influential.