svm.svm_什么是svm

svm.svm

Support Vector Machine (SVM) is an approach for classification which uses the concept of separating hyperplane. It was developed in the 1990s. It is a generalization of an intuitive and simple classifier called maximal margin classifier.

支持向量机(SVM)是一种使用分离超平面的概念进行分类的方法。 它是在1990年代开发的。 它是直观且简单的分类器(称为最大余量分类器)的概括。

In order to study Support Vector Machine (SVM), we first need to understand what is maximal margin classifier and support vector classifier.

为了研究支持向量机(SVM),我们首先需要了解什么是最大余量分类器和支持向量分类器。

In maximal margin classifier, we use a hyperplane to separate the classes. But, What is a hyperplane? Consider we have a p-dimensional space, a hyperplane is a flat affine(does not necessarily pass from the origin)subspace of dimension p-1. For example, in a two-dimensional space, a hyperplane is a one-dimensional flat subspace, which is nothing but a line. Similarly, in a three-dimensional space, a hyperplane is two-dimensional flat subspace which is nothing but a plane. Figure 1 illustrates a hyperplane in two-dimensional space.

在最大余量分类器中,我们使用超平面来分离类。 但是, 什么是超平面? 考虑我们有一个p维空间,一个超平面是维p-1的平面仿射(不一定从原点通过)子空间。 例如,在二维空间中,超平面是一维平面子空间,仅是一条线。 类似地,在三维空间中,超平面是二维平面子空间,它仅是平面。 图1说明了二维空间中的超平面。

From Figure 1, we can also see this hyperplane as a line dividing the space in two halves. Therefore, it can act as a decision boundary for classification. For example, in right hand panel of Figure 2 the points above the line belong to blue class and points below the line belongs to purple.

从图1中,我们还可以看到此超平面是将空间分成两半的线。 因此,它可以作为分类的决策边界。 例如,在图2的右面板中,线上方的点属于蓝色类,线下方的点属于紫色。

Image for post
Source : An Introduction to Statistical Learning with Applications in R, book by Robert Tibshirani, Gareth James, Trevor Hastie and Daniela Witten.
资料来源:Robert Tibshirani,Gareth James,Trevor Hastie和Daniela Witten撰写的《 R中的统计学习及其应用入门》。

In general, if our data can be perfectly separated using a hyperplane, then there will in fact exist an infinite number of such hyperplanes. This is because a given separating hyperplane can usually be shifted a tiny bit up or down, or rotated, without coming into contact with any of the observations. Three possible separating hyperplanes are shown in the left hand panel of Figure 2. In order to construct a classifier based upon a separating hyperplane, we must have a reasonable way to decide which of the infinite possible separating hyperplanes to use. A natural choice is the maximal margin hyperplane (also known as the optimal separating hyperplane), which is the separating hyperplane that is farthest from the training observations. That is, we can compute the (perpendicular) distance from each training observation to a given separating hyperplane; the smallest such distance is the minimal distance from the observations to the hyperplane, and is known as the margin. The maximal margin hyperplane is the separating hyperplane for which the margin is largest — that is, it is the hyperplane that has the farthest minimum distance to the training observations. We can then classify a test observation based on which side of the maximal margin hyperplane it lies. This is known as the maximal margin classifier. Figure 3 shows the maximal margin hyperplane on the data which is used in Figure 2.

通常,如果可以使用超平面完美地分离我们的数据,那么实际上将存在无限数量的此类超平面。 这是因为给定的分离超平面通常可以在不接触任何观测值的情况下向上或向下移动或旋转一点点。 图2的左侧面板中显示了三个可能的分离超平面。为了构造一个基于分离超平面的分类器,我们必须有一种合理的方法来决定要使用哪个无限可能的分离超平面。 自然的选择是最大余量超平面 (也称为最佳分离超平面),它是距训练观测值最远的分离超平面。 也就是说,我们可以计算从每个训练观测值到给定分离超平面的(垂直)距离; 这样的最小距离是从观测到超平面的最小距离,称为边距。 最大边距超平面是边距最大的分离超平面,即距训练观测距离最小的超平面。 然后,我们可以根据其位于最大余量超平面的哪一侧对测试观察进行分类。 这被称为最大余量分类器。 图3显示了图2中使用的数据上的最大余量超平面。

Image for post
Source : An Introduction to Statistical Learning with Applications in R, book by Robert Tibshirani, Gareth James, Trevor Hastie and Daniela Witten.
资料来源:Robert Tibshirani,Gareth James,Trevor Hastie和Daniela Witten撰写的《 R中的统计学习及其应用入门》。

Comparing right hand panel of Figure 2 to Figure 3, we see that the maximal margin hyperplane shown in Figure 3 does indeed result in a greater minimal distance between the observations and the separating hyperplane — that is, a larger margin. In a sense, the maximal margin hyperplane represents the mid-line of the widest “slab” that we can insert between the two classes. Examining Figure 3, we see that three training observations are equidistant from the maximal margin hyperplane and lie along the dashed lines indicating the width of the margin. These three observations are known as support vectors, since they are vectors in p-dimensional space (in Figure 3, p = 2) and they “support” the maximal margin hyperplane in the sense that if these points were moved slightly then the maximal margin hyperplane would move as well. Interestingly, the maximal margin hyperplane depends directly on the support vectors, but not on the other observations: a movement to any of the other observations would not affect the separating hyperplane, provided that the observation’s movement does not cause it to cross the boundary set by the margin.

比较图2和图3的右手面板,我们看到图3所示的最大余量超平面确实确实导致观测值和分离的超平面之间的最小距离更大,也就是更大的余量。 从某种意义上说, 最大余量超平面代表了我们可以在两个类之间插入的最宽“平板”的中线 。 查看图3,我们看到三个训练观测值与最大边距超平面等距,并且沿着虚线表示边距的宽度。 这三个观测值称为支持向量,因为它们是p维空间中的向量(在图3中,p = 2),并且它们在某种意义上“支持”最大余量超平面,如果这些点稍有移动,则最大余量超飞机也会移动。 有趣的是, 最大余量超平面直接取决于支持向量,但不取决于其他观测值向其他任何观测值移动不会影响分离的超平面,前提是观测值的移动不会导致其越过所设置的边界保证金。

The maximal margin hyperplane requires the existence of separating hyperplane but, in many cases no separating hyperplane exists. That is, we cannot exactly separate the two classes using a hyperplane. An example for such is given in Figure 4.

最大余量超平面要求存在分离超平面,但是在许多情况下,不存在分离超平面。 也就是说,我们无法使用超平面将两个类完全分开。 图4给出了一个这样的示例。

Image for post
Source : An Introduction to Statistical Learning with Applications in R, book by Robert Tibshirani, Gareth James, Trevor Hastie and Daniela Witten.
资料来源:Robert Tibshirani,Gareth James,Trevor Hastie和Daniela Witten撰写的《 R中的统计学习及其应用入门》。

In this case, we cannot exactly separate the two classes. However, we can extend the concept of a separating hyperplane in order to develop a hyperplane that almost separates the classes, using a so-called soft margin. The generalization of the maximal margin classifier to the non-separable case is known as the support vector classifier. In Figure 4, we see that observations that belong to two classes are not necessarily separable by a hyperplane. In fact, even if a separating hyperplane does exist, then there are instances in which a classifier based on a separating hyperplane might not be desirable. A classifier based on a separating hyperplane will necessarily perfectly classify all of the training observations; this can lead to sensitivity to individual observations. An example is shown in Figure 5.

在这种情况下,我们无法将两个类完全分开。 但是,我们可以扩展分离超平面的概念,以开发一种使用所谓的软边距几乎将各个类分离的超平面。 最大余量分类器到不可分情况的推广被称为支持向量分类器 。 在图4中,我们看到属于两个类别的观测值不一定可以由超平面分开。 实际上,即使确实存在分离的超平面,在某些情况下也可能不希望基于分离的超平面进行分类。 基于分离的超平面的分类器将必须对所有训练观测值进行完美分类。 这可能会导致对个人观察的敏感性。 一个示例如图5所示。

Image for post
Source : An Introduction to Statistical Learning with Applications in R, book by Robert Tibshirani, Gareth James, Trevor Hastie and Daniela Witten.
资料来源:Robert Tibshirani,Gareth James,Trevor Hastie和Daniela Witten撰写的《 R中的统计学习及其应用入门》。

The addition of a single observation in the right-hand panel of Figure 5 leads to a dramatic change in the maximal margin hyperplane. The resulting maximal margin hyperplane is not satisfactory — for one thing, it has only a tiny margin. This is problematic because as discussed previously, the distance of an observation from the hyperplane can be seen as a measure of our confidence that the observation was correctly classified. Therefore, it could be worthwhile to misclassify a few training observations in order to do a better job in classifying the remaining observations. The support vector classifier, sometimes called a soft margin classifier, does exactly this. Rather than seeking the largest possible margin so that every observation is not only on the correct side of the hyperplane but also on the correct side of the margin, we instead allow some observations to be on the incorrect side of the margin, or even the incorrect side of the hyperplane. (The margin is soft because it can be violated by some of the training observations.) An example is shown in the left hand panel of Figure 6.

在图5的右侧面板中添加单个观察值会导致最大边距超平面的急剧变化。 由此产生的最大余量超平面并不令人满意-一方面,它仅有很小的余量。 这是有问题的,因为如前所述,观测值与超平面的距离可以视为衡量我们对观测值已正确分类的信心的度量。 因此,可能有必要对一些训练观察值进行错误分类,以便更好地对其余观察值进行分类。 支持向量分类器(有时称为“软边界”分类器)正是这样做的。 与其寻求最大可能的裕度,而不是使每个观测值不仅位于超平面的正确侧,而且还位于裕度的正确侧,我们都允许一些观测值位于裕度的不正确侧,甚至不正确的位置超平面的一侧 。 (边缘很柔和,因为某些训练观察可能会破坏边缘。)图6的左侧面板中显示了一个示例。

Image for post
Source : An Introduction to Statistical Learning with Applications in R, book by Robert Tibshirani, Gareth James, Trevor Hastie and Daniela Witten.
资料来源:Robert Tibshirani,Gareth James,Trevor Hastie和Daniela Witten撰写的《 R中的统计学习及其应用入门》。

Most of the observations are on the correct side of the margin, but a small set of observations are on the wrong side of the margin(observation 1 and 8). An observation can not only be on the wrong side of margin but also on the wrong side of the hyperplane. In case no hyperplane exist then such a scenario is inevitable. Observations corresponds to the wrong side of the hyperplane are observations that are misclassified by support vector classifier. Observations which lie on correct side of the margin does not affect the support vector classifier, but the observations that lie directly on the margin, or on the wrong side of the margin for their class, are known as support vectors. These observations do affect the support vector classifier.

大多数观测值位于边界的正确一侧,但一小部分观测值位于边界的错误一侧(观测值1和8)。 观察不仅可以在边缘的错误一侧,而且可以在超平面的错误一侧。 如果不存在超平面,那么这种情况是不可避免的。 观测值对应于超平面的错误一侧,是被支持向量分类器误分类的观测值。 位于边缘正确一侧的观测值不会影响支持向量分类器,但是直接位于边缘或类别错误边缘上的观测值称为支持向量。 这些观察结果确实会影响支持向量分类器。

The support vector classifier is a natural approach for classification in the two-class setting, if the boundary between the two classes is linear. However, in practice we are sometimes faced with non-linear class boundaries. For instance consider the data shown in left panel of Figure 7.

如果两类之间的边界是线性的,则支持向量分类器是在两类环境中进行分类的自然方法。 但是,实际上,有时我们会遇到非线性的类边界。 例如,考虑图7左面板中显示的数据。

Image for post
Source : An Introduction to Statistical Learning with Applications in R, book by Robert Tibshirani, Gareth James, Trevor Hastie and Daniela Witten.
资料来源:Robert Tibshirani,Gareth James,Trevor Hastie和Daniela Witten撰写的《 R中的统计学习及其应用入门》。

It is clear that a support vector classifier or any linear classifier will perform poorly here. Indeed, the support vector classifier shown in the right-hand panel of Figure 7 is useless here.

显然,支持向量分类器或任何线性分类器在这里的表现都会很差。 实际上,图7右侧面板中所示的支持向量分类器在这里是无用的。

We can address this problem by enlarging the feature space using quadratic, cubic or higher order polynomial functions of the features. The support vector machine (SVM) is an extension of the support vector classifier that results from enlarging the feature space in a specific way, using kernels. For example, In Figure 7 we have two features X1 and X2, rather than using these features as it is for classification, we can also include higher degree terms of these features. Such as, X1² and X2² , which will give us a quadratic polynomial whose solution is nonlinear. We use kernels for doing exactly this in an efficient manner, where we specify what kind of decision boundary to use. For example: linear, polynomial (with some degree) and radial. Figure 8 illustrates the use of polynomial and radial kernel on the data of Figure 7.

我们可以通过使用特征的二次,三次或更高级多项式函数扩大特征空间来解决此问题。 支持向量机(SVM)是支持向量分类器的扩展,它是通过使用内核以特定方式扩大特征空间而产生的。 例如,在图7中,我们具有两个特征X1和X2,而不是按原样使用这些特征进行分类,我们还可以包括这些特征的较高阶项。 例如X1²和X2²,这将给我们一个二次多项式,其解是非线性的。 我们使用内核来有效地做到这一点,在此我们指定要使用哪种决策边界。 例如:线性,多项式(一定程度)和径向。 图8说明了在图7的数据上使用多项式和径向核。

Image for post
Source : An Introduction to Statistical Learning with Applications in R, book by Robert Tibshirani, Gareth James, Trevor Hastie and Daniela Witten.
资料来源:Robert Tibshirani,Gareth James,Trevor Hastie和Daniela Witten撰写的《 R中的统计学习及其应用入门》。

On the left hand panel we used a polynomial kernel with degree 3 and on the right we have used a radial kernel. Both the kernels resulted in more appropriate decision rule. The mathematics of how decision boundaries and kernels are obtained, are too technical to discuss here.

在左侧面板上,我们使用了阶数为3的多项式核,在右侧,我们使用了径向核。 这两个内核都产生了更合适的决策规则。 如何获得决策边界和核的数学方法太技术化,无法在此处讨论。

翻译自: https://medium.com/@sauravjadhav698/what-is-svm-f4e06eca79b3

svm.svm

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值