SVM (支持向量机推导)

最新推荐文章于 2024-08-27 22:32:28 发布

Camoro

最新推荐文章于 2024-08-27 22:32:28 发布

阅读量204

点赞数

本文链接：https://blog.csdn.net/u014617412/article/details/115370016

版权

SVM (支持向量机推导)

Background

1. Hyperplane Definition

A hyperplane can be defined by two vectors $\bold{w}=\begin{pmatrix} -b \\ -a \\ 1 \end{pmatrix},\; \bold{x}=\begin{pmatrix} 1 \\ x \\ y \end{pmatrix}$ , because

A hyperplane can be defined as $\implies y-ax-b=0$
And we have $\bold{w}^T\bold{x}=-b*1+(-a)*x+1*y=y-ax-b$

2. Vector subtraction

subtract

3. Dot product to calculate projection length

Vector dot product: $\bold{a} \cdot \bold{b}=|\bold{a}| \times |\bold{b}|\times\cos(\theta)$
$\text{Length }(|\bold{a}|\text{'s projection on }|\bold{b}|)=|\bold{a}| \times\cos(\theta)=\frac{\bold{a} \cdot \bold{b}}{|\bold{b}|}$

4. The distance from a point to a line

$d=\frac{|Ax_0+By_0+C|}{\sqrt{A^2+B^2}}$

What is Support Vector Machines

Intuition: The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space (N - the number of features) that distinctly classifies the data points (the hyperplane with maximum margin).
Decision boundaries: hyperplanes that help classify the data points.
Support vectors
- Data points that are closer to the hyperplane and influence the position and orientation of the hyperplane.
- Using these support vectors, we maximize the margin of the classifier.
- Delete the support vectors will change the position of the hyperplane.

Objective function

$\min_{\bold{w},b}\frac{||\bold{w}||^2}{2}, \; \text{s.t. }\;y_i(w^Tx_i+b)\ge1,\quad i=1,2,\dots, m$

With above function, we can find the hyperplane $(\bold{w}, b)$ with the largest margin, where $\bold{w}$ is the weight vector and b is the bias of the hyperplane.
Assumpe the hyperplane $(\bold{w}, b)$ can classify all the data correctly, i.e. for any $(x_i,y_i)\in D$ , $\bold{w}^T\bold{x_i}+b\gt 0$ if $y_i=+1$ ; $\bold{w}^T\bold{x_i}+b\lt 0$ if $y_i=-1$ . Let,
$\begin{cases} \bold{w}^T\bold{x_i}+b\ge 1, &y_i=+1 \\ \bold{w}^T\bold{x_i}+b\le -1, &y_i=-1 \end{cases}$
For the two support vectors (the closest data points on the margin line), we have $\bold{w}^T\bold{x^+}+b=+1,\bold{w}^T\bold{x^-}+b=-1$ (e.q. 1)

1. Proof with dot product

Calculate margin

$\hat{\bold{w}}=\frac{\bold{w}}{||\bold{w}||}$ is the unit vector that are orthogonal to hyperplane.
- why $\hat{\bold{w}}$ is orthogonal to $\bold{w}^T\bold{x}+b=0$ ? Can prove for a line $y = k x + b$ in 2D axis.
Then use dot product to calculate projected length of $\bold{c}$ alongside $\hat{\bold{w}}$ , $\text{margin}=||\bold{c}||\times \cos\theta=\frac{\bold{c}\cdot \bold{w}}{||\bold{w}||}=\bold{c}\cdot \hat{\bold{w}}=(\bold{x}^+-\bold{x}^-)\cdot \hat{\bold{w}}=(\bold{x}^+-\bold{x}^-)\cdot \frac{\bold{w}}{||\bold{w}||}=\bold{x}^+\cdot \frac{\bold{w}}{||\bold{w}||} -\bold{x}^-\cdot \frac{\bold{w}}{||\bold{w}||}$
- $\bold{c}$ is the vector starting at $x^-$ and ending at $x^+$
With (e.q. 1) , we have $\text{margin}=\bold{x}^+\cdot \frac{\bold{w}}{||\bold{w}||} -\bold{x}^-\cdot \frac{\bold{w}}{||\bold{w}||}=\frac{1-b}{||\bold{w}||} -\frac{-1-b}{||\bold{w}||}=\frac{2}{||\bold{w}||}$
All in one

2. Proof with distance formula

Using formula to calculate distance from a point to a line
$\text{margin}=r^++r^-=\frac{|w^Tx^++b|}{||w||}+\frac{|w^Tx^-+b|}{||w||}$
With (e.q. 1) , we have $\text{margin}=r^++r^-=\frac{|w^Tx^++b|}{||w||}+\frac{|w^Tx^-+b|}{||w||}=\frac{|1|}{||w||}+\frac{|-1|}{||w||}=\frac{|2|}{||w||}$

Our objective function is then: $\max\frac{2}{||\bold{w}||}\implies\max\frac{1} {||\bold{w}||}\implies\min||\bold{w}||\implies\min\frac{||\bold{w}||^2}{2}$

Soft Margin SVM

Margin: The distance of the vectors from the hyperplane.
- Hard margin: all data are separated correctly
- Soft margin: allow some margin violation to occur
It’s not always plausible to classify all the data points correctly with a hyperplane, so we need to tolerate some error classification which not satisfy the restriction: $y_i(\bold{w}^T\bold{x_i}+b)\ge 1$ (e.q. 2) .We call this soft margin SVM.

Objective function of soft margin SVM

$\min_{\bold{w},b}\frac{||\bold{w}||^2}{2}+C\sum_{i=1}^m\ell_{0/1}(y_i(\bold{w}^T\bold{x_i}+b)),\quad \ell_{0/1}=\begin{cases} 1, &\text{if } z\lt 0; \\ 0, &\text{otherwise.} \end{cases}$

When $C=+\infty$ , objective function force all the data subject to our restriction e.q.2. Otherwise, the function tolerates some error classification.
Because $\ell_{0/1}$ is not a convex and continuous function, we usually use other function to take place of $\ell_{0/1}$ , we call them “surrogate loss”.
- hinge loss: $\ell_{hinge}(z)=\max(0,1-z)$
- exponential loss: $\ell_{exp}(z)=\exp(-z)$
- logistic loss: $\ell_{hinge}(z)=\log (1+\exp{(-z)})$

Kernel Tricks -TBD

If ata points are not separable in low dimensional space, use kernel function to map them to higher dimensional space
在这里插入图片描述

Solve objective function -TBD

Lagrange Multiplier

Reference：A Top Machine Learning Algorithm Explained: Support Vector Machines (SVMs)

Camoro

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
SVM (支持向量机推导)

SVM (支持向量机推导)What is Support Vector MachinesIntuition: The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space (N - the number of features) that distinctly classifies the data points (the hyperplane with m
复制链接

扫一扫