Support Vector Machine 支持向量机
Vapnik(前苏联人)
适合样本数较小时的预测
由听浙江大学的研究生课程整理得来,链接
0. No Free Lunch Therome
(随手放到这里)
如果不对特征空间有先验假设,所有算法平均表现一致。
我们认为:特征差距小的样本更有可能是一类,所以机器学习不是白学di!
SVM简介
图片来自百度百科
可以证明,对于一个 线性可分(linear separable) 的样本空间有无数条线可以分开,那么怎么算最好的?使 间隔(margin) d最大,且 d 1 = d 2 = d 2 d_1=d_2=\frac{d}{2} d1=d2=2d的线唯一且最佳。图中在虚线上的的样本点称为 支持向量(support vector)
定义:
①训练数据和标签
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
.
.
.
(
x
n
,
y
n
)
(x_1, y_1), (x_2, y_2)...(x_n, y_n)
(x1,y1),(x2,y2)...(xn,yn) 。
x
i
x_i
xi是向量,代表该样本各个特征(Feature)的取值,
y
i
y_i
yi是标签(label),在SVM中一般取+1,-1(为什么不是1,0?后面会解释)
②线性模型
(
ω
,
b
)
(\boldsymbol\omega, b)
(ω,b)代表了
ω
T
x
+
b
=
0
\boldsymbol\omega^T\boldsymbol{x}+b=0
ωTx+b=0的超平面(hyperplane)
要求什么?
当然是要求d(margin)的最大值了,但是显然不太好求,需要转化为:(一律先给结果再解释)
{
min
1
2
∥
ω
∥
2
s
.
t
.
(
s
u
b
j
e
c
t
t
o
)
y
i
[
ω
T
x
i
+
b
]
≥
1
\left\{\begin{array}{l} \min\frac{1}{2}\|\boldsymbol{\omega}\|^2\\ s.t.(subject \ to)y_i[\boldsymbol\omega^T\boldsymbol{x_i}+b]\geq1 \end{array}\right.
{min21∥ω∥2s.t.(subject to)yi[ωTxi+b]≥1
(这是一个凸优化理论中的二次规划问题)
考虑到以下的事实:
事实1:
ω
T
x
+
b
=
0
\boldsymbol\omega^T\boldsymbol{x}+b=0
ωTx+b=0与
a
ω
T
x
+
a
b
=
0
(
a
∈
R
+
)
a\boldsymbol\omega^T\boldsymbol{x}+ab=0(a\in R^+)
aωTx+ab=0(a∈R+)是同一个超平面,即
(
ω
,
b
)
=
(
a
ω
,
a
b
)
(\boldsymbol\omega, b)=(a\boldsymbol\omega, ab)
(ω,b)=(aω,ab)
事实2:向量
x
0
x_0
x0和超平面
(
ω
,
b
)
(\boldsymbol\omega, b)
(ω,b)的距离(类比点到直线的距离)
d
=
∣
ω
T
x
0
+
b
∣
∥
ω
∥
d=\frac{|\boldsymbol\omega^T\boldsymbol{x_0}+b|}{\|\boldsymbol\omega\|}
d=∥ω∥∣ωTx0+b∣
下面开始分析:
对超平面
(
ω
0
,
b
0
)
⟶
a
(
ω
,
b
)
(\boldsymbol\omega_0, b_0)\stackrel{a}{\longrightarrow}(\boldsymbol\omega, b)
(ω0,b0)⟶a(ω,b)进行变换,一定存在a使得
∣
ω
T
x
+
b
∣
=
1
|\boldsymbol\omega^T\boldsymbol{x}+b|=1
∣ωTx+b∣=1,此时
d
=
1
∥
ω
∥
d=\frac{1}{\|\boldsymbol\omega\|}
d=∥ω∥1(所以要
min
1
2
∥
ω
∥
2
\min\frac{1}{2}\|\boldsymbol{\omega}\|^2
min21∥ω∥2,
1
2
\frac{1}{2}
21是为了求导方便)
For non-support vector
∣
ω
T
x
+
b
∣
∥
ω
∥
=
d
>
1
∥
ω
∥
\frac{|\boldsymbol\omega^T\boldsymbol{x}+b|}{\|\boldsymbol\omega\|}=d>\frac{1}{\|\boldsymbol\omega\|}
∥ω∥∣ωTx+b∣=d>∥ω∥1, so
∣
ω
T
x
+
b
∣
>
1
|\boldsymbol\omega^T\boldsymbol{x}+b|>1
∣ωTx+b∣>1,. Now we need to drop the absolute value sign. We notice that
y
i
y_i
yi has the same sign with
∣
ω
T
x
+
b
∣
>
1
|\boldsymbol\omega^T\boldsymbol{x}+b|>1
∣ωTx+b∣>1, so
y
i
[
ω
T
x
+
b
]
>
1
y_i[\boldsymbol\omega^T\boldsymbol{x}+b]>1
yi[ωTx+b]>1. I don’t why the teacher use [] instead of ()😦
For support vector
y
i
[
ω
T
x
+
b
]
=
1
y_i[\boldsymbol\omega^T\boldsymbol{x}+b]=1
yi[ωTx+b]=1
Finally, we get the restrictive condition
y
i
[
ω
T
x
+
b
]
≥
1
y_i[\boldsymbol\omega^T\boldsymbol{x}+b]\geq1
yi[ωTx+b]≥1
PS When you set is non-linear separable the ineuqlity
y
i
[
ω
T
x
+
b
]
≥
1
y_i[\boldsymbol\omega^T\boldsymbol{x}+b]\geq1
yi[ωTx+b]≥1 has no solution.
PPS In fact, you can say
y
i
[
ω
T
x
+
b
]
y_i[\boldsymbol\omega^T\boldsymbol{x}+b]
yi[ωTx+b] is no less than any positive number, but 1 is much more simple.
How do we use SVM dealing with non-separable data?
{
min
1
2
∥
ω
∥
2
+
C
∑
i
=
1
N
ξ
i
s
.
t
.
(
s
u
b
j
e
c
t
t
o
)
{
s
.
t
.
(
s
u
b
j
e
c
t
t
o
)
y
i
[
ω
T
x
i
+
b
]
≥
1
−
ξ
i
ξ
i
≥
0
\left\{\begin{array}{l} \min\frac{1}{2}\|\boldsymbol{\omega}\|^2+C\sum\limits_{i=1}^N\xi_i\\ s.t.(subject \ to) \left\{\begin{array}{l} s.t.(subject \ to)y_i[\boldsymbol\omega^T\boldsymbol{x_i}+b]\geq1-\xi_i\\ \xi_i\geq0 \end{array}\right. \end{array}\right.
⎩⎪⎪⎨⎪⎪⎧min21∥ω∥2+Ci=1∑Nξis.t.(subject to){s.t.(subject to)yi[ωTxi+b]≥1−ξiξi≥0
We introduce ξ to make the ineqlity hold. At the same time, we have to make sure
ξ
\xi
ξ is not too big, so we add regulation term to the first equation. (Regulation term is pretty common in the machine learning field). C is a parameter we have already set. In practice, we usually set up upper bound, lower bound and step and try everyone to get the best C.
One of the most importent differnece between SVM and other algorithms is how they deal with non-linear separable data. Other algorithms try to use circles or rectangles etc to find the boundry. SVM tries to find a linear boundary in a higher dimension space.
X
⟶
ϕ
ϕ
(
X
)
\boldsymbol{X}\stackrel{\phi}{\longrightarrow}\phi(\boldsymbol{X})
X⟶ϕϕ(X)
example:
X
=
[
a
b
]
→
ϕ
(
X
)
=
[
a
2
b
2
a
b
a
b
]
\boldsymbol{X} = \begin{bmatrix} a \\ b \end{bmatrix}\to\phi(\boldsymbol{X}) = \begin{bmatrix} a^2 \\ b^2\\a\\b\\ab \end{bmatrix}
X=[ab]→ϕ(X)=⎣⎢⎢⎢⎢⎡a2b2abab⎦⎥⎥⎥⎥⎤
Warning:The dimension of
ω
\boldsymbol{\omega}
ω changes too.
We have already proved that the possibility of linear separable grow as the dimension grows. In a infinity dimension space, all the data sets are linear separable.
Kernel Function
We don’t have to know the expilicit expression of φ(x) if we have kernel function.
K
(
x
1
,
x
2
)
=
ϕ
(
x
1
)
T
ϕ
(
x
2
)
K(x_1, x_2) = \phi(x_1)^T\phi(x_2)
K(x1,x2)=ϕ(x1)Tϕ(x2)
Here are some common kernel function:
- Gaussian kernel function: K ( x 1 , x 2 ) = e x p ( − ∣ ∣ x 1 − x 2 ∣ ∣ 2 2 σ 2 ) K(x_1, x_2)=exp(−\frac{∣∣x_1-x_2||^2}{2σ^2}) K(x1,x2)=exp(−2σ2∣∣x1−x2∣∣2)
- Polynomial kernel function: K ( x 1 , x 2 ) = ( γ ϕ ( x 1 ) T ϕ ( x 2 ) + c ) n K(x_1, x_2)=(γ\phi(x_1)^T\phi(x_2)+c)^n K(x1,x2)=(γϕ(x1)Tϕ(x2)+c)n
- Linear kernel function(no kernel function) K ( x 1 , x 2 ) = ϕ ( x 1 ) T ϕ ( x 2 ) K(x_1, x_2)=\phi(x_1)^T\phi(x_2) K(x1,x2)=ϕ(x1)Tϕ(x2)
- Sigmoid kernel function: K ( x 1 , x 2 ) = t a n h ( γ ϕ ( x 1 ) T ϕ ( x 2 ) + c ) K(x_1, x_2)=tanh(γ\phi(x_1)^T\phi(x_2)+c) K(x1,x2)=tanh(γϕ(x1)Tϕ(x2)+c)
- Laplace kernel function: K ( x 1 , x 2 ) = e x p ( − ∣ ∣ x 1 − x 2 ∣ ∣ σ ) K(x_1, x_2) = exp(-\frac{||x_1-x_2||}{\sigma}) K(x1,x2)=exp(−σ∣∣x1−x2∣∣)
Kernel function must follow the following rules:
①
K
(
x
1
,
x
2
)
=
K
(
x
2
,
x
1
)
K(x_1, x_2) = K(x_2, x_1)
K(x1,x2)=K(x2,x1)
②半正定性
∀
C
i
(
p
a
r
a
m
e
t
e
r
)
,
X
i
(
v
e
c
t
o
r
)
,
∃
∑
i
=
1
N
∑
j
=
1
N
C
i
C
j
K
(
x
i
,
x
j
)
\forall C_i(parameter) , \boldsymbol{X_i}(vector), \exists \sum\limits_{i=1}^N\sum\limits_{j=1}^NC_iC_jK( \boldsymbol{x_i}, \boldsymbol{x_j})
∀Ci(parameter),Xi(vector),∃i=1∑Nj=1∑NCiCjK(xi,xj)
Use prime problem and dual problem to avoid ϕ ( x ) \phi(x) ϕ(x)
What is prime problem and dual problem? Check this
Prime problem:
{
m
i
n
i
m
i
z
e
f
(
ω
)
s
.
t
.
{
g
i
(
ω
)
≤
(
i
=
1
∼
K
)
h
i
(
ω
)
=
0
(
i
=
1
∼
N
)
\left\{\begin{array}{l} minimize\ f(\boldsymbol\omega)\\ s.t.\left\{\begin{array}{l} g_i(\boldsymbol\omega)\leq(i=1\sim K)\\ h_i(\boldsymbol\omega) = 0 (i=1\sim N) \end{array}\right. \end{array}\right.
⎩⎨⎧minimize f(ω)s.t.{gi(ω)≤(i=1∼K)hi(ω)=0(i=1∼N)
Dual Problem:
{
Θ
(
α
,
β
)
=
min
a
l
l
ω
{
L
(
ω
,
α
,
β
)
}
s
.
t
.
α
i
≥
0
\left\{\begin{array}{l} \Theta(\boldsymbol\alpha, \boldsymbol\beta) = \min\limits_{all \ ω}\{L(\boldsymbol\omega, \boldsymbol\alpha, \boldsymbol\beta)\}\\ s.t. \ \boldsymbol\alpha_i \geq0 \end{array}\right.
{Θ(α,β)=all ωmin{L(ω,α,β)}s.t. αi≥0
We need to find:
{
min
1
2
∥
ω
∥
2
+
C
∑
i
=
1
N
ξ
i
s
.
t
.
(
s
u
b
j
e
c
t
t
o
)
{
s
.
t
.
(
s
u
b
j
e
c
t
t
o
)
y
i
[
ω
T
x
i
+
b
]
≥
1
−
ξ
i
ξ
i
≥
0
\left\{\begin{array}{l} \min\frac{1}{2}\|\boldsymbol{\omega}\|^2+C\sum\limits_{i=1}^N\xi_i\\ s.t.(subject \ to) \left\{\begin{array}{l} s.t.(subject \ to)y_i[\boldsymbol\omega^T\boldsymbol{x_i}+b]\geq1-\xi_i\\ \xi_i\geq0 \end{array}\right. \end{array}\right.
⎩⎪⎪⎨⎪⎪⎧min21∥ω∥2+Ci=1∑Nξis.t.(subject to){s.t.(subject to)yi[ωTxi+b]≥1−ξiξi≥0
Considerate it as prime problem, we can get dual problem:
{
Θ
(
α
)
=
min
a
l
l
ω
,
ξ
i
,
b
{
1
2
∥
ω
∥
2
−
C
∑
i
=
1
N
ξ
i
+
∑
i
=
1
N
β
i
ξ
i
+
∑
i
=
1
N
α
i
[
1
+
ξ
i
−
y
i
ω
T
ϕ
(
x
i
)
−
y
i
b
]
}
s
.
t
.
α
i
≥
0
,
β
i
≥
0
\left\{\begin{array}{l} \Theta(\boldsymbol\alpha) = \min\limits_{all \ \omega, \ \xi_i, \ b} \{ \frac{1}{2}\|\boldsymbol{\omega}\|^2 - C \sum\limits_{i=1}^N\xi_i + \sum\limits_{i=1}^N\beta_i\xi_i + \sum\limits_{i=1}^N\alpha_i[1 + \xi_i - y_i\boldsymbol\omega^T \phi(x_i) - y_ib] \}\\ s.t. \ \boldsymbol\alpha_i \geq0,\ \boldsymbol\beta_i \geq 0\end{array}\right.
⎩⎨⎧Θ(α)=all ω, ξi, bmin{21∥ω∥2−Ci=1∑Nξi+i=1∑Nβiξi+i=1∑Nαi[1+ξi−yiωTϕ(xi)−yib]}s.t. αi≥0, βi≥0
ω
→
ω
,
ξ
i
,
b
;
α
→
α
i
,
β
i
,
β
→
ϕ
\boldsymbol{\omega\to\omega,\xi_i,b ;\ \alpha\to\alpha_i,\beta_i, \ \beta\to\phi}
ω→ω,ξi,b; α→αi,βi, β→ϕ
(
α
\alpha
α controlls the inequality,
β
\beta
β controlls the euqality)
Let’s start!
L
=
1
2
∥
ω
∥
2
−
C
∑
i
=
1
N
ξ
i
+
∑
i
=
1
N
β
i
ξ
i
+
∑
i
=
1
N
α
i
[
1
+
ξ
i
−
y
i
ω
T
ϕ
(
x
i
)
−
y
i
b
]
{
∂
L
∂
ω
=
0
∂
L
∂
ξ
i
=
0
∂
L
∂
b
=
0
L=\frac{1}{2}\|\boldsymbol{\omega}\|^2 - C \sum\limits_{i=1}^N\xi_i + \sum\limits_{i=1}^N\beta_i\xi_i + \sum\limits_{i=1}^N\alpha_i[1 + \xi_i - y_i\boldsymbol\omega^T \phi(x_i) - y_ib] \\ \left\{\begin{array}{l} \frac{\partial L}{\partial\omega}=0\\ \frac{\partial L}{\partial\xi_i}=0\\ \frac{\partial L}{\partial b}=0 \end{array}\right.
L=21∥ω∥2−Ci=1∑Nξi+i=1∑Nβiξi+i=1∑Nαi[1+ξi−yiωTϕ(xi)−yib]⎩⎨⎧∂ω∂L=0∂ξi∂L=0∂b∂L=0
Solve these three equation and we can get:
{
ω
=
∑
i
=
1
N
α
i
y
i
ϕ
(
x
i
)
α
i
+
β
i
=
C
∑
i
=
0
N
α
i
y
i
=
0
\left\{\begin{array}{l} \omega=\sum\limits_{i=1}^N\alpha_iy_i\phi(x_i)\\ \alpha_i + \beta_i = C\\ \sum\limits_{i=0}^N\alpha_iy_i=0 \end{array}\right.
⎩⎪⎪⎪⎪⎨⎪⎪⎪⎪⎧ω=i=1∑Nαiyiϕ(xi)αi+βi=Ci=0∑Nαiyi=0
By using these three euqations we can solve
Θ
(
α
)
\Theta(\alpha)
Θ(α). Remember what we want? We want to use
K
(
x
i
,
x
2
)
K(x_i,x_2)
K(xi,x2) to replace
ϕ
(
x
i
)
\phi(x_i)
ϕ(xi).
L
m
i
n
=
1
2
ω
T
ω
+
∑
i
=
1
N
α
i
−
∑
i
=
1
N
α
i
y
i
ω
T
ϕ
(
x
i
)
L
m
i
n
=
∑
i
=
1
N
α
i
+
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
ϕ
(
x
i
)
T
ϕ
(
x
j
)
−
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
ϕ
(
x
j
)
T
ϕ
(
x
i
)
a
t
t
e
n
t
i
o
n
!
K
(
x
i
,
x
j
)
=
ϕ
(
x
i
)
T
ϕ
(
x
i
)
L
m
i
n
=
∑
i
=
1
N
α
i
−
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
K
(
x
i
,
x
j
)
L_{min}=\frac{1}{2}\boldsymbol{\omega}^T\boldsymbol{\omega} + \sum\limits_{i=1}^N\alpha_i -\sum\limits_{i=1}^N\alpha_iy_i\boldsymbol\omega^T \phi(x_i) \\ L _{min}= \sum\limits_{i=1}^N\alpha_i + \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^N\alpha_i\alpha_jy_iy_j\phi(x_i)^T\phi(x_j) - \sum\limits_{i=1}^N\sum\limits_{j=1}^N\alpha_i\alpha_jy_iy_j\phi(x_j)^T\phi(x_i)\\ attention! \ K(x_i,x_j) = \phi(x_i)^T\phi(x_i)\\ L_{min} = \sum\limits_{i=1}^N\alpha_i - \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^N\alpha_i\alpha_jy_iy_jK(x_i,x_j)
Lmin=21ωTω+i=1∑Nαi−i=1∑NαiyiωTϕ(xi)Lmin=i=1∑Nαi+21i=1∑Nj=1∑Nαiαjyiyjϕ(xi)Tϕ(xj)−i=1∑Nj=1∑Nαiαjyiyjϕ(xj)Tϕ(xi)attention! K(xi,xj)=ϕ(xi)Tϕ(xi)Lmin=i=1∑Nαi−21i=1∑Nj=1∑NαiαjyiyjK(xi,xj)
Finally, we get our optimize target:
{
max
Θ
(
α
)
=
∑
i
=
1
N
α
i
−
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
K
(
x
i
,
x
j
)
s
.
t
.
0
≤
α
i
≤
C
,
∑
i
=
1
N
α
i
y
i
=
0
\left\{\begin{array}{l} \max\Theta(\boldsymbol\alpha) = \sum\limits_{i=1}^N\alpha_i - \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^N\alpha_i\alpha_jy_iy_jK(x_i,x_j)\\ s.t. \ 0\leq\boldsymbol\alpha_i \leq C, \ \sum\limits_{i=1}^N\alpha_i y_i = 0 \end{array}\right.
⎩⎪⎪⎨⎪⎪⎧maxΘ(α)=i=1∑Nαi−21i=1∑Nj=1∑NαiαjyiyjK(xi,xj)s.t. 0≤αi≤C, i=1∑Nαiyi=0
This is also a convex optimization problem, but we can solve it. (We know what K is!). There are different algorithms to slove convex optimization problem. We won’t talk about that here. SMO is one of the most common algorithms.
evaluation ω a n d b \omega \ and \ b ω and b
ω
\boldsymbol\omega
ω:
Remember how do we predict y?
if
ω
T
ϕ
(
x
i
)
+
b
≥
0
\boldsymbol\omega^T\phi(\boldsymbol{x_i})+b\geq0
ωTϕ(xi)+b≥0, then
y
=
1
y = 1
y=1
if
ω
T
ϕ
(
x
i
)
+
b
<
0
\boldsymbol\omega^T\phi(\boldsymbol{x_i})+b<0
ωTϕ(xi)+b<0, then
y
=
−
1
y = -1
y=−1
Actually we don’t need to know the exact value of
ω
\boldsymbol\omega
ω if we know
ω
T
ϕ
(
x
i
)
\boldsymbol\omega^T\phi(\boldsymbol{x_i})
ωTϕ(xi).
ω
T
ϕ
(
x
i
)
=
∑
i
=
1
N
α
i
y
i
K
(
x
i
,
x
j
)
\boldsymbol\omega^T\phi(\boldsymbol{x_i}) = \sum\limits_{i=1}^N\alpha_iy_iK(x_i,x_j)
ωTϕ(xi)=i=1∑NαiyiK(xi,xj)
b
b
b:
We need to use KKT condition:
∀
i
=
1
∼
K
,
α
=
0
o
r
g
(
ω
)
=
0
\forall i=1\sim K, \ \alpha=0 \ or \ g(\omega) = 0
∀i=1∼K, α=0 or g(ω)=0
β
i
=
0
,
α
i
=
0
o
r
ξ
i
=
0
,
1
+
ξ
i
−
y
i
ω
T
ϕ
(
x
i
)
−
y
i
b
=
0
\beta_i =0, \ \alpha_i = 0 \\ or \\ \xi_i = 0, \ 1 + \xi_i - y_i\boldsymbol\omega^T \phi(x_i) - y_ib = 0
βi=0, αi=0orξi=0, 1+ξi−yiωTϕ(xi)−yib=0
Choose a
0
<
α
i
<
C
0<\alpha_i<C
0<αi<C randomly, then:
{
ξ
i
=
0
1
+
ξ
i
−
y
i
ω
T
ϕ
(
x
i
)
−
y
i
b
=
0
b
=
1
−
y
i
∑
i
=
1
N
α
i
y
i
K
(
x
i
,
x
j
)
y
i
\left\{\begin{array}{l} \xi_i = 0\\ 1+\xi_i - y_i\boldsymbol\omega^T \phi(x_i) - y_ib = 0 \end{array}\right. \\ b = \frac{1 - y_i\sum\limits_{i=1}^N\alpha_iy_iK(x_i,x_j)}{y_i}
{ξi=01+ξi−yiωTϕ(xi)−yib=0b=yi1−yii=1∑NαiyiK(xi,xj)
In parctice, we often choose many
α
i
\alpha_i
αi and evaluate mean b as the final output parameter.
Summary
①train the model
- input { ( x i , y i ) } i = 1 ∼ N \{(x_i,y_i)\}_{i=1\sim N} {(xi,yi)}i=1∼N
- solve the optimization problem (SMO, etc.) { max Θ ( α ) = ∑ i = 1 N α i − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j K ( x i , x j ) s . t . 0 ≤ α i ≤ C , ∑ i = 1 N α i y i = 0 \left\{\begin{array}{l} \max\Theta(\boldsymbol\alpha) = \sum\limits_{i=1}^N\alpha_i - \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^N\alpha_i\alpha_jy_iy_jK(x_i,x_j)\\ s.t. \ 0\leq\boldsymbol\alpha_i \leq C, \ \sum\limits_{i=1}^N\alpha_i y_i = 0 \end{array}\right. ⎩⎪⎪⎨⎪⎪⎧maxΘ(α)=i=1∑Nαi−21i=1∑Nj=1∑NαiαjyiyjK(xi,xj)s.t. 0≤αi≤C, i=1∑Nαiyi=0
- evaluate b
②test model
- input x
{ i f ∑ i = 1 N α i y i K ( x i , x j ) + b ≥ 0 , t h e n y = 1 i f ∑ i = 1 N α i y i K ( x i , x j ) + b < 0 , t h e n y = − 1 \left\{\begin{array}{l} if \ \sum\limits_{i=1}^N\alpha_iy_iK(x_i,x_j)+b\geq0, then \ y = 1\\ if \ \sum\limits_{i=1}^N\alpha_iy_iK(x_i,x_j)+b<0, then \ y = -1 \end{array}\right. ⎩⎪⎪⎨⎪⎪⎧if i=1∑NαiyiK(xi,xj)+b≥0,then y=1if i=1∑NαiyiK(xi,xj)+b<0,then y=−1
- ouput y