1.目标函数:
若样本数据集线性不可分,则增加松弛因子
ξ
i
≥
0
\xi_{i} \geq 0
ξi≥0,使函数间隔加上松弛变量大于等于1,这样,约束条件变为
y
i
(
w
⋅
x
i
+
b
)
≥
1
−
ξ
i
y_{i}\left(w \cdot x_{i}+b\right) \geq 1-\xi_{i}
yi(w⋅xi+b)≥1−ξi
目标函数为:
min
w
,
b
1
2
∥
w
∥
2
+
C
∑
i
=
1
N
ξ
i
\min _{w, b} \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{N} \xi_{i}
w,bmin21∥w∥2+Ci=1∑Nξi
s.t.
y
i
(
w
⋅
x
i
+
b
)
≥
1
−
ξ
i
,
i
=
1
,
2
,
⋯
,
n
\text { s.t. } \quad y_{i}\left(w \cdot x_{i}+b\right) \geq 1-\xi_{i}, \quad i=1,2, \cdots, n
s.t. yi(w⋅xi+b)≥1−ξi,i=1,2,⋯,n
2.带松弛因子的SVM拉格朗日函数及其计算:
L
(
w
,
b
,
ξ
,
α
,
μ
)
≡
1
2
∥
w
∥
2
+
C
∑
i
=
1
n
ξ
i
−
∑
i
=
1
n
α
i
(
y
i
(
w
⋅
x
i
+
b
)
−
1
+
ξ
i
)
−
∑
i
=
1
n
μ
i
ξ
i
L(w, b, \xi, \alpha, \mu) \equiv \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{n} \xi_{i}-\sum_{i=1}^{n} \alpha_{i}\left(y_{i}\left(w \cdot x_{i}+b\right)-1+\xi_{i}\right)-\sum_{i=1}^{n} \mu_{i} \xi_{i}
L(w,b,ξ,α,μ)≡21∥w∥2+Ci=1∑nξi−i=1∑nαi(yi(w⋅xi+b)−1+ξi)−i=1∑nμiξi
对
w
,
b
,
ξ
\mathrm{w}, \mathrm{b}, \xi
w,b,ξ分别求偏导
∂
L
∂
w
=
0
⇒
w
=
∑
i
=
1
n
α
i
y
i
ϕ
(
x
n
)
∂
L
∂
b
=
0
⇒
0
=
∑
i
=
1
n
α
i
y
i
∂
L
∂
ξ
i
=
0
⇒
C
−
α
i
−
μ
i
=
0
\begin{aligned} &\frac{\partial L}{\partial w}=0 \Rightarrow w=\sum_{i=1}^{n} \alpha_{i} y_{i} \phi\left(x_{n}\right)\\ &\frac{\partial L}{\partial b}=0 \Rightarrow 0=\sum_{i=1}^{n} \alpha_{i} y_{i}\\ &\frac{\partial L}{\partial \xi_{i}}=0 \Rightarrow C-\alpha_{i}-\mu_{i}=0 \end{aligned}
∂w∂L=0⇒w=i=1∑nαiyiϕ(xn)∂b∂L=0⇒0=i=1∑nαiyi∂ξi∂L=0⇒C−αi−μi=0
将三式带入L中得到
min
w
,
b
,
ξ
L
(
w
,
b
,
ξ
,
α
,
μ
)
=
−
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
(
x
i
⋅
x
j
)
+
∑
i
=
1
n
α
i
\min _{w, b, \xi} L(w, b, \xi, \alpha, \mu)=-\frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)+\sum_{i=1}^{n} \alpha_{i}
w,b,ξminL(w,b,ξ,α,μ)=−21i=1∑nj=1∑nαiαjyiyj(xi⋅xj)+i=1∑nαi
对上式关于
α
\alpha
α的极大值,得到
max
a
−
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
(
x
i
⋅
x
j
)
+
∑
i
=
1
n
α
i
s.t.
∑
i
=
1
n
α
i
y
i
=
0
C
−
α
i
−
μ
i
=
0
α
i
≥
0
μ
i
≥
0
,
i
=
1
,
2
,
…
,
n
0
≤
α
i
≤
C
\begin{aligned} &\max _{a}-\frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)+\sum_{i=1}^{n} \alpha_{i}\\ &\text { s.t. } \sum_{i=1}^{n} \alpha_{i} y_{i}=0\\ &\begin{array}{l} C-\alpha_{i}-\mu_{i}=0 \\ \alpha_{i} \geq 0 \\ \mu_{i} \geq 0, \quad i=1,2, \ldots, n \end{array} \quad 0 \leq \alpha_{i} \leq C \end{aligned}
amax−21i=1∑nj=1∑nαiαjyiyj(xi⋅xj)+i=1∑nαi s.t. i=1∑nαiyi=0C−αi−μi=0αi≥0μi≥0,i=1,2,…,n0≤αi≤C
整理得到对偶问题:
min
α
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
(
x
i
⋅
x
j
)
−
∑
i
=
1
n
α
i
\min _{\alpha} \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)-\sum_{i=1}^{n} \alpha_{i}
αmin21i=1∑nj=1∑nαiαjyiyj(xi⋅xj)−i=1∑nαi
s.t.
∑
i
=
1
n
α
i
y
i
=
0
0
≤
α
i
≤
C
,
i
=
1
,
2
,
…
,
n
\begin{aligned} &\text { s.t. } \sum_{i=1}^{n} \alpha_{i} y_{i}=0\\ &0 \leq \alpha_{i} \leq C, \quad i=1,2, \dots, n \end{aligned}
s.t. i=1∑nαiyi=00≤αi≤C,i=1,2,…,n
构造并求解约束最优化问题:
min
α
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
(
x
i
⋅
x
j
)
−
∑
i
=
1
n
α
i
\min _{\alpha} \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)-\sum_{i=1}^{n} \alpha_{i}
αmin21i=1∑nj=1∑nαiαjyiyj(xi⋅xj)−i=1∑nαi
s.t.
∑
i
=
1
n
α
i
y
i
=
0
0
≤
α
i
≤
C
,
i
=
1
,
2
,
…
,
n
\begin{array}{l} \text { s.t. } \sum_{i=1}^{n} \alpha_{i} y_{i}=0 \\ \qquad 0 \leq \alpha_{i} \leq C, \quad i=1,2, \ldots, n \end{array}
s.t. ∑i=1nαiyi=00≤αi≤C,i=1,2,…,n
求解得到最优解
α
∗
\alpha^*
α∗
计算
w
∗
=
∑
i
=
1
n
α
i
∗
y
i
x
i
w^{*}=\sum_{i=1}^{n} \alpha_{i}^{*} y_{i} x_{i}
w∗=i=1∑nαi∗yixi
b
∗
=
max
i
:
y
i
=
−
1
w
∗
⋅
x
i
+
min
i
:
y
i
=
1
w
∗
⋅
x
i
2
b^{*}=\frac{\max _{i: y_{i}=-1} w^{*} \cdot x_{i}+\min _{i: y_{i}=1} w^{*} \cdot x_{i}}{2}
b∗=2maxi:yi=−1w∗⋅xi+mini:yi=1w∗⋅xi
求得分离超平面
w
∗
x
+
b
∗
=
0
w^{*} x+b^{*}=0
w∗x+b∗=0
分类决策函数
f
(
x
)
=
sign
(
w
∗
x
+
b
∗
)
f(x)=\operatorname{sign}\left(w^{*} x+b^{*}\right)
f(x)=sign(w∗x+b∗)
3.损失函数分析
绿色:0/1损失;蓝色:SVM Hinge损失函数;红色:Logistic损失函数。
Logistic损失函数
4.核函数
可以使用核函数,将原始输入空间映射到新的特征空间,从而可以使得原本线性不可分的样本。常见的有三种核函数:
多项式核函数:
κ
(
x
1
,
x
2
)
=
(
x
1
⋅
x
2
+
c
)
d
\kappa\left(x_{1}, x_{2}\right)=\left(x_{1} \cdot x_{2}+c\right)^{d}
κ(x1,x2)=(x1⋅x2+c)d
高斯核RBF函数:
κ
(
x
1
,
x
2
)
=
exp
(
−
γ
⋅
∥
x
1
−
x
2
∥
2
)
\kappa\left(x_{1}, x_{2}\right)=\exp \left(-\gamma \cdot\left\|x_{1}-x_{2}\right\|^{2}\right)
κ(x1,x2)=exp(−γ⋅∥x1−x2∥2)
Sigmoid核函数:
κ
(
x
1
,
x
2
)
=
tanh
(
x
1
⋅
x
2
+
c
)
\kappa\left(x_{1}, x_{2}\right)=\tanh \left(x_{1} \cdot x_{2}+c\right)
κ(x1,x2)=tanh(x1⋅x2+c)
(1)多项式核函数:
κ
(
x
⃗
,
y
⃗
)
=
(
x
⃗
⋅
y
⃗
)
2
⇒
(
∑
i
=
1
n
x
i
y
i
)
2
=
∑
i
=
1
n
∑
j
=
1
n
x
i
x
j
y
i
y
j
=
∑
i
=
1
n
∑
j
=
1
n
(
x
i
x
j
)
(
y
i
y
j
)
\begin{aligned} &\kappa(\vec{x}, \vec{y})=(\vec{x} \cdot \vec{y})^{2}\\ &\Rightarrow\left(\sum_{i=1}^{n} x_{i} y_{i}\right)^{2}\\ &=\sum_{i=1}^{n} \sum_{j=1}^{n} x_{i} x_{j} y_{i} y_{j}\\ &=\sum_{i=1}^{n} \sum_{j=1}^{n}\left(x_{i} x_{j}\right)\left(y_{i} y_{j}\right) \end{aligned}
κ(x,y)=(x⋅y)2⇒(i=1∑nxiyi)2=i=1∑nj=1∑nxixjyiyj=i=1∑nj=1∑n(xixj)(yiyj)
特别的,n=3,即为:
Φ
(
x
⃗
)
=
(
x
1
x
1
x
1
x
2
x
1
x
3
x
2
x
1
x
2
x
2
x
2
x
3
x
3
x
1
x
3
x
2
x
3
x
3
)
\Phi(\vec{x})=\left(\begin{array}{l} x_{1} x_{1} \\ x_{1} x_{2} \\ x_{1} x_{3} \\ x_{2} x_{1} \\ x_{2} x_{2} \\ x_{2} x_{3} \\ x_{3} x_{1} \\ x_{3} x_{2} \\ x_{3} x_{3} \end{array}\right)
Φ(x)=⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎛x1x1x1x2x1x3x2x1x2x2x2x3x3x1x3x2x3x3⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎞
另外,
κ
(
x
⃗
,
y
⃗
)
=
(
x
⃗
⋅
y
⃗
+
c
)
2
⇒
(
x
⃗
⋅
y
⃗
)
2
+
2
c
x
⃗
⋅
y
⃗
+
c
2
=
∑
i
=
1
n
∑
j
=
1
n
(
x
i
x
j
)
(
y
i
y
j
)
+
∑
i
=
1
n
(
2
c
x
i
⋅
2
c
x
j
)
+
c
2
\begin{array}{l} \kappa(\vec{x}, \vec{y})=(\vec{x} \cdot \vec{y}+c)^{2} \\ \Rightarrow(\vec{x} \cdot \vec{y})^{2}+2 c \vec{x} \cdot \vec{y}+c^{2} \\ =\sum_{i=1}^{n} \sum_{j=1}^{n}\left(x_{i} x_{j}\right)\left(y_{i} y_{j}\right)+\sum_{i=1}^{n}\left(\sqrt{2 c} x_{i} \cdot \sqrt{2 c} x_{j}\right)+c^{2} \end{array}
κ(x,y)=(x⋅y+c)2⇒(x⋅y)2+2cx⋅y+c2=∑i=1n∑j=1n(xixj)(yiyj)+∑i=1n(2cxi⋅2cxj)+c2
特别的,n=3,即为:
Φ
(
x
⃗
)
=
(
x
1
x
1
x
1
x
2
x
1
x
3
x
2
x
1
x
2
x
2
x
2
x
3
x
3
x
1
x
3
x
2
x
3
x
3
2
c
x
1
2
c
x
2
2
c
x
3
c
)
\Phi(\vec{x})=\left(\begin{array}{c} x_{1} x_{1} \\ x_{1} x_{2} \\ x_{1} x_{3} \\ x_{2} x_{1} \\ x_{2} x_{2} \\ x_{2} x_{3} \\ x_{3} x_{1} \\ x_{3} x_{2} \\ x_{3} x_{3} \\ \sqrt{2 c} x_{1} \\ \sqrt{2 c} x_{2} \\ \sqrt{2 c} x_{3} \\ c \end{array}\right)
Φ(x)=⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎛x1x1x1x2x1x3x2x1x2x2x2x3x3x1x3x2x3x32cx12cx22cx3c⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎞
核函数映射:
(2)高斯核
κ
(
x
1
,
x
2
)
=
e
∣
x
1
−
x
2
∣
2
2
σ
2
=
e
−
(
x
1
−
x
2
)
2
2
σ
2
=
e
−
x
1
2
+
x
2
2
−
2
x
1
x
2
2
σ
2
=
e
−
x
1
2
+
x
2
2
2
σ
2
⋅
e
x
1
x
2
σ
2
=
e
−
x
1
2
+
x
2
2
2
σ
2
⋅
(
1
+
1
σ
2
⋅
x
1
x
2
1
!
+
(
1
σ
2
)
2
⋅
(
x
1
x
2
)
2
2
!
+
(
1
σ
2
)
3
⋅
(
x
1
x
2
)
3
3
!
+
⋯
+
(
1
σ
2
)
n
⋅
(
x
1
x
2
)
n
n
!
+
⋯
)
=
e
x
2
+
x
2
2
σ
2
⋅
(
1
⋅
1
+
1
1
!
x
1
σ
⋅
x
2
σ
+
1
2
!
⋅
x
1
2
σ
2
⋅
x
2
2
σ
2
+
1
3
!
⋅
x
1
3
σ
3
⋅
x
2
3
σ
3
+
⋯
+
1
n
!
⋅
x
1
n
σ
n
⋅
x
2
n
σ
n
+
⋯
)
=
Φ
(
x
1
)
T
⋅
Φ
(
x
2
)
\begin{aligned} &\kappa\left(x_{1}, x_{2}\right)=e^{\frac{\left|x_{1}-x_{2}\right|^{2}}{2 \sigma^{2}}}=e^{-\frac{\left(x_{1}-x_{2}\right)^{2}}{2 \sigma^{2}}}=e^{-\frac{x_{1}^{2}+x_{2}^{2}-2 x_{1} x_{2}}{2 \sigma^{2}}}=e^{-\frac{x_{1}^{2}+x_{2}^{2}}{2 \sigma^{2}}} \cdot e^{\frac{x_{1} x_{2}}{\sigma^{2}}}\\ &\begin{array}{l} =e^{-\frac{x_{1}^{2}+x_{2}^{2}}{2 \sigma^{2}}} \cdot\left(1+\frac{1}{\sigma^{2}} \cdot \frac{x_{1} x_{2}}{1 !}+\left(\frac{1}{\sigma^{2}}\right)^{2} \cdot \frac{\left(x_{1} x_{2}\right)^{2}}{2 !}+\left(\frac{1}{\sigma^{2}}\right)^{3} \cdot \frac{\left(x_{1} x_{2}\right)^{3}}{3 !}+\cdots+\left(\frac{1}{\sigma^{2}}\right)^{n} \cdot \frac{\left(x_{1} x_{2}\right)^{n}}{n !}+\cdots\right) \\ =e^{\frac{x^{2}+x^{2}}{2 \sigma^{2}}} \cdot\left(1 \cdot 1+\frac{1}{1 !} \frac{x_{1}}{\sigma} \cdot \frac{x_{2}}{\sigma}+\frac{1}{2 !} \cdot \frac{x_{1}^{2}}{\sigma^{2}} \cdot \frac{x_{2}^{2}}{\sigma^{2}}+\frac{1}{3 !} \cdot \frac{x_{1}^{3}}{\sigma^{3}} \cdot \frac{x_{2}^{3}}{\sigma^{3}}+\cdots+\frac{1}{n !} \cdot \frac{x_{1}^{n}}{\sigma^{n}} \cdot \frac{x_{2}^{n}}{\sigma^{n}}+\cdots\right) \\ =\Phi\left(x_{1}\right)^{T} \cdot \Phi\left(x_{2}\right) \end{array} \end{aligned}
κ(x1,x2)=e2σ2∣x1−x2∣2=e−2σ2(x1−x2)2=e−2σ2x12+x22−2x1x2=e−2σ2x12+x22⋅eσ2x1x2=e−2σ2x12+x22⋅(1+σ21⋅1!x1x2+(σ21)2⋅2!(x1x2)2+(σ21)3⋅3!(x1x2)3+⋯+(σ21)n⋅n!(x1x2)n+⋯)=e2σ2x2+x2⋅(1⋅1+1!1σx1⋅σx2+2!1⋅σ2x12⋅σ2x22+3!1⋅σ3x13⋅σ3x23+⋯+n!1⋅σnx1n⋅σnx2n+⋯)=Φ(x1)T⋅Φ(x2)
其中,
Φ
(
x
)
=
e
−
x
2
2
σ
2
(
1
,
1
1
!
x
σ
,
1
2
!
x
2
σ
2
,
1
3
!
x
3
σ
3
,
⋯
,
1
n
!
x
n
σ
n
,
⋯
)
\Phi(x)=e^{-\frac{x^{2}}{2 \sigma^{2}}}\left(1, \sqrt{\frac{1}{1 !}} \frac{x}{\sigma}, \sqrt{\frac{1}{2 !}} \frac{x^{2}}{\sigma^{2}}, \sqrt{\frac{1}{3 !}} \frac{x^{3}}{\sigma^{3}}, \cdots, \sqrt{\frac{1}{n !}} \frac{x^{n}}{\sigma^{n}}, \cdots\right)
Φ(x)=e−2σ2x2(1,1!1σx,2!1σ2x2,3!1σ3x3,⋯,n!1σnxn,⋯)可以知道高斯核是无穷维。
5.总结和思考
SVM可以用作多分类,1 vs 1 / 1 vs rest
SVM和Logistic分类比较:SVM直接输出类别,Logistic输出属于哪个类别的后验概率。
SVM用于回归问题:SVR。