SMO序列最小最优化算法中关于解析方法的证明
在SMO这篇文档中,我们已经详细介绍了SMO从0实现的详细步骤,当时在学习到生成
α
2
n
e
w
,
u
n
c
\alpha_2^{new,unc}
α2new,unc时,我们只给出了定理内容,并没有介绍定理的详细证明。
即如下定理:
在如下最优化问题中
min α 1 , α 2 W ( α 1 , α 2 ) = 1 2 K 11 α 1 2 + 1 2 K 22 α 2 2 + y 1 y 2 K 12 α 1 α 2 − ( α 1 + α 2 ) + y 1 α 1 ∑ i = 3 N y i α i K i 1 + y 2 α 2 ∑ i = 3 N y i α i K i 2 s . t . α 1 y 1 + α 2 y 2 = − ∑ i = 3 N y i α i = k 0 ≤ α i ≤ C , i = 1 , 2 \min\limits_{\alpha_1,\alpha_2}\,\,\,\,\,\,\,W(\alpha_1,\alpha_2)=\frac{1}{2}K_{11}\alpha_1^2+\frac{1}{2}K_{22}\alpha_2^2+y_1y_2K_{12}\alpha_1\alpha_2-\\(\alpha_1+\alpha_2)+y_1\alpha_1\sum\limits_{i=3}^N{y_i\alpha_iK_{i1}}+y_2\alpha_2\sum\limits_{i=3}^Ny_i\alpha_iK_{i2}\\ s.t.\,\,\,\,\,\alpha_1y_1+\alpha_2y_2=-\sum\limits_{i=3}^N{y_i\alpha_i}=k\\ 0\le\alpha_i\le C,\,\,\,\,\,i=1,2 α1,α2minW(α1,α2)=21K11α12+21K22α22+y1y2K12α1α2−(α1+α2)+y1α1i=3∑NyiαiKi1+y2α2i=3∑NyiαiKi2s.t.α1y1+α2y2=−i=3∑Nyiαi=k0≤αi≤C,i=1,2
沿着未经剪辑时的解是:α 2 n e w , u n c = α 2 o l d + y 2 ( E 1 − E 2 ) η \alpha_2^{new,unc}=\alpha_2^{old}+\frac{y_2(E_1-E_2)}{\eta} α2new,unc=α2old+ηy2(E1−E2) 其中:
η = K 11 + K 22 − 2 K 12 = ∣ ∣ ϕ ( x 1 ) − ϕ ( x 2 ) ∣ ∣ 2 \eta=K_{11}+K_{22}-2K_{12}=||\phi(x_1)-\phi(x_2)||^2 η=K11+K22−2K12=∣∣ϕ(x1)−ϕ(x2)∣∣2
这里只给出了定理的结果,下面将针对这个定理给出详细的证明。
证明:
首先引进记号:
v
i
=
∑
j
=
3
N
α
j
y
j
K
(
x
i
,
x
j
)
=
g
(
x
i
)
−
∑
j
=
1
2
α
j
y
j
K
(
x
i
,
x
j
)
−
b
{v_i} = \sum\limits_{j = 3}^N {{\alpha _j}{y_j}K({x_i},{x_j}) = g({x_i}) - \sum\limits_{j = 1}^2 {{\alpha _j}{y_j}K({x_i},{x_j}) - b} }
vi=j=3∑NαjyjK(xi,xj)=g(xi)−j=1∑2αjyjK(xi,xj)−b
其中
g
(
x
i
)
=
∑
j
=
1
N
α
j
y
j
K
(
x
i
,
x
j
)
+
b
g(x_i)=\sum\limits_{j=1}^N\alpha_jy_jK(x_i,x_j)+b
g(xi)=j=1∑NαjyjK(xi,xj)+b,接着可以把
W
(
α
1
,
α
2
)
W(\alpha_1,\alpha_2)
W(α1,α2)进行代换得到:
min
α
1
,
α
2
W
(
α
1
,
α
2
)
=
1
2
K
11
α
1
2
+
1
2
K
22
α
2
2
+
y
1
y
2
K
12
α
1
α
2
−
(
α
1
+
α
2
)
+
y
1
α
1
v
1
+
y
2
α
2
v
2
\min\limits_{\alpha_1,\alpha_2}\,\,\,\,\,\,\,W(\alpha_1,\alpha_2)=\frac{1}{2}K_{11}\alpha_1^2+\frac{1}{2}K_{22}\alpha_2^2+y_1y_2K_{12}\alpha_1\alpha_2-\\(\alpha_1+\alpha_2)+y_1\alpha_1v_1+y_2\alpha_2v_2
α1,α2minW(α1,α2)=21K11α12+21K22α22+y1y2K12α1α2−(α1+α2)+y1α1v1+y2α2v2
既然我们的目的就是求
α
2
n
e
w
\alpha_2^{new}
α2new那么我们可以借助等式将目标函数化成只有
α
2
n
e
w
\alpha_2^{new}
α2new的函数。因此在约束条件中有:
α
1
y
1
=
k
−
α
2
y
2
\alpha_1y_1=k-\alpha_2y_2
α1y1=k−α2y2及
y
i
2
=
1
y_i^2=1
yi2=1,于是有
α
1
\alpha_1
α1:
α
1
=
(
k
−
α
2
y
2
)
y
1
\alpha_1=(k-\alpha_2y_2)y_1
α1=(k−α2y2)y1
然后将
α
1
\alpha_1
α1的式子带入
w
(
α
1
,
α
2
)
w(\alpha_1,\alpha_2)
w(α1,α2)中得到:
W
(
α
2
)
=
1
2
K
11
(
k
−
α
2
y
2
)
2
+
1
2
K
22
α
2
2
+
y
2
K
12
(
k
−
α
2
y
2
)
α
2
−
(
(
k
−
α
2
y
2
)
y
1
+
α
2
)
+
(
k
−
α
2
y
2
)
v
1
+
y
2
α
2
v
2
W(\alpha_2)=\frac{1}{2}K_{11}(k-\alpha_2y_2)^2+\frac{1}{2}K_{22}\alpha_2^2+y_2K_{12}(k-\alpha_2y_2)\alpha_2-\\((k-\alpha_2y_2)y_1+\alpha_2)+(k-\alpha_2y_2)v_1+y_2\alpha_2v_2
W(α2)=21K11(k−α2y2)2+21K22α22+y2K12(k−α2y2)α2−((k−α2y2)y1+α2)+(k−α2y2)v1+y2α2v2
上述式子中只有
α
2
\alpha_2
α2这一个变量,于是对其求导并令其等于0得到:
∂
W
∂
α
2
=
K
11
α
2
+
K
22
α
2
−
2
K
12
α
2
−
K
11
k
y
2
+
K
12
k
y
2
+
y
1
y
2
−
1
−
v
1
y
2
+
y
2
v
2
=
0
\frac{{\partial W}}{{\partial {\alpha _2}}} = {K_{11}}{\alpha _2} + {K_{22}}{\alpha _2} - 2{K_{12}}{\alpha _2} - {K_{11}}k{y_2} + {K_{12}}k{y_2} + {y_1}{y_2} - 1 - {v_1}{y_2} + {y_2}{v_2}=0
∂α2∂W=K11α2+K22α2−2K12α2−K11ky2+K12ky2+y1y2−1−v1y2+y2v2=0
(
K
11
+
K
22
−
2
K
12
)
α
2
=
y
2
(
y
2
−
y
1
+
k
K
11
−
k
K
12
+
v
1
−
v
2
)
=
y
2
[
y
2
−
y
1
+
k
K
11
−
k
K
12
+
(
g
(
x
1
)
−
∑
j
=
1
2
y
j
α
j
K
1
j
−
b
)
−
(
g
(
x
2
)
−
∑
j
=
1
2
y
j
α
j
K
2
j
−
b
)
]
\begin{aligned} ({K_{11}} + {K_{22}} - 2{K_{12}}){\alpha _2} &= {y_2}({y_2} - {y_1} + k{K_{11}} - k{K_{12}} + {v_1} - {v_2})\\ &={y_2}[{y_2} - {y_1} + k{K_{11}} - k{K_{12}} + (g({x_1}) - \sum\limits_{j = 1}^2 {{y_j}{\alpha _j}{K_{1j}} - b} ) - (g({x_2}) - \sum\limits_{j = 1}^2 {{y_j}{\alpha _j}{K_{2j}} - b} )] \end{aligned}
(K11+K22−2K12)α2=y2(y2−y1+kK11−kK12+v1−v2)=y2[y2−y1+kK11−kK12+(g(x1)−j=1∑2yjαjK1j−b)−(g(x2)−j=1∑2yjαjK2j−b)]
再将
k
=
α
1
o
l
d
y
1
+
α
2
o
l
d
y
2
k=\alpha_1^{old}y_1+\alpha_2^{old}y_2
k=α1oldy1+α2oldy2带入得到:
(
K
11
+
k
22
−
2
K
12
)
α
2
n
e
w
,
u
n
c
=
y
2
(
(
K
11
+
K
22
−
2
K
12
)
α
2
o
l
d
y
2
+
y
2
−
y
1
+
g
(
x
1
)
−
g
(
x
2
)
)
=
(
K
11
+
K
22
−
2
K
12
)
α
2
o
l
d
+
y
2
(
E
1
−
E
2
)
\begin{aligned} ({K_{11}} + {k_{22}} - 2{K_{12}})\alpha _2^{new,unc} &= {y_2}(({K_{11}} + {K_{22}} - 2{K_{12}})\alpha _2^{old}{y_2} + {y_2} - {y_1} + g({x_1}) - g({x_2}))\\ &=({K_{11}} + {K_{22}} - 2{K_{12}})\alpha _2^{old} + {y_2}({E_1} - {E_2}) \end{aligned}
(K11+k22−2K12)α2new,unc=y2((K11+K22−2K12)α2oldy2+y2−y1+g(x1)−g(x2))=(K11+K22−2K12)α2old+y2(E1−E2)
设
η
=
K
11
+
K
22
−
2
K
12
\eta = {K_{11}} + {K_{22}} - 2{K_{12}}
η=K11+K22−2K12带入,于是得到:
α
2
n
e
w
,
u
n
c
=
α
2
o
l
d
+
y
2
(
E
1
−
E
2
)
η
\alpha_2^{new,unc}=\alpha_2^{old}+\frac{y_2(E_1-E_2)}{\eta}
α2new,unc=α2old+ηy2(E1−E2)
到这里所有的证明过程已经结束,其实细看证明步骤并不难,就是一些代换和无约束求导求极值的方法,在得到解析方法的证明后,我们就可以放心使用定理了。