本课程来自深度之眼,部分截图来自课程视频以及李航老师的《统计学习方法》第二版。
公式输入请参考: 在线Latex公式
定理7.1(最大间隔分离超平面的存在唯一性)若训练数据集T线性可分,则可将训练数据集中的样本点完全正确分开的最大间隔分离超平面存在且唯一。
存在性
min
w
,
b
1
2
∣
∣
w
∣
∣
2
s.t.
y
i
(
w
⋅
x
i
+
b
)
−
1
≥
0
i
=
1
,
2
,
⋯
,
N
\underset{w,b}{\min}\cfrac{1}{2}||w||^2 \\ \text{s.t.}\quad y_i\left(w\cdot x_i+b\right)-1\ge0\quad i = 1,2,\cdots,N
w,bmin21∣∣w∣∣2s.t.yi(w⋅xi+b)−1≥0i=1,2,⋯,N
由于训练数据集线性可分,所以算法一定存在可行解。又由于目标函数又下界,所以最优化问题必有解。由于训练数据中既有正类点又有负类点,所以
(
w
,
b
)
=
(
0
,
b
)
(w,b)=(0,b)
(w,b)=(0,b)不是最优化的可行解,因此最优解必定满足
w
w
w不等于0,由此可知分离超平面的存在性。
唯一性
假设问题存在两个最优解
(
w
1
∗
,
b
1
∗
)
,
(
w
2
∗
,
b
2
∗
)
(w_1^*,b_1^*),(w_2^*,b_2^*)
(w1∗,b1∗),(w2∗,b2∗),根据条件中的式子可知,两个最优解均能使得
1
2
∣
∣
w
∣
∣
2
\cfrac{1}{2}||w||^2
21∣∣w∣∣2最小,因此有:
1
2
∣
∣
w
1
∗
∣
∣
2
=
1
2
∣
∣
w
2
∗
∣
∣
2
=
c
/
2
\cfrac{1}{2}||w_1^*||^2=\cfrac{1}{2}||w_2^*||^2=c/2
21∣∣w1∗∣∣2=21∣∣w2∗∣∣2=c/2
因此我们如果假设最小值是常数
c
/
2
c/2
c/2,则有:
∣
∣
w
1
∗
∣
∣
=
∣
∣
w
2
∗
∣
∣
=
c
||w_1^*||=||w_2^*||=c
∣∣w1∗∣∣=∣∣w2∗∣∣=c
令
w
=
w
1
∗
+
w
2
∗
2
,
b
=
b
1
∗
+
b
2
∗
2
w=\cfrac{w_1^*+w_2^*}{2},b=\cfrac{b_1^*+b_2^*}{2}
w=2w1∗+w2∗,b=2b1∗+b2∗,从下面推导可以知道
(
w
,
b
)
(w,b)
(w,b)是问题的可行解:
y
i
(
w
⋅
x
+
b
)
−
1
=
y
i
(
w
1
∗
+
w
2
∗
2
⋅
x
+
b
1
∗
+
b
2
∗
2
)
−
1
=
1
2
[
y
i
(
(
w
1
∗
+
w
2
∗
)
⋅
x
+
b
1
∗
+
b
2
∗
)
−
2
]
=
1
2
(
y
i
w
1
∗
x
+
y
i
b
1
∗
−
1
+
y
i
w
2
∗
x
+
y
i
b
2
∗
−
1
)
y_i\left(w\cdot x+b\right)-1\\ =y_i\left(\cfrac{w_1^*+w_2^*}{2}\cdot x+\cfrac{b_1^*+b_2^*}{2}\right)-1\\ =\cfrac{1}{2}\left[y_i\left((w_1^*+w_2^*)\cdot x+b_1^*+b_2^*\right)-2\right]\\ =\cfrac{1}{2}(y_iw_1^*x+y_ib_1^*-1+y_iw_2^*x+y_ib_2^*-1)
yi(w⋅x+b)−1=yi(2w1∗+w2∗⋅x+2b1∗+b2∗)−1=21[yi((w1∗+w2∗)⋅x+b1∗+b2∗)−2]=21(yiw1∗x+yib1∗−1+yiw2∗x+yib2∗−1)
根据以上推导结果我们可以得到如下关系:
c
≤
∣
∣
w
∣
∣
=
∣
∣
w
1
∗
+
w
2
∗
2
∣
∣
=
∣
∣
1
2
w
1
∗
+
1
2
w
2
∗
∣
∣
≤
∣
∣
1
2
w
1
∗
∣
∣
+
∣
∣
1
2
w
2
∗
∣
∣
=
c
c\le||w||=||\cfrac{w_1^*+w_2^*}{2}||=||\cfrac{1}{2}w_1^*+\cfrac{1}{2}w_2^*||\le||\cfrac{1}{2}w_1^*||+||\cfrac{1}{2}w_2^*||=c
c≤∣∣w∣∣=∣∣2w1∗+w2∗∣∣=∣∣21w1∗+21w2∗∣∣≤∣∣21w1∗∣∣+∣∣21w2∗∣∣=c
因此上式中的等号成立:
∣
∣
w
∣
∣
=
∣
∣
1
2
w
1
∗
∣
∣
+
∣
∣
1
2
w
2
∗
∣
∣
||w||=||\cfrac{1}{2}w_1^*||+||\cfrac{1}{2}w_2^*||
∣∣w∣∣=∣∣21w1∗∣∣+∣∣21w2∗∣∣
向量存在以上关系,可以知道
w
1
∗
,
w
2
∗
w_1^*,w_2^*
w1∗,w2∗必然同向,因此可以写成:
w
1
∗
=
λ
w
2
∗
,
∣
λ
∣
=
1
w_1^*=\lambda w_2^*,|\lambda|=1
w1∗=λw2∗,∣λ∣=1
如果
λ
=
−
1
\lambda=-1
λ=−1,则
w
=
w
1
∗
+
w
2
∗
2
=
0
w=\cfrac{w_1^*+w_2^*}{2}=0
w=2w1∗+w2∗=0,
(
w
,
b
)
(w,b)
(w,b)不是问题可行解,矛盾。
因此必有
λ
=
1
\lambda=1
λ=1:
w
1
∗
=
w
2
∗
w_1^*=w_2^*
w1∗=w2∗
两个最优解
(
w
1
∗
,
b
1
∗
)
,
(
w
2
∗
,
b
2
∗
)
(w_1^*,b_1^*),(w_2^*,b_2^*)
(w1∗,b1∗),(w2∗,b2∗)可以写为
(
w
∗
,
b
1
∗
)
,
(
w
∗
,
b
2
∗
)
(w^*,b_1^*),(w^*,b_2^*)
(w∗,b1∗),(w∗,b2∗)。再证
b
1
∗
=
b
2
∗
b_1^*=b_2^*
b1∗=b2∗
假设
x
1
′
,
x
2
′
x_1',x_2'
x1′,x2′是集合
{
x
i
∣
y
i
=
−
1
}
\{x_i|y_i=-1\}
{xi∣yi=−1}中对应
(
w
∗
,
b
1
∗
)
,
(
w
∗
,
b
2
∗
)
(w^*,b_1^*),(w^*,b_2^*)
(w∗,b1∗),(w∗,b2∗)使得问题的不等式等号成立的点;
x
1
′
′
,
x
2
′
′
x_1'',x_2''
x1′′,x2′′是集合
{
x
i
∣
y
i
=
1
}
\{x_i|y_i=1\}
{xi∣yi=1}中对应
(
w
∗
,
b
1
∗
)
,
(
w
∗
,
b
2
∗
)
(w^*,b_1^*),(w^*,b_2^*)
(w∗,b1∗),(w∗,b2∗)使得问题的不等式等号成立的点。
意思就是一撇是在超平面的正分类(
y
i
=
1
y_i=1
yi=1)临界面上的点(满足约束条件的等号),两撇是负分类(
y
i
=
−
1
y_i=-1
yi=−1)临界面上的点。
y
i
(
w
∗
⋅
x
1
′
+
b
1
∗
)
−
1
=
y
i
(
w
∗
⋅
x
1
′
′
+
b
1
∗
)
−
1
=
0
w
∗
⋅
x
1
′
+
b
1
∗
=
−
(
w
∗
⋅
x
1
′
′
+
b
1
∗
)
−
2
b
1
∗
=
w
∗
⋅
x
1
′
+
w
∗
⋅
x
1
′
′
b
1
∗
=
−
1
2
(
w
∗
⋅
x
1
′
+
w
∗
⋅
x
1
′
′
)
y_i\left(w^*\cdot x_1'+b_1^*\right)-1=y_i\left(w^*\cdot x_1''+b_1^*\right)-1=0\\ w^*\cdot x_1'+b_1^*=-(w^*\cdot x_1''+b_1^*)\\ -2b_1^*=w^*\cdot x_1'+w^*\cdot x_1''\\ b_1^*=-\cfrac{1}{2}(w^*\cdot x_1'+w^*\cdot x_1'')
yi(w∗⋅x1′+b1∗)−1=yi(w∗⋅x1′′+b1∗)−1=0w∗⋅x1′+b1∗=−(w∗⋅x1′′+b1∗)−2b1∗=w∗⋅x1′+w∗⋅x1′′b1∗=−21(w∗⋅x1′+w∗⋅x1′′)
同理:
b
2
∗
=
−
1
2
(
w
∗
⋅
x
2
′
+
w
∗
⋅
x
2
′
′
)
b_2^*=-\cfrac{1}{2}(w^*\cdot x_2'+w^*\cdot x_2'')
b2∗=−21(w∗⋅x2′+w∗⋅x2′′)
两个式子相减:
b
1
∗
−
b
2
∗
=
−
1
2
[
(
w
∗
⋅
x
1
′
+
w
∗
⋅
x
1
′
′
)
−
(
w
∗
⋅
x
2
′
+
w
∗
⋅
x
2
′
′
)
]
=
−
1
2
[
w
∗
⋅
(
x
1
′
−
x
2
′
)
+
w
∗
⋅
(
x
1
′
′
−
x
2
′
′
)
]
(1)
b_1^*-b_2^*=-\cfrac{1}{2}[(w^*\cdot x_1'+w^*\cdot x_1'')-(w^*\cdot x_2'+w^*\cdot x_2'')]\\ =-\cfrac{1}{2}[w^*\cdot(x_1'-x_2')+w^*\cdot(x_1''-x_2'')]\tag1
b1∗−b2∗=−21[(w∗⋅x1′+w∗⋅x1′′)−(w∗⋅x2′+w∗⋅x2′′)]=−21[w∗⋅(x1′−x2′)+w∗⋅(x1′′−x2′′)](1)
由于
x
1
′
x_1'
x1′是所在超平面的临界点(支持向量),因此
x
2
′
x_2'
x2′对于
x
1
′
x_1'
x1′是所在超平面而言,肯定要远一些:
w
∗
⋅
x
2
′
+
b
1
∗
≥
1
=
w
∗
⋅
x
1
′
+
b
1
∗
w
∗
⋅
x
1
′
+
b
2
∗
≥
1
=
w
∗
⋅
x
2
′
+
b
2
∗
w^*\cdot x_2'+b_1^*\ge1=w^*\cdot x_1'+b_1^*\\ w^*\cdot x_1'+b_2^*\ge1=w^*\cdot x_2'+b_2^*
w∗⋅x2′+b1∗≥1=w∗⋅x1′+b1∗w∗⋅x1′+b2∗≥1=w∗⋅x2′+b2∗
化简一下:
w
∗
⋅
x
2
′
≥
w
∗
⋅
x
1
′
w
∗
⋅
x
1
′
≥
w
∗
⋅
x
2
′
w^*\cdot x_2'\ge w^*\cdot x_1'\\ w^*\cdot x_1'\ge w^*\cdot x_2'
w∗⋅x2′≥w∗⋅x1′w∗⋅x1′≥w∗⋅x2′
因此左右必定相等:
w
∗
⋅
x
2
′
=
w
∗
⋅
x
1
′
w
∗
⋅
(
x
1
′
−
x
2
′
)
=
0
w^*\cdot x_2'= w^*\cdot x_1'\\ w^*\cdot (x_1'-x_2')=0
w∗⋅x2′=w∗⋅x1′w∗⋅(x1′−x2′)=0
同理:
w
∗
⋅
(
x
1
′
′
−
x
2
′
′
)
=
0
w^*\cdot (x_1''-x_2'')=0
w∗⋅(x1′′−x2′′)=0
把这个结果带入1后:
b
1
∗
−
b
2
∗
=
0
b_1^*-b_2^*=0
b1∗−b2∗=0
由
w
1
∗
=
w
2
∗
,
b
1
∗
=
b
2
∗
w_1^*=w_2^*,b_1^*=b_2^*
w1∗=w2∗,b1∗=b2∗
可知,两个最优解
(
w
1
∗
,
b
1
∗
)
,
(
w
2
∗
,
b
2
∗
)
(w_1^*,b_1^*),(w_2^*,b_2^*)
(w1∗,b1∗),(w2∗,b2∗)相同,唯一性得证。