强对偶性的证明
本博客是参考资料的笔记整理:
https://www.bilibili.com/video/BV1dJ411B7gh?p=11
1. 预备知识
定义 1 凸集:某点集D是凸集,是指对于任意两点
x
1
x_1
x1,
x
2
∈
D
x_2∈ D
x2∈D 和
0
≤
λ
≤
1
0 ≤ λ ≤ 1
0≤λ≤1,有:
x
=
λ
x
1
+
(
1
−
λ
)
x
2
∈
D
(1)
x=\lambda x_{1}+(1-\lambda) x_{2} \in D \tag{1}
x=λx1+(1−λ)x2∈D(1)
以下是凸集的例子
定理 1 分离超平面定理:假设两个不相交的凸集
C
C
C 和
D
D
D,即
C
∩
D
=
∅
C \cap D=\emptyset
C∩D=∅,则存在
向量
a
≠
0
a \neq 0
a=0和常数
b
b
b,有
{
a
T
x
≤
b
∀
x
∈
C
a
T
x
≥
b
∀
x
∈
D
(2)
\left\{\begin{array}{ll} \boldsymbol{a}^{\mathrm{T}} x \leq b & \forall x \in C \\ \boldsymbol{a}^{\mathrm{T}} x \geq b & \forall x \in D \end{array}\right. \tag{2}
{aTx≤baTx≥b∀x∈C∀x∈D(2)
Proof of 定理1:
定义 2:点集的
C
C
C 和
D
D
D 的之间的距离为:
dist
(
C
,
D
)
=
inf
u
∈
C
,
v
∈
D
∥
u
−
v
∥
2
(3)
\operatorname{dist}(C, D)=\inf _{u \in C, v \in D}\|u-v\|^{2} \tag{3}
dist(C,D)=u∈C,v∈Dinf∥u−v∥2(3)
假设
c
∈
C
,
d
∈
D
c \in C, d \in D
c∈C,d∈D能达到此最小距离,即
dist
(
C
,
D
)
=
∥
c
−
d
∥
2
\operatorname{dist}(C, D)=\|c-d\|^{2}
dist(C,D)=∥c−d∥2,令
a
=
c
−
d
a=c-d
a=c−d,
b
=
∥
c
∥
2
−
∥
d
∥
2
2
b=\frac{\|c\|^{2}-\|d\|^{2}}{2}
b=2∥c∥2−∥d∥2(实际上,
(
c
−
d
)
T
x
−
∥
c
∥
2
−
∥
d
∥
2
2
=
0
(c-d)^{\mathrm{T}} x-\frac{\|c\|^{2}-\|d\|^{2}}{2}=0
(c−d)Tx−2∥c∥2−∥d∥2=0是点
c
c
c 和点
d
d
d 连线的“中垂面”),下面证明:①对于任意
u
∈
C
u \in C
u∈C,有
a
T
u
−
b
≥
0
a^{\mathrm{T}} u-b \geq 0
aTu−b≥0; ②对于任意
v
∈
D
v \in D
v∈D,有
a
T
v
−
b
≤
0
a^{\mathrm{T}} v-b \leq 0
aTv−b≤0.
反证法:假设存在一个
u
∈
C
u \in C
u∈C,使
a
T
u
−
b
<
0
(
c
−
d
)
T
u
−
∥
c
∥
2
−
∥
d
∥
2
2
<
0
(
c
−
d
)
T
(
u
−
1
2
(
c
+
d
)
)
<
0
(
c
−
d
)
T
(
(
u
−
c
)
+
1
2
(
c
−
d
)
)
<
0
(
c
−
d
)
T
(
u
−
c
)
+
1
2
∥
c
−
d
∥
2
<
0
\begin{array}{l} a^{\mathrm{T}} u-b<0 \\ (c-d)^{\mathrm{T}} u-\frac{\|c\|^{2}-\|d\|^{2}}{2}<0 \\ (c-d)^{\mathrm{T}}\left(u-\frac{1}{2}(c+d)\right)<0 \\ (c-d)^{\mathrm{T}}\left((u-c)+\frac{1}{2}(c-d)\right)<0 \\ (c-d)^{\mathrm{T}}(u-c)+\frac{1}{2}\|c-d\|^{2}<0 \end{array}
aTu−b<0(c−d)Tu−2∥c∥2−∥d∥2<0(c−d)T(u−21(c+d))<0(c−d)T((u−c)+21(c−d))<0(c−d)T(u−c)+21∥c−d∥2<0
因为
−
1
2
∥
c
−
d
∥
2
≥
0
-\frac{1}{2}\|c-d\|^{2} \geq 0
−21∥c−d∥2≥0,所以
(
c
−
d
)
T
(
u
−
c
)
<
0
(c-d)^{\mathrm{T}}(u-c)<0
(c−d)T(u−c)<0。假设另有一点
p
p
p在
u
u
u和
c
c
c的连线上,即
p
=
λ
u
+
(
1
−
λ
)
c
p=\lambda u+(1-\lambda) c
p=λu+(1−λ)c,其中
0
≤
λ
≤
1
0 \leq \lambda \leq 1
0≤λ≤1。根据
C
C
C 是凸集,则有
p
∈
C
p \in C
p∈C。下面计算
∥
p
−
d
∥
2
\|p-d\|^{2}
∥p−d∥2 :
∥
p
−
d
∥
2
=
∥
λ
u
+
(
1
−
λ
)
c
−
d
∥
2
=
∥
(
c
−
d
)
+
λ
(
u
−
c
)
∥
2
=
∥
c
−
d
∥
2
+
2
λ
(
c
−
d
)
T
(
u
−
c
)
+
λ
2
∥
u
−
c
∥
2
=
∥
c
−
d
∥
2
+
λ
[
2
(
c
−
d
)
T
(
u
−
c
)
+
λ
∥
u
−
c
∥
2
]
\begin{aligned}\|p-d\|^{2} &=\|\lambda u+(1-\lambda) c-d\|^{2} \\ &=\|(c-d)+\lambda(u-c)\|^{2} \\ &=\|c-d\|^{2}+2 \lambda(c-d)^{\mathrm{T}}(u-c)+\lambda^{2}\|u-c\|^{2} \\ &=\|c-d\|^{2}+\lambda\left[2(c-d)^{\mathrm{T}}(u-c)+\lambda\|u-c\|^{2}\right] \end{aligned}
∥p−d∥2=∥λu+(1−λ)c−d∥2=∥(c−d)+λ(u−c)∥2=∥c−d∥2+2λ(c−d)T(u−c)+λ2∥u−c∥2=∥c−d∥2+λ[2(c−d)T(u−c)+λ∥u−c∥2]
分析
(
c
−
d
)
T
(
u
−
c
)
<
0
(c-d)^{\mathrm{T}}(u-c)<0
(c−d)T(u−c)<0,当
λ
\lambda
λ 取一个很小的正数时,即满足
λ
<
−
2
(
c
−
d
)
T
(
u
−
c
)
∥
u
−
c
∥
2
(4)
\lambda<-\frac{2(c-d)^{\mathrm{T}}(u-c)}{\|u-c\|^{2}} \tag{4}
λ<−∥u−c∥22(c−d)T(u−c)(4)
一定有:
∥
p
−
d
∥
2
<
∥
c
−
d
∥
2
\|p-d\|^{2}<\|c-d\|^{2}
∥p−d∥2<∥c−d∥2且
p
∈
C
p \in C
p∈C,这与定义 2 矛盾,故①得证。而②的证明过程,同理。
■
\blacksquare
■
定理2:若 c c c 是一个非零向量,即 ∥ c ∥ 2 > 0 \|c\|^{2}>0 ∥c∥2>0,即则对任意 ε > 0 \varepsilon>0 ε>0,存在一个向量 x x x 满足: ① ∥ x ∥ 2 ≤ ε \|x\|^{2} \leq \varepsilon ∥x∥2≤ε ;② c T x > 0 c^{T} x>0 cTx>0.
Proof of 定理2: 取 x = ε ∥ c ∥ 2 c x=\frac{\varepsilon}{\|c\|^{2}} c x=∥c∥2εc,则 ∥ x ∥ 2 = ε \|x\|^{2}=\varepsilon ∥x∥2=ε,且 c T x = ε > 0 c^{T} x=\varepsilon>0 cTx=ε>0,同理也存在一个向量 x x x,使① ∥ x ∥ 2 ≤ ε \|x\|^{2} \leq \varepsilon ∥x∥2≤ε,② c T x > 0 c^{T} x>0 cTx>0. ■ \blacksquare ■
2. 对偶问题
原问题(Prime Problem):
min
w
f
(
w
)
s.t.
g
i
(
w
)
≤
0
,
i
=
1
,
2
,
…
,
K
h
j
(
w
)
=
0
,
j
=
1
,
2
,
…
,
M
(5)
\begin{aligned} &\min _{w} f(w)\\ \text { s.t. }& g_{i}(w) \leq 0, \quad i=1,2, \dots, K\\ &h_{j}(w)=0, \quad j=1,2, \dots, M \end{aligned} \tag{5}
s.t. wminf(w)gi(w)≤0,i=1,2,…,Khj(w)=0,j=1,2,…,M(5)
对偶问题(Dual Problem):
先定义拉格朗日函数
L
(
w
,
α
,
β
)
=
f
(
w
)
+
∑
i
=
1
K
α
i
g
i
(
w
)
+
∑
j
=
1
M
β
j
h
j
(
w
)
(6)
\mathcal{L}(w, \alpha, \beta)=f(w)+\sum_{i=1}^{K} \alpha_{i} g_{i}(w)+\sum_{j=1}^{M} \beta_{j} h_{j}(w) \tag{6}
L(w,α,β)=f(w)+i=1∑Kαigi(w)+j=1∑Mβjhj(w)(6)
由拉格朗日函数推导出对偶问题的形式:
max
α
,
β
θ
(
α
,
β
)
=
inf
w
L
(
w
,
α
,
β
)
s.t.
α
i
≥
0
,
i
=
1
,
2
,
…
K
(7)
\begin{aligned} \max _{\alpha, \beta} \theta(\alpha, \beta)=\inf _{w} \mathcal{L}(w, \alpha, \beta) \\ \text { s.t. } \alpha_{i} \geq 0, \quad i=1,2, \dots K \end{aligned} \tag{7}
α,βmaxθ(α,β)=winfL(w,α,β) s.t. αi≥0,i=1,2,…K(7)
定理3:若
W
∗
W^{*}
W∗是原问题的解,
(
α
∗
,
β
∗
)
\left(\alpha^{*}, \beta^{*}\right)
(α∗,β∗)是对偶问题的解,则有:
KaTeX parse error: Can't use function '$' in math mode at position 2: $̲\theta\left(\al…
Proof of 定理3:
θ
(
α
∗
,
β
∗
)
=
inf
w
L
(
w
,
α
∗
,
β
∗
)
≤
L
(
w
∗
,
α
∗
,
β
∗
)
=
f
(
w
∗
)
+
∑
i
=
1
K
α
i
∗
g
i
(
w
∗
)
+
∑
j
=
1
M
β
j
∗
h
j
(
w
∗
)
≤
f
(
w
∗
)
\begin{aligned} \theta\left(\alpha^{*}, \beta^{*}\right) &=\inf _{w} \mathcal{L}\left(w, \alpha^{*}, \beta^{*}\right) \\ & \leq \mathcal{L}\left(w^{*}, \alpha^{*}, \beta^{*}\right) \\ &=f\left(w^{*}\right)+\sum_{i=1}^{K} \alpha_{i}^{*} g_{i}\left(w^{*}\right)+\sum_{j=1}^{M} \beta_{j}^{*} h_{j}\left(w^{*}\right) \\ & \leq f\left(w^{*}\right) \end{aligned}
θ(α∗,β∗)=winfL(w,α∗,β∗)≤L(w∗,α∗,β∗)=f(w∗)+i=1∑Kαi∗gi(w∗)+j=1∑Mβj∗hj(w∗)≤f(w∗)
■
\blacksquare
■
定义3(凸函数):
f
(
w
)
f(w)
f(w)是凸函数是指对
∀
w
1
,
w
2
,
∀
λ
∈
[
0
,
1
]
\forall w_{1}, w_{2}, \quad \forall \lambda \in[0,1]
∀w1,w2,∀λ∈[0,1],有:
f
(
λ
w
1
+
(
1
−
λ
)
w
2
)
≤
λ
f
(
w
1
)
+
(
1
−
λ
)
f
(
w
2
)
(9)
f\left(\lambda w_{1}+(1-\lambda) w_{2}\right) \leq \lambda f\left(w_{1}\right)+(1-\lambda) f\left(w_{2}\right) \tag{9}
f(λw1+(1−λ)w2)≤λf(w1)+(1−λ)f(w2)(9)
3. 强对偶性的证明
定理4(强对偶定理):对于 f ( w ) , g i ( w ) , h j ( w ) f(w), g_{i}(w), h_{j}(w) f(w),gi(w),hj(w),若满足:
① f ( w ) f(w) f(w)是凸函数;
② g i ( w ) g_{i}(w) gi(w)是凸函数;
③ h j ( w ) h_{j}(w) hj(w)是仿射函数,即 h j ( w ) = c j T w + d h_{j}(w)=c_{j}^{\mathrm{T}} w+d hj(w)=cjTw+d;
④slater条件:存在一个 w w w使 g i ( w ) < 0 g_i(w)<0 gi(w)<0和 h j ( w ) = 0 h_j(w)=0 hj(w)=0;
⑤ w w w的取值范围 D D D是开集,即若 w ∈ D w \in D w∈D 则存在邻域 N ( w , ε ) ∈ D N(w, \varepsilon) \in D N(w,ε)∈D;
⑥ w w w的取值范围 D D D是凸集。
则有: f ( w ∗ ) = θ ( α ∗ , β ∗ ) f\left(w^{*}\right)=\theta\left(\alpha^{*}, \beta^{*}\right) f(w∗)=θ(α∗,β∗).
Proof of 强对偶定理:
构造点集:
A
=
{
(
u
,
v
,
t
)
∣
∃
w
∈
D
,
使
g
i
(
w
)
≤
u
i
,
h
j
(
w
)
=
v
i
,
f
(
w
)
≤
t
}
(10)
A=\left\{(u, v, t) | \exists w \in D, \text { 使 } g_{i}(w) \leq u_{i}, h_{j}(w)=v_{i}, f(w) \leq t\right\} \tag{10}
A={(u,v,t)∣∃w∈D, 使 gi(w)≤ui,hj(w)=vi,f(w)≤t}(10)
定义:
g
(
w
)
=
[
g
1
(
w
)
g
2
(
w
)
⋮
g
K
(
w
)
]
,
h
(
w
)
=
[
h
1
(
w
)
h
2
(
w
)
⋮
h
M
(
w
)
]
(11)
g(w)=\left[\begin{array}{c} g_{1}(w) \\ g_{2}(w) \\ \vdots \\ g_{K}(w) \end{array}\right], \quad h(w)=\left[\begin{array}{c} h_{1}(w) \\ h_{2}(w) \\ \vdots \\ h_{M}(w) \end{array}\right] \tag{11}
g(w)=⎣⎢⎢⎢⎡g1(w)g2(w)⋮gK(w)⎦⎥⎥⎥⎤,h(w)=⎣⎢⎢⎢⎡h1(w)h2(w)⋮hM(w)⎦⎥⎥⎥⎤(11)
注意:①若
w
∈
D
w \in D
w∈D,则
(
g
(
w
)
,
h
(
w
)
,
f
(
w
)
)
∈
A
(g(w), h(w), f(w)) \in A
(g(w),h(w),f(w))∈A(证明:至少可以使定义中等号成立);②若
w
∈
D
w \in D
w∈D,则
(
+
∞
,
h
(
w
)
,
+
∞
)
∈
A
(+\infty, h(w),+\infty) \in A
(+∞,h(w),+∞)∈A(证明:任何数都小于正无穷)。
引理1:若 D D D 是凸集, g i ( w ) g_{i}(w) gi(w)是凸函数 ( i = 1 , 2 , … , K ) (i=1,2, \dots, K) (i=1,2,…,K), h j ( w ) h_{j}(w) hj(w)是仿射函数,即 h i ( w ) = c w + d h_{i}(w)=c w+d hi(w)=cw+d, f ( w ) f(w) f(w)是凸函数,则 A A A 是凸集.
证明:
设
(
u
1
,
v
1
,
t
1
)
,
(
u
2
,
v
2
,
t
2
)
∈
A
\left(u_{1}, v_{1}, t_{1}\right),\left(u_{2}, v_{2}, t_{2}\right) \in A
(u1,v1,t1),(u2,v2,t2)∈A,我们要证当
0
≤
λ
≤
1
0 \leq \lambda \leq 1
0≤λ≤1时,有
(
λ
u
1
+
(
1
−
λ
)
u
2
,
λ
v
1
+
(
1
−
λ
)
v
2
,
λ
t
1
+
(
1
−
λ
)
t
2
)
∈
A
(12)
\left(\lambda u_{1}+(1-\lambda) u_{2}, \lambda v_{1}+(1-\lambda) v_{2}, \lambda t_{1}+(1-\lambda) t_{2}\right) \in A \tag{12}
(λu1+(1−λ)u2,λv1+(1−λ)v2,λt1+(1−λ)t2)∈A(12)
①因为
(
u
1
,
v
1
,
t
1
)
∈
A
\left(u_{1}, v_{1}, t_{1}\right) \in A
(u1,v1,t1)∈A,所以
∃
w
1
∈
D
\exists w_{1} \in D
∃w1∈D,使
g
i
(
w
1
)
≤
u
i
,
h
j
(
w
1
)
=
v
i
,
f
(
w
1
)
≤
t
g_{i}\left(w_{1}\right) \leq u_{i}, h_{j}\left(w_{1}\right)=v_{i}, f\left(w_{1}\right) \leq t
gi(w1)≤ui,hj(w1)=vi,f(w1)≤t;同理
(
u
2
,
v
2
,
t
2
)
∈
A
\left(u_{2}, v_{2}, t_{2}\right) \in A
(u2,v2,t2)∈A,所以
∃
w
2
∈
D
\exists w_{2} \in D
∃w2∈D,使
g
i
(
w
2
)
≤
u
i
,
h
j
(
w
2
)
=
v
i
,
f
(
w
2
)
≤
t
g_{i}\left(w_{2}\right) \leq u_{i}, h_{j}\left(w_{2}\right)=v_{i}, f\left(w_{2}\right) \leq t
gi(w2)≤ui,hj(w2)=vi,f(w2)≤t.
②设 w ′ = λ w 1 + ( 1 − λ ) w 2 w^{\prime}=\lambda w_{1}+(1-\lambda) w_{2} w′=λw1+(1−λ)w2,因为 D D D是凸集,所以 w ′ ∈ D w^{\prime} \in D w′∈D。由于 g i ( w ) g_i(w) gi(w)是凸函数,故: g i ( w ′ ) ≤ λ g i ( w 1 ) + ( 1 − λ ) g i ( w 2 ) ≤ λ u 1 , i + ( 1 − λ ) u 2 , i g_{i}\left(w^{\prime}\right) \leq \lambda g_{i}\left(w_{1}\right)+(1-\lambda) g_{i}\left(w_{2}\right) \leq \lambda u_{1, i}+(1-\lambda) u_{2, i} gi(w′)≤λgi(w1)+(1−λ)gi(w2)≤λu1,i+(1−λ)u2,i,同理有 f ( w ′ ) ≤ λ t 1 + ( 1 − λ ) t 2 f\left(w^{\prime}\right) \leq \lambda t_{1}+(1-\lambda) t_{2} f(w′)≤λt1+(1−λ)t2.
③
h
j
(
w
′
)
=
c
w
′
+
d
h_{j}\left(w^{\prime}\right)=c w^{\prime}+d
hj(w′)=cw′+d
=
λ
(
c
w
1
+
d
)
+
(
1
−
λ
)
(
c
w
2
+
d
)
=\lambda\left(c w_{1}+d\right)+(1-\lambda)\left(c w_{2}+d\right)
=λ(cw1+d)+(1−λ)(cw2+d)
=
λ
h
j
(
w
1
)
+
(
1
−
λ
)
h
j
(
w
2
)
=\lambda h_{j}\left(w_{1}\right)+(1-\lambda) h_{j}\left(w_{2}\right)
=λhj(w1)+(1−λ)hj(w2)
=
λ
v
1
,
j
+
(
1
−
λ
)
v
2
,
j
=\lambda v_{1, j}+(1-\lambda) v_{2, j}
=λv1,j+(1−λ)v2,j
综上①②③,引理1得证. ■ \blacksquare ■
根据式子(10)的定义,我们有原问题的解
f
(
w
∗
)
=
min
(
0
,
0
,
t
)
∈
A
t
(13)
f\left(w^{*}\right)=\min _{(0,0, t) \in A} t \tag{13}
f(w∗)=(0,0,t)∈Amint(13)
定义另一个点集
B
=
{
(
0
,
0
,
s
)
∣
s
<
f
(
w
∗
)
}
B=\left\{(0,0, s) | s<f\left(w^{*}\right)\right\}
B={(0,0,s)∣s<f(w∗)},可以证明
B
B
B 也是凸集,且
A
∩
B
=
∅
A \cap B=\emptyset
A∩B=∅.
根据定理1(分离超平面定理),存在 ( α , β , η ) (\alpha, \beta, \eta) (α,β,η)使得:①若 ( u , v , t ) ∈ A (u, v, t) \in A (u,v,t)∈A,则 α T u + β T v + η t ≥ b \alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t \geq b αTu+βTv+ηt≥b;②若 ( u , v , t ) ∈ B (u, v, t) \in B (u,v,t)∈B,则 α T u + β T v + η t < b \alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t<b αTu+βTv+ηt<b。由于此时, u = 0 u=0 u=0和 v = 0 v=0 v=0,所以 − η t < b -\eta t<b −ηt<b.
引理2:若对
∀
(
u
,
v
,
t
)
∈
A
\forall(u, v, t) \in A
∀(u,v,t)∈A,有
α
T
u
+
β
T
v
+
η
t
≥
b
\alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t \geq b
αTu+βTv+ηt≥b,则有
α
=
[
α
1
,
α
2
,
…
,
α
K
]
≽
0
,
η
≥
0
(14)
\alpha=\left[\alpha_{1}, \alpha_{2}, \ldots, \alpha_{K}\right] \succcurlyeq 0, \quad \eta \geq 0 \tag{14}
α=[α1,α2,…,αK]≽0,η≥0(14)
Proof:
假设某个 α i < 0 \alpha_{i}<0 αi<0,则可以取相应 u i = + ∞ u_{i}=+\infty ui=+∞,此时 ( u , v , t ) (u, v, t) (u,v,t)仍然属于 A A A,但 α T u + β T v + η t = − ∞ \alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t=-\infty αTu+βTv+ηt=−∞,这与 α T u + β T v + η t ≥ 0 \alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t \geq 0 αTu+βTv+ηt≥0矛盾。同理可证 η ≥ 0 \eta \geq 0 η≥0。
根据
A
A
A 的定义和①可得,对
∀
w
∈
D
\forall w \in D
∀w∈D,有
∑
i
=
1
K
α
i
g
i
(
w
)
+
∑
j
=
1
M
β
j
h
j
(
w
)
+
η
f
(
w
)
≥
b
\sum_{i=1}^{K} \alpha_{i} g_{i}(w)+\sum_{j=1}^{M} \beta_{j} h_{j}(w)+\eta f(w) \geq b
∑i=1Kαigi(w)+∑j=1Mβjhj(w)+ηf(w)≥b;根据
B
B
B 的定义和②的
−
η
t
<
b
-\eta t<b
−ηt<b可得,
η
f
(
w
∗
)
≤
b
\eta f\left(w^{*}\right) \leq b
ηf(w∗)≤b。因此有:
∑
i
=
1
K
α
i
g
i
(
w
)
+
∑
j
=
1
M
β
j
h
j
(
w
)
+
η
f
(
w
)
≥
b
≥
η
f
(
w
∗
)
(15)
\sum_{i=1}^{K} \alpha_{i} g_{i}(w)+\sum_{j=1}^{M} \beta_{j} h_{j}(w)+\eta f(w) \geq b \geq \eta f\left(w^{*}\right) \tag{15}
i=1∑Kαigi(w)+j=1∑Mβjhj(w)+ηf(w)≥b≥ηf(w∗)(15)
下面分两种情况讨论:
情况1:
η
≠
0
\eta \neq 0
η=0,此时有
f
(
w
∗
)
≤
∑
i
=
1
K
α
i
η
g
i
(
w
)
+
∑
j
=
1
M
β
j
η
h
j
(
w
)
+
f
(
w
)
=
L
(
w
,
α
η
,
β
η
)
(16)
f\left(w^{*}\right) \leq \sum_{i=1}^{K} \frac{\alpha_{i}}{\eta} g_{i}(w)+\sum_{j=1}^{M} \frac{\beta_{j}}{\eta} h_{j}(w)+f(w)=\mathcal{L}\left(w, \frac{\alpha}{\eta}, \frac{\beta}{\eta}\right) \tag{16}
f(w∗)≤i=1∑Kηαigi(w)+j=1∑Mηβjhj(w)+f(w)=L(w,ηα,ηβ)(16)
由于
w
w
w是任意的,因此有
f
(
w
∗
)
≤
inf
w
L
(
w
,
α
η
,
β
η
)
=
θ
(
α
η
,
β
η
)
(17)
f\left(w^{*}\right) \leq \inf _{w} \mathcal{L}\left(w, \frac{\alpha}{\eta}, \frac{\beta}{\eta}\right)=\theta\left(\frac{\alpha}{\eta}, \frac{\beta}{\eta}\right) \tag{17}
f(w∗)≤winfL(w,ηα,ηβ)=θ(ηα,ηβ)(17)
由于
α
≻
0
,
η
>
0
\alpha \succ 0, \eta>0
α≻0,η>0,所以
α
η
≻
0
\frac{\alpha}{\eta} \succ 0
ηα≻0,满足对偶问题的限制条件,因此有:
f
(
w
∗
)
≤
θ
(
α
∗
,
β
∗
)
(18)
f\left(w^{*}\right) \leq \theta\left(\alpha^{*}, \beta^{*}\right) \tag{18}
f(w∗)≤θ(α∗,β∗)(18)
在根据定理3,有
θ
(
α
∗
,
β
∗
)
≤
f
(
w
∗
)
\theta\left(\alpha^{*}, \beta^{*}\right) \leq f\left(w^{*}\right)
θ(α∗,β∗)≤f(w∗),所以
f
(
w
∗
)
=
θ
(
α
∗
,
β
∗
)
f\left(w^{*}\right)=\theta\left(\alpha^{*}, \beta^{*}\right)
f(w∗)=θ(α∗,β∗),得证。
情况2:
η
=
0
\eta=0
η=0,此时对
∀
w
∈
D
\forall w \in D
∀w∈D,有
∑
i
=
1
K
α
i
g
i
(
w
)
+
∑
j
=
1
M
β
j
h
j
(
w
)
≥
0
(19)
\sum_{i=1}^{K} \alpha_{i} g_{i}(w)+\sum_{j=1}^{M} \beta_{j} h_{j}(w) \geq 0 \tag{19}
i=1∑Kαigi(w)+j=1∑Mβjhj(w)≥0(19)
根据定理4中的条件④(slater条件),
∃
w
\exists w
∃w使
g
i
(
w
)
<
0
g_{i}(w)<0
gi(w)<0,
h
j
(
w
)
=
0
h_{j}(w)=0
hj(w)=0,这可以推出
α
i
=
0
\alpha_{i}=0
αi=0,因此公式(19)变为
∑
j
=
1
M
β
j
h
j
(
w
)
≥
0
,
或记为
β
T
h
(
w
)
≥
0
(20)
\sum_{j=1}^{M} \beta_{j} h_{j}(w) \geq 0, \text { 或记为 } \beta^{\mathrm{T}} h(w) \geq 0 \tag{20}
j=1∑Mβjhj(w)≥0, 或记为 βTh(w)≥0(20)
根据定理4中的条件③,
h
(
w
)
=
c
w
+
d
h(w)=c w+d
h(w)=cw+d,代入得:
β
T
h
(
w
)
≥
0
β
T
c
w
+
β
T
d
≥
0
(21)
\begin{array}{l} \beta^{\mathrm{T}} h(w) \geq 0 \\ \beta^{\mathrm{T}} c w+\beta^{\mathrm{T}} d \geq 0 \end{array} \tag{21}
βTh(w)≥0βTcw+βTd≥0(21)
记
P
=
β
T
c
P=\beta^{\mathrm{T}} c
P=βTc,
q
=
β
T
d
q=\beta^{\mathrm{T}} d
q=βTd,则式子(21)改写为:
P
w
+
q
≥
0
(22)
P w+q \geq 0 \tag{22}
Pw+q≥0(22)
注意公式(22)对所有的
w
∈
D
w \in D
w∈D都成立。根据条件④ (slater条件),
∃
w
\exists w
∃w 使
c
w
+
d
=
0
c w+d=0
cw+d=0,从而
P
w
+
q
=
0
P w+q=0
Pw+q=0.
下面证明,存在一个 w ′ = w + Δ w w^{\prime}=w+\Delta w w′=w+Δw,其中 Δ w \Delta w Δw在 w w w的一个领域 N ( 0 , ε ) N(0, \varepsilon) N(0,ε)中,使 P w ′ + q < 0 P w^{\prime}+q<0 Pw′+q<0。
证明:根据定理1,有 β ≠ 0 \beta \neq 0 β=0,否则 ( α , β , η ) (\alpha, \beta, \eta) (α,β,η)都为0,与分离超平面定理矛盾。则有 P = β T c ≠ 0 P=\beta^{\mathrm{T}} c \neq 0 P=βTc=0;根据定理2,存在一个 Δ w \Delta w Δw满足 ∥ w ∥ 2 < ε \|w\|^{2}<\varepsilon ∥w∥2<ε且 P Δ w < 0 P \Delta w<0 PΔw<0。因此, w ′ = w + Δ w ∈ N ( 0 , ε ) w^{\prime}=w+\Delta w \in N(0, \varepsilon) w′=w+Δw∈N(0,ε)。
根据定理4中的条件⑤,
w
′
∈
D
w^{\prime} \in D
w′∈D,同时,
P
w
′
+
q
=
P
(
w
+
Δ
w
)
+
q
=
(
P
w
+
q
)
+
P
Δ
w
=
P
Δ
w
<
0
(23)
\begin{aligned} P w^{\prime}+q &=P(w+\Delta w)+q \\ &=(P w+q)+P \Delta w \\ &=P \Delta w<0 \end{aligned} \tag{23}
Pw′+q=P(w+Δw)+q=(Pw+q)+PΔw=PΔw<0(23)
这与式子(22)矛盾,所以情况2不成立/不存在。
定理4 强对偶定理得证. ■ \blacksquare ■