文章目录
前序文章链接:
SVM-1
2.4 凸优化问题、拉格朗日函数、对偶问题、KKT条件
2.4.1 凸优化问题
\qquad 凸优化问题指的是形如以下的最优化问题:
min x ⃗ f ( x ⃗ ) s . t . g i ( x ⃗ ) ≤ 0 , i = 1 , 2 , 3 , ⋯ , k h j ( x ⃗ ) = 0 , j = 1 , 2 , 3 , ⋯ , l \min_{\vec{x}}f(\vec{x}) \\ s.t. \quad g_{i}(\vec{x}) \leq 0, i=1,2,3, \cdots, k \\ \qquad h_{j}(\vec{x})=0,j=1,2,3, \cdots, l xminf(x)s.t.gi(x)≤0,i=1,2,3,⋯,khj(x)=0,j=1,2,3,⋯,l
\qquad
其中,目标函数
f
(
x
⃗
)
f(\vec{x})
f(x) 和约束函数
g
i
(
x
⃗
)
g_{i}(\vec{x})
gi(x)都是
R
n
R^{n}
Rn上连续可微的凸函数,约束函数
h
j
(
x
⃗
)
h_{j}(\vec{x})
hj(x)是
R
n
R^{n}
Rn上的仿射函数。
2.4.2 拉格朗日函数
\qquad 拉格朗日函数通过引入拉格朗日乘子,将目标优化问题与约束条件组合在一起进行计算。以2.4.1中凸优化问题所对应的拉格朗日函数为例:
L ( x ⃗ , α ⃗ , β ⃗ ) = f ( x ⃗ ) + ∑ i = 1 k α i g i ( x ⃗ ) + ∑ j = 1 l β j h j ( x ⃗ ) L(\vec{x},\vec{\alpha},\vec{\beta}) = f(\vec{x}) + \sum_{i=1}^{k}\alpha_{i}g_{i}(\vec{x})+\sum_{j=1}^{l}\beta_{j}h_{j}(\vec{x}) L(x,α,β)=f(x)+i=1∑kαigi(x)+j=1∑lβjhj(x)
\qquad 其中,有自变量 x ⃗ = ( x 1 , x 2 , x 3 , ⋯ , x n ) T ∈ R n \vec{x} = (x_{1},x_{2},x_{3}, \cdots ,x_{n})^{T} \in R^{n} x=(x1,x2,x3,⋯,xn)T∈Rn , α ⃗ = ( α 1 , α 2 , α 3 , ⋯ , α k ) T ∈ R k \vec{\alpha} = (\alpha_{1},\alpha_{2},\alpha_{3}, \cdots ,\alpha_{k})^{T} \in R^{k} α=(α1,α2,α3,⋯,αk)T∈Rk , β ⃗ = ( β 1 , β 2 , β 3 , ⋯ , β l ) T ∈ R l \vec{\beta} = (\beta_{1},\beta_{2},\beta_{3}, \cdots ,\beta_{l})^{T} \in R^{l} β=(β1,β2,β3,⋯,βl)T∈Rl , α i \alpha_{i} αi 和 β j \beta_{j} βj 是拉格朗日乘子, α i ≥ 0 \alpha_{i} \geq 0 αi≥0 。接下来给出论证,为什么采用拉格朗日函数可以等价于原问题及其约束条件。
\qquad 记:
θ p ( x ⃗ ) = max α ⃗ , β ⃗ : α i ≥ 0 L ( x ⃗ , α ⃗ , β ⃗ ) \theta_{p}(\vec{x}) = \max_{\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0}L(\vec{x},\vec{\alpha},\vec{\beta}) θp(x)=α,β:αi≥0maxL(x,α,β)
\qquad 其中, max α ⃗ , β ⃗ : α i ≥ 0 \max_{\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0} maxα,β:αi≥0 可以理解为对于 x ⃗ \vec{x} x 的每一个可取值,在 α ⃗ , β ⃗ : α i ≥ 0 {\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0} α,β:αi≥0 可取值范围内所取到的最大值。假定给某个 x ⃗ \vec{x} x 。如果该 x ⃗ \vec{x} x 违反了原始问题的约束条件,那么就存在某个 g i ( x ⃗ ) > 0 g_i(\vec{x}) > 0 gi(x)>0 或某个 h j ( x ⃗ ) ≠ 0 h_j(\vec{x}) \neq 0 hj(x)=0 ,就可令 $\alpha_{i} \rightarrow +\infty $ 或 $\beta_{j}h_{j}(\vec{x}) \rightarrow +\infty $ 并使其余各 α i , β j \alpha_{i}, \beta_{j} αi,βj 均取 0,得到:
θ p ( x ⃗ ) = max α ⃗ , β ⃗ : α i ≥ 0 L ( x ⃗ , α ⃗ , β ⃗ ) = max α ⃗ , β ⃗ : α i ≥ 0 [ f ( x ⃗ ) + ∑ i = 1 k α i g i ( x ⃗ ) + ∑ j = 1 l β j h j ( x ⃗ ) ] = + ∞ \theta_{p}(\vec{x}) = \max_{\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0}L(\vec{x},\vec{\alpha},\vec{\beta}) = \max_{\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0} \Big [f(\vec{x}) + \sum_{i=1}^{k}\alpha_{i}g_{i}(\vec{x})+\sum_{j=1}^{l}\beta_{j}h_{j}(\vec{x}) \Big] = +\infty θp(x)=α,β:αi≥0maxL(x,α,β)=α,β:αi≥0max[f(x)+i=1∑kαigi(x)+j=1∑lβjhj(x)]=+∞
\qquad 相反,如果 x ⃗ \vec{x} x 满足所有的约束条件,则可以令 α i = 0 , β j \alpha_{i} = 0, \beta_{j} αi=0,βj 取任意值,使得:
θ p ( x ⃗ ) = max α ⃗ , β ⃗ : α i ≥ 0 L ( x ⃗ , α ⃗ , β ⃗ ) = max α ⃗ , β ⃗ : α i ≥ 0 [ f ( x ⃗ ) + ∑ i = 1 k α i g i ( x ⃗ ) + ∑ j = 1 l β j h j ( x ⃗ ) ] = f ( x ⃗ ) \theta_{p}(\vec{x}) = \max_{\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0}L(\vec{x},\vec{\alpha},\vec{\beta}) = \max_{\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0} \Big [f(\vec{x}) + \sum_{i=1}^{k}\alpha_{i}g_{i}(\vec{x})+\sum_{j=1}^{l}\beta_{j}h_{j}(\vec{x}) \Big] = f(\vec{x}) θp(x)=α,β:αi≥0maxL(x,α,β)=α,β:αi≥0max[f(x)+i=1∑kαigi(x)+j=1∑lβjhj(x)]=f(x)
θ p ( x ⃗ ) = max α ⃗ , β ⃗ : α i ≥ 0 L ( x ⃗ , α ⃗ , β ⃗ ) = { f ( x ⃗ ) , 所 有 满 足 原 始 问 题 约 束 条 件 的 x ⃗ + ∞ , o t h e r s \theta_{p}(\vec{x}) = \max_{\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0}L(\vec{x},\vec{\alpha},\vec{\beta}) = \begin{cases} f(\vec{x}), &所有满足原始问题约束条件的\vec{x} \\ \\ +\infty, &others \end{cases} θp(x)=α,β:αi≥0maxL(x,α,β)=⎩⎪⎨⎪⎧f(x),+∞,所有满足原始问题约束条件的xothers
\qquad
此外,也可以考虑采用梯度的思想对拉格朗日函数进行理解,但在此不做赘述。
\qquad 综上,有:
θ p ( x ⃗ ) = max α ⃗ , β ⃗ : α i ≥ 0 L ( x ⃗ , α ⃗ , β ⃗ ) = { f ( x ⃗ ) , 所 有 满 足 原 始 问 题 约 束 条 件 的 x ⃗ + ∞ , o t h e r s \theta_{p}(\vec{x}) = \max_{\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0}L(\vec{x},\vec{\alpha},\vec{\beta}) = \begin{cases} f(\vec{x}), &所有满足原始问题约束条件的\vec{x} \\ \\ +\infty, &others \end{cases} θp(x)=α,β:αi≥0maxL(x,α,β)=⎩⎪⎨⎪⎧f(x),+∞,所有满足原始问题约束条件的xothers
min
x
⃗
θ
p
(
x
⃗
)
=
min
x
⃗
f
(
x
⃗
)
=
min
x
⃗
max
α
⃗
,
β
⃗
:
α
i
≥
0
L
(
x
⃗
,
α
⃗
,
β
⃗
)
\min_{\vec{x}}\theta_{p}(\vec{x}) = \min_{\vec{x}}f(\vec{x}) = \min_{\vec{x}}\max_{\vec{\alpha},{\vec{\beta}:\alpha_{i} \geq 0}}L(\vec{x},\vec{\alpha},\vec{\beta})
xminθp(x)=xminf(x)=xminα,β:αi≥0maxL(x,α,β)
\qquad
所以,求2.4.1的原问题即是求解拉格朗日函数的极小极大问题,对拉格朗日乘子求拉格朗日函数的极大值在整个计算过程中起到了过滤不满足约束的参数的作用。
2.4.3 原问题与对偶问题、强弱对偶关系
\qquad 当原问题是一个极小极大问题时,其对偶问题为一个极大极小问题。
\qquad 假设原始问题为 min x ⃗ max α ⃗ , β ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) \min_{\vec{x}}\max_{\vec{\alpha},\vec{\beta}}L(\vec{x},\vec{\alpha},\vec{\beta}) minxmaxα,βL(x,α,β),最优值为 p ∗ p^{*} p∗,对偶问题为 max α ⃗ , β ⃗ min x ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) \max_{\vec{\alpha},\vec{\beta}}\min_{\vec{x}}L(\vec{x},\vec{\alpha},\vec{\beta}) maxα,βminxL(x,α,β)最优值为 d ∗ d^{*} d∗,那么,有:
p ∗ = min x ⃗ max α ⃗ , β ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) d ∗ = max α ⃗ , β ⃗ min x ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) p^{*}=\min_{\vec{x}}\max_{\vec{\alpha},\vec{\beta}}L(\vec{x},\vec{\alpha},\vec{\beta}) \\ d^{*}=\max_{\vec{\alpha},\vec{\beta}}\min_{\vec{x}}L(\vec{x},\vec{\alpha},\vec{\beta}) p∗=xminα,βmaxL(x,α,β)d∗=α,βmaxxminL(x,α,β)
\qquad 对于函数 L ( x ⃗ , α ⃗ , β ⃗ ) L(\vec{x},\vec{\alpha},\vec{\beta}) L(x,α,β),有:
min x ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) ≤ L ( x ⃗ , α ⃗ , β ⃗ ) L ( x ⃗ , α ⃗ , β ⃗ ) ≤ max α ⃗ , β ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) θ d ( α ⃗ , β ⃗ ) = min x ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) ≤ L ( x ⃗ , α ⃗ , β ⃗ ) ≤ max α ⃗ , β ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) = θ p ( x ⃗ ) \min_{\vec{x}}L(\vec{x},\vec{\alpha},\vec{\beta}) \leq L(\vec{x},\vec{\alpha},\vec{\beta}) \\ L(\vec{x},\vec{\alpha},\vec{\beta}) \leq \max_{\vec{\alpha},\vec{\beta}}L(\vec{x},\vec{\alpha},\vec{\beta}) \\ \theta_{d}(\vec{\alpha},\vec{\beta}) = \min_{\vec{x}}L(\vec{x},\vec{\alpha},\vec{\beta}) \leq L(\vec{x},\vec{\alpha},\vec{\beta}) \leq \max_{\vec{\alpha},\vec{\beta}}L(\vec{x},\vec{\alpha},\vec{\beta}) = \theta_{p}(\vec{x}) xminL(x,α,β)≤L(x,α,β)L(x,α,β)≤α,βmaxL(x,α,β)θd(α,β)=xminL(x,α,β)≤L(x,α,β)≤α,βmaxL(x,α,β)=θp(x)
\qquad
则:
d
∗
=
max
α
⃗
,
β
⃗
θ
d
(
α
⃗
,
β
⃗
)
≤
L
(
x
⃗
,
α
⃗
,
β
⃗
)
≤
min
x
⃗
θ
p
(
x
⃗
)
=
p
∗
d^{*} = \max_{\vec{\alpha},\vec{\beta}}{\theta_{d}(\vec{\alpha},\vec{\beta})} \leq L(\vec{x},\vec{\alpha},\vec{\beta}) \leq \min_{\vec{x}}{\theta_{p}(\vec{x})} = p^{*}
d∗=α,βmaxθd(α,β)≤L(x,α,β)≤xminθp(x)=p∗
\qquad
即得:
d
∗
≤
p
∗
d^{*} \leq p^{*}
d∗≤p∗
\qquad
当上式不严格取等号时表明原问题与对偶问题存在弱对偶关系,上式取等号时为强队偶关系,取等号的充分条件为:在凸优化问题中存在
x
x
x ,使得对所有的
i
i
i 严格满足不等式约束
g
i
(
x
)
g_{i}(x)
gi(x) 。这一条件被称为Slater条件。Slater条件在一定程度上指出了与对偶问题有同解的凸优化问题所对应的凸集的几何形式,不作赘述。
2.4.4 Karush-Kuhn-Tucker(KKT)条件
\qquad
根据上文,我们有原凸优化问题:
min
x
⃗
f
(
x
⃗
)
s
.
t
.
g
i
(
x
⃗
)
≤
0
,
i
=
1
,
2
,
3
,
⋯
,
k
h
j
(
x
⃗
)
=
0
,
j
=
1
,
2
,
3
,
⋯
,
l
\min_{\vec{x}}f(\vec{x}) \\ s.t. \quad g_{i}(\vec{x}) \leq 0, i=1,2,3, \cdots, k \\ \qquad h_{j}(\vec{x})=0,j=1,2,3, \cdots, l
xminf(x)s.t.gi(x)≤0,i=1,2,3,⋯,khj(x)=0,j=1,2,3,⋯,l
\qquad
然后通过拉格朗日函数得到了一对对偶问题:
L
(
x
⃗
,
α
⃗
,
β
⃗
)
=
f
(
x
⃗
)
+
∑
i
=
1
k
α
i
g
i
(
x
⃗
)
+
∑
j
=
1
l
β
j
h
j
(
x
⃗
)
P
r
i
m
a
l
p
r
o
b
l
e
m
:
min
x
⃗
θ
p
(
x
⃗
)
=
min
x
⃗
max
α
⃗
,
β
⃗
L
(
x
⃗
,
α
⃗
,
β
⃗
)
D
u
a
l
p
r
o
b
l
e
m
:
max
α
⃗
,
β
⃗
θ
d
(
α
⃗
,
β
⃗
)
=
max
α
⃗
,
β
⃗
min
x
⃗
L
(
x
⃗
,
α
⃗
,
β
⃗
)
L(\vec{x},\vec{\alpha},\vec{\beta}) = f(\vec{x}) + \sum_{i=1}^{k}\alpha_{i}g_{i}(\vec{x})+\sum_{j=1}^{l}\beta_{j}h_{j}(\vec{x}) \\ Primal \quad problem: \quad \min_{\vec{x}}{\theta_{p}(\vec{x})} = \min_{\vec{x}}\max_{\vec{\alpha},\vec{\beta}}L(\vec{x},\vec{\alpha},\vec{\beta}) \\ Dual \quad problem: \quad \max_{\vec{\alpha},\vec{\beta}}{\theta_{d}(\vec{\alpha},\vec{\beta})} = \max_{\vec{\alpha},\vec{\beta}}\min_{\vec{x}}L(\vec{x},\vec{\alpha},\vec{\beta})
L(x,α,β)=f(x)+i=1∑kαigi(x)+j=1∑lβjhj(x)Primalproblem:xminθp(x)=xminα,βmaxL(x,α,β)Dualproblem:α,βmaxθd(α,β)=α,βmaxxminL(x,α,β)
\qquad
当一对对偶问题满足Slater条件存在强对偶关系时,可以得到:
p
∗
=
θ
p
(
x
⃗
∗
)
=
min
x
⃗
θ
p
(
x
⃗
)
=
d
∗
=
θ
d
(
α
⃗
∗
,
β
⃗
∗
)
=
max
α
⃗
,
β
⃗
θ
d
(
α
⃗
,
β
⃗
)
p^{*} = {\theta_{p}(\vec{x}^{*})} = \min_{\vec{x}}{\theta_{p}(\vec{x})} = d^{*} = {\theta_{d}(\vec{\alpha}^{*},\vec{\beta}^{*})} = \max_{\vec{\alpha},\vec{\beta}}{\theta_{d}(\vec{\alpha},\vec{\beta})}
p∗=θp(x∗)=xminθp(x)=d∗=θd(α∗,β∗)=α,βmaxθd(α,β)
\qquad
其中,
x
⃗
∗
,
α
⃗
∗
,
β
⃗
∗
\vec{x}^{*}, \vec{\alpha}^{*}, \vec{\beta}^{*}
x∗,α∗,β∗ 分别是取到原始问题和对偶问题最优值时的解。
\qquad
那么,在得到强对偶关系的条件下,将原问题转化为较容易求解的对偶问题,求出对偶问题的最优值
d
∗
d^{*}
d∗ 和最优解
α
⃗
∗
,
β
⃗
∗
{\vec{\alpha}^{*},\vec{\beta}^{*}}
α∗,β∗ 后,可以通过Karush-Kuhn-Tucker(KKT)条件转而求出原问题的最优解
x
⃗
∗
\vec{x}^{*}
x∗ 。KKT条件是用于证明
x
⃗
∗
\vec{x}^{*}
x∗、
α
⃗
∗
,
β
⃗
∗
{\vec{\alpha}^{*},\vec{\beta}^{*}}
α∗,β∗ 分别是原始问题与对偶问题的解的充分必要条件。
\qquad
KKT条件:
{
∇
x
⃗
L
(
x
⃗
∗
,
α
⃗
∗
,
β
⃗
∗
)
=
0
(
1
)
g
i
(
x
⃗
∗
)
≤
0
,
i
=
1
,
2
,
3
,
⋯
,
k
(
2
)
h
j
(
x
⃗
∗
)
=
0
,
j
=
1
,
2
,
3
,
⋯
,
l
(
3
)
α
i
∗
≥
0
,
i
=
1
,
2
,
3
,
⋯
,
k
(
4
)
α
i
∗
g
i
(
x
⃗
∗
)
=
0
,
i
=
1
,
2
,
3
,
⋯
,
k
(
5
)
\begin{cases} \nabla_{\vec{x}}L(\vec{x}^{*},\vec{\alpha}^{*},\vec{\beta}^{*}) = 0 & &(1) \\ g_{i}(\vec{x}^{*}) \leq 0 ,& i = 1,2,3, \cdots ,k &(2) \\ h_{j}(\vec{x}^{*}) = 0 ,& j = 1,2,3, \cdots ,l &(3) \\ \alpha_{i}^{*} \geq 0 ,& i = 1,2,3, \cdots ,k &(4) \\ \alpha_{i}^{*}g_{i}(\vec{x}^{*}) = 0 ,& i = 1,2,3, \cdots ,k &(5) \end{cases}
⎩⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎧∇xL(x∗,α∗,β∗)=0gi(x∗)≤0,hj(x∗)=0,αi∗≥0,αi∗gi(x∗)=0,i=1,2,3,⋯,kj=1,2,3,⋯,li=1,2,3,⋯,ki=1,2,3,⋯,k(1)(2)(3)(4)(5)
p ∗ = θ p ( x ⃗ ∗ ) = d ∗ = θ d ( α ⃗ ∗ , β ⃗ ∗ ) = L ( x ⃗ ∗ , α ⃗ ∗ , β ⃗ ∗ ) ⟺ { ∇ x ⃗ L ( x ⃗ ∗ , α ⃗ ∗ , β ⃗ ∗ ) = 0 g i ( x ⃗ ∗ ) ≤ 0 , i = 1 , 2 , 3 , ⋯ , k h j ( x ⃗ ∗ ) = 0 , j = 1 , 2 , 3 , ⋯ , l α i ∗ ≥ 0 , i = 1 , 2 , 3 , ⋯ , k α i ∗ g i ( x ⃗ ∗ ) = 0 , i = 1 , 2 , 3 , ⋯ , k p^{*} = \theta_{p}(\vec{x}^{*}) = d^{*} = \theta_{d}(\vec{\alpha}^{*},\vec{\beta}^{*}) = L(\vec{x}^{*},\vec{\alpha}^{*},\vec{\beta}^{*})\iff \begin{cases} \nabla_{\vec{x}}L(\vec{x}^{*},\vec{\alpha}^{*},\vec{\beta}^{*}) = 0 & \\ g_{i}(\vec{x}^{*}) \leq 0 ,& i = 1,2,3, \cdots ,k \\ h_{j}(\vec{x}^{*}) = 0 ,& j = 1,2,3, \cdots ,l \\ \alpha_{i}^{*} \geq 0 ,& i = 1,2,3, \cdots ,k \\ \alpha_{i}^{*}g_{i}(\vec{x}^{*}) = 0 ,& i = 1,2,3, \cdots ,k \end{cases} p∗=θp(x∗)=d∗=θd(α∗,β∗)=L(x∗,α∗,β∗)⟺⎩⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎧∇xL(x∗,α∗,β∗)=0gi(x∗)≤0,hj(x∗)=0,αi∗≥0,αi∗gi(x∗)=0,i=1,2,3,⋯,kj=1,2,3,⋯,li=1,2,3,⋯,ki=1,2,3,⋯,k
\qquad 下面证明充分性:
\qquad 由于 d ∗ = p ∗ d^{*} = p^{*} d∗=p∗, 假设 x ⃗ ∗ \vec{x}^{*} x∗、 α ⃗ ∗ , β ⃗ ∗ {\vec{\alpha}^{*},\vec{\beta}^{*}} α∗,β∗ 分别是原始问题与对偶问题的解,那么有:
d ∗ = max α ⃗ , β ⃗ θ d ( α ⃗ , β ⃗ ) ( a ) = θ d ( α ⃗ ∗ , β ⃗ ∗ ) ( b ) = min x ⃗ L ( x ⃗ , α ⃗ ∗ , β ⃗ ∗ ) ( c ) ≤ L ( x ⃗ ∗ , α ⃗ ∗ , β ⃗ ∗ ) ( d ) = f ( x ⃗ ∗ ) + ∑ i = 1 k α i ∗ g i ( x ⃗ ∗ ) + ∑ j = 1 l β j ∗ h j ( x ⃗ ∗ ) ( e ) = p ∗ \begin{aligned} d^{*} & = \max_{\vec{\alpha},\vec{\beta}}\theta_{d}(\vec{\alpha},\vec{\beta}) &(a) \\ & = \theta_{d}(\vec{\alpha}^{*},\vec{\beta}^{*}) &(b) \\ & = \min_{\vec{x}}L(\vec{x}, \vec{\alpha}^{*},\vec{\beta}^{*}) &(c) \\ & \leq L(\vec{x}^{*},\vec{\alpha}^{*},\vec{\beta}^{*}) &(d) \\ & = f(\vec{x}^{*}) + \sum_{i=1}^{k}\alpha_{i}^{*}g_{i}(\vec{x}^{*}) + \sum_{j=1}^{l}\beta_{j}^{*}h_{j}(\vec{x}^{*}) &(e) \\ & = p^{*} \end{aligned} d∗=α,βmaxθd(α,β)=θd(α∗,β∗)=xminL(x,α∗,β∗)≤L(x∗,α∗,β∗)=f(x∗)+i=1∑kαi∗gi(x∗)+j=1∑lβj∗hj(x∗)=p∗(a)(b)(c)(d)(e)
\qquad 对于满足KKT条件的 x ⃗ ∗ \vec{x}^{*} x∗、 α ⃗ ∗ , β ⃗ ∗ {\vec{\alpha}^{*},\vec{\beta}^{*}} α∗,β∗ 可以使得上述的不等式都取到等号,所以充分性得证。
\qquad 下面证明必要性:
d
∗
=
max
α
⃗
,
β
⃗
θ
d
(
α
⃗
,
β
⃗
)
(
a
)
=
θ
d
(
α
⃗
∗
,
β
⃗
∗
)
(
b
)
=
min
x
⃗
L
(
x
⃗
,
α
⃗
∗
,
β
⃗
∗
)
(
c
)
≤
L
(
x
⃗
∗
,
α
⃗
∗
,
β
⃗
∗
)
(
d
)
=
f
(
x
⃗
∗
)
+
∑
i
=
1
k
α
i
∗
g
i
(
x
⃗
∗
)
+
∑
j
=
1
l
β
j
∗
h
j
(
x
⃗
∗
)
(
e
)
≤
f
(
x
⃗
∗
)
(
f
)
=
p
∗
\begin{aligned} d^{*} & = \max_{\vec{\alpha},\vec{\beta}}\theta_{d}(\vec{\alpha},\vec{\beta}) &(a) \\ & = \theta_{d}(\vec{\alpha}^{*},\vec{\beta}^{*}) &(b) \\ & = \min_{\vec{x}}L(\vec{x}, \vec{\alpha}^{*},\vec{\beta}^{*}) &(c) \\ & \leq L(\vec{x}^{*},\vec{\alpha}^{*},\vec{\beta}^{*}) &(d) \\ & = f(\vec{x}^{*}) + \sum_{i=1}^{k}\alpha_{i}^{*}g_{i}(\vec{x}^{*}) + \sum_{j=1}^{l}\beta_{j}^{*}h_{j}(\vec{x}^{*}) &(e) \\ & \leq f(\vec{x}^{*}) &(f) \\ & = p^{*} \end{aligned}
d∗=α,βmaxθd(α,β)=θd(α∗,β∗)=xminL(x,α∗,β∗)≤L(x∗,α∗,β∗)=f(x∗)+i=1∑kαi∗gi(x∗)+j=1∑lβj∗hj(x∗)≤f(x∗)=p∗(a)(b)(c)(d)(e)(f)
\qquad
x
⃗
∗
,
α
⃗
∗
,
β
⃗
∗
\vec{x}^{*}, \vec{\alpha}^{*}, \vec{\beta}^{*}
x∗,α∗,β∗ 分别是取到原始问题和对偶问题最优值时的解,那么KKT条件中的(2)(3)(4)天然满足。要使得强对偶关系成立,即
d
∗
=
p
∗
d^{*} = p^{*}
d∗=p∗ ,
(
d
)
(d)
(d) 和
(
f
)
(f)
(f) 中的不等号应该严格取等号。下面给出KKT条件中(1)(5)的推导。
\qquad 对不等式关系 ( c ) (c) (c) 和 ( d ) (d) (d) 取等,有:
∵ min x ⃗ L ( x ⃗ , α ⃗ ∗ , β ⃗ ∗ ) = L ( x ⃗ ∗ , α ⃗ ∗ , β ⃗ ∗ ) ∴ ∇ x ⃗ L ( x ⃗ ∗ , α ⃗ ∗ , β ⃗ ∗ ) = 0 ( 1 ) \begin{aligned} & \because \min_{\vec{x}}L(\vec{x},\vec{\alpha}^{*},\vec{\beta}^{*}) = L(\vec{x}^{*},\vec{\alpha}^{*},\vec{\beta}^{*}) \\ & \therefore \nabla_{\vec{x}}L(\vec{x}^{*},\vec{\alpha}^{*},\vec{\beta}^{*}) = 0 &(1) \end{aligned} ∵xminL(x,α∗,β∗)=L(x∗,α∗,β∗)∴∇xL(x∗,α∗,β∗)=0(1)
\qquad 对不等式关系 ( e ) (e) (e) 和 ( f ) (f) (f) 取等,有:
∵
h
j
(
x
⃗
∗
)
=
0
∴
∑
j
=
1
l
β
j
∗
h
j
(
x
⃗
∗
)
=
0
∴
∑
i
=
1
k
α
i
∗
g
i
(
x
⃗
∗
)
=
0
∵
α
i
∗
≥
0
,
g
i
(
x
⃗
∗
)
≤
0
i
=
1
,
2
,
3
,
⋯
,
k
∴
α
i
∗
g
i
(
x
⃗
∗
)
=
0
i
=
1
,
2
,
3
,
⋯
,
k
(
5
)
\begin{aligned} & \because h_{j}(\vec{x}^{*}) = 0 \\ & \therefore \sum_{j=1}^{l}\beta_{j}^{*}h_{j}(\vec{x}^{*}) = 0 \\ & \therefore \sum_{i=1}^{k}\alpha_{i}^{*}g_{i}(\vec{x}^{*}) = 0 \\ & \because \alpha_{i}^{*} \geq 0 , \quad g_{i}(\vec{x}^{*}) \leq 0 & i = 1,2,3, \cdots ,k \\ & \therefore \alpha_{i}^{*}g_{i}(\vec{x}^{*}) = 0 & i = 1,2,3, \cdots ,k &&(5) \end{aligned}
∵hj(x∗)=0∴j=1∑lβj∗hj(x∗)=0∴i=1∑kαi∗gi(x∗)=0∵αi∗≥0,gi(x∗)≤0∴αi∗gi(x∗)=0i=1,2,3,⋯,ki=1,2,3,⋯,k(5)