KKT条件是约束优化问题最优解的一阶必要条件,证明角度有很多,比较容易看懂的是从约束条件梯度线性无关角度出发的证明,下面进行分析。
约束优化问题的一般形式可写为
min
x
∈
R
n
f
(
x
)
s
.
t
.
{
c
i
(
x
)
=
0
,
i
∈
E
c
i
(
x
)
≤
0
,
i
∈
I
(1)
\min_{x\in \mathbb{R} ^n} f\left( x \right) \\s.t. \left\{ \begin{aligned} c_i(x)&=0, i\in \mathcal{E}\\ c_i(x)&\le 0, i\in \mathcal{I}\\\end{aligned} \right. \tag{1}
x∈Rnminf(x)s.t.{ci(x)ci(x)=0,i∈E≤0,i∈I(1)
其中,
f
f
f和
c
i
c_i
ci均为光滑函数,
E
\mathcal{E}
E和
I
\mathcal{I}
I分别表示等式和不等式约束。可行集为满足约束
c
i
c_i
ci的
x
x
x集合,即
Ω
=
{
x
∣
c
i
(
x
)
=
0
,
i
∈
E
;
c
i
(
x
)
≤
0
,
i
∈
I
}
\Omega =\left\{ \left. x \right|c_i(x)=0, i\in \mathcal{E} ; c_i(x)\le 0, i\in \mathcal{I} \right\}
Ω={x∣ci(x)=0,i∈E;ci(x)≤0,i∈I}。
- 局部解: x ∗ ∈ Ω x^{\ast}\in \Omega x∗∈Ω,且存在 x ∗ x^{\ast} x∗的邻域 N \mathcal{N} N,使得当 x ∈ N ∩ Ω x\in \mathcal{N} \cap \Omega x∈N∩Ω时, f ( x ∗ ) ≤ f ( x ) f\left( x^{\ast} \right) \le f\left( x \right) f(x∗)≤f(x);
- 严格局部解: x ∗ ∈ Ω x^{\ast}\in \Omega x∗∈Ω,且存在 x ∗ x^{\ast} x∗的邻域 N \mathcal{N} N,使得当 x ∈ N ∩ Ω x\in \mathcal{N} \cap \Omega x∈N∩Ω且 x ≠ x ∗ x\ne x^{\ast} x=x∗时, f ( x ∗ ) < f ( x ) f\left( x^{\ast} \right) < f\left( x \right) f(x∗)<f(x);
- 孤立局部解: x ∗ ∈ Ω x^{\ast}\in \Omega x∗∈Ω,且存在 x ∗ x^{\ast} x∗的邻域 N \mathcal{N} N,使得 x ∗ x^{\ast} x∗为 N ∩ Ω \mathcal{N} \cap \Omega N∩Ω的唯一局部解;
- 积极集(active set): A ( x ) = E ∪ { i ∈ I ∣ c i ( x ) = 0 } \mathcal{A} \left( x \right) =\mathcal{E} \cup \left\{ \left. i\in \mathcal{I} \right|c_i\left( x \right) =0 \right\} A(x)=E∪{i∈I∣ci(x)=0};
- 接近可行点 x x x的可行序列 { z k } \left\{ z_k \right\} {zk}:对充分大的 k k k, z k ∈ Ω z_k\in \Omega zk∈Ω,且 z k → x z_k\rightarrow x zk→x;
- 切向量 d d d:存在接近可行点 x x x的可行序列 { z k } \left\{ z_k \right\} {zk}和趋于0的正数序列 { t k } \left\{ t_k \right\} {tk}(即 t k → 0 t_k\rightarrow 0 tk→0),使得 lim k → ∞ z k − x t k = d \lim\limits_{k\rightarrow \infty} \frac{z_k-x}{t_k}=d k→∞limtkzk−x=d;
- 切锥:在 x ∗ x^{\ast} x∗处的切向量集合 T Ω ( x ∗ ) T_{\Omega}\left( x^{\ast} \right) TΩ(x∗);
- 线性化可行方向: F ( x ) = { d ∣ d T ∇ c i ( x ) = 0 , i ∈ E d T ∇ c i ( x ) ≤ 0 , i ∈ A ( x ) ∩ I } \mathcal{F} \left( x \right) =\left\{ d\left| \begin{array}{l} d^{\mathrm{T}}\nabla c_i(x)=0,i\in \mathcal{E}\\ d^{\mathrm{T}}\nabla c_i(x)\le 0,i\in \mathcal{A} (x)\cap \mathcal{I}\\\end{array} \right. \right\} F(x)={d∣∣∣∣dT∇ci(x)=0,i∈EdT∇ci(x)≤0,i∈A(x)∩I}
- LICQ(Linear independence constraint qualification): { ∇ c i ( x ) , i ∈ A ( x ) } \left\{ \nabla c_i\left( x \right) ,i\in \mathcal{A} \left( x \right) \right\} {∇ci(x),i∈A(x)}线性无关;
KKT条件 设
x
∗
x^{\ast}
x∗为的局部解,
f
f
f和
c
i
c_i
ci连续可微,且在
x
∗
x^{\ast}
x∗处LICQ条件成立,则存在Lagrange乘子向量
λ
∗
\lambda ^{\ast}
λ∗(其分量为
λ
i
∗
,
i
∈
E
∪
I
\lambda _{i}^{\ast},i\in \mathcal{E} \cup \mathcal{I}
λi∗,i∈E∪I),使得如下条件在
(
x
∗
,
λ
∗
)
\left( x^{\ast},\lambda ^{\ast} \right)
(x∗,λ∗)处成立:
∇
x
L
(
x
∗
,
λ
∗
)
=
∇
f
(
x
∗
)
+
∑
i
∈
E
∪
I
λ
i
∗
∇
c
i
(
x
∗
)
=
0
c
i
(
x
∗
)
=
0
,
i
∈
E
c
i
(
x
∗
)
≤
0
,
i
∈
I
λ
i
∗
≥
0
,
i
∈
I
λ
i
∗
c
i
(
x
∗
)
=
0
,
i
∈
E
∪
I
(2)
\begin{aligned} \nabla _x\mathcal{L} \left( x^{\ast},\lambda ^{\ast} \right) &=\nabla f\left( x^{\ast} \right) +\sum_{i\in \mathcal{E} \cup \mathcal{I}}{\lambda _{i}^{\ast}\nabla c_i\left( x^{\ast} \right)}=0\\ c_i\left( x^{\ast} \right) &=0,i\in \mathcal{E} \\ c_i\left( x^{\ast} \right) &\le 0,i\in \mathcal{I}\\ \lambda _{i}^{\ast}&\ge 0,i\in \mathcal{I}\\ \lambda _{i}^{\ast}c_i\left( x^{\ast} \right) &=0,i\in \mathcal{E} \cup \mathcal{I} \end{aligned}\tag{2}
∇xL(x∗,λ∗)ci(x∗)ci(x∗)λi∗λi∗ci(x∗)=∇f(x∗)+i∈E∪I∑λi∗∇ci(x∗)=0=0,i∈E≤0,i∈I≥0,i∈I=0,i∈E∪I(2)
式(2)中的最后一项又称为互补松弛条件。
- 严格互补:对 i ∈ I i\in \mathcal{I} i∈I, λ i ∗ \lambda _{i}^{\ast} λi∗和 c i ( x ∗ ) c_i\left( x^{\ast} \right) ci(x∗)只有一个为零,即对 i ∈ A ∩ I ( x ) i\in \mathcal{A} \cap \mathcal{I} \left( x \right) i∈A∩I(x), λ i ∗ > 0 \lambda _{i}^{\ast}>0 λi∗>0;
引理1 设 x ∗ x^{\ast} x∗为可行点,则有
(1) T Ω ( x ∗ ) ⊂ F ( x ∗ ) T_{\Omega}( x^{\ast}) \subset \mathcal{F}( x^{\ast}) TΩ(x∗)⊂F(x∗);
(2)若LICQ条件成立,则 F ( x ∗ ) = T Ω ( x ∗ ) \mathcal{F}(x^{\ast}) =T_{\Omega}(x^{\ast}) F(x∗)=TΩ(x∗)。
证明:
(1)令
{
z
k
}
\{ z_k\}
{zk}和
{
t
k
}
\{ t_k\}
{tk}为定义切向量的序列,即
lim
k
→
∞
z
k
−
x
∗
t
k
=
d
\displaystyle\lim_{k\rightarrow \infty} \frac{z_k-x^{\ast}}{t_k}=d
k→∞limtkzk−x∗=d,
t
k
>
0
t_k>0
tk>0且
t
k
→
0
t_k\rightarrow 0
tk→0,则对充分大的
k
k
k,有
z
k
=
x
∗
+
t
k
d
+
o
(
t
k
)
(3)
z_k=x^\ast+t_kd+o(t_k)\tag{3}
zk=x∗+tkd+o(tk)(3)
注意到
z
k
∈
Ω
z_k\in \Omega
zk∈Ω,对等式约束
i
∈
E
i\in\mathcal{E}
i∈E,结合式(3)和Taylor公式有
0
=
1
t
k
c
i
(
z
k
)
=
1
t
k
[
c
i
(
x
∗
)
+
t
k
∇
c
i
T
(
x
∗
)
d
+
o
(
t
k
)
]
=
∇
c
i
T
(
x
∗
)
d
+
o
(
t
k
)
t
k
(4)
\begin{aligned} 0&=\frac{1}{t_k}c_i(z_k)\\ &=\frac{1}{t_k}[c_i(x^\ast)+t_k\nabla c_i^\mathrm{T}(x^\ast)d+o(t_k)]\\ &=\nabla c_i^\mathrm{T}(x^\ast)d+\frac{o(t_k)}{t_k} \end{aligned}\tag{4}
0=tk1ci(zk)=tk1[ci(x∗)+tk∇ciT(x∗)d+o(tk)]=∇ciT(x∗)d+tko(tk)(4)
对上式取极限 k → ∞ k\rightarrow\infty k→∞,有 ∇ c i T ( x ∗ ) d = 0 \nabla c_i^\mathrm{T}(x^\ast)d=0 ∇ciT(x∗)d=0。
类似地,对于积极的不等式约束
i
∈
A
(
x
∗
)
∩
I
i\in\mathcal{A}(x^\ast)\cap\mathcal{I}
i∈A(x∗)∩I,有
0
≥
1
t
k
c
i
(
z
k
)
=
1
t
k
[
c
i
(
x
∗
)
+
t
k
∇
c
i
T
(
x
∗
)
d
+
o
(
t
k
)
]
=
∇
c
i
T
(
x
∗
)
d
+
o
(
t
k
)
t
k
(5)
\begin{aligned} 0&\geq\frac{1}{t_k}c_i(z_k)\\ &=\frac{1}{t_k}[c_i(x^\ast)+t_k\nabla c_i^\mathrm{T}(x^\ast)d+o(t_k)]\\ &=\nabla c_i^\mathrm{T}(x^\ast)d+\frac{o(t_k)}{t_k} \end{aligned}\tag{5}
0≥tk1ci(zk)=tk1[ci(x∗)+tk∇ciT(x∗)d+o(tk)]=∇ciT(x∗)d+tko(tk)(5)
对上式取极限 k → ∞ k\rightarrow\infty k→∞,有 ∇ c i T ( x ∗ ) d ≤ 0 \nabla c_i^\mathrm{T}(x^\ast)d\leq0 ∇ciT(x∗)d≤0,第(1)条得证。
(2)设 x ∗ x^\ast x∗处有 m m m个积极约束( m < n m<n m<n,即积极约束的个数小于优化变量的维数,否则优化问题没有意义),记为 c ( x ∗ ) = [ c 1 ( x ∗ ) ⋯ c i ( x ∗ ) ⋯ c m ( x ∗ ) ] i ∈ A ( x ∗ ) T c(x^\ast)=[c_1(x^\ast)\cdots c_i(x^\ast)\cdots c_m(x^\ast)]^\mathrm{T}_{i\in\mathcal{A}(x^\ast)} c(x∗)=[c1(x∗)⋯ci(x∗)⋯cm(x∗)]i∈A(x∗)T, c ( x ∗ ) c(x^\ast) c(x∗)各分量的梯度组成 m × n m\times n m×n矩阵 A ( x ∗ ) A(x^\ast) A(x∗)的行,即 A T ( x ∗ ) = [ ∇ c i ( x ∗ ) ] i ∈ A ( x ∗ ) A^\mathrm{T}(x^\ast)=[\nabla c_i(x^\ast)]_{i\in\mathcal{A}(x^\ast)} AT(x∗)=[∇ci(x∗)]i∈A(x∗),由于LICQ条件成立, A ( x ∗ ) A(x^\ast) A(x∗)为行满秩矩阵,令其零空间的基向量组成 n × ( n − m ) n\times (n-m) n×(n−m)矩阵 Z Z Z的列,即 A ( x ∗ ) Z = 0 A(x^\ast)Z=0 A(x∗)Z=0且 Z Z Z为列满秩矩阵。
设
d
∈
F
(
x
∗
)
d\in\mathcal{F}( x^{\ast})
d∈F(x∗),且
{
t
k
}
k
=
0
∞
\{t_k\}_{k=0}^\infty
{tk}k=0∞为满足
lim
k
→
∞
t
k
=
0
\displaystyle\lim_{k\rightarrow\infty}t_k=0
k→∞limtk=0的任意正数序列,定义如下参数化方程
R
(
z
,
t
)
=
[
c
(
z
)
−
t
A
(
x
∗
)
d
Z
T
(
z
−
x
∗
−
t
d
)
]
=
0
(6)
R(z,t)=\begin{bmatrix} c(z)-tA(x^\ast)d\\ Z^\mathrm{T}(z-x^\ast-td) \end{bmatrix}=0\tag{6}
R(z,t)=[c(z)−tA(x∗)dZT(z−x∗−td)]=0(6)
下面只需证明给定足够小的 t = t k > 0 t=t_k>0 t=tk>0,式(6)的解 z = z k z=z_k z=zk为接近 x ∗ x^\ast x∗的可行序列,且 lim k → ∞ z k − x t k = d \lim\limits_{k\rightarrow \infty} \frac{z_k-x}{t_k}=d k→∞limtkzk−x=d即可。
考虑到
t
=
0
t=0
t=0,
z
=
x
∗
z=x^\ast
z=x∗为式(6)的一个解,对应
R
R
R的Jacobian矩阵为
∇
z
R
(
x
∗
,
0
)
=
[
A
(
x
∗
)
Z
T
]
(7)
\nabla_zR(x^\ast,0)=\begin{bmatrix} A(x^\ast)\\ Z^\mathrm{T} \end{bmatrix}\tag{7}
∇zR(x∗,0)=[A(x∗)ZT](7)
显然
∇
z
R
(
x
∗
,
0
)
\nabla_zR(x^\ast,0)
∇zR(x∗,0)为非奇异矩阵,由隐函数定理可知,对充分小的
t
k
>
0
t_k>0
tk>0,式(6)有唯一解
z
k
z_k
zk,且由
d
∈
F
(
x
∗
)
d\in\mathcal{F}( x^{\ast})
d∈F(x∗)可知有
i
∈
E
⇒
c
i
(
z
k
)
=
t
k
∇
c
i
T
(
x
∗
)
d
=
0
i
∈
A
(
x
∗
)
∩
I
⇒
c
i
(
z
k
)
=
t
k
∇
c
i
T
(
x
∗
)
d
≤
0
(8)
\begin{aligned} i\in\mathcal{E}\Rightarrow c_i(z_k)=t_k\nabla c_i^\mathrm{T}(x^\ast)d=0\\ i\in\mathcal{A}(x^\ast) \cap \mathcal{I}\Rightarrow c_i(z_k)=t_k\nabla c_i^\mathrm{T}(x^\ast)d\leq0\tag{8} \end{aligned}
i∈E⇒ci(zk)=tk∇ciT(x∗)d=0i∈A(x∗)∩I⇒ci(zk)=tk∇ciT(x∗)d≤0(8)
可见 z k z_k zk为可行点(对于充分小的 t k > 0 t_k>0 tk>0, z k z_k zk足够接近 x ∗ x^\ast x∗,考虑 x ∗ x^\ast x∗足够小邻域内的 z k z_k zk,则可知 z k z_k zk同样满足其他严格的不等式约束)。
事实上,在
(
x
∗
,
0
)
(x^\ast,0)
(x∗,0)附近,式(6)的解
z
z
z可视为关于
t
t
t的隐函数,即
z
=
z
(
t
)
z=z(t)
z=z(t)且
z
k
=
z
(
t
k
)
z_k=z(t_k)
zk=z(tk),且由隐函数定理可知
z
z
z关于
t
t
t连续可微,满足
z
′
(
0
)
=
−
∇
z
R
(
x
∗
,
0
)
−
1
∇
R
t
(
x
∗
,
0
)
(9)
z'(0)=-\nabla_zR(x^\ast,0)^{-1}\nabla R_t(x^\ast,0)\tag{9}
z′(0)=−∇zR(x∗,0)−1∇Rt(x∗,0)(9)
结合式(6)、(7)和(9)可知
z
′
(
0
)
=
d
z'(0)=d
z′(0)=d,由于
z
(
0
)
=
x
∗
z(0)=x^\ast
z(0)=x∗,有
z
k
−
x
∗
t
k
=
z
(
0
)
+
t
k
z
′
(
0
)
+
o
(
t
k
)
−
x
∗
t
k
=
d
+
o
(
t
k
)
t
k
(10)
\frac{z_k-x^\ast}{t_k}=\frac{z(0)+t_kz'(0)+o(t_k)-x^\ast}{t_k}=d+\frac{o(t_k)}{t_k}\tag{10}
tkzk−x∗=tkz(0)+tkz′(0)+o(tk)−x∗=d+tko(tk)(10)
上式取极限 k → ∞ k\rightarrow\infty k→∞, t k → 0 t_k\rightarrow 0 tk→0可知 lim k → ∞ z k − x t k = d \lim\limits_{k\rightarrow \infty} \frac{z_k-x}{t_k}=d k→∞limtkzk−x=d,因此 d ∈ T Ω ( x ∗ ) d\in T_{\Omega}( x^{\ast}) d∈TΩ(x∗),第(2)条得证。
定理1 设 x ∗ x^{\ast} x∗为局部解,则对 d ∈ T Ω ( x ∗ ) d\in T_{\Omega}( x^{\ast}) d∈TΩ(x∗), ∇ f T ( x ∗ ) d ≥ 0 \nabla f^\mathrm{T}(x^\ast)d\geq 0 ∇fT(x∗)d≥0。
证明: 反证法,假设存在
d
∈
T
Ω
(
x
∗
)
d\in T_{\Omega}( x^{\ast})
d∈TΩ(x∗)使得
∇
f
T
(
x
∗
)
d
<
0
\nabla f^\mathrm{T}(x^\ast)d< 0
∇fT(x∗)d<0,令
d
d
d对应的序列分别为
{
z
k
}
\{z_k\}
{zk}和
{
t
k
}
\{t_k\}
{tk},则有
f
(
z
k
)
=
f
(
x
∗
)
+
(
z
k
−
x
∗
)
T
∇
f
(
x
∗
)
+
o
(
∥
z
k
−
x
∗
∥
)
=
f
(
x
∗
)
+
t
k
d
T
∇
f
(
x
∗
)
+
o
(
t
k
)
(11)
\begin{aligned} f(z_k)&=f(x^\ast)+(z_k-x^\ast)^\mathrm{T}\nabla f(x^\ast)+o(\|z_k-x^\ast\|)\\ &=f(x^\ast)+t_kd^\mathrm{T}\nabla f(x^\ast)+o(t_k) \end{aligned}\tag{11}
f(zk)=f(x∗)+(zk−x∗)T∇f(x∗)+o(∥zk−x∗∥)=f(x∗)+tkdT∇f(x∗)+o(tk)(11)
由于 ∇ f T ( x ∗ ) d < 0 \nabla f^\mathrm{T}(x^\ast)d< 0 ∇fT(x∗)d<0,对充分大的 k k k,有 f ( z k ) < f ( x ∗ ) f(z_k)<f(x^\ast) f(zk)<f(x∗),因此,给定 x ∗ x^\ast x∗的任意开邻域,可通过选择足够大的 k k k,使得 z k z_k zk位于该邻域且 f ( z k ) < f ( x ∗ ) f(z_k)<f(x^\ast) f(zk)<f(x∗),与 x ∗ x^{\ast} x∗为局部解矛盾,证毕。
引理2(Farkas引理) 考虑锥
K
=
{
B
y
+
C
w
∣
y
≥
0
}
K=\{By+Cw\vert y\geq 0\}
K={By+Cw∣y≥0},
B
B
B和
C
C
C分别为
n
×
m
n\times m
n×m和
n
×
p
n\times p
n×p矩阵,
y
y
y和
w
w
w为合适维度的向量。对任意向量
g
∈
R
n
g\in\mathbb{R}^n
g∈Rn,要么
g
∈
K
g\in K
g∈K,要么存在
d
∈
R
n
d\in\mathbb{R}^n
d∈Rn使得
g
T
d
<
0
,
B
T
d
≥
0
,
C
T
d
=
0
(12)
g^\mathrm{T}d<0,\;B^\mathrm{T}d\geq 0,\;C^\mathrm{T}d=0\tag{12}
gTd<0,BTd≥0,CTd=0(12)
证明: 首先证明两种情况不能同时成立。若
g
∈
K
g\in K
g∈K,则存在向量
y
≥
0
y\geq 0
y≥0和
w
w
w使得
g
=
B
y
+
C
w
g=By+Cw
g=By+Cw。若此时还存在
d
d
d使得式(12)成立,则有
0
>
d
T
g
=
d
T
B
y
+
d
T
C
w
=
(
B
T
d
)
T
y
+
(
C
T
d
)
T
w
≥
0
0>d^\mathrm{T}g=d^\mathrm{T}By+d^\mathrm{T}Cw=(B^\mathrm{T}d)^\mathrm{T}y+(C^\mathrm{T}d)^\mathrm{T}w\ge 0
0>dTg=dTBy+dTCw=(BTd)Ty+(CTd)Tw≥0
因此两种情况不能同时成立。
进一步证明
g
∉
K
g\notin K
g∈/K时式(12)成立。考虑到
K
K
K为闭集,令
s
^
\hat{s}
s^为
K
K
K中距离
g
g
g最近的向量,即为如下优化问题的解:
min
∥
s
−
g
∥
2
2
,
s
u
b
j
e
c
t
t
o
s
∈
K
(13)
\min\Vert s-g\Vert_2^2,\; {\rm{subject\;to}}\;s\in K\tag{13}
min∥s−g∥22,subjecttos∈K(13)
由于
s
^
∈
K
\hat{s}\in K
s^∈K且
K
K
K为锥,因此对
α
≥
0
\alpha\geq 0
α≥0,
α
s
^
∈
K
\alpha\hat{s}\in K
αs^∈K,且当
α
=
1
\alpha=1
α=1时
∥
α
s
^
−
g
∥
2
2
\Vert\alpha\hat{s}-g\Vert_2^2
∥αs^−g∥22最小,因此有
d
d
α
∥
α
s
^
−
g
∥
2
2
∣
α
=
1
=
0
⇒
(
−
2
s
^
T
g
+
2
α
s
^
T
s
^
)
∣
α
=
1
=
0
⇒
s
^
T
(
s
^
−
g
)
=
0
(14)
\left.\frac{\mathrm{d}}{\mathrm{d}\alpha}\Vert\alpha\hat{s}-g\Vert_2^2\right\vert_{\alpha=1}=0\Rightarrow \left.(-2\hat{s}^\mathrm{T}g+2\alpha\hat{s}^\mathrm{T}\hat{s})\right\vert_{\alpha=1}=0\Rightarrow\hat{s}^\mathrm{T}(\hat{s}-g)=0\tag{14}
dαd∥αs^−g∥22∣∣∣∣α=1=0⇒(−2s^Tg+2αs^Ts^)∣∣α=1=0⇒s^T(s^−g)=0(14)
对
K
K
K中任意其他向量
s
s
s,由于
K
K
K为凸集,对
θ
∈
[
0
,
1
]
\theta\in [0,1]
θ∈[0,1],有
∥
s
^
+
θ
(
s
−
s
^
)
−
g
∥
2
2
≥
∥
s
^
−
g
∥
2
2
\Vert\hat{s}+\theta(s-\hat{s})-g\Vert_2^2\geq\Vert\hat{s}-g\Vert_2^2
∥s^+θ(s−s^)−g∥22≥∥s^−g∥22
即
2
θ
(
s
−
s
^
)
T
(
s
^
−
g
)
+
θ
2
∥
s
−
s
^
∥
2
2
≥
0
2\theta(s-\hat{s})^\mathrm{T}(\hat{s}-g)+\theta^2\Vert s-\hat{s}\Vert_2^2\geq 0
2θ(s−s^)T(s^−g)+θ2∥s−s^∥22≥0
上式左右两边除以
θ
\theta
θ并取极限
θ
→
0
+
\theta\rightarrow 0^+
θ→0+,有
(
s
−
s
^
)
T
(
s
^
−
g
)
≥
0
(s-\hat{s})^\mathrm{T}(\hat{s}-g)\geq 0
(s−s^)T(s^−g)≥0,结合式(14)可知对任意
s
∈
K
s\in K
s∈K,
s
T
(
s
^
−
g
)
≥
0
(15)
s^\mathrm{T}(\hat{s}-g)\geq 0\tag{15}
sT(s^−g)≥0(15)
下面进一步证明矢量
d
=
s
^
−
g
d=\hat{s}-g
d=s^−g满足式(12)。由于
g
∉
K
g\notin K
g∈/K,
d
≠
0
d\neq 0
d=0,从而
d
T
g
=
d
T
(
s
^
−
d
)
=
(
s
^
−
g
)
T
s
^
−
d
T
d
=
−
∥
d
∥
2
2
<
0
(16)
d^\mathrm{T}g=d^\mathrm{T}(\hat{s}-d)=(\hat{s}-g)^\mathrm{T}\hat{s}-d^\mathrm{T}d=-\Vert d\Vert_2^2<0\tag{16}
dTg=dT(s^−d)=(s^−g)Ts^−dTd=−∥d∥22<0(16)
由式(15)可知对任意
s
∈
K
s\in K
s∈K,
d
T
s
≥
0
d^\mathrm{T}s\geq 0
dTs≥0,因此对任意
y
≥
0
y\geq 0
y≥0和
w
w
w,有
d
T
(
B
y
+
C
w
)
≥
0
d^\mathrm{T}(By+Cw)\geq 0
dT(By+Cw)≥0
取 y = 0 y=0 y=0可得 ( C T d ) T w ≥ 0 (C^\mathrm{T}d)^\mathrm{T}w\geq 0 (CTd)Tw≥0对任意 w w w均成立,因此 C T d = 0 C^\mathrm{T}d=0 CTd=0;取 w = 0 w=0 w=0可得 ( B T d ) T y ≥ 0 (B^\mathrm{T}d)^\mathrm{T}y\geq 0 (BTd)Ty≥0对任意 y ≥ 0 y\geq 0 y≥0均成立,因此 B T d ≥ 0 B^\mathrm{T}d\geq 0 BTd≥0,结合式(16)可知 d d d满足式(12),证毕。
由引理2,令
B
=
[
−
∇
c
i
(
x
∗
)
]
i
∈
A
(
x
∗
)
∩
I
B=[-\nabla c_i(x^\ast)]_{i\in\mathcal{A}(x^\ast)\cap\mathcal{I}}
B=[−∇ci(x∗)]i∈A(x∗)∩I,
y
=
[
λ
i
]
i
∈
A
(
x
∗
)
∩
I
y=[\lambda_i]_{i\in\mathcal{A}(x^\ast)\cap\mathcal{I}}
y=[λi]i∈A(x∗)∩I,
C
=
[
−
∇
c
i
(
x
∗
)
]
i
∈
E
C=[-\nabla c_i(x^\ast)]_{i\in\mathcal{E}}
C=[−∇ci(x∗)]i∈E,
w
=
[
λ
i
]
i
∈
E
w=[\lambda_i]_{i\in\mathcal{E}}
w=[λi]i∈E,则可定义锥
N
=
{
B
y
+
C
w
∣
y
≥
0
}
=
{
−
∑
i
∈
A
(
x
∗
)
λ
i
∇
c
i
(
x
∗
)
,
λ
i
≥
0
f
o
r
i
∈
A
(
x
∗
)
∩
I
}
\begin{aligned} N&=\{By+Cw\vert y\geq 0\}\\ &=\left\{-\sum_{i\in\mathcal{A}(x^\ast)}\lambda_i\nabla c_i(x^\ast),\;\lambda_i\geq 0\;{\rm for}\;i\in\mathcal{A}(x^\ast)\cap\mathcal{I}\right\} \end{aligned}
N={By+Cw∣y≥0}=⎩⎨⎧−i∈A(x∗)∑λi∇ci(x∗),λi≥0fori∈A(x∗)∩I⎭⎬⎫
并设
g
=
∇
f
(
x
∗
)
g=\nabla f(x^\ast)
g=∇f(x∗),则要么(注意到
A
T
(
x
∗
)
=
[
∇
c
i
(
x
∗
)
]
i
∈
A
(
x
∗
)
A^\mathrm{T}(x^\ast)=[\nabla c_i(x^\ast)]_{i\in\mathcal{A}(x^\ast)}
AT(x∗)=[∇ci(x∗)]i∈A(x∗))
∇
f
(
x
∗
)
=
−
∑
i
∈
A
(
x
∗
)
λ
i
∇
c
i
(
x
∗
)
=
−
A
T
(
x
∗
)
λ
∗
,
λ
i
≥
0
f
o
r
i
∈
A
(
x
∗
)
∩
I
(17)
\nabla f(x^\ast)=-\sum_{i\in\mathcal{A}(x^\ast)}\lambda_i\nabla c_i(x^\ast)=-A^\mathrm{T}(x^\ast)\lambda^\ast,\;\lambda_i\geq 0\;{\rm for}\;i\in\mathcal{A}(x^\ast)\cap\mathcal{I}\tag{17}
∇f(x∗)=−i∈A(x∗)∑λi∇ci(x∗)=−AT(x∗)λ∗,λi≥0fori∈A(x∗)∩I(17)
要么存在 d d d使得 d T ∇ f ( x ∗ ) < 0 d^\mathrm{T}\nabla f(x^\ast)<0 dT∇f(x∗)<0, B T d = [ − ∇ c i ( x ∗ ) ] i ∈ A ( x ∗ ) ∩ I T d ≥ 0 B^\mathrm{T}d=[-\nabla c_i(x^\ast)]^\mathrm{T}_{i\in\mathcal{A}(x^\ast)\cap\mathcal{I}}d\geq 0 BTd=[−∇ci(x∗)]i∈A(x∗)∩ITd≥0, C T d = [ − ∇ c i ( x ∗ ) ] i ∈ E T d = 0 C^\mathrm{T}d=[-\nabla c_i(x^\ast)]^\mathrm{T}_{i\in\mathcal{E}}d=0 CTd=[−∇ci(x∗)]i∈ETd=0,即 d ∈ F ( x ∗ ) d\in\mathcal{F}(x^\ast) d∈F(x∗)。
KKT条件的证明 根据定理1,对
d
∈
T
Ω
(
x
∗
)
d\in T_{\Omega}( x^{\ast})
d∈TΩ(x∗),
d
T
∇
f
(
x
∗
)
≥
0
d^\mathrm{T}\nabla f(x^\ast)\geq 0
dT∇f(x∗)≥0,由于LICQ条件成立,根据引理1可知
F
(
x
∗
)
=
T
Ω
(
x
∗
)
\mathcal{F}(x^{\ast}) =T_{\Omega}(x^{\ast})
F(x∗)=TΩ(x∗),因此对
d
∈
F
(
x
∗
)
d\in \mathcal{F}(x^{\ast})
d∈F(x∗),有
d
T
∇
f
(
x
∗
)
≥
0
d^\mathrm{T}\nabla f(x^\ast)\geq 0
dT∇f(x∗)≥0,由引理2可知存在
λ
\lambda
λ使得式(17)成立。构造
λ
∗
\lambda^\ast
λ∗如下
λ
i
∗
=
{
λ
i
,
i
∈
A
(
x
∗
)
0
,
i
∈
I
\
A
(
x
∗
)
(18)
\lambda_i^\ast=\left\{ \begin{aligned} &\lambda_i,\;i\in\mathcal{A}(x^\ast)\\ &0,\;i\in\mathcal{I}\backslash\mathcal{A}(x^\ast) \end{aligned} \right.\tag{18}
λi∗={λi,i∈A(x∗)0,i∈I\A(x∗)(18)
则可以逐项检查KKT条件是否成立:
- 由 λ ∗ \lambda^\ast λ∗的定义和式(17)可知式(2)第1项成立;
- 由于 x ∗ x^\ast x∗为可行点,式(2)第2项和第3项成立;
- 考虑到 λ i ∗ ≥ 0 f o r i ∈ A ( x ∗ ) ∩ I \lambda_i^\ast\geq 0\;{\rm for}\;i\in\mathcal{A}(x^\ast)\cap\mathcal{I} λi∗≥0fori∈A(x∗)∩I, λ i ∗ = 0 f o r i ∈ I \ A ( x ∗ ) \lambda_i^\ast=0\;{\rm for}\;i\in\mathcal{I}\backslash\mathcal{A}(x^\ast) λi∗=0fori∈I\A(x∗),因此 λ i ∗ ≥ 0 f o r i ∈ I \lambda_i^\ast\geq 0\;{\rm for}\;i\in\mathcal{I} λi∗≥0fori∈I,式(2)第4项成立;
- 由于 c i ( x ∗ ) = 0 f o r i ∈ A ( x ∗ ) c_i(x^\ast)=0 \;{\rm for}\;i\in\mathcal{A}(x^\ast) ci(x∗)=0fori∈A(x∗), λ i ∗ = 0 f o r i ∈ I \ A ( x ∗ ) \lambda_i^\ast=0\;{\rm for}\;i\in\mathcal{I}\backslash\mathcal{A}(x^\ast) λi∗=0fori∈I\A(x∗),因此 λ i ∗ c i ( x ∗ ) = 0 , i ∈ E ∪ I \lambda _{i}^{\ast}c_i\left( x^{\ast} \right) =0,i\in \mathcal{E} \cup \mathcal{I} λi∗ci(x∗)=0,i∈E∪I,式(2)第5项成立,证毕。
整个证明的关键在于先确定 ∇ f ( x ∗ ) \nabla f(x^\ast) ∇f(x∗)和 T Ω ( x ∗ ) T_{\Omega}( x^{\ast}) TΩ(x∗)之间的夹角关系,将 T Ω ( x ∗ ) T_{\Omega}( x^{\ast}) TΩ(x∗)替换为 F ( x ∗ ) \mathcal{F}(x^{\ast}) F(x∗),再利用Farkas引理证明 ∇ f ( x ∗ ) \nabla f(x^\ast) ∇f(x∗)位于构造的锥内。