凸优化学习
KKT条件是核心,我会在这一节里面讲清楚KKT的推导以及它的性质。
学习笔记
一、KKT条件的推导
对于一个普通优化问题:
min
f
0
(
x
)
(
P
)
s.t.
f
i
(
x
)
≤
0
i
=
1
⋯
m
h
i
(
x
)
=
0
i
=
1
⋯
p
\begin{aligned} \min&& f_0(x)&\\ (\text P)\qquad\text{s.t.}&&f_i(x)&\le0\qquad i=1\cdots m\\ &&h_i(x)&=0\qquad i=1\cdots p\\ \end{aligned}\\
min(P)s.t.f0(x)fi(x)hi(x)≤0i=1⋯m=0i=1⋯p
拉格朗日函数(
lagrangian function
\text{lagrangian function}
lagrangian function):
l
(
x
,
λ
,
v
)
=
f
0
(
x
)
+
∑
i
=
1
m
λ
i
f
i
(
x
)
+
∑
i
=
1
p
v
i
h
i
(
x
)
l(x,\lambda,v)=f_0(x)+\sum_{i=1}^m\lambda_if_i(x)+\sum_{i=1}^pv_ih_i(x)
l(x,λ,v)=f0(x)+i=1∑mλifi(x)+i=1∑pvihi(x)
由拉格朗日函数构造的对偶函数(
dual function
\text{dual function}
dual function):
g
(
λ
,
v
)
=
inf
x
∈
D
l
(
x
,
λ
,
v
)
g(\lambda,v)=\inf_{x\in D}l(x,\lambda,v)
g(λ,v)=x∈Dinfl(x,λ,v)
其对偶问题为:
max
g
(
λ
,
v
)
(
D
)
s.t.
λ
≥
0
\begin{aligned} \max&& g(\lambda,v)&\\ (\text D)\qquad\text{s.t.}&&\lambda\ \ge0&\\ \end{aligned}\\
max(D)s.t.g(λ,v)λ ≥0
我们做出两个假设:
- p ∗ = d ∗ p^*=d^* p∗=d∗
- 所有函数可微。
那么,对于最优解
x
∗
,
λ
∗
,
v
∗
x^*,\lambda^*,v^*
x∗,λ∗,v∗,必有:
f
i
(
x
∗
)
≤
0
i
=
1
,
⋯
,
m
h
i
(
x
∗
)
=
0
i
=
1
,
⋯
,
p
λ
∗
≥
0
\begin{aligned} f_i(x^*)&\le0\qquad i=1,\cdots,m\\ h_i(x^*)&=0\qquad i=1,\cdots,p\\ \lambda^*&\ge0 \end{aligned}
fi(x∗)hi(x∗)λ∗≤0i=1,⋯,m=0i=1,⋯,p≥0
对于假设1,我们做如下推导:
p
∗
=
d
∗
⇔
f
0
(
x
∗
)
=
g
(
λ
∗
,
v
∗
)
=
inf
x
{
f
0
(
x
)
+
∑
i
=
1
m
λ
i
∗
f
i
(
x
)
+
∑
i
=
1
p
v
i
∗
h
i
(
x
)
}
≤
f
0
(
x
∗
)
+
∑
i
=
1
m
λ
i
∗
f
i
(
x
∗
)
+
∑
i
=
1
p
v
i
∗
h
i
(
x
∗
)
≤
f
0
(
x
∗
)
\begin{aligned} &&p^*&=d^*\\ \Leftrightarrow&&f_0(x^*)&=g(\lambda^*,v^*)\\ &&&=\inf_x\lbrace f_0(x)+\sum^m_{i=1}\lambda_i^*f_i(x)+\sum_{i=1}^pv_i^*h_i(x)\rbrace\\ &&&\le f_0(x^*)+\sum^m_{i=1}\lambda_i^*f_i(x^*)+\sum_{i=1}^pv_i^*h_i(x^*)\\ &&&\le f_0(x^*) \end{aligned}
⇔p∗f0(x∗)=d∗=g(λ∗,v∗)=xinf{f0(x)+i=1∑mλi∗fi(x)+i=1∑pvi∗hi(x)}≤f0(x∗)+i=1∑mλi∗fi(x∗)+i=1∑pvi∗hi(x∗)≤f0(x∗)
对一个式子求最小下界一定小于等于任意带一个可行解。
我们对得到的结果进行分析,发现不等式左边和右边是相等的,那么,也就意味着,所有的不等号可以改为等号。
我们得到如下式子:
∑
i
=
1
m
λ
i
∗
f
i
(
x
∗
)
=
0
⇔
λ
i
∗
f
i
(
x
∗
)
=
0
,
i
=
1
,
⋯
,
m
\begin{aligned} &&\sum^m_{i=1}\lambda_i^*f_i(x^*)&=0\\ \Leftrightarrow&&\lambda_i^*f_i(x^*)&=0,i=1,\cdots,m \end{aligned}
⇔i=1∑mλi∗fi(x∗)λi∗fi(x∗)=0=0,i=1,⋯,m
分析得到KKT条件中的互补松弛条件(Complementary slackness):
- λ ∗ > 0 ⇒ f i ( x ∗ ) = 0 \lambda^*>0\Rightarrow f_i(x^*)=0 λ∗>0⇒fi(x∗)=0
- f I ( x ∗ ) < 0 ⇒ λ i ∗ = 0 f_I(x^*)<0\Rightarrow\lambda_i^*=0 fI(x∗)<0⇒λi∗=0
同时,也可以得到:
inf
x
{
f
0
(
x
)
+
∑
i
=
1
m
λ
i
∗
f
i
(
x
)
+
∑
i
=
1
p
v
i
∗
h
i
(
x
)
}
=
f
0
(
x
∗
)
⇔
inf
x
l
(
x
,
λ
∗
,
v
∗
)
=
f
0
(
x
∗
)
⇔
∂
l
(
x
,
λ
∗
,
v
∗
)
∂
x
∣
x
=
x
∗
=
0
\begin{aligned} &&\inf_x\lbrace f_0(x)+\sum^m_{i=1}\lambda_i^*f_i(x)+\sum_{i=1}^pv_i^*h_i(x)\rbrace&=f_0(x^*)\\ \Leftrightarrow&&\inf_xl(x,\lambda^*,v^*)&=f_0(x^*)\\ \Leftrightarrow&&\frac{\partial l(x,\lambda^*,v^*)}{\partial x}\bigg|_{x=x^*}&=0 \end{aligned}
⇔⇔xinf{f0(x)+i=1∑mλi∗fi(x)+i=1∑pvi∗hi(x)}xinfl(x,λ∗,v∗)∂x∂l(x,λ∗,v∗)∣∣∣∣x=x∗=f0(x∗)=f0(x∗)=0
于是我们就推导出了KKT条件中的稳定性条件(Stationary)
- ∂ l ( x , λ ∗ , v ∗ ) ∂ x ∣ x = x ∗ = 0 \frac{\partial l(x,\lambda^*,v^*)}{\partial x}\bigg|_{x=x^*}=0 ∂x∂l(x,λ∗,v∗)∣∣∣∣x=x∗=0
加入原问题可行解的约束,我们就得到了完整的KKT条件:
KKT
=
{
f
i
(
x
∗
)
≤
0
i
=
1
,
⋯
,
m
Primal feasibility
h
i
(
x
∗
)
=
0
i
=
1
,
⋯
,
p
Primal feasibility
λ
∗
≥
0
Dual feasibility
λ
∗
>
0
⇒
f
i
(
x
∗
)
=
0
Complementary slackness
f
i
(
x
∗
)
<
0
⇒
λ
i
∗
=
0
Complementary slackness
∂
l
(
x
,
λ
∗
,
v
∗
)
∂
x
∣
x
=
x
∗
=
0
Stationary
\text{KKT}= \begin{cases} f_i(x^*)\le0&&i=1,\cdots,m&\text{Primal feasibility}\\ h_i(x^*)=0&&i=1,\cdots,p&\text{Primal feasibility}\\ \lambda^*\ge0&&&\text{Dual feasibility}\\ \lambda^*>0\Rightarrow f_i(x^*)=0&&&\text{Complementary slackness}\\ f_i(x^*)<0\Rightarrow\lambda_i^*=0&&&\text{Complementary slackness}\\ \frac{\partial l(x,\lambda^*,v^*)}{\partial x}\Big|_{x=x^*}=0&&&\text{Stationary} \end{cases}
KKT=⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧fi(x∗)≤0hi(x∗)=0λ∗≥0λ∗>0⇒fi(x∗)=0fi(x∗)<0⇒λi∗=0∂x∂l(x,λ∗,v∗)∣∣∣x=x∗=0i=1,⋯,mi=1,⋯,pPrimal feasibilityPrimal feasibilityDual feasibilityComplementary slacknessComplementary slacknessStationary
但是,这里的KKT条件只是 p ∗ = d ∗ p^*=d^* p∗=d∗的必要条件,此时我们用KKT解出来的解并不一定是原问题的解,我们接下去探究何时KKT为充要条件。
二、KKT条件何时是 p ∗ = d ∗ p^*=d^* p∗=d∗的充要条件
若原问题为凸问题,各个函数可微,则KKT条件为
p
∗
=
d
∗
p^*=d^*
p∗=d∗的充要条件。
证明:
只需证明充分性,设
(
x
∗
,
λ
∗
,
v
∗
)
(x^*,\lambda^*,v^*)
(x∗,λ∗,v∗)满足KKT条件,需证:
d
∗
=
p
∗
⇔
g
(
λ
∗
,
v
∗
)
=
f
0
(
x
∗
)
\begin{aligned} &&d^*&=p^*\\ \Leftrightarrow&&g(\lambda^*,v^*)&=f_0(x^*)\\ \end{aligned}
⇔d∗g(λ∗,v∗)=p∗=f0(x∗)
KKT条件如下:
KKT
=
{
f
i
(
x
∗
)
≤
0
i
=
1
,
⋯
,
m
Primal feasibility
h
i
(
x
∗
)
=
0
i
=
1
,
⋯
,
p
Primal feasibility
λ
≥
0
Dual feasibility
λ
∗
>
0
⇒
f
i
(
x
∗
)
=
0
Complementary slackness
f
i
(
x
∗
)
<
0
⇒
λ
i
∗
=
0
Complementary slackness
∂
l
(
x
,
λ
∗
,
v
∗
)
∂
x
∣
x
=
x
∗
=
0
Stationary
\text{KKT}= \begin{cases} f_i(x^*)\le0&&i=1,\cdots,m&\text{Primal feasibility}\\ h_i(x^*)=0&&i=1,\cdots,p&\text{Primal feasibility}\\ \lambda\ge0&&&\text{Dual feasibility}\\ \lambda^*>0\Rightarrow f_i(x^*)=0&&&\text{Complementary slackness}\\ f_i(x^*)<0\Rightarrow\lambda_i^*=0&&&\text{Complementary slackness}\\ \frac{\partial l(x,\lambda^*,v^*)}{\partial x}\Big|_{x=x^*}=0&&&\text{Stationary} \end{cases}
KKT=⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧fi(x∗)≤0hi(x∗)=0λ≥0λ∗>0⇒fi(x∗)=0fi(x∗)<0⇒λi∗=0∂x∂l(x,λ∗,v∗)∣∣∣x=x∗=0i=1,⋯,mi=1,⋯,pPrimal feasibilityPrimal feasibilityDual feasibilityComplementary slacknessComplementary slacknessStationary
对于
g
(
λ
∗
,
v
∗
)
g(\lambda^*,v^*)
g(λ∗,v∗),有:
g
(
λ
∗
,
v
∗
)
=
inf
x
l
(
x
,
λ
∗
,
v
∗
)
由
Stationary
得
:
=
l
(
x
∗
,
λ
∗
,
v
∗
)
=
f
0
(
x
∗
)
+
∑
i
=
1
m
λ
i
∗
f
i
(
x
∗
)
+
∑
i
=
1
p
v
i
∗
h
i
(
x
∗
)
由
Primal feasibility
和
Complementary slackness
得
:
=
f
0
(
x
∗
)
\begin{aligned}&&g(\lambda^*,v^*)&=\inf_xl(x,\lambda^*,v^*)\\ 由\text{Stationary}得: &&&=l(x^*,\lambda^*,v^*)\\ &&&=f_0(x^*)+\sum^m_{i=1}\lambda_i^*f_i(x^*)+\sum_{i=1}^pv_i^*h_i(x^*)\\由\text{Primal feasibility}和\text{Complementary slackness}得:&&&=f_0(x^*) \end{aligned}
由Stationary得:由Primal feasibility和Complementary slackness得:g(λ∗,v∗)=xinfl(x,λ∗,v∗)=l(x∗,λ∗,v∗)=f0(x∗)+i=1∑mλi∗fi(x∗)+i=1∑pvi∗hi(x∗)=f0(x∗)
证毕。
宏观感受KKT条件适用的范围:
三、解KKT条件的一个例子
二次规划:
min
1
2
x
T
P
x
+
q
T
x
+
r
P
∈
S
+
n
s.t.
A
x
−
b
=
0
\begin{aligned} \min&& \frac 1 2x^T\textbf Px+q^Tx+r&\qquad\textbf P\in\textbf S_+^n\\ \text{s.t.} &&\textbf A x-b&=0\\ \end{aligned}
mins.t.21xTPx+qTx+rAx−bP∈S+n=0
其KKT条件:
KKT
=
{
G
x
∗
−
h
≤
0
Primal feasibility
A
x
∗
−
b
=
0
Primal feasibility
P
x
+
q
+
A
T
v
∗
=
0
Stationary
\text{KKT}= \begin{cases} \textbf G x^*-h\le0&&\text{Primal feasibility}\\ \textbf A x^*-b=0&&\text{Primal feasibility}\\ \textbf Px+q+\textbf A^Tv^*=0&&\text{Stationary} \end{cases}
KKT=⎩⎪⎨⎪⎧Gx∗−h≤0Ax∗−b=0Px+q+ATv∗=0Primal feasibilityPrimal feasibilityStationary
解就行了。
四、KKT条件与凸函数一阶条件的关系
对于一个普通优化问题:
min
f
0
(
x
)
s.t.
f
i
(
x
)
≤
0
i
=
1
⋯
m
h
i
(
x
)
=
0
i
=
1
⋯
p
\begin{aligned} \min&& f_0(x)&\\ \qquad\text{s.t.}&&f_i(x)&\le0\qquad i=1\cdots m\\ &&h_i(x)&=0\qquad i=1\cdots p\\ \end{aligned}\\
mins.t.f0(x)fi(x)hi(x)≤0i=1⋯m=0i=1⋯p
其一阶条件:
{
f
i
(
x
)
≤
0
i
=
1
,
⋯
,
m
h
i
(
x
)
=
0
i
=
1
,
⋯
,
p
λ
≥
0
x
i
(
∇
f
0
(
x
)
)
i
=
0
,
i
=
1
,
⋯
,
n
\begin{cases} f_i(x)\le0&&i=1,\cdots,m&\\ h_i(x)=0&&i=1,\cdots,p&\\ \lambda\ge0&\\ x_i\big(\nabla f_0(x)\big)_i=0,&&i=1,\cdots,n&&&\\ \end{cases}
⎩⎪⎪⎪⎨⎪⎪⎪⎧fi(x)≤0hi(x)=0λ≥0xi(∇f0(x))i=0,i=1,⋯,mi=1,⋯,pi=1,⋯,n
其KKT条件:
KKT
=
{
f
i
(
x
∗
)
≤
0
i
=
1
,
⋯
,
m
Primal feasibility
h
i
(
x
∗
)
=
0
i
=
1
,
⋯
,
p
Primal feasibility
λ
≥
0
Dual feasibility
λ
∗
>
0
⇒
f
i
(
x
∗
)
=
0
Complementary slackness
f
i
(
x
∗
)
<
0
⇒
λ
i
∗
=
0
Complementary slackness
∂
l
(
x
,
λ
∗
,
v
∗
)
∂
x
∣
x
=
x
∗
=
0
Stationary
\text{KKT}= \begin{cases} f_i(x^*)\le0&&i=1,\cdots,m&\text{Primal feasibility}\\ h_i(x^*)=0&&i=1,\cdots,p&\text{Primal feasibility}\\ \lambda\ge0&&&\text{Dual feasibility}\\ \lambda^*>0\Rightarrow f_i(x^*)=0&&&\text{Complementary slackness}\\ f_i(x^*)<0\Rightarrow\lambda_i^*=0&&&\text{Complementary slackness}\\ \frac{\partial l(x,\lambda^*,v^*)}{\partial x}\Big|_{x=x^*}=0&&&\text{Stationary} \end{cases}
KKT=⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧fi(x∗)≤0hi(x∗)=0λ≥0λ∗>0⇒fi(x∗)=0fi(x∗)<0⇒λi∗=0∂x∂l(x,λ∗,v∗)∣∣∣x=x∗=0i=1,⋯,mi=1,⋯,pPrimal feasibilityPrimal feasibilityDual feasibilityComplementary slacknessComplementary slacknessStationary
KKT条件消除掉对偶变量
λ
\lambda
λ就与一阶条件等价。
个人思考
KKT条件是凸优化的核心,它使所有带约束的凸优化问题求解变得简单,同时也指明了一条解凸问题的思路。所有的算法本质都是在解KKT条件,这是学习凸优化的重中之重。
纸质笔记