1 Introduction
1.1 对偶原理
对偶问题是凸优化的关键,原问题可能因为约束比较复杂,不容易求解,或者不满足凸优化条件,采用对偶处理后,是凹函数。问题是对偶处理的原理是什么?[1]
{
m
i
n
f
0
(
x
)
,
x
∈
R
n
s
u
b
f
1
(
x
)
≤
0
\left \{ \begin{aligned} & min \quad f_0(x), x \in R^n \\ & sub \quad f_1(x) \leq 0 \end{aligned} \right.
{minf0(x),x∈Rnsubf1(x)≤0
采用二维空间几何描述
(
f
0
(
x
)
,
f
1
(
x
)
)
(f_0(x), f_1(x))
(f0(x),f1(x)),u表示
f
1
(
x
)
f_1(x)
f1(x),t表示
f
0
(
x
)
f_0(x)
f0(x),集合表示为
G
=
{
(
u
,
t
)
,
u
≤
0
}
G=\{ (u, t),u\leq0 \}
G={(u,t),u≤0}.
对于最优解
p
∗
p^*
p∗,有
p
∗
=
i
n
f
{
t
∣
(
u
,
t
)
∈
G
,
u
≤
0
}
p^*=inf \{ t| (u,t)\in G, u \leq 0 \}
p∗=inf{t∣(u,t)∈G,u≤0}
拉格朗日对偶采用类似于求集合超平面的方式:
g
(
λ
)
=
i
n
f
(
λ
u
+
t
)
,
λ
>
0
g(\lambda) = inf(\lambda u+t), \lambda >0
g(λ)=inf(λu+t),λ>0
∃
(
u
,
t
)
∈
G
,
使
得
λ
u
+
t
−
b
=
0
\exists (u,t)\in G,使得\lambda u+t-b=0
∃(u,t)∈G,使得λu+t−b=0,在上面这个图像上,可能会有三种情况。
其中
b
m
a
x
b_{max}
bmax和
p
∗
p^*
p∗最接近。求
b
m
a
x
b_{max}
bmax的过程和之前学过的求共轭函数原理上非常相似。
什么时候
b
m
a
x
b_{max}
bmax和
p
∗
p^*
p∗相等?
当原问题是一个凸优化问题,则满足强对偶条件。
几何上理解了对偶的原理后,再推广到弱对偶条件就容易理解了。对于一般的优化问题
{
m
i
n
f
0
(
x
)
,
x
∈
R
n
s
u
b
f
i
(
x
)
≤
0
,
i
=
1
,
.
.
.
m
h
i
(
x
)
=
0
,
i
=
1
,
.
.
.
,
p
\left \{ \begin{aligned} & min \quad & f_0(x), x \in R^n \\ & sub \quad & f_i(x) \leq 0, i=1,...m \\ & \quad & h_i(x) =0,i=1,...,p \end{aligned} \right.
⎩⎪⎨⎪⎧minsubf0(x),x∈Rnfi(x)≤0,i=1,...mhi(x)=0,i=1,...,p
定义对偶形式,设
t
=
f
0
(
x
)
,
μ
=
[
f
1
(
x
)
,
.
.
.
,
f
m
(
x
)
]
T
,
υ
=
[
h
1
(
x
)
,
.
.
.
,
h
p
(
x
)
]
T
t=f_0(x), \mu = [f_1(x),...,f_m(x)]^T,\upsilon=[h_1(x),...,h_p(x)]^T
t=f0(x),μ=[f1(x),...,fm(x)]T,υ=[h1(x),...,hp(x)]T
g
(
λ
,
ν
)
=
t
+
λ
T
μ
+
ν
T
υ
=
(
λ
,
ν
,
1
)
T
(
μ
,
υ
,
t
)
\begin{aligned} g(\lambda, \nu) &=t+\lambda^T \mu+\nu^T \upsilon \\ & = (\lambda, \nu, 1)^T(\mu, \upsilon, t) \end{aligned}
g(λ,ν)=t+λTμ+νTυ=(λ,ν,1)T(μ,υ,t)
(
λ
,
ν
,
1
)
(\lambda, \nu, 1)
(λ,ν,1)定义了集合
G
(
μ
,
υ
,
t
)
G(\mu, \upsilon, t)
G(μ,υ,t)的超平面,所以有
(
λ
,
ν
,
1
)
T
(
μ
,
υ
,
t
)
≥
g
(
λ
,
ν
)
(\lambda, \nu, 1)^T(\mu, \upsilon, t) \geq g(\lambda, \nu)
(λ,ν,1)T(μ,υ,t)≥g(λ,ν)
一般定义
λ
>
0
,
考
虑
μ
<
0
,
υ
=
0
\lambda > 0,考虑\mu<0, \upsilon=0
λ>0,考虑μ<0,υ=0,有
p
∗
=
t
≥
(
λ
,
ν
,
1
)
T
(
μ
,
υ
,
t
)
≥
g
(
λ
,
ν
)
p^*=t\geq(\lambda, \nu, 1)^T(\mu, \upsilon, t)\geq g(\lambda, \nu)
p∗=t≥(λ,ν,1)T(μ,υ,t)≥g(λ,ν)
满足了弱对偶条件。
1.2 对偶的性质
对偶有两个重要的性质[2]
1)
g
(
λ
,
ν
)
g(\lambda, \nu)
g(λ,ν)是凹函数(不论原问题是否是凸函数);
2)
g
(
λ
,
ν
)
<
p
∗
g(\lambda,\nu)<p*
g(λ,ν)<p∗,对于任意
λ
>
0
,
ν
\lambda>0,\nu
λ>0,ν都成立;
对于性质1,
t
+
λ
T
μ
+
ν
T
υ
t+\lambda^T \mu+\nu^T \upsilon
t+λTμ+νTυ是线性函数,
g
(
λ
,
ν
)
g(\lambda,\nu)
g(λ,ν)是逐点下确界用二维图像可以表示为下图,所以是凹函数。性质2是弱对偶性质。
对于复杂问题,如果不容易直接求最优解可以转换成
{
m
a
x
g
(
λ
,
ν
)
s
u
b
λ
>
0
\left \{ \begin{aligned} & max \quad & g(\lambda, \nu) \\ & sub \quad & \lambda>0 \end{aligned} \right.
{maxsubg(λ,ν)λ>0
2 对偶问题
2.1 转换成对偶问题
- LS problem
{ m a x ∣ ∣ x ∣ ∣ 2 s u b A x = b \left \{ \begin{aligned} & max \quad & ||x||^2 \\ & sub \quad & Ax=b \end{aligned} \right. {maxsub∣∣x∣∣2Ax=b
写成对偶形式
L ( x , v ) = ∣ ∣ x ∣ ∣ 2 + ν T ( A x − b ) = x T x + ν T ( A x − b ) \begin{aligned} L(x,v) &=||x||^2+\nu^T(Ax-b) \\ & = x^Tx+ \nu^T(Ax-b) \end{aligned} L(x,v)=∣∣x∣∣2+νT(Ax−b)=xTx+νT(Ax−b)
通过求导计算,L(x,v)的最小值
d L d x = 2 x + A T ν = 0 \frac{dL}{dx}=2x+A^T\nu=0 dxdL=2x+ATν=0
得到了 x = − 1 2 A T ν x=-\frac{1}{2}A^T\nu x=−21ATν之后, g ( ν ) g(\nu) g(ν)也确定了
g ( ν ) = 1 4 ν T A A T ν − 1 2 ν T A A T ν − ν T b = − 1 4 ν T A A T ν − ν T b \begin{aligned} g(\nu) &=\frac{1}{4}\nu^TAA^T\nu-\frac{1}{2}\nu^TAA^T\nu-\nu^Tb \\ &=-\frac{1}{4}\nu^TAA^T\nu-\nu^Tb \end{aligned} g(ν)=41νTAATν−21νTAATν−νTb=−41νTAATν−νTb
接下来的工作就是求 g ( ν ) g(\nu) g(ν)的最大值。 - LP (standard form)
{ m i n c T x s u b A x = b x ≥ 0 \left \{ \begin{aligned} & min \quad & c^Tx \\ & sub \quad & Ax=b \\ & \quad & x\geq 0 \end{aligned} \right. ⎩⎪⎨⎪⎧minsubcTxAx=bx≥0
写成对偶形式:
L ( x , v ) = c T x + λ T ( − x ) + ν T ( A x − b ) = ( c − λ + A T ν ) T x − ν T b \begin{aligned} L(x,v) &=c^Tx+\lambda^T(-x)+\nu^T(Ax-b)\\ &=(c-\lambda+A^T\nu)^Tx-\nu^Tb \end{aligned} L(x,v)=cTx+λT(−x)+νT(Ax−b)=(c−λ+ATν)Tx−νTb
确定了 g ( λ , ν ) g(\lambda, \nu) g(λ,ν)
- two way partitioning
二进制在计算机和电气系统中很常见。
{ m i n x T W x s u b x i 2 = 1 , i = 1 , . . . , n x i ∈ { − 1 , 1 } \left \{ \begin{aligned} & min \quad & x^TWx \\ & sub \quad & x_i^2=1,i=1,...,n\\ & \quad & x_i\in \{-1,1 \} \end{aligned} \right. ⎩⎪⎨⎪⎧minsubxTWxxi2=1,i=1,...,nxi∈{−1,1}
根据定义,写出对偶形式
L ( x , v ) = x T W x + ∑ i n ν i ( x i 2 − 1 ) = x T W x + x T [ v 1 0 . . . 0 0 v 2 . . . 0 0 0 . . . v n ] x − I T ν = x T ( W + V ) x − I T ν \begin{aligned} L(x,v) &=x^TWx+\sum_{i}^{n} \nu_i(x_i^2-1) \\ &=x^TWx+x^T \begin{bmatrix} v_1 & 0 & ... & 0 \\ 0 & v_2 & ... & 0 \\ 0 & 0 & ... & v_n \end{bmatrix} x-I^T\nu \\ & = x^T(W+V)x-I^T\nu \end{aligned} L(x,v)=xTWx+i∑nνi(xi2−1)=xTWx+xT⎣⎡v1000v20.........00vn⎦⎤x−ITν=xT(W+V)x−ITν
根据二次函数的性质,得到
g ( ν ) = i n f x L ( x , ν ) = { − I T ν , W + V ∈ S D P 0 , W + V ∉ S D P g(\nu)=\mathop{inf}\limits_{x}L(x,\nu)= \left \{ \begin{aligned} -I^T\nu, \quad W+V \in SDP \\ 0, \quad W+V \notin SDP \end{aligned} \right. g(ν)=xinfL(x,ν)={−ITν,W+V∈SDP0,W+V∈/SDP
2.2 对偶和共轭函数的关系
2.2.1 理论部分
对于下面的形式
{
m
i
n
f
0
(
x
)
s
u
b
x
=
0
\left \{ \begin{aligned} & min \quad & f_0(x) \\ & sub \quad & x=0 \end{aligned} \right.
{minsubf0(x)x=0
对偶形式:
L
(
x
,
ν
)
=
f
0
(
x
)
+
ν
T
x
g
(
ν
)
=
i
n
f
x
{
f
0
(
x
)
+
ν
T
x
}
=
−
s
u
p
x
{
−
f
0
(
x
)
−
ν
T
x
}
=
−
f
0
∗
(
−
ν
)
\begin{aligned} L(x,\nu) &=f_0(x)+\nu^T x \\ g(\nu) &=\mathop{inf} \limits_{x}\{ f_0(x)+ \nu^T x \} \\ &=-\mathop{sup} \limits_{x} \{ -f_0(x)-\nu^T x \} \\ &=-f_0^*(-\nu) \end{aligned}
L(x,ν)g(ν)=f0(x)+νTx=xinf{f0(x)+νTx}=−xsup{−f0(x)−νTx}=−f0∗(−ν)
复习一下共轭函数
f
∗
(
y
)
=
s
u
p
x
∈
d
o
m
f
(
y
T
x
−
f
(
x
)
)
f^*(y)=\mathop{sup} \limits_{x \in dom f}(y^Tx-f(x))
f∗(y)=x∈domfsup(yTx−f(x))
对于更加一般的形式
{
m
i
n
f
0
(
x
)
s
u
b
A
x
≤
b
C
x
=
d
\left \{ \begin{aligned} & min \quad & f_0(x) \\ & sub \quad & Ax\leq b \\ & \quad & Cx=d \end{aligned} \right.
⎩⎪⎨⎪⎧minsubf0(x)Ax≤bCx=d
拉格朗日方程
g
(
λ
,
ν
)
=
i
n
f
x
{
f
0
(
x
)
+
λ
T
(
A
x
−
b
)
+
ν
T
(
C
x
−
d
)
}
=
i
n
f
x
{
(
A
T
λ
+
C
T
ν
)
T
x
+
f
0
(
x
)
}
−
λ
T
b
−
ν
T
d
=
−
s
u
p
x
{
(
−
A
T
λ
−
C
T
ν
)
T
x
−
f
0
(
x
)
}
−
λ
T
b
−
ν
T
d
=
−
f
0
∗
(
−
A
T
λ
−
C
T
ν
)
−
λ
T
b
−
ν
T
d
\begin{aligned} g(\lambda, \nu) &=\mathop{inf} \limits_{x} \{f_0(x)+\lambda^T(Ax-b)+\nu^T(Cx-d) \}\\ &=\mathop{inf} \limits_{x} \{(A^T\lambda+C^T\nu)^Tx+f_0(x) \}-\lambda^Tb-\nu^Td \\ &=-\mathop{sup} \limits_{x} \{ (-A^T\lambda-C^T\nu)^Tx-f_0(x) \}-\lambda^Tb-\nu^Td \\ &=-f_0^*(-A^T\lambda-C^T\nu) -\lambda^Tb-\nu^Td \end{aligned}
g(λ,ν)=xinf{f0(x)+λT(Ax−b)+νT(Cx−d)}=xinf{(ATλ+CTν)Tx+f0(x)}−λTb−νTd=−xsup{(−ATλ−CTν)Tx−f0(x)}−λTb−νTd=−f0∗(−ATλ−CTν)−λTb−νTd
限定的定义域可以归纳为
d
o
m
g
=
{
(
λ
,
ν
)
∣
(
−
A
T
λ
−
C
T
ν
)
∈
f
0
∗
}
domg=\{ (\lambda, \nu)| (-A^T\lambda-C^T\nu) \in f_0^* \}
domg={(λ,ν)∣(−ATλ−CTν)∈f0∗}
2.2.2 applications
- Equally constrained norm minimization
从拉格朗日方程和共轭方程之间的联系,可以快速的找到拉格朗日方程的限定区间。
{ m i n ∣ ∣ x ∣ ∣ s u b A x = b \left \{ \begin{aligned} & min \quad & ||x||\\ & sub \quad & Ax= b \end{aligned} \right. {minsub∣∣x∣∣Ax=b
对于 f 0 ( x ) = ∣ ∣ x ∣ ∣ f_0(x)=||x|| f0(x)=∣∣x∣∣,其共轭函数
f 0 ∗ ( y ) = s u p x { y T x + ∣ ∣ x ∣ ∣ } = { 0 , ∣ ∣ y ∣ ∣ ≤ 1 ∞ , e l s e \begin{aligned} f_0^*(y) &=\mathop{sup}\limits_{x}\{ y^Tx+||x|| \} \\ & = \left \{ \begin{aligned} & 0, \quad ||y||\leq 1 \\ & \infty, \quad else \end{aligned} \right. \end{aligned} f0∗(y)=xsup{yTx+∣∣x∣∣}={0,∣∣y∣∣≤1∞,else
根据对偶和共轭方程之间的联系,
g ( ν ) = − f 0 ∗ ( − A T ν ) − b T ν = { − b T v , ∣ ∣ A T ν ∣ ∣ ≤ 1 − ∞ , e l s e \begin{aligned} g(\nu) &=-f_0^*(-A^T\nu)-b^T\nu \\ &= \left \{ \begin{aligned} &-b^Tv, \quad ||A^T\nu|| \leq1 \\ &- \infty, \quad else \end{aligned} \right. \end{aligned} g(ν)=−f0∗(−ATν)−bTν={−bTv,∣∣ATν∣∣≤1−∞,else
这样就得到了对偶形式。
2.3 对偶形式
2.3.1 定义
优化问题转换成功对偶形式,如下
再来看对偶问题的定义域和纬度:
d
o
m
g
=
{
(
λ
,
ν
)
∣
g
(
λ
,
v
)
>
−
∞
}
d
i
m
(
d
o
m
g
)
≤
m
+
p
\begin{aligned} dom g=\{ (\lambda, \nu) | g(\lambda, v) > - \infty \} \\ dim (domg) \leq m+p \end{aligned}
domg={(λ,ν)∣g(λ,v)>−∞}dim(domg)≤m+p
2.3.2 applications
- standard LP
p ) { m i n c T x s u b A x = b x ≥ 0 p) \quad \left \{ \begin{aligned} & min \quad & c^Tx\\ & sub \quad & Ax= b \\ & \quad & x \geq 0 \end{aligned} \right. p)⎩⎪⎨⎪⎧minsubcTxAx=bx≥0
拉格朗日方程有
L ( x , λ , ν ) = c T x + λ T ( − x ) + ν T ( A x − b ) = ( c − λ + A T ν ) x − ν T b \begin{aligned} L(x,\lambda, \nu) &=c^Tx+\lambda^T(-x)+\nu^T(Ax-b) \\ & = (c-\lambda+A^T\nu)x-\nu^Tb \end{aligned} L(x,λ,ν)=cTx+λT(−x)+νT(Ax−b)=(c−λ+ATν)x−νTb
写成 g ( λ , ν ) g(\lambda, \nu) g(λ,ν)形式
g ( λ , ν ) = { − ν T b , A T ν − λ + c = 0 − ∞ , e l s e g(\lambda, \nu) = \left \{ \begin{aligned} & -\nu^T b, \quad A^T\nu-\lambda+c=0 \\ & -\infty, \quad else \end{aligned} \right. g(λ,ν)={−νTb,ATν−λ+c=0−∞,else
对偶形式
p ) { m a x − ν T b s u b A T ν − λ + c = 0 λ ≥ 0 p) \quad \left \{ \begin{aligned} & max \quad &-\nu^Tb\\ & sub \quad & A^T\nu-\lambda+c=0 \\ & \quad & \lambda \geq0 \end{aligned} \right. p)⎩⎪⎨⎪⎧maxsub−νTbATν−λ+c=0λ≥0
进一步可以修改成:
p ) { m a x − ν T b s u b A T ν + c ≥ 0 p) \quad \left \{ \begin{aligned} & max \quad &-\nu^Tb\\ & sub \quad & A^T\nu+c\geq 0 \end{aligned} \right. p){maxsub−νTbATν+c≥0 - inequality form of LP
p ) { m i n c T x s u b A x ≤ b p) \quad \left \{ \begin{aligned} & min \quad & c^Tx\\ & sub \quad & Ax\leq b \end{aligned} \right. p){minsubcTxAx≤b
拉格朗日方程有
L ( x , λ ) = c T x + λ T ( A x − b ) = ( c + A T λ ) x − λ T b \begin{aligned} L(x,\lambda) &=c^Tx+\lambda^T(Ax-b) \\ & = (c+A^T\lambda)x-\lambda^Tb \end{aligned} L(x,λ)=cTx+λT(Ax−b)=(c+ATλ)x−λTb
写成 g ( λ , ν ) g(\lambda, \nu) g(λ,ν)形式
g ( λ , ν ) = { − λ T b , A T λ + c = 0 − ∞ , e l s e g(\lambda, \nu) = \left \{ \begin{aligned} & -\lambda^T b, \quad A^T\lambda+c=0 \\ & -\infty, \quad else \end{aligned} \right. g(λ,ν)={−λTb,ATλ+c=0−∞,else
对偶形式
D ) { m a x − λ T b s u b A T λ + c = 0 λ ≥ 0 D) \quad \left \{ \begin{aligned} & max \quad &-\lambda^Tb\\ & sub \quad & A^T\lambda+c=0 \\ & \quad & \lambda \geq0 \end{aligned} \right. D)⎩⎪⎨⎪⎧maxsub−λTbATλ+c=0λ≥0
这里发现一个有意思的现象,
2.4 slater 条件
2.4.1 定义
slater条件从图像上非常容易理解,满足下列形式,说明
p
∗
=
d
∗
p^*=d^*
p∗=d∗。根据[1],slater条件仅仅只是一个充分条件。
不符合slater条件,但是也满足强对偶条件。
2.4.2 applications
- QCQP
p ) { m i n 1 2 x T p 0 x + q 0 T x + r 0 s u b 1 2 x T p i x + q i T x + r i , i = 1 , . . . , m p) \quad \left \{ \begin{aligned} & min \quad & \frac{1}{2}x^Tp_0x+q_0^Tx+r_0\\ & sub \quad & \frac{1}{2}x^Tp_ix+q_i^Tx+r_i , i=1,...,m \end{aligned} \right. p)⎩⎪⎨⎪⎧minsub21xTp0x+q0Tx+r021xTpix+qiTx+ri,i=1,...,m
为了简化计算,设
p ( λ ) = p 0 + ∑ λ i p i q ( λ ) = q 0 + ∑ λ i q i r ( λ ) = r 0 + ∑ λ r i \begin{aligned} p(\lambda) &=p_0+\sum \lambda_ip_i \\ q(\lambda) &=q_0+\sum \lambda_iq_i \\ r(\lambda) & = r_0+\sum \lambda r_i \end{aligned} p(λ)q(λ)r(λ)=p0+∑λipi=q0+∑λiqi=r0+∑λri
得到拉格朗日方程
L ( x , λ ) = 1 2 x T p 0 x + q 0 T x + r 0 + ∑ λ i ( 1 2 x T p i x + q i T x + r i ) = 1 2 x T p x + q T x + r \begin{aligned} L(x,\lambda)&=\frac{1}{2}x^Tp_0x+q_0^Tx+r_0 + \sum \lambda_i( \frac{1}{2}x^Tp_ix+q_i^Tx+r_i ) \\ &=\frac{1}{2}x^Tpx+q^Tx+r \end{aligned} L(x,λ)=21xTp0x+q0Tx+r0+∑λi(21xTpix+qiTx+ri)=21xTpx+qTx+r
通过求导计算 L ( x , λ ) L(x,\lambda) L(x,λ)相对于x的最小值。有 p ( λ ) x + q ( λ ) = 0 p(\lambda)x+q(\lambda)=0 p(λ)x+q(λ)=0,得到 x 0 = − p ( λ ) − 1 q ( λ ) x_0=-p(\lambda)^{-1}q(\lambda) x0=−p(λ)−1q(λ)
代入,有
g ( λ ) = − 1 2 q T ( λ ) p − 1 ( λ ) q ( λ ) + r ( λ ) \begin{aligned} g(\lambda) &=-\frac{1}{2}q^T(\lambda)p^{-1}(\lambda)q(\lambda)+r(\lambda) \end{aligned} g(λ)=−21qT(λ)p−1(λ)q(λ)+r(λ)
对偶形式如下
D ) { m a x − 1 2 q T ( λ ) p − 1 ( λ ) q ( λ ) + r ( λ ) s u b λ ≥ 0 D) \quad \left \{ \begin{aligned} & max \quad & -\frac{1}{2}q^T(\lambda)p^{-1}(\lambda)q(\lambda)+r(\lambda)\\ & sub \quad & \lambda \geq0 \end{aligned} \right. D)⎩⎨⎧maxsub−21qT(λ)p−1(λ)q(λ)+r(λ)λ≥0 - entropy maximization
p ) { m i n ∑ x i l o g x i s u b A x ≤ b I T x = 1 p) \quad \left \{ \begin{aligned} & min \quad & \sum x_ilogx_i\\ & sub \quad & Ax\leq b \\ & \quad & I^Tx=1 \end{aligned} \right. p)⎩⎪⎪⎨⎪⎪⎧minsub∑xilogxiAx≤bITx=1
f 0 ( x ) = ∑ x i l o g x i f_0(x)=\sum x_ilogx_i f0(x)=∑xilogxi,共轭函数
f 0 ∗ ( y ) = s u p x ∈ d o m f ( y T x − ∑ x i l o g x i ) = s u p x ∈ d o m f ∑ ( y i x i − x i l o g x i ) \begin{aligned} f_0^*(y) &=\mathop{sup}\limits_{x\in dom f}(y^Tx-\sum x_ilogx_i) \\ & = \mathop{sup}\limits_{x\in dom f}\sum(y_ix_i-x_ilogx_i) \end{aligned} f0∗(y)=x∈domfsup(yTx−∑xilogxi)=x∈domfsup∑(yixi−xilogxi)
对 h i ( x ) = y i x i − x i l o g x i 求 导 h_i(x)=y_ix_i-x_ilogx_i求导 hi(x)=yixi−xilogxi求导,容易得到 x i = e x p ( y i − 1 ) x_i=exp(y_i-1) xi=exp(yi−1)得到最大值,共轭函数可以整理成
f 0 ∗ ( y ) = ∑ ( y i e x p ( y i − 1 ) − ( y i − 1 ) e x p ( y i − 1 ) ) = ∑ e x p ( y i − 1 ) \begin{aligned} f_0^*(y) &=\sum (y_iexp(y_i-1)-(y_i-1)exp(y_i-1)) \\ & = \sum exp(y_i-1) \end{aligned} f0∗(y)=∑(yiexp(yi−1)−(yi−1)exp(yi−1))=∑exp(yi−1)
根据拉格朗日方程和共轭方程的联系,有
g ( λ , ν ) = f 0 ∗ ( − A T λ − c T ν ) − λ T b − I T ν = − λ T b − I T ν − e x p ( − ν − 1 ) ∑ i e x p ( − a i T λ i ) \begin{aligned} g(\lambda, \nu) & = f_0^*(-A^T\lambda -c^T\nu)-\lambda^Tb-I^T\nu\\ & = -\lambda^Tb-I^T\nu-exp (-\nu-1) \sum \limits_iexp(-a_i^T\lambda_i) \end{aligned} g(λ,ν)=f0∗(−ATλ−cTν)−λTb−ITν=−λTb−ITν−exp(−ν−1)i∑exp(−aiTλi)
这是个二元函数,并不容易求到最大值,需要进一步简化。
对 ν \nu ν进行求导
d g d ν = − I + e x p ( − ν − 1 ) ∑ i e x p ( − a i T λ i ) = 0 \frac{dg}{d\nu}=-I+exp (-\nu-1) \sum \limits_iexp(-a_i^T\lambda_i)=0 dνdg=−I+exp(−ν−1)i∑exp(−aiTλi)=0
得到
v 0 = l o g ∑ i e x p ( − a i T λ i ) − 1 \begin{aligned} v_0 &=log\sum \limits_iexp(-a_i^T\lambda_i)-1 \end{aligned} v0=logi∑exp(−aiTλi)−1
整理 g ( λ , ν ) g(\lambda, \nu) g(λ,ν),有
g ( λ , ν ) = − λ T b − l o g ∑ i e x p ( − a i T λ i ) = − l o g ∑ i e x p ( − a i T λ i ) e x p ( − b T λ ) = − l o g ∑ i e x p ( − a i T λ i + b T λ ) \begin{aligned} g(\lambda, \nu)&= -\lambda^Tb-log\sum \limits_iexp(-a_i^T\lambda_i) \\ &=-log \frac{\sum \limits_iexp(-a_i^T\lambda_i)}{exp(-b^T\lambda)} \\ &=-log \sum \limits_iexp(-a_i^T\lambda_i+b^T\lambda) \end{aligned} g(λ,ν)=−λTb−logi∑exp(−aiTλi)=−logexp(−bTλ)i∑exp(−aiTλi)=−logi∑exp(−aiTλi+bTλ)
对偶形式有
D
)
{
m
i
n
l
o
g
∑
i
e
x
p
(
−
a
i
T
λ
i
+
b
T
λ
)
s
u
b
λ
≥
0
D) \quad \left \{ \begin{aligned} & min \quad & log \sum \limits_iexp(-a_i^T\lambda_i+b^T\lambda)\\ & sub \quad & \lambda \geq0 \end{aligned} \right.
D)⎩⎪⎨⎪⎧minsublogi∑exp(−aiTλi+bTλ)λ≥0
- trust region region
存在原问题并不是凸问题,但是仍满足强对偶条件的情况。
p ) { m i n x T A x + 2 b T x s u b x T x ≤ 1 p) \quad \left \{ \begin{aligned} & min \quad & x^TAx+2b^Tx\\ & sub \quad & x^Tx \leq 1 \end{aligned} \right. p){minsubxTAx+2bTxxTx≤1
拉格朗日方程
L ( x , λ ) = x T A x + 2 b T x + λ ( x T x − 1 ) = x T ( A + λ I ) x + 2 b T x − λ \begin{aligned} L(x,\lambda) & = x^TAx+2b^Tx+\lambda (x^Tx-1) \\ & = x^T(A+\lambda I)x+2b^Tx-\lambda \end{aligned} L(x,λ)=xTAx+2bTx+λ(xTx−1)=xT(A+λI)x+2bTx−λ
通过求导,找出 x 0 = − ( A + λ I ) − 1 b x_0=-(A+\lambda I)^{-1}b x0=−(A+λI)−1b,
d L d x = 2 ( A + λ I ) x + 2 b = 0 \frac{dL}{dx}=2(A+\lambda I)x+2b=0 dxdL=2(A+λI)x+2b=0
设 A = Q T ε Q A=Q^T \varepsilon Q A=QTεQ, Q是特征向量, ε \varepsilon ε是特征矩阵,有
g ( λ ) = − b T ( A + λ I ) − T b − λ = − b T Q ( ε + λ I ) − 1 Q T b − λ = − ∑ i = 1 n ( b i T q i ) 2 λ i + λ − λ \begin{aligned} g(\lambda) &= -b^T(A+\lambda I)^{-T}b-\lambda \\ & =-b^TQ(\varepsilon +\lambda I)^{-1}Q^Tb -\lambda \\ &=-\sum \limits_{i=1}^{n}\frac{(b_i^Tq_i)^2}{\lambda_i+\lambda}-\lambda \end{aligned} g(λ)=−bT(A+λI)−Tb−λ=−bTQ(ε+λI)−1QTb−λ=−i=1∑nλi+λ(biTqi)2−λ
转换成对偶形式
D ) { m i n ∑ i = 1 n ( b i T q i ) 2 λ i + λ + λ s u b λ ≥ − λ ( A ) m i n D) \quad \left \{ \begin{aligned} & min \quad & \sum \limits_{i=1}^{n}\frac{(b_i^Tq_i)^2}{\lambda_i+\lambda}+\lambda \\ & sub \quad & \lambda\geq -\lambda(A)_{min} \end{aligned} \right. D)⎩⎪⎪⎨⎪⎪⎧minsubi=1∑nλi+λ(biTqi)2+λλ≥−λ(A)min
References
[1] https://www.zhihu.com/question/58584814/answer/1119054535
[2] https://zhuanlan.zhihu.com/p/133457394
[3] https://www.youtube.com/watch?v=Qneah_lyQ0o&list=PL-DDW8QIRjNOVxrU2efygBw0xADVOgpmw&index=15
[4] https://www.youtube.com/watch?v=0WpYucMfaHM&list=PL-DDW8QIRjNOVxrU2efygBw0xADVOgpmw&index=16