在约束最优化问题中,常常利用拉格朗日对偶性将原始问题转换为对偶问题,通过解对偶问题而得到原始问题的解。该方法应用在许多统计学习方法中,例如,最大熵模型与支持向量机。
1、原始问题
假设
f
(
x
)
f(x)
f(x),
c
i
(
x
)
c_{i}(x)
ci(x),
h
j
(
x
)
h_{j}(x)
hj(x)是定义在
R
n
R^n
Rn上的连续可微函数。考虑约束最优化问题
(C.1)
min
x
∈
R
n
f
(
x
)
\min_{x\in R^n} f(x) \tag{C.1}
x∈Rnminf(x)(C.1)
(C.2)
s
.
t
.
c
i
(
x
)
≤
0
,
i
=
1
,
2
,
…
,
k
s.t. \ c_{i}(x)\le 0,i = 1,2,\ldots,k \tag{C.2}
s.t. ci(x)≤0,i=1,2,…,k(C.2)
(C.3)
h
j
(
x
)
=
0
,
j
=
1
,
2
,
…
,
l
h_{j}(x)=0,j=1,2,\ldots,l \tag{C.3}
hj(x)=0,j=1,2,…,l(C.3)
称此约束最优化问题为原始最优化问题或原始问题。
首先,引入广义拉格朗日函数
(C.4)
L
(
x
,
α
,
β
)
=
f
(
x
)
+
∑
i
=
1
k
α
i
c
i
(
x
)
+
∑
j
=
1
l
β
j
h
j
(
x
)
L(x,\alpha,\beta) = f(x)+\sum_{i=1}^{k}\alpha_{i}c_{i}(x) + \sum_{j=1}^l \beta_{j}h_{j}(x) \tag{C.4}
L(x,α,β)=f(x)+i=1∑kαici(x)+j=1∑lβjhj(x)(C.4)
这里,
x
=
(
x
(
1
)
,
x
(
2
)
,
…
,
x
(
n
)
)
∈
R
n
x=(x^{(1)},x^{(2)},\ldots,x^{(n)}) \in R^{n}
x=(x(1),x(2),…,x(n))∈Rn,
α
i
,
β
j
\alpha_{i},\beta_{j}
αi,βj是拉格朗日乘子,
α
i
≥
0
\alpha_{i} \ge 0
αi≥0。考虑
x
x
x的函数:
(C.5)
θ
p
(
x
)
=
max
α
,
β
:
α
i
≥
0
L
(
x
,
α
,
β
)
\theta_{p}(x) = \max_{\alpha,\beta:\alpha_{i}\ge 0} L(x,\alpha,\beta) \tag{C.5}
θp(x)=α,β:αi≥0maxL(x,α,β)(C.5)
这里,下标
P
P
P表示原始问题。
假设给定某个x,如果x违反原始问题的约束条件,即存在某个i使得
c
i
(
w
)
>
0
c_{i}(w) \gt 0
ci(w)>0或者存在某个j使得
h
j
(
w
)
≠
0
h_{j}(w) \ne 0
hj(w)̸=0,那么有.
(C.6)
θ
p
x
=
max
α
,
β
:
α
i
≥
0
[
f
(
x
)
+
∑
i
=
1
k
α
i
c
i
(
x
)
+
∑
j
=
1
l
β
j
h
j
(
x
)
]
=
+
∞
\theta_{p}{x}=\max_{\alpha,\beta:\alpha_{i}\ge 0}[f(x)+\sum_{i=1}^{k}\alpha_{i}c_{i}(x) + \sum_{j=1}^l \beta_{j}h_{j}(x)] = +\infty \tag{C.6}
θpx=α,β:αi≥0max[f(x)+i=1∑kαici(x)+j=1∑lβjhj(x)]=+∞(C.6)
因为若某个i使约束
c
i
(
x
)
>
0
c_{i}(x) \gt 0
ci(x)>0,则可令
α
i
→
+
∞
\alpha_{i} \rightarrow +\infty
αi→+∞,若某个
j
j
j使
h
j
(
x
)
≠
0
h_{j}(x) \ne 0
hj(x)̸=0,则可令
β
j
\beta_{j}
βj使得
β
j
h
j
(
x
)
→
+
∞
\beta_{j}h_{j}(x)\rightarrow +\infty
βjhj(x)→+∞,而将其余各
α
i
,
β
j
\alpha_{i},\beta_{j}
αi,βj均取为0。
相反地,如果
x
x
x满足约束条件式(C.2)和(C.3),则由式(C.5)和式(C.4)可知,
θ
p
(
x
)
=
f
(
x
)
\theta_{p}(x)=f(x)
θp(x)=f(x)。因此,
(C.7)
θ
p
(
x
)
=
{
f
(
x
)
,
x
满
足
原
始
条
件
约
束
+
∞
,
其
他
\theta_{p}(x)=\begin{cases} f(x),&x满足原始条件约束\\ +\infty,&其他 \end{cases} \tag{C.7}
θp(x)={f(x),+∞,x满足原始条件约束其他(C.7)
所以如果考虑极小化问题
(C.8)
min
x
θ
p
(
x
)
=
min
x
max
α
,
β
:
α
i
≥
0
L
(
x
,
α
,
β
)
\min_{x}\theta_{p}(x)=\min_{x}\max_{\alpha,\beta:\alpha_{i}\ge 0}L(x,\alpha,\beta) \tag{C.8}
xminθp(x)=xminα,β:αi≥0maxL(x,α,β)(C.8)
它是与原始最优化问题(C.1~C.3)等价的,即他们由相同的解。问题
min
x
max
α
,
β
:
α
i
≥
0
L
(
x
,
α
,
β
)
\min \limits_{x}\max \limits_{\alpha,\beta:\alpha_{i}\ge 0}L(x,\alpha,\beta)
xminα,β:αi≥0maxL(x,α,β)称为广义拉格朗日极小极大问题。这样,就把原始问题的最优值
(C.9)
p
∗
=
min
x
θ
p
(
x
)
p^*=\min_{x}\theta_{p}(x) \tag{C.9}
p∗=xminθp(x)(C.9)
称为原始问题的值。
2、对偶问题
定义
(C.10)
θ
D
(
α
,
β
)
=
min
x
L
(
x
,
α
,
β
)
\theta_{D}(\alpha,\beta) = \min_{x}L(x,\alpha,\beta) \tag{C.10}
θD(α,β)=xminL(x,α,β)(C.10)
在考虑极大化
θ
D
(
α
,
β
)
=
min
x
L
(
x
,
α
,
β
)
\theta_{D}(\alpha,\beta) = \min_{x}L(x,\alpha,\beta)
θD(α,β)=minxL(x,α,β),即
(C.11)
max
α
,
β
:
α
i
≥
0
θ
D
(
α
,
β
)
=
max
α
,
β
:
α
i
≥
0
min
x
L
(
x
,
α
,
β
)
\max_{\alpha,\beta:\alpha_{i}\ge 0}\theta_{D}(\alpha,\beta)=\max_{\alpha,\beta:\alpha_{i}\ge 0}\min_{x}L(x,\alpha,\beta) \tag{C.11}
α,β:αi≥0maxθD(α,β)=α,β:αi≥0maxxminL(x,α,β)(C.11)
问题
max
α
,
β
:
α
i
≥
0
min
x
L
(
x
,
α
,
β
)
\max \limits_{\alpha,\beta:\alpha_{i}\ge 0}\min_{x}L(x,\alpha,\beta)
α,β:αi≥0maxminxL(x,α,β)称为广义拉格朗日函数的极大极小问题。
可以将广义拉格朗日函数的极大极小问题表示为约束最优化问题:
(C.12)
max
α
,
β
θ
D
(
α
,
β
)
=
max
α
,
β
min
x
L
(
x
,
α
,
β
)
\max_{\alpha,\beta}\theta_{D}(\alpha,\beta)=\max_{\alpha,\beta}\min_{x}L(x,\alpha,\beta) \tag{C.12}
α,βmaxθD(α,β)=α,βmaxxminL(x,α,β)(C.12)
(C.13)
s
.
t
.
α
i
≥
0
,
i
=
1
,
2
,
…
,
k
s.t. \ \alpha_{i}\ge 0,i=1,2,\ldots,k \tag{C.13}
s.t. αi≥0,i=1,2,…,k(C.13)
称为原始问题的对偶问题。定义对偶问题的最优值
(C.14)
d
∗
=
max
α
,
β
:
α
i
≥
0
θ
D
(
α
,
β
)
d^*=\max_{\alpha,\beta:\alpha_{i}\ge 0}\theta_{D}(\alpha,\beta) \tag{C.14}
d∗=α,β:αi≥0maxθD(α,β)(C.14)
称为对偶问题的值。
3、原始问题和对偶问题的关系
定理C.1 若原始问题对偶问题都有最优值,则
(C.15)
d
∗
=
max
α
,
β
:
α
i
≥
0
min
x
L
(
x
,
α
,
β
)
≤
min
x
max
α
,
β
:
α
i
≥
0
L
(
x
,
α
,
β
)
=
p
∗
d^*=\max_{\alpha,\beta:\alpha_{i}\ge 0}\min_{x}L(x,\alpha,\beta) \le \min_{x}\max_{\alpha,\beta:\alpha_{i}\ge 0}L(x,\alpha,\beta) = p^* \tag{C.15}
d∗=α,β:αi≥0maxxminL(x,α,β)≤xminα,β:αi≥0maxL(x,α,β)=p∗(C.15)
证明 由式
(
C
.
12
)
(C.12)
(C.12)和式
(
C
.
5
)
(C.5)
(C.5),对任意的
α
,
β
\alpha,\beta
α,β和
x
x
x,有
(C.16)
θ
D
(
α
,
β
)
=
min
x
L
(
x
,
α
,
β
)
≤
L
(
x
,
α
,
β
)
≤
max
α
,
β
:
α
i
≥
0
L
(
x
,
α
,
β
)
=
θ
p
(
x
)
\theta_{D}(\alpha,\beta) = \min_{x}L(x,\alpha,\beta) \le L(x,\alpha,\beta) \le \max_{\alpha,\beta:\alpha_{i}\ge 0}L(x,\alpha,\beta) = \theta_{p}(x) \tag{C.16}
θD(α,β)=xminL(x,α,β)≤L(x,α,β)≤α,β:αi≥0maxL(x,α,β)=θp(x)(C.16)
即
(C.17)
θ
D
(
α
,
β
)
≤
θ
p
(
x
)
\theta_{D}(\alpha,\beta) \le \theta_{p}(x) \tag{C.17}
θD(α,β)≤θp(x)(C.17)
由于原始问题和对偶问题均有最优值,所以,
(C.18)
max
α
,
β
:
α
i
≥
0
θ
D
(
α
,
β
)
≤
min
x
θ
p
(
x
)
\max_{\alpha,\beta:\alpha_{i}\ge 0}\theta_{D}(\alpha,\beta) \le \min_{x} \theta_{p}(x) \tag{C.18}
α,β:αi≥0maxθD(α,β)≤xminθp(x)(C.18)
即
(C.19)
d
∗
=
max
α
,
β
:
α
i
≥
0
min
x
L
(
x
,
α
,
β
)
≤
min
x
max
α
,
β
:
α
i
≥
0
L
(
x
,
α
,
β
)
=
p
∗
d^*=\max_{\alpha,\beta:\alpha_{i}\ge 0}\min_{x}L(x,\alpha,\beta) \le \min_{x}\max_{\alpha,\beta:\alpha_{i}\ge 0}L(x,\alpha,\beta) = p^* \tag{C.19}
d∗=α,β:αi≥0maxxminL(x,α,β)≤xminα,β:αi≥0maxL(x,α,β)=p∗(C.19)
推论C.1 设 x ∗ x^* x∗和 α ∗ , β ∗ \alpha^*,\beta^* α∗,β∗分别为原始问题(C.1)-(C.3)和对偶问题(C.12)-(C.13)的可行解,并且 d ∗ = p ∗ d^*=p^* d∗=p∗,则 x ∗ x^* x∗和 α ∗ , β ∗ \alpha^*,\beta^* α∗,β∗分别式原始问题和对偶问题的最优解。
定理C.2 考虑原始问题(C.1)-(C.3)和对偶问题(C.12)-(C.13)。假设函数
f
(
x
)
f(x)
f(x)和
c
i
(
x
)
c_{i}(x)
ci(x)是凸函数,
h
j
(
x
)
h_{j}(x)
hj(x)是仿射函数;并且假设不等式约束
c
i
(
x
)
c_{i}(x)
ci(x)是严格可行的,即存在x,对所与
i
i
i有
c
i
(
x
)
<
0
c_{i}(x) \lt 0
ci(x)<0,则存在
x
∗
,
α
∗
,
β
∗
x^*,\alpha^*,\beta^*
x∗,α∗,β∗,使
x
∗
x^*
x∗是原始问题的解,
α
∗
\alpha^*
α∗,
β
∗
\beta^*
β∗是对偶问题的解,并且$
(C.20)
p
∗
=
d
∗
=
L
(
x
∗
,
α
∗
,
β
∗
)
p^*=d^*=L(x^*,\alpha^*,\beta^*) \tag{C.20}
p∗=d∗=L(x∗,α∗,β∗)(C.20)
***定理C.3***对原始问题(C.1)-(C.3)和对偶问题(C.12)-(C.13),假设函数
f
(
x
)
f(x)
f(x)和
c
i
(
x
)
c_{i}(x)
ci(x)是凸函数,
h
j
(
x
)
h_{j}(x)
hj(x)是仿射函数,并且假设不等式约束
c
i
(
x
)
c_{i}(x)
ci(x)是严格可行的则
x
∗
x^*
x∗和
α
∗
,
β
∗
\alpha^*,\beta^*
α∗,β∗分别式原始问题和对偶问题的解的充分必要条件是
x
∗
,
α
∗
,
β
∗
x^*,\alpha^*,\beta^*
x∗,α∗,β∗满足下面的Karush-Kuhn-Tucker(KKT)条件:
(C.21)
∇
x
L
(
x
∗
,
α
∗
,
β
∗
)
=
0
\nabla_{x}L(x^*,\alpha^*,\beta^*)=0 \tag{C.21}
∇xL(x∗,α∗,β∗)=0(C.21)
(C.22)
α
∗
c
i
(
x
∗
)
=
0
,
i
=
1
,
2
,
…
,
k
\alpha^*c_{i}(x^*)=0,\ i=1,2,\ldots,k \tag{C.22}
α∗ci(x∗)=0, i=1,2,…,k(C.22)
(C.23)
c
i
(
x
∗
)
≤
0
,
i
=
1
,
2
,
…
,
k
c_{i}(x^*) \le 0, \ i=1,2,\ldots,k \tag{C.23}
ci(x∗)≤0, i=1,2,…,k(C.23)
(C.24)
α
∗
≥
0
,
i
=
1
,
2
,
…
,
k
\alpha^* \ge 0, \ i=1,2,\ldots,k \tag{C.24}
α∗≥0, i=1,2,…,k(C.24)
(C.25)
h
j
(
x
∗
)
=
0
,
j
=
1
,
2
,
…
,
l
h_{j}(x^*) = 0,\ j = 1,2,\ldots,l \tag{C.25}
hj(x∗)=0, j=1,2,…,l(C.25)
特别指出,式(C.24)称为KKT的队友互补条件。由此条件可知:若
α
∗
>
0
\alpha^* \gt 0
α∗>0,则
c
i
(
x
∗
)
=
0
c_{i}(x^*) = 0
ci(x∗)=0