Convex Optimization 读书笔记 (4)

Chapter5: Duality

5.1 The Lagrange dual function

5.1.1 The Lagrangian

Consider an optimization problem in the standard form
m i n i m i z e      f 0 ( x ) s u b j e c t   t o      f i ( x ) ≤ 0 , i = 1 , . . . m h i ( x ) = 0 , i = 1 , . . . p \begin{aligned} {\rm minimize} \ \ \ \ & f_0(x)\\ {\rm subject \ to} \ \ \ \ & f_i(x)\leq0,i=1,...m \\ & h_i(x)=0,i=1,...p \end{aligned} minimize    subject to    f0(x)fi(x)0,i=1,...mhi(x)=0,i=1,...p
We define the Lagrangian Duality L : R n × R m × R p → R L : \mathbf{R}^n × \mathbf{R}^m ×\mathbf{R}^p → \mathbf{R} L:Rn×Rm×RpR associated with the problem as
L ( x , λ , ν ) = f 0 ( x ) + ∑ i = 1 m λ i f i ( x ) + ∑ i = 1 p ν i h i ( x ) L(x,\lambda,\nu)=f_0(x)+\sum_{i=1}^{m}\lambda_if_i(x)+\sum_{i=1}^{p}\nu_ih_i(x) L(x,λ,ν)=f0(x)+i=1mλifi(x)+i=1pνihi(x)
with d o m   L = D × R × R \mathbf{dom} \ L = \mathcal{D} × \mathbf{R} × \mathbf{R} dom L=D×R×R. We refer to λ i λ_i λi as the Lagrange multiplier associated with the i i i-th inequality constraint f i ( x ) ≤ 0 f_i(x) ≤ 0 fi(x)0; similarly we refer to ν i \nu_i νi as the Lagrange multiplier associated with the i i i-th equality constraint h i ( x ) = 0 h_i(x) = 0 hi(x)=0. The vectors λ λ λ and ν \nu ν are called the dual variables or Lagrange multiplier vectors.

5.1.2 The Lagrange dual function

We define the Lagrange dual function g : R m × R p → R g : \mathbf{R}^m × \mathbf{R}^p → \mathbf{R} g:Rm×RpR as the minimum value of the Lagrangian over x : f o r   λ ∈ R m , ν ∈ R p x: {\rm for} \ λ ∈ \mathbf{R}^m, \nu ∈ \mathbf{R}^p x:for λRm,νRp,
g ( λ , ν ) = inf ⁡ x ∈ D L ( x , λ , ν ) g(\lambda, \nu)=\inf_{x\in \mathcal{D}}L(x,\lambda,\nu) g(λ,ν)=xDinfL(x,λ,ν)

5.1.3 Lower bounds on optimal value

The dual function yields lower bounds on the optimal value p ⋆ p^⋆ p of the problem: For any λ ⪰ 0 λ \succeq 0 λ0 and any ν \nu ν we have
g ( λ , ν ) ≤ p ∗ g(\lambda,\nu)\leq p^* g(λ,ν)p

5.1.4 Linear approximation interpretation

5.1.5 Examples

5.1.6 The Lagrange dual function and conjugate functions

Consider an optimization problem with linear inequality and equality constraints,
m i n i m i z e      f 0 ( x ) s u b j e c t   t o      A x ⪯ b C x = d \begin{aligned} {\rm minimize} \ \ \ \ & f_0(x)\\ {\rm subject \ to} \ \ \ \ & Ax \preceq b \\ & Cx=d \end{aligned} minimize    subject to    f0(x)AxbCx=d
Using the conjugate of f 0 f_0 f0 we can write the dual function for the problem
g ( λ , ν ) = inf ⁡ x ( f 0 ( x ) + λ T ( A x − b ) + ν T ( C x − d ) ) = − b T λ − d T ν − f 0 ∗ ( − A T λ − C T ν ) \begin{aligned} g(\lambda,\nu) &= \inf_x(f_0(x)+\lambda^T(Ax-b)+\nu^T(Cx-d)) \\ &=-b^T\lambda-d^T\nu-f^*_0(-A^T\lambda-C^T\nu) \end{aligned} g(λ,ν)=xinf(f0(x)+λT(Axb)+νT(Cxd))=bTλdTνf0(ATλCTν)

5.2 The Lagrange dual problem

The optimization problem
m a x i m i z e      g ( λ , ν ) s u b j e c t   t o      λ ⪰ b \begin{aligned} {\rm maximize} \ \ \ \ & g(\lambda,\nu)\\ {\rm subject \ to} \ \ \ \ & \lambda \succeq b \\ \end{aligned} maximize    subject to    g(λ,ν)λb
This problem is called the Lagrange dual problem. We refer to ( λ ⋆ , ν ⋆ ) (λ^⋆, \nu^⋆) (λ,ν) as dual optimal or optimal Lagrange multipliers if they are optimal for the problem. The Lagrange dual problem is a convex optimization problem whether the original one is convex or not.

5.2.1 Making dual constraints explicit

In many cases we can identify the affine hull of d o m   g \mathbf{dom} \ g dom g, and describe it as a set of linear equality constraints.

5.2.2 Weak duality

The optimal value of the Lagrange dual problem, which we denote d ⋆ d^⋆ d, is, by definition, the best lower bound on p ⋆ p^⋆ p that can be obtained from the Lagrange dual function:
d ∗ ≤ p ∗ d^*\leq p^* dp
which holds even if the original problem is not convex. This property is called weak duality.

We refer to the difference p ⋆ − d ⋆ p^⋆ − d^⋆ pd as the optimal duality gap of the original problem.

5.2.3 Strong duality and Slater’s constraint qualification

If the equality
d ⋆ = p ⋆ d^⋆ = p^⋆ d=p
holds, then we say that strong duality holds.

One simple constraint qualification is Slater’s condition: There exists an x ∈ r e l i n t   D x ∈ \mathbf{relint} \ \mathcal{D} xrelint D such that
f i ( x ) < 0 ,    i = 1 , . . . , m ,    A x = b f_i(x)<0, \ \ i=1,...,m, \ \ Ax=b fi(x)<0,  i=1,...,m,  Ax=b
and f 0 ( x ) f_0(x) f0(x) is a convex function.

5.2.4 Examples

5.2.5 Mixed strategies for matrix games

5.3 Geometric interpretation

5.3.1 Weak and strong duality via set of values

Suppose G = { ( f 1 ( x ) , . . . , f m ( x ) , h 1 ( x ) , . . . , h p ( x ) , f 0 ( x ) ) ∈ R m × R p × R ∣ x ∈ D } \mathcal{G}=\{(f_1(x),...,f_m(x),h_1(x),...,h_p(x),f_0(x))\in \mathbf{R}^m\times\mathbf{R}^p\times\mathbf{R}\mid x\in \mathcal{D}\} G={(f1(x),...,fm(x),h1(x),...,hp(x),f0(x))Rm×Rp×RxD}, then the dual function at λ , ν \lambda,\nu λ,ν is a supporting hyperplane to G \mathcal{G} G :

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0GlZffdB-1604287826338)(/Users/apple/Library/Application Support/typora-user-images/image-20201030211057304.png)]

5.3.2 Proof of strong duality under constraint qualification

5.3.3 Multicriterion interpretation

Take the scalarization method for the (unconstrained) multicriterion problem
m i n i m i z e   ( w . r . t   R + m + 1 )      F ( x ) = ( f 1 ( x ) , . . . , f m ( x ) , f 0 ( x ) ) {\rm minimize \ (w.r.t} \ \mathbf{R}_+^{m+1}) \ \ \ \ F(x)=(f_1(x),...,f_m(x),f_0(x)) minimize (w.r.t R+m+1)    F(x)=(f1(x),...,fm(x),f0(x))
which is just the Lagrange duality for a problem without equality constraints
λ ~ T F ( x ) = f 0 ( x ) + ∑ i = 1 m λ i f i ( x ) \tilde{\lambda}^TF(x)=f_0(x)+\sum_{i=1}^{m}\lambda_if_i(x) λ~TF(x)=f0(x)+i=1mλifi(x)

5.4 Saddle-point interpretation

5.4.1 Max-min characterization of weak and strong duality

We can express the optimal value of the primal problem as
p ∗ = inf ⁡ x sup ⁡ λ ⪰ 0 L ( x , λ ) p^*=\inf_x\sup_{\lambda\succeq0}L(x,\lambda) p=xinfλ0supL(x,λ)
By the definition of the dual function, we also have
d ∗ = sup ⁡ λ ⪰ 0 inf ⁡ x L ( x , λ ) d^*=\sup_{\lambda\succeq0}\inf_xL(x,\lambda) d=λ0supxinfL(x,λ)
Thus, weak duality can be expressed as the inequality
sup ⁡ λ ⪰ 0 inf ⁡ x L ( x , λ ) ≤ inf ⁡ x sup ⁡ λ ⪰ 0 L ( x , λ ) \sup_{\lambda\succeq0}\inf_xL(x,\lambda)\leq\inf_x\sup_{\lambda\succeq0}L(x,\lambda) λ0supxinfL(x,λ)xinfλ0supL(x,λ)
and strong duality as the equality
sup ⁡ λ ⪰ 0 inf ⁡ x L ( x , λ ) = inf ⁡ x sup ⁡ λ ⪰ 0 L ( x , λ ) \sup_{\lambda\succeq0}\inf_xL(x,\lambda)=\inf_x\sup_{\lambda\succeq0}L(x,\lambda) λ0supxinfL(x,λ)=xinfλ0supL(x,λ)

5.4.2 Saddle-point interpretation

Returning to our discussion of Lagrange duality, we see that if x ⋆ x^⋆ x and λ ⋆ λ^⋆ λ are primal and dual optimal points for a problem in which strong duality obtains, they form a saddle-point for the Lagrangian. The converse is also true: If ( x , λ ) (x,λ) (x,λ) is a saddle-point of the Lagrangian, then x is primal optimal, λ λ λ is dual optimal, and the optimal duality gap is zero.

5.4.3 Game interpretation

The optimal duality gap for the problem is exactly equal to the advantage afforded the player who goes second, i.e., the player who has the advantage of knowing his or her opponent’s choice before choosing. If strong duality holds, then there is no advantage to the players of knowing their opponent’s choice.

5.4.4 Price or tax interpretation

5.5 Optimality conditions

5.5.1 Certificate of suboptimality and stopping criteria

A dual feasible point ( λ , ν ) (λ,\nu) (λ,ν) provides a proof or certificate that p ⋆ ≥ g ( λ , ν ) p^⋆ ≥ g(λ,\nu) pg(λ,ν).

The stopping criterion
f 0 ( x ( k ) ) − g ( λ k , ν k ) < ϵ a b s f_0(x^{(k)})-g(\lambda^{k},\nu^k)<\epsilon_{\rm abs} f0(x(k))g(λk,νk)<ϵabs
guarantees that when the algorithm terminates, x ( k ) x^{(k)} x(k) is ϵ a b s \epsilon_{\rm abs} ϵabs-suboptimal.

5.5.2 Complementary slackness

Let x ⋆ x^⋆ x be a primal optimal and ( λ ⋆ , ν ⋆ ) (λ^⋆, ν^⋆) (λ,ν) be a dual optimal point. This condition is known as complementary slackness that λ ∗ f i ( x ∗ ) = 0 \lambda^*f_i(x^*)=0 λfi(x)=0

5.5.3 KKT optimality conditions

KKT conditions for nonconvex problems

Let x ⋆ x^⋆ x be a primal optimal and ( λ ⋆ , ν ⋆ ) (λ^⋆, ν^⋆) (λ,ν) be a dual optimal point. Since x ⋆ x^⋆ x minimizes L ( x , λ ⋆ , ν ⋆ ) L(x, λ^⋆ , \nu^ ⋆ ) L(x,λ,ν) over x x x, it follows that its gradient must vanish at x ⋆ x^⋆ x,
∇ f 0 ( x ⋆ ) + ∑ i = 1 m λ i ∗ f i ( x ⋆ ) + ∑ i = 1 p ν i ∗ h i ( x ⋆ ) = 0 \nabla f_0(x^⋆)+\sum_{i=1}^m \lambda_i^*f_i(x^⋆)+\sum_{i=1}^p\nu_i^*h_i(x^⋆)=0 f0(x)+i=1mλifi(x)+i=1pνihi(x)=0
then the Karush-Kuhn-Tucker (KKT) condition is
f i ( x ⋆ ) ≤ 0 ,    i = 1 , . . . , m h i ( x ⋆ ) = 0 ,    i = 1 , . . . , p λ i ∗ ≥ 0 ,    i = 1 , . . . , m λ i ∗ f i ( x ⋆ ) = 0 ,    i = 1 , . . . , m ∇ f 0 ( x ⋆ ) + ∑ i = 1 m λ i ∗ f i ( x ⋆ ) + ∑ i = 1 p ν i ∗ h i ( x ⋆ ) = 0 \begin{aligned} f_i(x^⋆) & \leq 0, \ \ i=1,...,m \\ h_i(x^⋆) &=0,\ \ i=1,...,p \\ \lambda_i^* &\geq0,\ \ i=1,...,m \\ \lambda_i^*f_i(x^⋆)&=0,\ \ i=1,...,m \\ \nabla f_0(x^⋆)+\sum_{i=1}^m \lambda_i^*f_i(x^⋆)+\sum_{i=1}^p\nu_i^*h_i(x^⋆)&=0 \end{aligned} fi(x)hi(x)λiλifi(x)f0(x)+i=1mλifi(x)+i=1pνihi(x)0,  i=1,...,m=0,  i=1,...,p0,  i=1,...,m=0,  i=1,...,m=0
To summarize, for any optimization problem with differentiable objective and constraint functions for which strong duality obtains, any pair of primal and dual optimal points must satisfy the KKT conditions.

KKT conditions for convex problems

If f i f_i fi are convex and h i h_i hi are affine, and x ~ , λ , ν ~ \tilde{x}, λ, \tilde{ν} x~,λ,ν~ are any points that satisfy the KKT conditions, then x ~ \tilde{x} x~ and ( λ , ν ~ ) (λ,\tilde{ν}) (λ,ν~) are primal and dual optimal, with zero duality gap.

5.5.4 Mechanics interpretation of KKT conditions

5.5.5 Solving the primal problem via the dual

If strong duality holds and a dual optimal solution ( λ ⋆ , ν ⋆ ) (λ^⋆,ν^⋆) (λ,ν) exists, then any primal optimal point is also a minimizer of L ( x , λ ⋆ , ν ⋆ ) L(x, λ^⋆, ν^⋆) L(x,λ,ν).

5.6 Perturbation and sensitivity analysis

5.6.1 The perturbed problem

Consider the following perturbed version of the original optimization problem
m i n i m i z e      f 0 ( x ) s u b j e c t   t o      f i ( x ) ≤ u i , i = 1 , . . . m h i ( x ) = v i , i = 1 , . . . p \begin{aligned} {\rm minimize} \ \ \ \ & f_0(x)\\ {\rm subject \ to} \ \ \ \ & f_i(x)\leq u_i,i=1,...m \\ & h_i(x)=v_i,i=1,...p \end{aligned} minimize    subject to    f0(x)fi(x)ui,i=1,...mhi(x)=vi,i=1,...p

5.6.2 A global inequality

Let ( λ ⋆ , ν ⋆ ) (λ^⋆,ν^⋆) (λ,ν) be optimal for the dual of the unperturbed problem. Then for all u u u and ν \nu ν we have
p ∗ ( u , v ) ≥ p ∗ ( 0 , 0 ) − λ ∗ T u − ν ∗ T p^*(u,v)\geq p^*(0,0)-\lambda^{*T}u-\nu^{*T} p(u,v)p(0,0)λTuνT

5.6.3 Local sensitivity analysis

Provided strong duality holds, the optimal dual variables λ ⋆ , ν ⋆ λ^⋆, \nu^⋆ λ,ν are related to the gradient of p ⋆ p^⋆ p at
λ i ∗ = − ∂ p ∗ ( 0 , 0 ) ∂ u i ,      ν i ∗ = − ∂ p ∗ ( 0 , 0 ) ∂ v i \lambda^*_i=-\frac{\partial p^*(0,0)}{\partial u_i}, \ \ \ \ \nu^*_i=-\frac{\partial p^*(0,0)}{\partial v_i} λi=uip(0,0),    νi=vip(0,0)
If λ i ⋆ λ^⋆_i λi is small, it means that the constraint can be loosened or tightened a bit without much effect on the optimal value; if λ i ⋆ λ^⋆_i λi is large, it means that if the constraint is loosened or tightened a bit, the effect on the optimal value will be great.

5.7 Examples

5.7.1 Introducing new variables and equality constraints

5.7.2 Transforming the objective

5.7.3 Implicit constraints

5.8 Theorems of alternatives

5.8.1 Weak alternatives via the dual function

We can apply Lagrange duality theory to the problem of determining feasibility of a system of inequalities and equalities
m i n i m i z e      0 s u b j e c t   t o      f i ( x ) ≤ 0 , i = 1 , . . . m h i ( x ) = 0 , i = 1 , . . . p \begin{aligned} {\rm minimize} \ \ \ \ & 0\\ {\rm subject \ to} \ \ \ \ & f_i(x)\leq0,i=1,...m \\ & h_i(x)=0,i=1,...p \end{aligned} minimize    subject to    0fi(x)0,i=1,...mhi(x)=0,i=1,...p
This problem has optimal value
p ∗ = { 0 , f e a s i b l e ∞ , i n f e a s i b l e p^*=\left\{ \begin{array}{rcl} & 0 , & {\rm feasible} \\ & \infty ,& {\rm infeasible} \end{array} \right. p={0,,feasibleinfeasible
Two systems of inequalities (and equalities) are called weak alternatives if at most one of the two is feasible.

5.8.2 Strong alternatives

When the original inequality system is convex and some type of constraint qualification holds, then the pairs of weak alternatives described above are strong alternatives, which means that exactly one of the two alternatives holds.

5.8.3 Examples

5.9 Generalized inequalities

Lagrange duality extends to a problem with generalized inequality constraints
m i n i m i z e      f 0 ( x ) s u b j e c t   t o      f i ( x ) ⪯ K i 0 , i = 1 , . . . m h i ( x ) = 0 , i = 1 , . . . p \begin{aligned} {\rm minimize} \ \ \ \ & f_0(x)\\ {\rm subject \ to} \ \ \ \ & f_i(x)\preceq_{K_i}0,i=1,...m \\ & h_i(x)=0,i=1,...p \end{aligned} minimize    subject to    f0(x)fi(x)Ki0,i=1,...mhi(x)=0,i=1,...p

5.9.1 The Lagrange dual

The Lagrange dual optimization problem is
m a x i m i z e      g ( λ , ν ) s u b j e c t   t o      λ i ⪰ K i ∗ b \begin{aligned} {\rm maximize} \ \ \ \ & g(\lambda,\nu)\\ {\rm subject \ to} \ \ \ \ & \lambda_i \succeq_{K_i^*} b \\ \end{aligned} maximize    subject to    g(λ,ν)λiKib
where K i ∗ K_i^∗ Ki denotes the dual cone of K i K_i Ki.

5.9.2 Optimality conditions

Complementary slackness

We can conclude that
λ i ∗ ≻ K i 0 ⟹ f i ( x ∗ ) = 0 ,      f i ( x ∗ ) ≺ K i 0 ⟹ λ i ∗ = 0 \lambda_i^*\succ_{K_i}0\Longrightarrow f_i(x^*)=0, \ \ \ \ f_i(x^*)\prec_{K_i}0\Longrightarrow \lambda_i^*=0 λiKi0fi(x)=0,    fi(x)Ki0λi=0
However, in contrast to problems with scalar inequalities, it is possible to satisfy with λ i ⋆ ≠ 0 λ^⋆_i \ne 0 λi=0 and f i ( x ⋆ ) ≠ 0 f_i(x^⋆) \ne 0 fi(x)=0.

KKT conditions

Now we add the assumption that the functions f i , h i f_i, h_i fi,hi are differentiable, and generalize the KKT conditions to problems with generalized inequalities:
f i ( x ⋆ ) ⪯ K i 0 ,    i = 1 , . . . , m h i ( x ⋆ ) = 0 ,    i = 1 , . . . , p λ i ∗ ⪰ K i 0 ,    i = 1 , . . . , m λ i ∗ f i ( x ⋆ ) = 0 ,    i = 1 , . . . , m ∇ f 0 ( x ⋆ ) + ∑ i = 1 m D f i ( x ⋆ ) T λ i ∗ + ∑ i = 1 p ν i ∗ h i ( x ⋆ ) = 0 \begin{aligned} f_i(x^⋆) & \preceq_{K_i} 0, \ \ i=1,...,m \\ h_i(x^⋆) &=0,\ \ i=1,...,p \\ \lambda_i^* & \succeq_{K_i} 0,\ \ i=1,...,m \\ \lambda_i^*f_i(x^⋆)&=0,\ \ i=1,...,m \\ \nabla f_0(x^⋆)+\sum_{i=1}^m Df_i(x^⋆)^T\lambda_i^*+\sum_{i=1}^p\nu_i^*h_i(x^⋆)&=0 \end{aligned} fi(x)hi(x)λiλifi(x)f0(x)+i=1mDfi(x)Tλi+i=1pνihi(x)Ki0,  i=1,...,m=0,  i=1,...,pKi0,  i=1,...,m=0,  i=1,...,m=0
where D f i ( x ⋆ ) ∈ R k i × n Df_i(x^⋆)\in \mathbf{R}^{k_i\times n} Dfi(x)Rki×n is the derivative of f i f_i fi evaluated at x ⋆ x^⋆ x.

5.9.3 Perturbation and sensitivity analysis

We consider the associated perturbed version of the problem
m i n i m i z e      f 0 ( x ) s u b j e c t   t o      f i ( x ) ⪯ K i u i , i = 1 , . . . m h i ( x ) = v i , i = 1 , . . . p \begin{aligned} {\rm minimize} \ \ \ \ & f_0(x)\\ {\rm subject \ to} \ \ \ \ & f_i(x)\preceq_{K_i} u_i,i=1,...m \\ & h_i(x)=v_i,i=1,...p \end{aligned} minimize    subject to    f0(x)fi(x)Kiui,i=1,...mhi(x)=vi,i=1,...p

5.9.4 Theorems of alternatives

We can derive theorems of alternatives for systems of generalized inequalities and equalities
f i ( x ) ⪯ K i 0 , i = 1 , . . . m h i ( x ) = 0 , i = 1 , . . . p f_i(x)\preceq_{K_i}0,i=1,...m \\ h_i(x)=0,i=1,...p fi(x)Ki0,i=1,...mhi(x)=0,i=1,...p
where K i ⊆ R i k K_i ⊆ \mathbf{R}^k_i KiRik are proper cones. We will also consider systems with strict inequalities,
f i ( x ) ≺ K i 0 , i = 1 , . . . m h i ( x ) = 0 , i = 1 , . . . p f_i(x)\prec_{K_i}0,i=1,...m \\ h_i(x)=0,i=1,...p fi(x)Ki0,i=1,...mhi(x)=0,i=1,...p

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值