凸优化基础知识—对偶(Duality)

Directory


5.1 拉格朗日对偶函数(The Lagrange dual function)

5.1.1 拉格朗日(Lagrange)

An optimization problem in the standard form:
min ⁡ f 0 ( x ) s . t .   f i ( x ) ≤ 0 , i = 1 , . . . , m    h i ( x ) = 0 , i = 1 , . . . , p \begin{array}{lll} \min f_0(x) \\ s.t. ~ f_i(x) \le 0, i=1,...,m \\ \quad ~~ h_i(x) = 0, i =1,...,p \end{array} minf0(x)s.t. fi(x)0,i=1,...,m  hi(x)=0,i=1,...,p with variable x ∈ R n x\in \mathbb{R}^n xRn. We assume its domain D = ⋂ i = 0 m dom ⁡ f i ∩ ⋂ i = 1 p dom ⁡ h i \mathcal{D}=\bigcap_{i=0}^{m} \operatorname{dom} f_{i} \cap \bigcap_{i=1}^{p} \operatorname{dom} h_{i} D=i=0mdomfii=1pdomhi is nonempty, and denote the optimal value of the problem by p ∗ p^* p. We do not assume the problem is convex.

The basic idea in Lagrangian duality is to take the constraints into account by augmenting the objective function with a weighted sum of the constraint functions. We define the Lagrangian (function) L L L: R n × R m × R p → R \mathbb{R}^n \times \mathbb{R}^m \times \mathbb{R}^p \rightarrow \mathbb{R} Rn×Rm×RpR associated with the problem as
L ( x , λ , v ) = f 0 ( x ) + ∑ i = 1 m λ i f i ( x ) + ∑ i = 1 p v i h i ( x ) L(x,\lambda,v)=f_0(x)+\sum_{i=1}^{m}\lambda_i f_i(x) + \sum_{i=1}^{p} v_i h_i(x) L(x,λ,v)=f0(x)+i=1mλifi(x)+i=1pvihi(x) with dom L = D × R m × R p L=D \times \mathbb{R}^m \times \mathbb{R}^p L=D×Rm×Rp. We refer to λ i \lambda_i λi as the Lagrange multiplier associated with the i i ith inequality constraint f i ( x ) ≤ 0 f_i(x)\le 0 fi(x)0; similarly v i v_i vi is the Lagrange multiplier associated with the i i ith inequality constraint h i ( x ) = 0 h_i(x) = 0 hi(x)=0. The vectors λ \lambda λ and ν ν ν are called the dual variables or Lagrange multiplier vectors associated with the problem.


5.1.2 朗格朗日对偶函数(The Lagrange dual function)

Define Lagrange dual function g g g: R m × R p → R \mathbb{R}^m \times \mathbb{R}^p \rightarrow \mathbb{R} Rm×RpR as the minimum value of the Lagrangian over x x x: for λ ∈ R m \lambda \in \mathbb{R}^m λRm, v ∈ R p v \in \mathbb{R}^p vRp,
g ( λ , v ) = inf ⁡ x ∈ D L ( x , λ , v ) = inf ⁡ x ∈ D ( f 0 ( x ) + ∑ i = 1 m λ i f i ( x ) + ∑ i = 1 p v i h i ( x ) ) g(\lambda, v)=\inf _{x \in \mathcal{D}} L(x, \lambda, v)=\inf _{x \in \mathcal{D}}\left(f_{0}(x)+\sum_{i=1}^{m} \lambda_{i} f_{i}(x)+\sum_{i=1}^{p} v_{i} h_{i}(x)\right) g(λ,v)=xDinfL(x,λ,v)=xDinf(f0(x)+i=1mλifi(x)+i=1pvihi(x))
When the Lagrangian is unbounded below in x x x, the dual function takes on the value − ∞ −\infty . Since the dual function is the pointwise infimum of a family of affine functions of ( λ , ν ) (λ,ν) (λ,ν), it is concave, even when the problem is not convex.


5.1.3 最优值的下界(Lower bounds on optimal value)

The dual function yields lower bounds on the optimal value p ⋆ of the problem (5.1): For any 0 ⪯ λ 0 \preceq \lambda 0λ and any ν ν ν we have
g ( λ , v ) = inf ⁡ x ∈ D L ( x , λ , v ) ≤ p ∗ , g(\lambda, v)=\inf _{x \in \mathcal{D}} L(x, \lambda, v)\le p^* , g(λ,v)=xDinfL(x,λ,v)p, since g ( λ , v ) = inf ⁡ x ∈ D L ( x , λ , v ) ≤ L ( x ~ , λ , v ) ≤ f 0 ( x ~ ) ≤ p ∗ , g(\lambda, v)=\inf _{x \in \mathcal{D}} L(x, \lambda, v)\le L(\tilde{x}, \lambda, v)\le f_0(\tilde{x}) \le p^* , g(λ,v)=xDinfL(x,λ,v)L(x~,λ,v)f0(x~)p, where x ~ \tilde{x} x~ is a feasible solution for the problem.


5.1.5 例子(Examples)

线性等式的最小二乘解(Least-squares solution of linear equations)

we consider the problem:
min ⁡   x T x s . t .    A x = b \begin{array}{ll} \min~ x^Tx \\ s.t. ~~Ax=b \end{array} min xTxs.t.  Ax=b where A ∈ R p × n A\in \mathbb{R}^{p \times n} ARp×n.
Then, we give the Lagrangian :
L ( x , v ) = x T x + v T ( A x − b ) , L(x,v) = x^Tx + v^T(Ax-b), L(x,v)=xTx+vT(Axb), with domain R n × R p . \mathbb{R}^n \times \mathbb{R}^p. Rn×Rp.
Since L(x,ν) is a convex quadratic function of x x x, we can find the minimizing x from the optimality condition
∇ x L ( x , v ) = 2 x + A T v = 0 , \nabla_xL(x,v) = 2x + A^Tv =0, xL(x,v)=2x+ATv=0, which yields x ∗ = − ( 1 2 A T v ) x^*= -(\frac{1}{2}A^Tv) x=(21ATv) . Thereforethe dual function is
g ( v ) = L ( − ( 1 / 2 ) A T v , v ) = − ( 1 / 4 ) v T A A T v − b T v g(v)=L\left(-(1 / 2) A^{T} v, v \right)=-(1 / 4) v^{T} A A^{T} v-b^{T} v g(v)=L((1/2)ATv,v)=(1/4)vTAATvbTv, which is a concave quadratic function of x x x, with domain R p \mathbb{R}^p Rp.


5.1.6 拉格朗日函数&共轭函数(The Lagrange dual function and conjugate functions)

the conjugate f ∗ f^* f of a function f f f: R n → R \mathbb{R}^n\rightarrow \mathbb{R} RnR is given by
f ∗ ( y ) = sup ⁡ x ∈ d o m f ( y T x − f ( x ) ) f^*(y) = \sup_{x\in \mathbf{dom} f} (y^Tx-f(x)) f(y)=xdomfsup(yTxf(x))

Given a problem:
min ⁡   f ( x ) s . t .    x = 0 \begin{array}{ll} \min~ f(x) \\ s.t. ~~x=0 \end{array} min f(x)s.t.  x=0 Then, we have Lagrangian L ( x , v ) = f ( x ) + v T x L(x,v)=f(x)+v^Tx L(x,v)=f(x)+vTx, and dual function is g ( v ) = inf ⁡ x ( f ( x ) + v T x ) = − sup ⁡ x ( ( − v ) T x − f ( x ) ) = − f ∗ ( − v ) g(v)=\inf_x (f(x)+v^Tx)=-\sup_x((-v)^Tx-f(x))=-f^*(-v) g(v)=xinf(f(x)+vTx)=xsup((v)Txf(x))=f(v).
More generally, consider an optimization problem with linear inequality and equality constraints,
min ⁡   f 0 ( x ) s . t .    A x ⪯ b C x = d . \begin{array}{ll} \min~ f_0(x) \\ s.t. ~~Ax\preceq b\\ \qquad Cx=d. \end{array} min f0(x)s.t.  AxbCx=d.
Using the conjugate of f 0 f_0 f0, we can rewrite the dual function as follows,
g ( λ , v ) = inf ⁡ x ( f 0 ( x ) + λ ( A x − b ) + v T ( C x − d ) ) = − b T λ − d T v + inf ⁡ x ( f 0 ( x ) + ( A T + C T v ) T x ) = − b T λ − d T v − f 0 ∗ ( − A T λ − C T v ) . \begin{array}{ll} g(\lambda,v)&=\inf_x (f_0(x)+\lambda(Ax-b)+v^T(Cx-d)) \\ &=-b^T\lambda-d^Tv+\inf_x (f_0(x)+(A^T+C^Tv)^Tx) \\ &=-b^T\lambda-d^Tv-f_0^*(-A^T\lambda-C^Tv). \end{array} g(λ,v)=infx(f0(x)+λ(Axb)+vT(Cxd))=bTλdTv+infx(f0(x)+(AT+CTv)Tx)=bTλdTvf0(ATλCTv).


5.2 朗格朗日对偶问题(The Lagrange dual problem)

The Lagrange dual problem of a Lagrange dual problem is primary problem.

For each pair ( λ , v ) (\lambda,v) (λ,v) with λ > 0 \lambda>0 λ>0, the Lagrange dual function gives us a lower bound on the optimal value p ∗ p^* p of the optimization problem. We can obtain from the Lagrange dual function by the optimization problem:
max ⁡   g ( λ , v ) s . t .    λ > 0 \begin{array}{ll} \max~ g(\lambda,v) \\ s.t. ~~\lambda>0 \end{array} max g(λ,v)s.t.  λ>0 The above problem is called the Lagrange dual problem. The term dual feasible, to describe a pair ( λ , v ) (\lambda,v) (λ,v) with λ > 0 λ> 0 λ>0 and g ( λ , v ) > − ∞ g(\lambda,v) > −\infty g(λ,v)>, means, as the name implies, that ( λ , v ) (λ,v) (λ,v) is feasible for the dual problem. We refer to ( λ ⋆ , ν ∗ ) (λ^⋆ ,ν^*) (λ,ν) as dual optimal or optimal Lagrange multipliers if they are optimal for the problem. The Lagrange dual problem is a convex optimization problem, since the objective to be maximized is concave and the constraint is convex.

5.2.1 明确双重约束(Making Dual constraints explicit)

The examples above show that it is not uncommon for the domain of the dual function, d o m   g = { ( λ , v )   ∣   g ( λ , v ) > − ∞ ) } \mathbf{dom} ~g = \{ (\lambda,v)~|~g(\lambda ,v)>-\infty ) \} dom g={(λ,v)  g(λ,v)>)}, to have dimension smaller than m + p m+p m+p, i.e., d o m   g ∈ R m + p \mathbf{dom} ~g \in \mathbb{R}^{m+p} dom gRm+p.

A. 标准形式的朗格朗日对偶(Lagrange dual of standard from LP)

We found that the Lagrange dual function for the standard form LP
min ⁡ c T x s . t .    A x = b x ⪰ 0 \begin{array}{ll} \min c^Tx \\ s.t. ~~Ax = b \\ \qquad x \succeq 0 \end{array} mincTxs.t.  Ax=bx0 is given by g ( λ , v ) = { − b T v ,   A T v − λ + c = 0 − ∞ , o t h e r w i s e g(\lambda,v) = \{ \begin{array}{rcl} -b^Tv, ~A^Tv-\lambda + c = 0 \\ -\infty, \qquad \quad otherwise \end{array} g(λ,v)={bTv, ATvλ+c=0,otherwise Strictly speaking, the Lagrange dual problem of the standard form LP is to maximize this dual function g subject to λ > 0 λ > 0 λ>0, i.e., max ⁡   g ( λ , v ) = { − b T v ,   A T v − λ + c = 0 − ∞ , o t h e r w i s e s . t .     λ > 0 \begin{array}{ll} \max ~g(\lambda,v) = \{ \begin{array}{rcl} -b^Tv, ~A^Tv-\lambda + c = 0 \\ -\infty, \qquad \quad otherwise \end{array} \\ s.t. ~~~ \lambda > 0 \end{array} max g(λ,v)={bTv, ATvλ+c=0,otherwises.t.   λ>0
Here, g g g is finite only when A T v − λ + c = 0 A^Tv - \lambda+c=0 ATvλ+c=0
We can form an equivalent problem by making these equality constraints explicit: max ⁡   − b T v s . t .    A T v − λ + c = 0 λ ⪰ 0 \begin{array}{ll} \max~ -b^Tv \\ s.t. ~~A^T v - \lambda + c = 0 \\ \qquad \lambda \succeq 0 \end{array} max bTvs.t.  ATvλ+c=0λ0
This problem, in turn, can be expressed as max ⁡   − b T v s . t .    A T v + c ⪰ 0 \begin{array}{ll} \max~ -b^Tv \\ s.t. ~~A^T v + c \succeq 0 \end{array} max bTvs.t.  ATv+c0 which is an LP in inequality form.
Note that the first problem is the Lagrange dual of the standard form LP, which is equivalent to the two problems last.

B. 不等式形式线性规划的朗格朗日对偶(Lagrange Dual of Inequality Form LP)

In a similar way, we can find the Lagrange dual problem pf a linear program in inequality form
P 0 : min ⁡   c T x s . t .    A x ⪯ b . \begin{array}{ll} P0: &\min ~c^Tx \\ &s.t. ~~ Ax \preceq b. \end{array} P0:min cTxs.t.  Axb. The Lagrangian is L ( x , λ ) = c T x + λ T ( A x − b ) = − b T λ + ( A T λ + c ) T x , L(x,\lambda)=c^Tx+\lambda^T(Ax-b) = -b^T\lambda + (A^T\lambda+c)^Tx, L(x,λ)=cTx+λT(Axb)=bTλ+(ATλ+c)Tx, so the dual function is g ( λ ) = inf ⁡ x L ( x , λ ) = − b T λ + inf ⁡ x ( A T λ + c ) T x . g(\lambda)=\inf_x L(x,\lambda) = -b^T \lambda + \inf_x (A^T\lambda + c)^T x . g(λ)=xinfL(x,λ)=bTλ+xinf(ATλ+c)Tx.
So the dual function is
g ( λ ) = { − b T λ ,   A T λ + c = 0 − ∞ ,   o t h e r w i s e g(\lambda) = \{ \begin{array}{rcl} -b^T\lambda, ~A^T\lambda + c = 0 \\ -\infty, ~\quad otherwise \end{array} g(λ)={bTλ, ATλ+c=0, otherwise
The dual variable λ \lambda λ is dual feasible if λ ⪰ 0 \lambda \succeq0 λ0 and A T λ + c = 0. A^T \lambda + c=0. ATλ+c=0.
The Langrange dual of the LP is to maximize g g g over all λ ⪰ 0 \lambda \succeq 0 λ0. Again we can reformulate the Lagrange dual by explicitly including the dual feasibility conditions as constraints, as in
P 1 : max ⁡   b T λ s . t .    A T λ + c = 0 , λ ⪰ 0 , \begin{array}{ll} P1: &\max~b^T\lambda \\ &s.t. ~~ A^T \lambda + c = 0 ,\\ &\qquad \lambda \succeq 0, \end{array} P1:max bTλs.t.  ATλ+c=0,λ0, which is an LP in standard form.
Note that the Lagrange dual of the problem P 1 P1 P1 is (equivalent to) the primal problem P 0 P0 P0.


5.2.2 弱对偶(Weak Duality)

The optimal value of the Lagrange dual problem, which we denote d ∗ d^* d, is, by definition, the best lower bound on p ∗ p^* p that can be obtained from the Lagrange dual function. In particular, we have the simple but important inequality, called as weak duality, d ∗ < p ∗ , d^*<p^*, d<p, which holds even if the original problem is not convex. The weak duality inequality holds even if d ∗ d^* d and p ∗ p^* p are infinite.
We refer to the difference p ∗ − d ∗ p^*-d^* pd as the optimal duality gap of the original problem, since it gives the gap between the optimal value of the primal problem and the best (i.e., greatest) lower bound on it that can be obtained from the Lagrange dual function.


5.2.3 强对偶&Slater的约束准则(Strong Duality & Slater’s Constraint Qualification)

If the equality d ∗ = p ∗ d^* = p^* d=p holds, i.e., the optimal duality gap is zero, then we say that strong duality holds.

Strong duality does not, in general, hold. But if the primal problem is convex, i.e., of the form
P 0 : min ⁡   f 0 ( x ) s . t .    f i ( x ) ≤ b . i = 1 , . . . , m , A x = b , \begin{array}{ll} P0: & \min~ f_0(x) \\ & s.t. ~~ f_i(x) \leq b. i =1,...,m,\\ & \qquad Ax=b, \end{array} P0:min f0(x)s.t.  fi(x)b.i=1,...,m,Ax=b, with f 0 , . . . , f m f_0,...,f_m f0,...,fm convex, we usually (but not always) have the strong duality.
Some conditions on the problem, under which strong holds, are called constraint qualifications. One simple constraint qualification is Salter’s condition: There exists an x ∈ r e l i n t   D x \in \mathbf{relint}~ D xrelint D such that f i ( x ) < 0 , i = 1 , . . . , m , A x = b . f_i(x)<0,i=1,...,m, \quad Ax = b. fi(x)<0,i=1,...,m,Ax=b. Such a point is sometimes called strictly feasible, since the inequality constraints holds with strict inequalities. Slater’s theorem states that strong duality holds, if 1) Slater’s condition holds and 2) the problem is convex.
Slater’s condition can be refined when some of the inequality constraint functions f i f_i fi are affine. If the first k k k constraint functions f 1 , . . . , f k f_1,...,f_k f1,...,fk are affine, then the strong duality holds provided the following condition holds: There exists an There exists an x ∈ r e l i n t   D x \in \mathbf{relint}~ D xrelint D such that f i ( x ) ≤ 0 , i = 1 , . . . , k , f i ( x ) < 0 , i = k + 1 , . . . , m , A x = b . f_i(x)\leq 0,i=1,...,k, \quad f_i(x)<0, i=k+1,...,m, \quad Ax = b. fi(x)0,i=1,...,k,fi(x)<0,i=k+1,...,m,Ax=b.


5.2.4 Examples

A. Lagrange dual of QCQP

We consider the QCQP
P 0 : min ⁡   1 2 x T P 0 x + q 0 T x + r 0 s . t .    1 2 x T P 0 x + q 0 T x + r 0 ≤ 0 , i = 1 , . . . , m , \begin{array}{ll} P0: & \min~ \frac{1}{2}x^TP_0x+q^T_0x +r_0 \\ & s.t. ~~ \frac{1}{2}x^TP_0x+q^T_0x +r_0 \le 0, i =1,...,m, \end{array} P0:min 21xTP0x+q0Tx+r0s.t.  21xTP0x+q0Tx+r00,i=1,...,m, with P 0 ∈ S + + n P_0 \in \mathbf{S}_{++}^n P0S++n and P i ∈ S + n P_i \in \mathbf{S}_{+}^n PiS+n, i = 1 , . . . , m i=1,...,m i=1,...,m.
The Lagrangian is L ( x , λ ) = 1 2 x T P 0 x + q 0 T x + r 0 + ∑ i = 1 m λ i [ 1 2 x T P 0 x + q 0 T x + r 0 ] , i = 1 , . . . , m , = 1 2 x T P ( λ ) x + q ( λ ) T x + r ( λ ) \begin{array}{ll}L(x,\lambda) & = \frac{1}{2}x^TP_0x+q^T_0x +r_0 + \sum_{i=1}^{m} \lambda_i [ \frac{1}{2}x^TP_0x+q^T_0x +r_0], i=1,...,m,\\ &= \frac{1}{2}x^TP(\lambda)x+q(\lambda)^Tx +r(\lambda) \end{array} L(x,λ)=21xTP0x+q0Tx+r0+i=1mλi[21xTP0x+q0Tx+r0],i=1,...,m=21xTP(λ)x+q(λ)Tx+r(λ) where P ( λ ) = P 0 + ∑ i = 1 m λ i P i , P(\lambda)=P_0 + \sum_{i=1}^m \lambda_i P_i, P(λ)=P0+i=1mλiPi, q ( λ ) = q 0 + ∑ i = 1 m λ i q i , q(\lambda)=q_0 + \sum_{i=1}^m \lambda_i q_i, q(λ)=q0+i=1mλiqi, and r ( λ ) = r 0 + λ i r i . r(\lambda)=r_0+\lambda_i r_i. r(λ)=r0+λiri.
If λ ⪰ 0 λ \succeq 0 λ0, however, we have P ( λ ) ≻ 0 P(λ) \succ 0 P(λ)0 and g ( λ ) = inf ⁡ x L ( x , λ ) = − 1 2 q ( λ ) T P ( λ ) − 1 q ( λ ) + r ( λ ) . g(\lambda) = \inf_x L(x,\lambda) = - \frac{1}{2}q(\lambda)^T P(\lambda)^{-1} q(\lambda) + r(\lambda). g(λ)=xinfL(x,λ)=21q(λ)TP(λ)1q(λ)+r(λ). We can therefore express the dual problem as
P 1 : max ⁡   g ( λ ) s . t .    λ ⪰ 0 \begin{array}{ll} P1: &\max ~ g(\lambda) \\ &s.t. ~~ \lambda \succeq 0 \end{array} P1:max g(λ)s.t.  λ0 The Slater condition says that strong duality between the primal problem P 0 P0 P0 and the dual problem P 1 P1 P1 holds if the quadratic inequality constraints are strictly feasible, i.e., there exists an x x x with
( 1 2 ) x T P i x + q i T x + r i < 0 , i = 1 , . . . , m . ( \frac{1}{2}) x^T P_i x + q_i^T x + r_i < 0, i=1,...,m. (21)xTPix+qiTx+ri<0,i=1,...,m.

B. A nonconvex quadratic problem with strong duality

On rare occasions, strong duality obtains for a nonconvex problem. As an important example, we consider the problem of minimizing a nonconvex quadratic function over the unit ball,
P 0 : max ⁡   x T A x + 2 b T x s . t .     x T x ≤ 1 , \begin{array}{ll} P0: &\max ~ x^TAx + 2b^Tx \\ &s.t. ~~~ x^Tx \leq 1, \end{array} P0:max xTAx+2bTxs.t.   xTx1, where A ∈ S n A \in \mathbf{S}^n ASn and b ∈ R n b\in\mathbf{R}^n bRn. When A ⋡ 0 A \nsucceq 0 A0, ths is not a convex problem. This problem is called the trust region problem.
The Lagrangian is L ( x , λ ) = x T A x + 2 b T x + λ ( x T x − 1 ) = x T ( A + λ I ) x + 2 b T x − λ , L(x,\lambda) = x^TAx + 2b^Tx + \lambda(x^Tx-1)=x^T(A+\lambda \mathbf{I})x + 2b^Tx - \lambda, L(x,λ)=xTAx+2bTx+λ(xTx1)=xT(A+λI)x+2bTxλ,
so the dual function is given by
g ( λ ) = { − b T ( A + λ I ) † b − λ , i f   A + λ I ⪰ 0 , b ∈ R ( A + λ I ) − ∞ , o t h e r w i s e , g(\lambda) = \{ \begin{array}{ll} -b^T(A+\lambda\mathbf{I})^{\dagger} b -\lambda, &\mathrm{if} ~ A+\lambda\mathbf{I} \succeq 0, b \in \mathcal{R}(A+\lambda\mathbf{I}) \\ -\infty, &otherwise, \end{array} g(λ)={bT(A+λI)bλ,,if A+λI0,bR(A+λI)otherwise,
where ( A + λ I ) † (A+\lambda\mathbf{I})^\dagger (A+λI) is the preudo-inverse of ( A + λ I ) (A+\lambda\mathbf{I}) (A+λI). The Lagrange dual problem is thus
P 1 : max ⁡   − b T ( A + λ I ) † b − λ s . t .     A + λ I ⪰ 0 ,   b ∈ R ( A + λ I ) , \begin{array}{ll} P1: & \max ~ -b^T(A+\lambda\mathbf{I})\dagger b - \lambda \\ &s.t. ~~~ A + \lambda\mathbf{I} \succeq 0, ~ b \in \mathcal{R}(A + \lambda \mathbf{I} ) , \end{array} P1:max bT(A+λI)bλs.t.   A+λI0, bR(A+λI), with variable λ ∈ R \lambda \in \mathbf{R} λR.
The Lagrange dual problem is a convex optimization problem. In fact, it is readily solved since it can be expressed as
max ⁡   − ∑ i = 1 n ( q i T b ) 2 ( λ i + λ ) − λ s . t .     λ ≥ − λ min ⁡ ( A ) , \begin{array}{ll} \max ~ -\frac{\sum_{i=1}^{n} (q_i^T b)^2}{(\lambda_i+\lambda)} - \lambda \\ s.t. ~~~ \lambda \geq - \lambda_{\min}(A), \end{array} max (λi+λ)i=1n(qiTb)2λs.t.   λλmin(A), where λ i \lambda_i λi and q i q_i qi are the eigenvalues and corresponding (orthonormal) eigenvectors of A A A, and we interpret ( q i T b ) 2 / 0 (q_i^Tb)^2 / 0 (qiTb)2/0 as 0 0 0, if q i T b = 0 q_i^T b = 0 qiTb=0 and as ∞ \infty otherwise.
Despite the original problem P 0 P0 P0 is not convex, the strong duality still holds. In fact, a more general result holds: strong duality holds for any optimization problem with quadratic objective and one quadratic inequality constraint, provided Slater’s condition holds.


5.4 鞍点解释(Saddle-Point Interpretation)

5.4.1 Max-Min characterization of weak and strong duality

First note that
sup ⁡ λ ⪰ 0 L ( x , λ ) = sup ⁡ λ ⪰ 0 ( f 0 ( x ) + ∑ i = 1 m λ i f i ( x ) ) = { f 0 ( x ) ,   i f   f i ( x ) < 0 ,   i = 1 , . . . , m ∞ , o t h e r w i s e . \sup_{\lambda \succeq 0} L(x,\lambda) = \sup_{\lambda \succeq 0 } (f_0(x) + \sum_{i=1}^m \lambda_i f_i(x)) = \{ \begin{array}{ll} f_0(x), ~&\mathrm{if}~f_i(x)<0, ~ i =1,...,m \\ \infty, &otherwise. \end{array} λ0supL(x,λ)=λ0sup(f0(x)+i=1mλifi(x))={f0(x), ,if fi(x)<0, i=1,...,motherwise.
Suppose x x x is not feasible, and f i ( x ) > 0 f_i(x)>0 fi(x)>0 for some i i i. Then sup ⁡ λ ⪰ 0 L ( x , λ ) = ∞ , \sup_{\lambda \succeq 0} L(x,\lambda) = \infty, supλ0L(x,λ)=, as can be seen by choosing λ j = 0 ,   j ≠ i , \lambda_j = 0,~j \neq i, λj=0, j=i, and λ i → ∞ \lambda_i \rightarrow \infty λi. On the other hand, if f i ( x ) < 0 f_i(x)<0 fi(x)<0, i = 1 , . . . , m i=1,...,m i=1,...,m, then the optimal choice of λ \lambda λ is λ = 0 \lambda = 0 λ=0 and sup ⁡ λ ⪰ 0 L ( x , λ ) = f 0 ( x ) . \sup_{\lambda \succeq 0} L(x,\lambda) = f_0(x). supλ0L(x,λ)=f0(x). This means that we can express the optimal value of the primal problem as p ∗ = inf ⁡ x sup ⁡ λ ⪰ 0 L ( x , λ ) . p^* = \inf_x \sup_{\lambda \succeq 0} L(x,\lambda). p=xinfλ0supL(x,λ).
By the definition of the dual function, we also have optimal value of the dual problem d ∗ = sup ⁡ λ ⪰ 0 inf ⁡ x L ( x , λ ) . d^* = \sup_{\lambda \succeq 0} \inf_x L(x,\lambda). d=λ0supxinfL(x,λ).
Thus, the weak duality can be expressed as the inequality d ∗ = sup ⁡ λ ⪰ 0 inf ⁡ x L ( x , λ ) ≤ inf ⁡ x sup ⁡ λ ⪰ 0 L ( x , λ ) = p ∗ , d^* = \sup_{\lambda \succeq 0} \inf_x L(x,\lambda) \leq \inf_x \sup_{\lambda \succeq 0} L(x,\lambda) = p^* , d=λ0supxinfL(x,λ)xinfλ0supL(x,λ)=p, and strong duality as the equality sup ⁡ λ ⪰ 0 inf ⁡ x L ( x , λ ) ≤ inf ⁡ x sup ⁡ λ ⪰ 0 L ( x , λ ) . \sup_{\lambda \succeq 0} \inf_x L(x,\lambda) \leq \inf_x \sup_{\lambda \succeq 0} L(x,\lambda) . λ0supxinfL(x,λ)xinfλ0supL(x,λ).
Strong duality means that the order of the minimization over x x x and the maximization over λ ⪰ 0 λ \succeq 0 λ0 can be switched without affecting the result.
In fact, the inequality does not depend on any properties of L : L: L: We have sup ⁡ z ∈ Z inf ⁡ w ∈ W f ( w , z ) ≤ inf ⁡ w ∈ W sup ⁡ z ∈ Z f ( w , z ) \sup_{z \in \mathbf{Z}} \inf_{w \in \mathbf{W}} f(w,z) \leq \inf_{w \in \mathbf{W}} \sup_{z \in \mathbf{Z}} f(w,z) zZsupwWinff(w,z)wWinfzZsupf(w,z)
for any f :   R n × R m → R f:~\mathbf{R}^n \times \mathbf{R}^m \rightarrow \mathbf{R} f: Rn×RmR (and any W ⊆ R n \mathbf{W} \subseteq \mathbf{R}^n WRn and Z ⊆ R m \mathbf{Z} \subseteq \mathbf{R}^m ZRm). This general inequality is called the max-min inequality. When equality holds, i.e., sup ⁡ z ∈ Z inf ⁡ w ∈ W f ( w , z ) = inf ⁡ w ∈ W sup ⁡ z ∈ Z f ( w , z ) , \sup_{z \in \mathbf{Z}} \inf_{w \in \mathbf{W}} f(w,z) = \inf_{w \in \mathbf{W}} \sup_{z \in \mathbf{Z}} f(w,z), zZsupwWinff(w,z)=wWinfzZsupf(w,z), f f f (and W \mathbf{W} W and Z \mathbf{Z} Z) satisfy the strong max-min property or saddle-point property.


5.4.2 Saddle-Point Interpretation

We refer to a pair w ~ ∈ W ,   z ~ ∈ Z \tilde{w} \in W, ~\tilde{z} \in Z w~W, z~Z as a saddle-point for f f f (and W W W and Z Z Z) if f ( w ~ , z ) ≤ f ( w ~ , z ~ ) ≤ f ( w , z ~ ) f(\tilde{w},z) \leq f(\tilde{w},\tilde{z}) \leq f(w,\tilde{z}) f(w~,z)f(w~,z~)f(w,z~) for all w ∈ W ,   z ∈ Z . {w} \in W, ~{z} \in Z. wW, zZ. In other words, w ~ \tilde{w} w~ minimizes f ( w , z ~ ) f(w,\tilde{z}) f(w,z~) (over w ∈ W w \in W wW) and z ~ \tilde{z} z~ maximizes f ( w ~ , z ) f(\tilde{w},z) f(w~,z) (over z ∈ Z z \in Z zZ): f ( w ~ , z ~ ) = inf ⁡ w ∈ W f ( w , z ~ ) , f ( w ~ , z ~ ) = sup ⁡ z ∈ Z f ( w ~ , z ) . f(\tilde{w},\tilde{z}) = \inf_{w \in W} f(w,\tilde{z}),\quad f(\tilde{w},\tilde{z}) = \sup_{z \in Z} f(\tilde{w},z). f(w~,z~)=wWinff(w,z~),f(w~,z~)=zZsupf(w~,z). This implies that the strong max-min property holds, and that the common value if f ( w ~ , z ~ ) . f(\tilde{w},\tilde{z}). f(w~,z~).
Returning to our discussion of Lagrange duality, we see that if x ⋆ x^⋆ x and λ ⋆ λ^⋆ λ are respectively primal and dual optimal points for a problem in which strong duality obtains, they form a saddle-point for the Lagrangian. The converse is also true: If ( x , λ ) (x,λ) (x,λ) is a saddle-point of the Lagrangian, then x is primal optimal, λ is dual optimal, and the optimal duality gap is zero.


5.5 最优化条件(Optimality conditions)

5.5.1 Certificate of suboptimality and stopping criteria

If we can find a dual feasible g ( λ , ν ) , g(\lambda,\nu), g(λ,ν), we can establish a lower bound on the optimal value of the primal problem: p ∗ ≤ g ( λ , ν ) . p^* \leq g(\lambda, \nu). pg(λ,ν). Thus, a dual feasible point (λ,ν) provides a proof or certificate that g ( λ , ν ) . g(\lambda, \nu). g(λ,ν).


5.5.2 Complementary slackness

Let x ⋆ x^⋆ x be a primal optimal and ( λ ⋆ , ν ⋆ ) (λ^⋆,ν^⋆ ) (λ,ν) be a dual optimal point. This means that
f 0 ( x ∗ ) = g ( λ ∗ , ν ∗ ) = inf ⁡ x ( f 0 ( x ) + ∑ i = 1 m λ i ∗ f i ( x ) + ∑ i = 1 m ν i ∗ h i ( x ) ) ≤ f 0 ( x ∗ ) + ∑ i = 1 m λ i ∗ f i ( x ∗ ) + ∑ i = 1 m ν i ∗ h i ( x ∗ ) ≤ f 0 ( x ∗ ) \begin{array}{ll} f_0(x^*) & = g(\lambda^*,\nu^*) \\ &= \inf_x (f_0(x) + \sum_{i=1}^{m} \lambda_i^* f_i(x) + \sum_{i=1}^m \nu_i^* h_i (x)) \\ & \leq f_0(x^*) + \sum_{i=1}^{m} \lambda_i^* f_i(x^*) + \sum_{i=1}^m \nu_i^* h_i (x^*) \\ & \leq f_0(x^*) \end{array} f0(x)=g(λ,ν)=infx(f0(x)+i=1mλifi(x)+i=1mνihi(x))f0(x)+i=1mλifi(x)+i=1mνihi(x)f0(x)

  • The first line states that the optimal duality is zero.
  • The second line is the definition of the dual function.
  • The third line follows since the infimum of the Lagrangian over x x x is less than or equal to its value at x = x ∗ . x = x^*. x=x.
  • The last inequality follows from λ i ∗ ≥ 0 ,   \lambda_i^* \geq 0,~ λi0,  f i ( x ∗ ) ≤ 0 ,   f_i(x^*)\leq 0,~ fi(x)0,  i = 1 , . . . , m ,   i=1,...,m,~ i=1,...,m,  and h i ( x ∗ ) = 0 ,   i = 1 , . . . , p . h_i(x^*)=0,~ i=1,...,p. hi(x)=0, i=1,...,p.

We conclude that the two inequalities (3-4 lines) in this chain hold with equality.
The first conclusion: since the inequality in the third line is an equality, we conclude that x ∗ ⋆ x^*⋆ x minimizes L ( x , λ ∗ , ν ⋆ ) L(x,λ^*,\nu^⋆) L(x,λ,ν) over x x x.
The second conclusion (Complementary Slackness): ∑ i = 1 m λ i ∗ f i ( x ∗ ) = 0. \sum_{i=1}^m \lambda_i^* f_i(x^*) = 0. i=1mλifi(x)=0.
Since each term in this sum is nonpositive, we conclude that λ i ∗ f i ( x ∗ ) = 0 ,   i = 1 , . . . , m . \lambda_i^* f_i(x^*) = 0,~ i =1,...,m. λifi(x)=0, i=1,...,m. it holds for any primal optimal x ⋆ x^⋆ x and any dual optimal ( λ ⋆ , ν ⋆ ) (λ^⋆ ,ν^⋆ ) (λ,ν) (when strong duality holds).
We can express the complementary slackness condition as λ i ∗ > 0   →   f i ( x ∗ ) = 0 , \lambda_i^* >0 ~\rightarrow ~ f_i(x^*)=0, λi>0  fi(x)=0, or, equivalently f i ( x ∗ ) < 0   →   λ i ∗ = 0. f_i(x^*)<0 ~ \rightarrow ~ \lambda_i^* = 0 . fi(x)<0  λi=0. Roughly speaking, this means the ith optimal Lagrange multiplier is zero unless the i i ith constraint is active at the optimum.


5.5.3 KKT optimality conditions

We now assume that the functions f 0 , . . . , f m ,   h 1 , . . . , h p f_0,...,f_m,~h_1,...,h_p f0,...,fm, h1,...,hp are differentiable (and therefore have open domains), but we make no assumptions yet about convexity.

A. KKT conditions for nonconvex problems

As above, let x ⋆ x^⋆ x and ( λ ⋆ , ν ⋆ ) (λ^⋆ ,ν^⋆) (λ,ν) be any primal and dual optimal points with zero duality gap. Since x ⋆ minimizes L ( x , λ ⋆ , ν ⋆ ) L(x,λ ⋆ ,ν ⋆ ) L(x,λ,ν) over x x x, it follows that its gradient must vanish at x ⋆ x^⋆ x , i.e., ∇ f 0 ( x ∗ ) + ∑ i = 1 m λ i ∗ ∇ f i ( x ∗ ) + ∑ i = 1 p ν i ∗ ∇ f i ( x ∗ ) = 0. \nabla f_0(x^*) + \sum_{i=1}^m \lambda_i^* \nabla f_i(x^*) + \sum_{i=1}^p \nu_i^* \nabla f_i(x^*) = 0. f0(x)+i=1mλifi(x)+i=1pνifi(x)=0.
Thus, we have
f i ( x ⋆ ) ≤ 0 , i = 1 , … , m h i ( x ⋆ ) = 0 , i = 1 , … , p λ i ⋆ ≥ 0 , i = 1 , … , m λ i ⋆ f i ( x ⋆ ) = 0 , i = 1 , … , m ∇ f 0 ( x ⋆ ) + ∑ i = 1 m λ i ⋆ ∇ f i ( x ⋆ ) + ∑ i = 1 p ν i ⋆ ∇ h i ( x ⋆ ) = 0 , \begin{aligned} f_{i}\left(x^{\star}\right) & \leq 0, \quad i=1, \ldots, m \\ h_{i}\left(x^{\star}\right) &=0, \quad i=1, \ldots, p \\ \lambda_{i}^{\star} & \geq 0, \quad i=1, \ldots, m \\ \lambda_{i}^{\star} f_{i}\left(x^{\star}\right) &=0, \quad i=1, \ldots, m \\ \nabla f_{0}\left(x^{\star}\right)+\sum_{i=1}^{m} \lambda_{i}^{\star} \nabla f_{i}\left(x^{\star}\right)+\sum_{i=1}^{p} \nu_{i}^{\star} \nabla h_{i}\left(x^{\star}\right) &=0, \end{aligned} fi(x)hi(x)λiλifi(x)f0(x)+i=1mλifi(x)+i=1pνihi(x)0,i=1,,m=0,i=1,,p0,i=1,,m=0,i=1,,m=0, which are called the Karush-Kuhn-Tucker (KKT) conditions.
To summarize, for any optimization problem with differentiable objective and differentiable constraint functions for which strong duality obtains, any pair of primal and dual optimal points must satisfy the KKT conditions.

B. KKT conditions for convex problems

When the primal problem is convex, the KKT conditions are also sufficient for the points to be primal and dual optimal. In other words, if f i f_i fi are convex and h i h_i hi are affine, and x ~ , λ ~ , ν ~ \tilde{x},\tilde{λ}, \tilde{ν} x~,λ~,ν~ are any points that satisfy the KKT conditions
f i ( x ~ ) ≤ 0 , i = 1 , … , m h i ( x ~ ) = 0 , i = 1 , … , p λ i ~ ≥ 0 , i = 1 , … , m λ i ~ f i ( x ~ ) = 0 , i = 1 , … , m ∇ f 0 ( x ~ ) + ∑ i = 1 m λ i ~ ∇ f i ( x ~ ) + ∑ i = 1 p ν i ~ ∇ h i ( x ~ ) = 0 , \begin{aligned} f_{i}\left(\tilde{x}\right) & \leq 0, \quad i=1, \ldots, m \\ h_{i}\left(\tilde{x}\right) &=0, \quad i=1, \ldots, p \\ \tilde{\lambda_{i}} & \geq 0, \quad i=1, \ldots, m \\ \tilde{\lambda_{i}} f_{i}\left(\tilde{x}\right) &=0, \quad i=1, \ldots, m \\ \nabla f_{0}\left( \tilde{x} \right)+\sum_{i=1}^{m} \tilde{\lambda_{i}} \nabla f_{i}\left(\tilde{x}\right)+\sum_{i=1}^{p} \tilde{\nu_{i}} \nabla h_{i}\left( \tilde{x} \right) &=0, \end{aligned} fi(x~)hi(x~)λi~λi~fi(x~)f0(x~)+i=1mλi~fi(x~)+i=1pνi~hi(x~)0,i=1,,m=0,i=1,,p0,i=1,,m=0,i=1,,m=0, then x ~ \tilde{x} x~ and ( λ ~ , ν ~ ) ( \tilde{λ}, \tilde{ν}) (λ~,ν~) are primal and dual optimal, with zero duality gap.

To see this, note that the first two conditions state that x ~ \tilde{x} x~ is primal feasible. Since λ i ~ ≤ 0 ,   L ( x , λ ~ , ν ~ ) \tilde{λ_i} \leq 0, ~L(x, \tilde{λ}, \tilde{ν}) λi~0, L(x,λ~,ν~) is convex in x x x; the last KKT condition states that its gradient with respect to x x x vanishes at x = x ~ x = \tilde{x} x=x~, so it follows that x ~ \tilde{x} x~ minimizes L ( x , λ ~ , ν ~ ) L(x, \tilde{λ}, \tilde{ν}) L(x,λ~,ν~) over x x x. From this we conclude that

g ( λ ~ , ν ~ ) = L ( x ~ , λ ~ , ν ~ ) = f 0 ( x ~ ) = f 0 ( x ~ ) + ∑ i = 1 m λ i ~ f i ( x ~ ) + ∑ i = 1 p ν i ~ h i ( x ~ ) \begin{aligned} g(\tilde{\lambda},\tilde{\nu}) & = L(\tilde{x},\tilde{\lambda},\tilde{\nu}) \\ &= f_0( \tilde{x} ) \\ &= f_0( \tilde{x} ) +\sum_{i=1}^{m} \tilde{\lambda_{i}} f_{i}\left(\tilde{x}\right)+\sum_{i=1}^{p} \tilde{\nu_{i}} h_{i}\left( \tilde{x} \right) \end{aligned} g(λ~,ν~)=L(x~,λ~,ν~)=f0(x~)=f0(x~)+i=1mλi~fi(x~)+i=1pνi~hi(x~) where in the last line we use h i ( x ~ ) = 0 h_i (\tilde{x}) = 0 hi(x~)=0 and λ i ~ f i ( x ~ ) = 0 \tilde{λ_i} f_i (\tilde{x}) = 0 λi~fi(x~)=0. This shows that x ~ \tilde{x} x~ and ( λ ~ , ν ~ \tilde{λ}, \tilde{ν} λ~,ν~) have zero duality gap, and therefore are primal and dual optimal.
In summary, for any convex optimization problem with differentiable objective and differentiable constraint functions, any points that satisfy the KKT conditions are primal and dual optimal, and have zero duality gap.
If a convex optimization problem with differentiable objective and differentiable constraint functions satisfies Slater’s condition, then the KKT conditions provide necessary and sufficient conditions for optimality: Slater’s condition implies that the optimal duality gap is zero and the dual optimum is attained, so x x x is optimal if and only if there are ( λ , ν ) (\lambda,\nu) (λ,ν) that, together with x x x, satisfy the KKT conditions.
The KKT conditions play an important role in optimization. In a few special cases, it is possible to solve the KKT conditions analytically. More generally, many algorithms for convex optimization are conceived as, or can be interpreted as, methods for solving the KKT conditions.

Example 5.1

Equality constrained convex quadratic minimization. We consider the problem
P 0 :   min ⁡    ( 1 2 ) x T P x + q T x + r s . t . A x = b , \begin{array}{ll} P0: ~&\min ~~ &(\frac{1}{2})x^TPx + q^Tx + r \\ &s.t. &Ax = b, \end{array} P0: min  s.t.(21)xTPx+qTx+rAx=b, where P ∈ S + n . P \in S_{+}^n. PS+n.
The KKT conditions for this problem is
min ⁡   A x ∗ = b , P x ∗ + q + A T ν = 0 , \begin{array}{ll} \min ~ &Ax^* = b, \\ &Px^* + q + A^T \nu = 0, \end{array} min Ax=b,Px+q+ATν=0, which we can write as
[ P A T A 0 ] [ x ⋆ ν ⋆ ] = [ − q b ] . \left[\begin{array}{cc} P & A^{T} \\ A & 0 \end{array}\right]\left[\begin{array}{l} x^{\star} \\ \nu^{\star} \end{array}\right]=\left[\begin{array}{c} -q \\ b \end{array}\right]. [PAAT0][xν]=[qb].
Solving this set of m + n m + n m+n equations in the m + n m + n m+n variables x ⋆ , ν ⋆ x^⋆, ν^⋆ x,ν gives the optimal primal and dual variables for P 0 P0 P0.

Example 5.2 Water-filling.

We consider the convex optimization problem
P 0 :   min ⁡   − ∑ i = 1 n log ⁡ ( α i + x i ) s . t . x ⪰ 0 , 1 T x = 1 , \begin{array}{ll} P0: ~&\min ~ &-\sum_{i=1}^n \log (\alpha_i + x_i ) \\ &s.t. & x \succeq 0, \mathbf{1}^T x = 1, \end{array} P0: min s.t.i=1nlog(αi+xi)x0,1Tx=1, where α i > 0 \alpha_i > 0 αi>0. This problem arises in information theory, in allocating power to a set of n n n communication channels. The variable x i x_i xi represents the transmitter power allocated to the ith channel, and log ⁡ ( α i + x i ) \log(\alpha_i + x_i ) log(αi+xi) gives the capacity or communication rate of the channel, so the problem is to allocate a total power of one to the channels, in order to maximize the total communication rate.
Introducing Lagrange multipliers λ ⋆ ∈ R n \lambda^⋆ \in \mathbb{R}^n λRn for the inequality constraints x ⋆ ⪰ 0 x^⋆ \succeq 0 x0, and a multiplier ν ⋆ ∈ R \nu^⋆ \in R νR for the equality constraint 1 T x = 1 \mathbf{1}^T x = 1 1Tx=1, we obtain the KKT conditions
   x ∗ ⪰ 0 , 1 T x = 1    λ ∗ ⪰ 0 ,    i = 1 , . . . , n λ i ∗ x i ∗ = 0 , − 1 ( α i + x i ∗ ) − λ i ∗ + ν ∗ = 0 ,    i = 1 , . . . , n . \begin{array}{ll} \qquad \qquad ~~x^* &\succeq 0, \\ \qquad \qquad \mathbf{1}^T x &= 1 \\ \qquad \qquad ~~ \lambda^* &\succeq 0, ~~ i=1,...,n \\ \qquad \qquad \lambda_i^* x_i^* &= 0, \\ -\frac{1}{(\alpha_i+x_i^*)} - \lambda_i^* + \nu^* &= 0, ~~ i=1,...,n . \end{array}   x1Tx  λλixi(αi+xi)1λi+ν0,=10,  i=1,...,n=0,=0,  i=1,...,n. We can directly solve these equations to find x ⋆ x^⋆ x, λ ⋆ λ^⋆ λ, and ν ⋆ ν^⋆ ν. We start by noting that λ ⋆ λ^⋆ λ acts as a slack variable in the last equation, so it can be eliminated, leaving
   x ∗ ⪰ 0 , 1 T x = 1    x i ∗ ( ν ∗ − 1 ( α i + x i ∗ ) ) = 0 ,    ν ∗ ≥ 1 α i + x i ∗ ,    i = 1 , . . . , n . \begin{array}{ll} \qquad \qquad ~~x^* &\succeq 0, \\ \qquad \qquad \mathbf{1}^T x &= 1 \\ \quad ~~ x_i^*(\nu^* - \frac{1}{(\alpha_i+x_i^*)}) &= 0, \\ \qquad \qquad ~~ \nu^* & \geq \frac{1}{\alpha_i+x_i^*}, ~~ i=1,...,n . \end{array}   x1Tx  xi(ν(αi+xi)1)  ν0,=1=0,αi+xi1,  i=1,...,n.

  • If ν ⋆ < 1 / α i ν^⋆ < 1/α_i ν<1/αi , this last condition can only hold if x i ⋆ > 0 x^⋆_i > 0 xi>0, which by the third condition implies that ν ⋆ = 1 α i + x i ⋆ ν^⋆ = \frac{1}{α_i + x^⋆_i } ν=αi+xi1.
  • Solving for x i ⋆ x^⋆_i xi, we conclude that x i ⋆ = 1 ν ⋆ − α i x^⋆_i= \frac{1}{ν^⋆} −α_i xi=ν1αi if ν ⋆ < 1 α i ν^⋆ < \frac{1}{α_i} ν<αi1.
  • If ν ⋆ ≥ 1 / α i ν^⋆ \geq 1/α_i ν1/αi, then x i ⋆ > 0 x^⋆_i> 0 xi>0 is impossible, because it would imply ν ⋆ ≥ 1 α i > 1 α i + x i ⋆ ν^⋆ \geq \frac{1}{α_i} > \frac{1}{α_i + x^⋆_i } ναi1>αi+xi1, which violates the complementary slackness condition.
  • Therefore, x i ⋆ = 0 x^⋆_i = 0 xi=0 if ν ⋆ ≥ 1 / α i ν^⋆\geq 1/α_i ν1/αi.
    Thus we have
    x i ∗ = { 1 ν ∗ − α i , i f    ν ∗ < 1 α i   0 , i f    ν ∗ ≥ 1 α i x_i^* = \{\begin{array}{ll} \frac{1}{\nu^*} - \alpha_i, &\mathrm{if} ~~ \nu^* < \frac{1}{\alpha_i}\\ \quad~ 0,& \mathrm{if} ~~ \nu^* \geq \frac{1}{\alpha_i} \end{array} xi={ν1αi, 0,if  ν<αi1if  ναi1 or, put more simply, x i ∗ = max ⁡ { 0 , 1 ν ∗ − α i } x_i^* =\max \{0,\frac{1}{\nu^*} - \alpha_i \} xi=max{0,ν1αi}.
    Substituting this expression for x i ⋆ x^⋆_i xi into the condition 1 T x ⋆ = 1 \mathbf{1}^T x^⋆ = 1 1Tx=1, we obtain
    ∑ i = 1 n max ⁡ { 0 , 1 ν ∗ − α i } = 1. \sum_{i=1}^n \max \{0,\frac{1}{\nu^*} - \alpha_i \} = 1. i=1nmax{0,ν1αi}=1. The lefthand side is a piecewise-linear increasing function of 1 / ν ⋆ 1/ν^⋆ 1/ν , with breakpoints at α i α_i αi , so the equation has a unique solution which is readily determined.

5.5.5 Solving the primal problem via the dual

if strong duality holds and a dual optimal solution ( λ ⋆ , ν ⋆ ) (λ^⋆ ,ν^⋆ ) (λ,ν) exists, then any primal optimal point is also a minimizer of L ( x , λ ⋆ , ν ⋆ ) L(x,λ^⋆ ,ν^⋆ ) L(x,λ,ν). This fact sometimes allows us to compute a primal optimal solution from a dual optimal solution. More precisely, suppose we have strong duality and an optimal ( λ ⋆ , ν ⋆ ) (λ^⋆ ,ν^⋆ ) (λ,ν) is known. Suppose that the minimizer of L ( x , λ ⋆ , ν ⋆ ) L(x,λ^⋆ ,ν^⋆ ) L(x,λ,ν), i.e., the solution of min ⁡ f 0 ( x ) + ∑ i = 1 m λ i ⋆ f i ( x ) + ∑ i = 1 p ν i ⋆ h i ( x ) \min \quad f_{0}(x)+\sum_{i=1}^{m} \lambda_{i}^{\star} f_{i}(x)+\sum_{i=1}^{p} \nu_{i}^{\star} h_{i}(x) minf0(x)+i=1mλifi(x)+i=1pνihi(x) is unique.

Example 5.3 Entropy maximization.

We consider the entropy maximization problem
min ⁡ f 0 ( x ) = ∑ i = 1 n x i log ⁡ x i subject to A x ⪯ b 1 T x = 1 \begin{array}{ll} \operatorname{min} & f_{0}(x)=\sum_{i=1}^{n} x_{i} \log x_{i} \\ \text {subject to} & A x \preceq b \\ & \mathbf{1}^{T} x=1 \end{array} minsubject tof0(x)=i=1nxilogxiAxb1Tx=1 with domain R + + n , \mathbf{R}_{++}^n, R++n, and its Lagrange dual problem
 maximize  − b T λ − ν − e − ν − 1 ∑ i = 1 n e − a i T λ  subject to  λ ⪰ 0 \begin{array}{ll} \text { maximize } & -b^{T} \lambda-\nu-e^{-\nu-1} \sum_{i=1}^{n} e^{-a_{i}^{T} \lambda} \\ \text { subject to } & \lambda \succeq 0 \end{array}  maximize  subject to bTλνeν1i=1neaiTλλ0 where a i a_i ai are the columns of A A A. We assume that the weak form of Slater’s condition holds, i.e., there exists an x ≻ 0 x ≻ 0 x0 with A x ⪯ b Ax \preceq b Axb and 1 T x = 1 \mathbf{1}^T x = 1 1Tx=1, so strong duality holds and an optimal solution ( λ ⋆ , ν ⋆ ) (λ^⋆,ν^⋆ ) (λ,ν) exists.
Suppose we have solved the dual problem. The Lagrangian at ( λ ⋆ , ν ⋆ λ^⋆ ,ν^⋆ λ,ν) is
L ( x , λ ⋆ , ν ⋆ ) = ∑ i = 1 n x i log ⁡ x i + λ ⋆ T ( A x − b ) + ν ⋆ ( 1 T x − 1 ) L\left(x, \lambda^{\star}, \nu^{\star}\right)=\sum_{i=1}^{n} x_{i} \log x_{i}+\lambda^{\star T}(A x-b)+\nu^{\star}\left(\mathbf{1}^{T} x-1\right) L(x,λ,ν)=i=1nxilogxi+λT(Axb)+ν(1Tx1) which is strictly convex on D \mathcal{D} D and bounded below, so it has a unique solution x ⋆ x^⋆ x , given by
x i ∗ = 1 / exp ⁡ ( a i T λ ∗ + ν ∗ + 1 ) ,    i = 1 , . . . , n . x^*_i = 1/ \exp (a_i^T \lambda^*+\nu^* + 1), ~~i=1,...,n. xi=1/exp(aiTλ+ν+1),  i=1,...,n.
If x ⋆ x^⋆ x is primal feasible, it must be the optimal solution of the primal problem. If x ⋆ x^⋆ x is not primal feasible, then we can conclude that the primal optimum is not attained.


5.7 Examples (reformulations)

In this section, we show by example that simple equivalent reformulations of a problem can lead to very different dual problems. We consider the following types of reformulations:

  • Introducing new variables and associated equality constraints.
  • Replacing the objective with an increasing function of the original objective.
  • Making explicit constraints implicit, i.e., incorporating them into the domain of the objective.

5.7.1 Introducing new variables and equality constraints

Consider an unconstrained problem of the form
P 0 :   min ⁡   f 0 ( A x + b ) . P0: ~\min ~f_0(Ax + b). P0: min f0(Ax+b). Its Lagrange dual function is the constant p ⋆ p^⋆ p . So while we do have strong duality, i.e., p ⋆ = d ⋆ p^⋆= d^⋆ p=d, the Lagrangian dual is neither useful nor interesting.

Now let us reformulate the problem as
P 1 : min ⁡    f 0 ( A x + b ) s . t .     A x + b = y . \begin{array}{ll} P1: &\min~~ f_0(Ax + b) \\ &s.t. ~~~Ax +b = y. \end{array} P1:min  f0(Ax+b)s.t.   Ax+b=y. Here we have introduced new variables y, as well as new equality constraints A x + b = y Ax+b = y Ax+b=y. The problems P 0 P0 P0 and P 1 P1 P1 are clearly equivalent.
The Lagrangian of the reformulated problem is
L ( x , y , ν ) = f 0 ( y ) + ν T ( A x + b − y ) . L(x,y,\nu) = f_0(y) + \nu^T(Ax+b-y). L(x,y,ν)=f0(y)+νT(Ax+by). To find the dual function we minimize L L L over x x x and y y y. Minimizing over x x x, we find that g ( ν ) = − ∞ g(ν) = −\infty g(ν)= unless A T ν = 0 A^T\nu = 0 ATν=0, in which case we are left with
g ( ν ) = b T ν + inf ⁡ y ( f 0 ( y ) − ν T y ) = b T ν − f 0 ∗ ( ν ) , g(\nu) = b^T \nu + \inf_y (f_0(y) - \nu^T y ) = b^T \nu - f_0^*(\nu), g(ν)=bTν+yinf(f0(y)νTy)=bTνf0(ν), where f 0 ∗ f_0^* f0 is the conjugate of f 0 f_0 f0. The dual problem of P 1 P1 P1 can therefore be expressed as
P 1 : min ⁡    g ( ν ) = b T − f 0 ∗ ( ν ) s . t .     A T ν = 0. \begin{array}{ll} P1: &\min~~ g(\nu)=b^T-f_0^*(\nu) \\ &s.t. ~~~A^T \nu= 0. \end{array} P1:min  g(ν)=bTf0(ν)s.t.   ATν=0. Thus, the dual of the reformulated problem P 1 P1 P1 is considerably more useful than the dual of the original problem P 0 P0 P0.

Example 5.5 Unconstrained geometric program.

Consider the unconstrained geometric program
min ⁡   log ⁡ ( ∑ i = 1 m exp ⁡ ( a i T x + b i ) ) . \min~ \log (\sum_{i=1}^m \exp (a_i^T x + b_i)). min log(i=1mexp(aiTx+bi)). We first reformulate it by introducing new variables and equality constraints:
P 1 : min ⁡    f 0 ( y ) = log ⁡ ( ∑ i = 1 m exp ⁡ ( a i T x + b i ) ) s . t .     A x + b = y . \begin{array}{ll} P1: &\min~~ f_0(y) = \log (\sum_{i=1}^m \exp (a_i^T x + b_i)) \\ &s.t. ~~~Ax + b = y. \end{array} P1:min  f0(y)=log(i=1mexp(aiTx+bi))s.t.   Ax+b=y. where a i T a_i^T aiT are the rows of A A A. The conjugate of the log-sum-exp function is
f 0 ∗ = { ∑ i = 1 m ν i log ⁡ ν i , i f   ν ⪰ 0 , 1 T ν = 1    ∞ o t h e r w i s e f_0^* = \{\begin{array}{ll} \sum_{i=1}^m \nu_i \log \nu_i, &\mathrm{if}~ \nu \succeq 0, \mathbf{1}^T\nu =1 \\ \qquad ~~ \infty &\mathrm{otherwise} \end{array} f0={i=1mνilogνi,  if ν0,1Tν=1otherwise so the dual of the reformulated problem can be expressed
as max ⁡ b T ν − ∑ i = 1 m ν i log ⁡ ν i , 1 T ν = 1 A T ν = 0 ν ⪰ 0 , \begin{array}{ll} \max &b^T \nu - \sum_{i=1}^m \nu_i \log \nu_i , \\ & \mathbf{1}^T\nu =1\\ &A^T \nu = 0 \\ & \nu \succeq 0, \end{array} maxbTνi=1mνilogνi,1Tν=1ATν=0ν0, which is an entropy maximization problem.

Example 5.6 Norm approximation problem.

We consider the unconstrained norm approximation problem
P 0 :   max ⁡ ∥ A x − b ∥ , P0: ~ \max \| Ax-b \|, P0: maxAxb, where ∥ ⋅ ∥ \|\cdot\| is any norm. Here too the Lagrange dual function is constant, equal to the optimal value of P 0 P0 P0, and therefore not useful.
Once again we reformulate the problem as
min ⁡ ∥ y ∥ A x − b = y . \begin{array}{ll} \min &\| y \| \\ &Ax -b = y. \end{array} minyAxb=y.
The Lagrange dual problem is,
min ⁡ b T ν ∥ ν ∥ ∗ ≤ 1 A T ν = 0 , \begin{array}{ll} \min &b^T \nu \\ & \| \nu \|_* \leq 1 \\ & A^T \nu = 0, \end{array} minbTνν1ATν=0, where we use the fact that the conjugate of a norm is the indicator function of the dual norm unit ball.
The idea of introducing new equality constraints can be applied to the constraint functions as well. Consider, for example, the problem
min ⁡ b T ν ∥ ν ∥ ∗ ≤ 1 A T ν = 0 , \begin{array}{ll} \min &b^T \nu \\ & \| \nu \|_* \leq 1 \\ & A^T \nu = 0, \end{array} minbTνν1ATν=0, where A i ∈ R k i × n A_i \in \mathbf{R}^{k_i \times n} AiRki×n and f i : R k i → R f_i: \mathbf{R}^{k_i} \rightarrow \mathbf{R} fi:RkiR are convex. We introduce a new variable y i ∈ R k i y_i \in \mathbf{R}^{k_i} yiRki , for i = 0 , . . . , m i = 0,...,m i=0,...,m, and reformulate the problem as
min ⁡ f 0 ( y 0 ) f i ( y i ) ≤ 0 , i = 1 , . . . , m . A i x + b i = y i , i = 0 , . . . , m . \begin{array}{ll} \min &f_0(y_0) \\ & f_i(y_i) \le 0, i =1,...,m. \\ & A_i x + b_i = y_i , i =0,...,m. \end{array} minf0(y0)fi(yi)0,i=1,...,m.Aix+bi=yi,i=0,...,m.
The Lagrangian for this problem is
L ( x , y 0 , . . . , λ , ν o , . . . , ν m ) = f 0 ( y 0 ) + ∑ i = 1 m λ i f i ( y i ) + ∑ i = 0 m ν i T ( A i x + b i − y i ) . L(x,y_0,...,\lambda,\nu_o,...,\nu_m) = f_0(y_0) + \sum_{i=1}^m \lambda_i f_i(y_i) + \sum_{i=0}^m \nu_i^T (A_i x + b_i - y_i). L(x,y0,...,λ,νo,...,νm)=f0(y0)+i=1mλifi(yi)+i=0mνiT(Aix+biyi).
To find the dual function, we minimize over x x x and y i y_i yi. The minimum over x x x is − ∞ -\infty unless ∑ i = 0 m A i T ν i = 0 , \sum_{i=0}^m A_i^T \nu_i = 0, i=0mAiTνi=0, in which case we have, for λ ≻ 0 \lambda \succ 0 λ0,
g ( λ , ν 0 , … , ν m ) = ∑ i = 0 m ν i T b i + inf ⁡ y 0 , … , y m ( f 0 ( y 0 ) + ∑ i = 1 m λ i f i ( y i ) − ∑ i = 0 m ν i T y i ) = ∑ i = 0 m ν i T b i + inf ⁡ y 0 ( f 0 ( y 0 ) − ν 0 T y 0 ) + ∑ i = 1 m λ i inf ⁡ y i ( f i ( y i ) − ( ν i / λ i ) T y i ) = ∑ i = 0 m ν i T b i − f 0 ∗ ( ν 0 ) − ∑ i = 1 m λ i f i ∗ ( ν i / λ i ) \begin{aligned} &g\left(\lambda, \nu_{0}, \ldots, \nu_{m}\right) \\ &\quad=\sum_{i=0}^{m} \nu_{i}^{T} b_{i}+\inf _{y_{0}, \ldots, y_{m}}\left(f_{0}\left(y_{0}\right)+\sum_{i=1}^{m} \lambda_{i} f_{i}\left(y_{i}\right)-\sum_{i=0}^{m} \nu_{i}^{T} y_{i}\right) \\ &\quad=\sum_{i=0}^{m} \nu_{i}^{T} b_{i}+\inf _{y_{0}}\left(f_{0}\left(y_{0}\right)-\nu_{0}^{T} y_{0}\right)+\sum_{i=1}^{m} \lambda_{i} \inf _{y_{i}}\left(f_{i}\left(y_{i}\right)-\left(\nu_{i} / \lambda_{i}\right)^{T} y_{i}\right) \\ &\quad=\sum_{i=0}^{m} \nu_{i}^{T} b_{i}-f_{0}^{*}\left(\nu_{0}\right)-\sum_{i=1}^{m} \lambda_{i} f_{i}^{*}\left(\nu_{i} / \lambda_{i}\right) \end{aligned} g(λ,ν0,,νm)=i=0mνiTbi+y0,,yminf(f0(y0)+i=1mλifi(yi)i=0mνiTyi)=i=0mνiTbi+y0inf(f0(y0)ν0Ty0)+i=1mλiyiinf(fi(yi)(νi/λi)Tyi)=i=0mνiTbif0(ν0)i=1mλifi(νi/λi)
The last expression involves the perspective of the conjugate function, and is therefore concave in the dual variables. Finally, we address the question of what happens when λ ≻ 0 λ \succ 0 λ0, but some λ i λ_i λi are zero. If λ i = 0 λ_i = 0 λi=0 and ν i ≠ 0 ν_i \neq 0 νi=0, then the dual function is − ∞ −∞ . If λ i = 0 λ_i = 0 λi=0 and ν i = 0 ν_i = 0 νi=0, however, the terms involving y i y_i yi, ν i ν_i νi, and λ i λ_i λi are all zero. Thus, the expression above for g is valid for all λ ≻ 0 λ \succ 0 λ0, if we take λ i f i ∗ ( ν i / λ i ) = 0 λ_i f^∗_i (ν_i /λ_i ) = 0 λifi(νi/λi)=0 when λ i = 0 λ_i = 0 λi=0 and ν i = 0 ν_i = 0 νi=0, and λ i f i ∗ ( ν i / λ i ) = ∞ λ_i f^∗_i (ν_i /λ_i ) = \infty λifi(νi/λi)= when λ i = 0 λ_i = 0 λi=0 and ν i ≠ 0 ν_i \neq 0 νi=0.
Therefore we can express the dual of the problem as
min ⁡ ∑ i = 0 m ν i T b i − f 0 ∗ ( ν 0 ) − ∑ i = 1 m λ i f i ∗ ( ν i / λ i ) λ ⪰ 0 ∑ i = 0 m A i T ν i = 0. \begin{array}{ll} \min & \sum_{i=0}^{m} \nu_{i}^{T} b_{i}-f_{0}^{*}\left(\nu_{0}\right)-\sum_{i=1}^{m} \lambda_{i} f_{i}^{*}\left(\nu_{i} / \lambda_{i}\right)\\ & \lambda \succeq 0 \\ & \sum_{i=0}^m A_i^T \nu_i =0. \end{array} mini=0mνiTbif0(ν0)i=1mλifi(νi/λi)λ0i=0mAiTνi=0.

5.7.2 Transforming the objective

If we replace the objective f 0 f_0 f0 by an increasing function of f 0 f_0 f0, the resulting problem is clearly equivalent. The dual of this equivalent problem, however, can be very different from the dual of the original problem.

Example 5.8

We consider again the minimum norm problem
min ⁡ ∥ A x − b ∥ , \min \| Ax - b \|, minAxb, where ∥ ⋅ ∥ \| \cdot \| is some norm. We reformulate this problem as
min ⁡    1 2 ∥ y ∥ 2 s . t .    A x − b = y . \begin{aligned} \min ~~&\frac{1}{2} \| y \|^2 \\ s.t .~~& Ax -b = y. \end{aligned} min  s.t.  21y2Axb=y. Here we have introduced new variables, and replaced the objective by half its square. Evidently it is equivalent to the original problem.
The dual of the reformulated problem is
min ⁡    − 1 2 ∥ y ∥ ∗ 2 + b T ν s . t .    A T ν = 0. \begin{aligned} \min ~~&-\frac{1}{2} \| y \|^2_* + b^T \nu \\ s.t .~~& A^T \nu = 0. \end{aligned} min  s.t.  21y2+bTνATν=0. where we use the fact that the conjugate of ( 1 / 2 ) ∥ ⋅ ∥ 2 (1/2)\|\cdot\|^2 (1/2)2 is ( 1 / 2 ) ∥ ⋅ ∥ ∗ 2 (1/2)\|\cdot\|^2_* (1/2)2.
Note that this dual problem is not the same as the dual problem (Example 5.6) derived earlier.

5.7.3 Implicit constraints

The next simple reformulation we study is to include some of the constraints in the objective function, by modifying the objective function to be infinite when the constraint is violated.


  • 2
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值