凸优化基础知识—对偶（Duality）

最新推荐文章于 2024-05-16 17:30:38 发布

MadJieJie

最新推荐文章于 2024-05-16 17:30:38 发布

阅读量2.3k

点赞数 2

分类专栏： Convex Optimization

若有帮助，请点赞&收藏，转载请标注出处。

本文链接：https://blog.csdn.net/MadJieJie/article/details/118516182

版权

Convex Optimization 专栏收录该内容

6 篇文章 14 订阅

订阅专栏

5.1 拉格朗日对偶函数（The Lagrange dual function）

5.1.1 拉格朗日（Lagrange）

An optimization problem in the standard form:
$\begin{array}{lll} \min f_0(x) \\ s.t. ~ f_i(x) \le 0, i=1,...,m \\ \quad ~~ h_i(x) = 0, i =1,...,p \end{array}$ with variable $x\in \mathbb{R}^n$ . We assume its domain $\mathcal{D}=\bigcap_{i=0}^{m} \operatorname{dom} f_{i} \cap \bigcap_{i=1}^{p} \operatorname{dom} h_{i}$ is nonempty, and denote the optimal value of the problem by $p^*$ . We do not assume the problem is convex.

The basic idea in Lagrangian duality is to take the constraints into account by augmenting the objective function with a weighted sum of the constraint functions. We define the Lagrangian (function) $L$ : $\mathbb{R}^n \times \mathbb{R}^m \times \mathbb{R}^p \rightarrow \mathbb{R}$ associated with the problem as
$L(x,\lambda,v)=f_0(x)+\sum_{i=1}^{m}\lambda_i f_i(x) + \sum_{i=1}^{p} v_i h_i(x)$ with dom $\times \mathbb{R}^m \times \mathbb{R}^p$ . We refer to $\lambda_i$ as the Lagrange multiplier associated with the $i$ th inequality constraint $f_i(x)\le 0$ ; similarly $v_i$ is the Lagrange multiplier associated with the $i$ th inequality constraint $h_i(x) = 0$ . The vectors $\lambda$ and $ν$ are called the dual variables or Lagrange multiplier vectors associated with the problem.

5.1.2 朗格朗日对偶函数（The Lagrange dual function）

Define Lagrange dual function $g$ : $\mathbb{R}^m \times \mathbb{R}^p \rightarrow \mathbb{R}$ as the minimum value of the Lagrangian over $x$ : for $\lambda \in \mathbb{R}^m$ , $\in \mathbb{R}^p$ ,
$g(\lambda, v)=\inf _{x \in \mathcal{D}} L(x, \lambda, v)=\inf _{x \in \mathcal{D}}\left(f_{0}(x)+\sum_{i=1}^{m} \lambda_{i} f_{i}(x)+\sum_{i=1}^{p} v_{i} h_{i}(x)\right)$
When the Lagrangian is unbounded below in $x$ , the dual function takes on the value $−\infty$ . Since the dual function is the pointwise infimum of a family of affine functions of $(λ, ν)$ , it is concave, even when the problem is not convex.

5.1.3 最优值的下界（Lower bounds on optimal value）

The dual function yields lower bounds on the optimal value p ⋆ of the problem (5.1): For any $\preceq \lambda$ and any $ν$ we have
$g(\lambda, v)=\inf _{x \in \mathcal{D}} L(x, \lambda, v)\le p^* ,$ since $g(\lambda, v)=\inf _{x \in \mathcal{D}} L(x, \lambda, v)\le L(\tilde{x}, \lambda, v)\le f_0(\tilde{x}) \le p^* ,$ where $\tilde{x}$ is a feasible solution for the problem.

5.1.5 例子（Examples）

线性等式的最小二乘解（Least-squares solution of linear equations）

we consider the problem:
$\begin{array}{ll} \min~ x^Tx \\ s.t. ~~Ax=b \end{array}$ where $A\in \mathbb{R}^{p \times n}$ .
Then, we give the Lagrangian :
$L(x,v) = x^Tx + v^T(Ax-b),$ with domain $\mathbb{R}^n \times \mathbb{R}^p.$
Since L(x,ν) is a convex quadratic function of $x$ , we can find the minimizing x from the optimality condition
$\nabla_xL(x,v) = 2x + A^Tv =0,$ which yields $x^*= -(\frac{1}{2}A^Tv)$ . Thereforethe dual function is
$g(v)=L\left(-(1 / 2) A^{T} v, v \right)=-(1 / 4) v^{T} A A^{T} v-b^{T} v$ , which is a concave quadratic function of $x$ , with domain $\mathbb{R}^p$ .

5.1.6 拉格朗日函数&共轭函数（The Lagrange dual function and conjugate functions）

the conjugate $f^*$ of a function $f$ : $\mathbb{R}^n\rightarrow \mathbb{R}$ is given by
$f^*(y) = \sup_{x\in \mathbf{dom} f} (y^Tx-f(x))$

Given a problem:
$\begin{array}{ll} \min~ f(x) \\ s.t. ~~x=0 \end{array}$ Then, we have Lagrangian $L(x,v)=f(x)+v^Tx$ , and dual function is $g(v)=\inf_x (f(x)+v^Tx)=-\sup_x((-v)^Tx-f(x))=-f^*(-v)$ .
More generally, consider an optimization problem with linear inequality and equality constraints,
$\begin{array}{ll} \min~ f_0(x) \\ s.t. ~~Ax\preceq b\\ \qquad Cx=d. \end{array}$
Using the conjugate of $f_0$ , we can rewrite the dual function as follows,
$\begin{array}{ll} g(\lambda,v)&=\inf_x (f_0(x)+\lambda(Ax-b)+v^T(Cx-d)) \\ &=-b^T\lambda-d^Tv+\inf_x (f_0(x)+(A^T+C^Tv)^Tx) \\ &=-b^T\lambda-d^Tv-f_0^*(-A^T\lambda-C^Tv). \end{array}$

5.2 朗格朗日对偶问题（The Lagrange dual problem）

The Lagrange dual problem of a Lagrange dual problem is primary problem.

For each pair $(\lambda,v)$ with $\lambda>0$ , the Lagrange dual function gives us a lower bound on the optimal value $p^*$ of the optimization problem. We can obtain from the Lagrange dual function by the optimization problem:
$\begin{array}{ll} \max~ g(\lambda,v) \\ s.t. ~~\lambda>0 \end{array}$ The above problem is called the Lagrange dual problem. The term dual feasible, to describe a pair $(\lambda,v)$ with $λ > 0$ and $g(\lambda,v) > −\infty$ , means, as the name implies, that $(λ, v)$ is feasible for the dual problem. We refer to $λ^⋆ ,ν^*)$ as dual optimal or optimal Lagrange multipliers if they are optimal for the problem. The Lagrange dual problem is a convex optimization problem, since the objective to be maximized is concave and the constraint is convex.

5.2.1 明确双重约束（Making Dual constraints explicit）

The examples above show that it is not uncommon for the domain of the dual function, $\mathbf{dom} ~g = \{ (\lambda,v)~|~g(\lambda ,v)>-\infty ) \}$ , to have dimension smaller than $m + p$ , i.e., $\mathbf{dom} ~g \in \mathbb{R}^{m+p}$ .

A. 标准形式的朗格朗日对偶（Lagrange dual of standard from LP）

We found that the Lagrange dual function for the standard form LP
$\begin{array}{ll} \min c^Tx \\ s.t. ~~Ax = b \\ \qquad x \succeq 0 \end{array}$ is given by $g(\lambda,v) = \{ \begin{array}{rcl} -b^Tv, ~A^Tv-\lambda + c = 0 \\ -\infty, \qquad \quad otherwise \end{array}$ Strictly speaking, the Lagrange dual problem of the standard form LP is to maximize this dual function g subject to $λ > 0$ , i.e., $\begin{array}{ll} \max ~g(\lambda,v) = \{ \begin{array}{rcl} -b^Tv, ~A^Tv-\lambda + c = 0 \\ -\infty, \qquad \quad otherwise \end{array} \\ s.t. ~~~ \lambda > 0 \end{array}$
Here, $g$ is finite only when $A^Tv - \lambda+c=0$
We can form an equivalent problem by making these equality constraints explicit: $\begin{array}{ll} \max~ -b^Tv \\ s.t. ~~A^T v - \lambda + c = 0 \\ \qquad \lambda \succeq 0 \end{array}$
This problem, in turn, can be expressed as $\begin{array}{ll} \max~ -b^Tv \\ s.t. ~~A^T v + c \succeq 0 \end{array}$ which is an LP in inequality form.
Note that the first problem is the Lagrange dual of the standard form LP, which is equivalent to the two problems last.

B. 不等式形式线性规划的朗格朗日对偶（Lagrange Dual of Inequality Form LP）

In a similar way, we can find the Lagrange dual problem pf a linear program in inequality form
$\begin{array}{ll} P0: &\min ~c^Tx \\ &s.t. ~~ Ax \preceq b. \end{array}$ The Lagrangian is $L(x,\lambda)=c^Tx+\lambda^T(Ax-b) = -b^T\lambda + (A^T\lambda+c)^Tx,$ so the dual function is $g(\lambda)=\inf_x L(x,\lambda) = -b^T \lambda + \inf_x (A^T\lambda + c)^T x .$
So the dual function is
$g(\lambda) = \{ \begin{array}{rcl} -b^T\lambda, ~A^T\lambda + c = 0 \\ -\infty, ~\quad otherwise \end{array}$
The dual variable $\lambda$ is dual feasible if $\lambda \succeq0$ and $A^T \lambda + c=0.$
The Langrange dual of the LP is to maximize $g$ over all $\lambda \succeq 0$ . Again we can reformulate the Lagrange dual by explicitly including the dual feasibility conditions as constraints, as in
$\begin{array}{ll} P1: &\max~b^T\lambda \\ &s.t. ~~ A^T \lambda + c = 0 ,\\ &\qquad \lambda \succeq 0, \end{array}$ which is an LP in standard form.
Note that the Lagrange dual of the problem $P 1$ is (equivalent to) the primal problem $P 0$ .

5.2.2 弱对偶（Weak Duality）

The optimal value of the Lagrange dual problem, which we denote $d^*$ , is, by definition, the best lower bound on $p^*$ that can be obtained from the Lagrange dual function. In particular, we have the simple but important inequality, called as weak duality, $d^*<p^*,$ which holds even if the original problem is not convex. The weak duality inequality holds even if $d^*$ and $p^*$ are infinite.
We refer to the difference $p^*-d^*$ as the optimal duality gap of the original problem, since it gives the gap between the optimal value of the primal problem and the best (i.e., greatest) lower bound on it that can be obtained from the Lagrange dual function.

5.2.3 强对偶&Slater的约束准则（Strong Duality & Slater’s Constraint Qualification）

If the equality $d^* = p^*$ holds, i.e., the optimal duality gap is zero, then we say that strong duality holds.

Strong duality does not, in general, hold. But if the primal problem is convex, i.e., of the form
$\begin{array}{ll} P0: & \min~ f_0(x) \\ & s.t. ~~ f_i(x) \leq b. i =1,...,m,\\ & \qquad Ax=b, \end{array}$ with $f_0,...,f_m$ convex, we usually (but not always) have the strong duality.
Some conditions on the problem, under which strong holds, are called constraint qualifications. One simple constraint qualification is Salter’s condition: There exists an $\in \mathbf{relint}~ D$ such that $f_i(x)<0,i=1,...,m, \quad Ax = b.$ Such a point is sometimes called strictly feasible, since the inequality constraints holds with strict inequalities. Slater’s theorem states that strong duality holds, if 1) Slater’s condition holds and 2) the problem is convex.
Slater’s condition can be refined when some of the inequality constraint functions $f_i$ are affine. If the first $k$ constraint functions $f_1,...,f_k$ are affine, then the strong duality holds provided the following condition holds: There exists an There exists an $\in \mathbf{relint}~ D$ such that $f_i(x)\leq 0,i=1,...,k, \quad f_i(x)<0, i=k+1,...,m, \quad Ax = b.$

5.2.4 Examples

A. Lagrange dual of QCQP

We consider the QCQP
$\begin{array}{ll} P0: & \min~ \frac{1}{2}x^TP_0x+q^T_0x +r_0 \\ & s.t. ~~ \frac{1}{2}x^TP_0x+q^T_0x +r_0 \le 0, i =1,...,m, \end{array}$ with $P_0 \in \mathbf{S}_{++}^n$ and $P_i \in \mathbf{S}_{+}^n$ , $i = 1, . . ., m$ .
The Lagrangian is $\begin{array}{ll}L(x,\lambda) & = \frac{1}{2}x^TP_0x+q^T_0x +r_0 + \sum_{i=1}^{m} \lambda_i [ \frac{1}{2}x^TP_0x+q^T_0x +r_0], i=1,...,m，\\ &= \frac{1}{2}x^TP(\lambda)x+q(\lambda)^Tx +r(\lambda) \end{array}$ where $P(\lambda)=P_0 + \sum_{i=1}^m \lambda_i P_i,$ $q(\lambda)=q_0 + \sum_{i=1}^m \lambda_i q_i,$ and $r(\lambda)=r_0+\lambda_i r_i.$
If $\succeq 0$ , however, we have $\succ 0$ and $g(\lambda) = \inf_x L(x,\lambda) = - \frac{1}{2}q(\lambda)^T P(\lambda)^{-1} q(\lambda) + r(\lambda).$ We can therefore express the dual problem as
$\begin{array}{ll} P1: &\max ~ g(\lambda) \\ &s.t. ~~ \lambda \succeq 0 \end{array}$ The Slater condition says that strong duality between the primal problem $P 0$ and the dual problem $P 1$ holds if the quadratic inequality constraints are strictly feasible, i.e., there exists an $x$ with
$\frac{1}{2}) x^T P_i x + q_i^T x + r_i < 0, i=1,...,m.$

B. A nonconvex quadratic problem with strong duality

On rare occasions, strong duality obtains for a nonconvex problem. As an important example, we consider the problem of minimizing a nonconvex quadratic function over the unit ball,
$\begin{array}{ll} P0: &\max ~ x^TAx + 2b^Tx \\ &s.t. ~~~ x^Tx \leq 1, \end{array}$ where $\in \mathbf{S}^n$ and $b\in\mathbf{R}^n$ . When $\nsucceq 0$ , ths is not a convex problem. This problem is called the trust region problem.
The Lagrangian is $L(x,\lambda) = x^TAx + 2b^Tx + \lambda(x^Tx-1)=x^T(A+\lambda \mathbf{I})x + 2b^Tx - \lambda,$
so the dual function is given by
$g(\lambda) = \{ \begin{array}{ll} -b^T(A+\lambda\mathbf{I})^{\dagger} b -\lambda, &\mathrm{if} ~ A+\lambda\mathbf{I} \succeq 0, b \in \mathcal{R}(A+\lambda\mathbf{I}) \\ -\infty, &otherwise, \end{array}$
where $(A+\lambda\mathbf{I})^\dagger$ is the preudo-inverse of $(A+\lambda\mathbf{I})$ . The Lagrange dual problem is thus
$\begin{array}{ll} P1: & \max ~ -b^T(A+\lambda\mathbf{I})\dagger b - \lambda \\ &s.t. ~~~ A + \lambda\mathbf{I} \succeq 0, ~ b \in \mathcal{R}(A + \lambda \mathbf{I} ) , \end{array}$ with variable $\lambda \in \mathbf{R}$ .
The Lagrange dual problem is a convex optimization problem. In fact, it is readily solved since it can be expressed as
$\begin{array}{ll} \max ~ -\frac{\sum_{i=1}^{n} (q_i^T b)^2}{(\lambda_i+\lambda)} - \lambda \\ s.t. ~~~ \lambda \geq - \lambda_{\min}(A), \end{array}$ where $\lambda_i$ and $q_i$ are the eigenvalues and corresponding (orthonormal) eigenvectors of $A$ , and we interpret $q_i^Tb)^2 / 0$ as $0$ , if $q_i^T b = 0$ and as $\infty$ otherwise.
Despite the original problem $P 0$ is not convex, the strong duality still holds. In fact, a more general result holds: strong duality holds for any optimization problem with quadratic objective and one quadratic inequality constraint, provided Slater’s condition holds.

5.4 鞍点解释（Saddle-Point Interpretation）

5.4.1 Max-Min characterization of weak and strong duality

First note that
$\sup_{\lambda \succeq 0} L(x,\lambda) = \sup_{\lambda \succeq 0 } (f_0(x) + \sum_{i=1}^m \lambda_i f_i(x)) = \{ \begin{array}{ll} f_0(x), ~&\mathrm{if}~f_i(x)<0, ~ i =1,...,m \\ \infty, &otherwise. \end{array}$
Suppose $x$ is not feasible, and $f_i(x)>0$ for some $i$ . Then $\sup_{\lambda \succeq 0} L(x,\lambda) = \infty,$ as can be seen by choosing $\lambda_j = 0,~j \neq i,$ and $\lambda_i \rightarrow \infty$ . On the other hand, if $f_i(x)<0$ , $i = 1, . . ., m$ , then the optimal choice of $\lambda$ is $\lambda = 0$ and $\sup_{\lambda \succeq 0} L(x,\lambda) = f_0(x).$ This means that we can express the optimal value of the primal problem as $p^* = \inf_x \sup_{\lambda \succeq 0} L(x,\lambda).$
By the definition of the dual function, we also have optimal value of the dual problem $d^* = \sup_{\lambda \succeq 0} \inf_x L(x,\lambda).$
Thus, the weak duality can be expressed as the inequality $d^* = \sup_{\lambda \succeq 0} \inf_x L(x,\lambda) \leq \inf_x \sup_{\lambda \succeq 0} L(x,\lambda) = p^* ,$ and strong duality as the equality $\sup_{\lambda \succeq 0} \inf_x L(x,\lambda) \leq \inf_x \sup_{\lambda \succeq 0} L(x,\lambda) .$
Strong duality means that the order of the minimization over $x$ and the maximization over $\succeq 0$ can be switched without affecting the result.
In fact, the inequality does not depend on any properties of $L :$ We have $\sup_{z \in \mathbf{Z}} \inf_{w \in \mathbf{W}} f(w,z) \leq \inf_{w \in \mathbf{W}} \sup_{z \in \mathbf{Z}} f(w,z)$
for any $f:~\mathbf{R}^n \times \mathbf{R}^m \rightarrow \mathbf{R}$ (and any $\mathbf{W} \subseteq \mathbf{R}^n$ and $\mathbf{Z} \subseteq \mathbf{R}^m$ ). This general inequality is called the max-min inequality. When equality holds, i.e., $\sup_{z \in \mathbf{Z}} \inf_{w \in \mathbf{W}} f(w,z) = \inf_{w \in \mathbf{W}} \sup_{z \in \mathbf{Z}} f(w,z),$ $f$ (and $\mathbf{W}$ and $\mathbf{Z}$ ) satisfy the strong max-min property or saddle-point property.

5.4.2 Saddle-Point Interpretation

We refer to a pair $\tilde{w} \in W, ~\tilde{z} \in Z$ as a saddle-point for $f$ (and $W$ and $Z$ ) if $f(\tilde{w},z) \leq f(\tilde{w},\tilde{z}) \leq f(w,\tilde{z})$ for all $\in W, ~{z} \in Z.$ In other words, $\tilde{w}$ minimizes $f(w,\tilde{z})$ (over $\in W$ ) and $\tilde{z}$ maximizes $f(\tilde{w},z)$ (over $\in Z$ ): $f(\tilde{w},\tilde{z}) = \inf_{w \in W} f(w,\tilde{z}),\quad f(\tilde{w},\tilde{z}) = \sup_{z \in Z} f(\tilde{w},z).$ This implies that the strong max-min property holds, and that the common value if $f(\tilde{w},\tilde{z}).$
Returning to our discussion of Lagrange duality, we see that if $x^⋆$ and $λ^⋆$ are respectively primal and dual optimal points for a problem in which strong duality obtains, they form a saddle-point for the Lagrangian. The converse is also true: If $(x, λ)$ is a saddle-point of the Lagrangian, then x is primal optimal, λ is dual optimal, and the optimal duality gap is zero.

5.5 最优化条件（Optimality conditions）

5.5.1 Certificate of suboptimality and stopping criteria

If we can find a dual feasible $g(\lambda,\nu),$ we can establish a lower bound on the optimal value of the primal problem: $p^* \leq g(\lambda, \nu).$ Thus， a dual feasible point (λ,ν) provides a proof or certificate that $g(\lambda, \nu).$

5.5.2 Complementary slackness

Let $x^⋆$ be a primal optimal and $λ^⋆,ν^⋆ )$ be a dual optimal point. This means that
$\begin{array}{ll} f_0(x^*) & = g(\lambda^*,\nu^*) \\ &= \inf_x (f_0(x) + \sum_{i=1}^{m} \lambda_i^* f_i(x) + \sum_{i=1}^m \nu_i^* h_i (x)) \\ & \leq f_0(x^*) + \sum_{i=1}^{m} \lambda_i^* f_i(x^*) + \sum_{i=1}^m \nu_i^* h_i (x^*) \\ & \leq f_0(x^*) \end{array}$

The first line states that the optimal duality is zero.
The second line is the definition of the dual function.
The third line follows since the infimum of the Lagrangian over $x$ is less than or equal to its value at $x = x^*.$
The last inequality follows from $\lambda_i^* \geq 0,~$ $f_i(x^*)\leq 0,~$ $i = 1, . . ., m,$ and $h_i(x^*)=0,~ i=1,...,p.$

We conclude that the two inequalities (3-4 lines) in this chain hold with equality.
The first conclusion: since the inequality in the third line is an equality, we conclude that $x^*⋆$ minimizes $L(x,λ^*,\nu^⋆)$ over $x$ .
The second conclusion (Complementary Slackness): $\sum_{i=1}^m \lambda_i^* f_i(x^*) = 0.$
Since each term in this sum is nonpositive, we conclude that $\lambda_i^* f_i(x^*) = 0,~ i =1,...,m.$ it holds for any primal optimal $x^⋆$ and any dual optimal $λ^⋆ ,ν^⋆ )$ (when strong duality holds).
We can express the complementary slackness condition as $\lambda_i^* >0 ~\rightarrow ~ f_i(x^*)=0,$ or, equivalently $f_i(x^*)<0 ~ \rightarrow ~ \lambda_i^* = 0 .$ Roughly speaking, this means the ith optimal Lagrange multiplier is zero unless the $i$ th constraint is active at the optimum.

5.5.3 KKT optimality conditions

We now assume that the functions $f_0,...,f_m,~h_1,...,h_p$ are differentiable (and therefore have open domains), but we make no assumptions yet about convexity.

A. KKT conditions for nonconvex problems

As above, let $x^⋆$ and $λ^⋆ ,ν^⋆)$ be any primal and dual optimal points with zero duality gap. Since x ⋆ minimizes $L (x, λ ⋆, ν ⋆)$ over $x$ , it follows that its gradient must vanish at $x^⋆$ , i.e., $\nabla f_0(x^*) + \sum_{i=1}^m \lambda_i^* \nabla f_i(x^*) + \sum_{i=1}^p \nu_i^* \nabla f_i(x^*) = 0.$
Thus, we have
$\begin{aligned} f_{i}\left(x^{\star}\right) & \leq 0, \quad i=1, \ldots, m \\ h_{i}\left(x^{\star}\right) &=0, \quad i=1, \ldots, p \\ \lambda_{i}^{\star} & \geq 0, \quad i=1, \ldots, m \\ \lambda_{i}^{\star} f_{i}\left(x^{\star}\right) &=0, \quad i=1, \ldots, m \\ \nabla f_{0}\left(x^{\star}\right)+\sum_{i=1}^{m} \lambda_{i}^{\star} \nabla f_{i}\left(x^{\star}\right)+\sum_{i=1}^{p} \nu_{i}^{\star} \nabla h_{i}\left(x^{\star}\right) &=0, \end{aligned}$ which are called the Karush-Kuhn-Tucker (KKT) conditions.
To summarize, for any optimization problem with differentiable objective and differentiable constraint functions for which strong duality obtains, any pair of primal and dual optimal points must satisfy the KKT conditions.

B. KKT conditions for convex problems

When the primal problem is convex, the KKT conditions are also sufficient for the points to be primal and dual optimal. In other words, if $f_i$ are convex and $h_i$ are affine, and $\tilde{x},\tilde{λ}, \tilde{ν}$ are any points that satisfy the KKT conditions
$\begin{aligned} f_{i}\left(\tilde{x}\right) & \leq 0, \quad i=1, \ldots, m \\ h_{i}\left(\tilde{x}\right) &=0, \quad i=1, \ldots, p \\ \tilde{\lambda_{i}} & \geq 0, \quad i=1, \ldots, m \\ \tilde{\lambda_{i}} f_{i}\left(\tilde{x}\right) &=0, \quad i=1, \ldots, m \\ \nabla f_{0}\left( \tilde{x} \right)+\sum_{i=1}^{m} \tilde{\lambda_{i}} \nabla f_{i}\left(\tilde{x}\right)+\sum_{i=1}^{p} \tilde{\nu_{i}} \nabla h_{i}\left( \tilde{x} \right) &=0, \end{aligned}$ then $\tilde{x}$ and $\tilde{λ}, \tilde{ν})$ are primal and dual optimal, with zero duality gap.

To see this, note that the first two conditions state that $\tilde{x}$ is primal feasible. Since $\tilde{λ_i} \leq 0, ~L(x, \tilde{λ}, \tilde{ν})$ is convex in $x$ ; the last KKT condition states that its gradient with respect to $x$ vanishes at $\tilde{x}$ , so it follows that $\tilde{x}$ minimizes $\tilde{λ}, \tilde{ν})$ over $x$ . From this we conclude that

$\begin{aligned} g(\tilde{\lambda},\tilde{\nu}) & = L(\tilde{x},\tilde{\lambda},\tilde{\nu}) \\ &= f_0( \tilde{x} ) \\ &= f_0( \tilde{x} ) +\sum_{i=1}^{m} \tilde{\lambda_{i}} f_{i}\left(\tilde{x}\right)+\sum_{i=1}^{p} \tilde{\nu_{i}} h_{i}\left( \tilde{x} \right) \end{aligned}$ where in the last line we use $h_i (\tilde{x}) = 0$ and $\tilde{λ_i} f_i (\tilde{x}) = 0$ . This shows that $\tilde{x}$ and ( $\tilde{λ}, \tilde{ν}$ ) have zero duality gap, and therefore are primal and dual optimal.
In summary, for any convex optimization problem with differentiable objective and differentiable constraint functions, any points that satisfy the KKT conditions are primal and dual optimal, and have zero duality gap.
If a convex optimization problem with differentiable objective and differentiable constraint functions satisfies Slater’s condition, then the KKT conditions provide necessary and sufficient conditions for optimality: Slater’s condition implies that the optimal duality gap is zero and the dual optimum is attained, so $x$ is optimal if and only if there are $(\lambda,\nu)$ that, together with $x$ , satisfy the KKT conditions.
The KKT conditions play an important role in optimization. In a few special cases, it is possible to solve the KKT conditions analytically. More generally, many algorithms for convex optimization are conceived as, or can be interpreted as, methods for solving the KKT conditions.

Example 5.1

Equality constrained convex quadratic minimization. We consider the problem
$\begin{array}{ll} P0: ~&\min ~~ &(\frac{1}{2})x^TPx + q^Tx + r \\ &s.t. &Ax = b, \end{array}$ where $\in S_{+}^n.$
The KKT conditions for this problem is
$\begin{array}{ll} \min ~ &Ax^* = b, \\ &Px^* + q + A^T \nu = 0, \end{array}$ which we can write as
$\left[\begin{array}{cc} P & A^{T} \\ A & 0 \end{array}\right]\left[\begin{array}{l} x^{\star} \\ \nu^{\star} \end{array}\right]=\left[\begin{array}{c} -q \\ b \end{array}\right].$
Solving this set of $m + n$ equations in the $m + n$ variables $x^⋆, ν^⋆$ gives the optimal primal and dual variables for $P 0$ .

Example 5.2 Water-filling.

We consider the convex optimization problem
$\begin{array}{ll} P0: ~&\min ~ &-\sum_{i=1}^n \log (\alpha_i + x_i ) \\ &s.t. & x \succeq 0, \mathbf{1}^T x = 1, \end{array}$ where $\alpha_i > 0$ . This problem arises in information theory, in allocating power to a set of $n$ communication channels. The variable $x_i$ represents the transmitter power allocated to the ith channel, and $\log(\alpha_i + x_i )$ gives the capacity or communication rate of the channel, so the problem is to allocate a total power of one to the channels, in order to maximize the total communication rate.
Introducing Lagrange multipliers $\lambda^⋆ \in \mathbb{R}^n$ for the inequality constraints $x^⋆ \succeq 0$ , and a multiplier $\nu^⋆ \in R$ for the equality constraint $\mathbf{1}^T x = 1$ , we obtain the KKT conditions
$\begin{array}{ll} \qquad \qquad ~~x^* &\succeq 0, \\ \qquad \qquad \mathbf{1}^T x &= 1 \\ \qquad \qquad ~~ \lambda^* &\succeq 0, ~~ i=1,...,n \\ \qquad \qquad \lambda_i^* x_i^* &= 0, \\ -\frac{1}{(\alpha_i+x_i^*)} - \lambda_i^* + \nu^* &= 0, ~~ i=1,...,n . \end{array}$ We can directly solve these equations to find $x^⋆$ , $λ^⋆$ , and $ν^⋆$ . We start by noting that $λ^⋆$ acts as a slack variable in the last equation, so it can be eliminated, leaving
$\begin{array}{ll} \qquad \qquad ~~x^* &\succeq 0, \\ \qquad \qquad \mathbf{1}^T x &= 1 \\ \quad ~~ x_i^*(\nu^* - \frac{1}{(\alpha_i+x_i^*)}) &= 0, \\ \qquad \qquad ~~ \nu^* & \geq \frac{1}{\alpha_i+x_i^*}, ~~ i=1,...,n . \end{array}$

If $ν^⋆ < 1/α_i$ , this last condition can only hold if $x^⋆_i > 0$ , which by the third condition implies that $ν^⋆ = \frac{1}{α_i + x^⋆_i }$ .
Solving for $x^⋆_i$ , we conclude that $x^⋆_i= \frac{1}{ν^⋆} −α_i$ if $ν^⋆ < \frac{1}{α_i}$ .
If $ν^⋆ \geq 1/α_i$ , then $x^⋆_i> 0$ is impossible, because it would imply $ν^⋆ \geq \frac{1}{α_i} > \frac{1}{α_i + x^⋆_i }$ , which violates the complementary slackness condition.
Therefore, $x^⋆_i = 0$ if $ν^⋆\geq 1/α_i$ .
Thus we have
$x_i^* = \{\begin{array}{ll} \frac{1}{\nu^*} - \alpha_i, &\mathrm{if} ~~ \nu^* < \frac{1}{\alpha_i}\\ \quad~ 0,& \mathrm{if} ~~ \nu^* \geq \frac{1}{\alpha_i} \end{array}$ or, put more simply, $x_i^* =\max \{0,\frac{1}{\nu^*} - \alpha_i \}$ .
Substituting this expression for $x^⋆_i$ into the condition $\mathbf{1}^T x^⋆ = 1$ , we obtain
$\sum_{i=1}^n \max \{0,\frac{1}{\nu^*} - \alpha_i \} = 1.$ The lefthand side is a piecewise-linear increasing function of $1/ν^⋆$ , with breakpoints at $α_i$ , so the equation has a unique solution which is readily determined.

5.5.5 Solving the primal problem via the dual

if strong duality holds and a dual optimal solution $λ^⋆ ,ν^⋆ )$ exists, then any primal optimal point is also a minimizer of $L(x,λ^⋆ ,ν^⋆ )$ . This fact sometimes allows us to compute a primal optimal solution from a dual optimal solution. More precisely, suppose we have strong duality and an optimal $λ^⋆ ,ν^⋆ )$ is known. Suppose that the minimizer of $L(x,λ^⋆ ,ν^⋆ )$ , i.e., the solution of $\min \quad f_{0}(x)+\sum_{i=1}^{m} \lambda_{i}^{\star} f_{i}(x)+\sum_{i=1}^{p} \nu_{i}^{\star} h_{i}(x)$ is unique.

Example 5.3 Entropy maximization.

We consider the entropy maximization problem
$\begin{array}{ll} \operatorname{min} & f_{0}(x)=\sum_{i=1}^{n} x_{i} \log x_{i} \\ \text {subject to} & A x \preceq b \\ & \mathbf{1}^{T} x=1 \end{array}$ with domain $\mathbf{R}_{++}^n,$ and its Lagrange dual problem
$\begin{array}{ll} \text { maximize } & -b^{T} \lambda-\nu-e^{-\nu-1} \sum_{i=1}^{n} e^{-a_{i}^{T} \lambda} \\ \text { subject to } & \lambda \succeq 0 \end{array}$ where $a_i$ are the columns of $A$ . We assume that the weak form of Slater’s condition holds, i.e., there exists an $x ≻ 0$ with $\preceq b$ and $\mathbf{1}^T x = 1$ , so strong duality holds and an optimal solution $λ^⋆,ν^⋆ )$ exists.
Suppose we have solved the dual problem. The Lagrangian at ( $λ^⋆ ,ν^⋆$ ) is
$L\left(x, \lambda^{\star}, \nu^{\star}\right)=\sum_{i=1}^{n} x_{i} \log x_{i}+\lambda^{\star T}(A x-b)+\nu^{\star}\left(\mathbf{1}^{T} x-1\right)$ which is strictly convex on $\mathcal{D}$ and bounded below, so it has a unique solution $x^⋆$ , given by
$x^*_i = 1/ \exp (a_i^T \lambda^*+\nu^* + 1), ~~i=1,...,n.$
If $x^⋆$ is primal feasible, it must be the optimal solution of the primal problem. If $x^⋆$ is not primal feasible, then we can conclude that the primal optimum is not attained.

5.7 Examples (reformulations)

In this section, we show by example that simple equivalent reformulations of a problem can lead to very different dual problems. We consider the following types of reformulations:

Introducing new variables and associated equality constraints.
Replacing the objective with an increasing function of the original objective.
Making explicit constraints implicit, i.e., incorporating them into the domain of the objective.

5.7.1 Introducing new variables and equality constraints

Consider an unconstrained problem of the form
$P0: ~\min ~f_0(Ax + b).$ Its Lagrange dual function is the constant $p^⋆$ . So while we do have strong duality, i.e., $p^⋆= d^⋆$ , the Lagrangian dual is neither useful nor interesting.

Now let us reformulate the problem as
$\begin{array}{ll} P1: &\min~~ f_0(Ax + b) \\ &s.t. ~~~Ax +b = y. \end{array}$ Here we have introduced new variables y, as well as new equality constraints $A x + b = y$ . The problems $P 0$ and $P 1$ are clearly equivalent.
The Lagrangian of the reformulated problem is
$L(x,y,\nu) = f_0(y) + \nu^T(Ax+b-y).$ To find the dual function we minimize $L$ over $x$ and $y$ . Minimizing over $x$ , we find that $−\infty$ unless $A^T\nu = 0$ , in which case we are left with
$g(\nu) = b^T \nu + \inf_y (f_0(y) - \nu^T y ) = b^T \nu - f_0^*(\nu),$ where $f_0^*$ is the conjugate of $f_0$ . The dual problem of $P 1$ can therefore be expressed as
$\begin{array}{ll} P1: &\min~~ g(\nu)=b^T-f_0^*(\nu) \\ &s.t. ~~~A^T \nu= 0. \end{array}$ Thus, the dual of the reformulated problem $P 1$ is considerably more useful than the dual of the original problem $P 0$ .

Example 5.5 Unconstrained geometric program.

Consider the unconstrained geometric program
$\min~ \log (\sum_{i=1}^m \exp (a_i^T x + b_i)).$ We first reformulate it by introducing new variables and equality constraints:
$\begin{array}{ll} P1: &\min~~ f_0(y) = \log (\sum_{i=1}^m \exp (a_i^T x + b_i)) \\ &s.t. ~~~Ax + b = y. \end{array}$ where $a_i^T$ are the rows of $A$ . The conjugate of the log-sum-exp function is
$f_0^* = \{\begin{array}{ll} \sum_{i=1}^m \nu_i \log \nu_i, &\mathrm{if}~ \nu \succeq 0, \mathbf{1}^T\nu =1 \\ \qquad ~~ \infty &\mathrm{otherwise} \end{array}$ so the dual of the reformulated problem can be expressed
as $\begin{array}{ll} \max &b^T \nu - \sum_{i=1}^m \nu_i \log \nu_i , \\ & \mathbf{1}^T\nu =1\\ &A^T \nu = 0 \\ & \nu \succeq 0, \end{array}$ which is an entropy maximization problem.

Example 5.6 Norm approximation problem.

We consider the unconstrained norm approximation problem
$\max \| Ax-b \|,$ where $\|\cdot\|$ is any norm. Here too the Lagrange dual function is constant, equal to the optimal value of $P 0$ , and therefore not useful.
Once again we reformulate the problem as
$\begin{array}{ll} \min &\| y \| \\ &Ax -b = y. \end{array}$
The Lagrange dual problem is,
$\begin{array}{ll} \min &b^T \nu \\ & \| \nu \|_* \leq 1 \\ & A^T \nu = 0, \end{array}$ where we use the fact that the conjugate of a norm is the indicator function of the dual norm unit ball.
The idea of introducing new equality constraints can be applied to the constraint functions as well. Consider, for example, the problem
$\begin{array}{ll} \min &b^T \nu \\ & \| \nu \|_* \leq 1 \\ & A^T \nu = 0, \end{array}$ where $A_i \in \mathbf{R}^{k_i \times n}$ and $f_i: \mathbf{R}^{k_i} \rightarrow \mathbf{R}$ are convex. We introduce a new variable $y_i \in \mathbf{R}^{k_i}$ , for $i = 0, . . ., m$ , and reformulate the problem as
$\begin{array}{ll} \min &f_0(y_0) \\ & f_i(y_i) \le 0, i =1,...,m. \\ & A_i x + b_i = y_i , i =0,...,m. \end{array}$
The Lagrangian for this problem is
$L(x,y_0,...,\lambda,\nu_o,...,\nu_m) = f_0(y_0) + \sum_{i=1}^m \lambda_i f_i(y_i) + \sum_{i=0}^m \nu_i^T (A_i x + b_i - y_i).$
To find the dual function, we minimize over $x$ and $y_i$ . The minimum over $x$ is $-\infty$ unless $\sum_{i=0}^m A_i^T \nu_i = 0,$ in which case we have, for $\lambda \succ 0$ ,
$\begin{aligned} &g\left(\lambda, \nu_{0}, \ldots, \nu_{m}\right) \\ &\quad=\sum_{i=0}^{m} \nu_{i}^{T} b_{i}+\inf _{y_{0}, \ldots, y_{m}}\left(f_{0}\left(y_{0}\right)+\sum_{i=1}^{m} \lambda_{i} f_{i}\left(y_{i}\right)-\sum_{i=0}^{m} \nu_{i}^{T} y_{i}\right) \\ &\quad=\sum_{i=0}^{m} \nu_{i}^{T} b_{i}+\inf _{y_{0}}\left(f_{0}\left(y_{0}\right)-\nu_{0}^{T} y_{0}\right)+\sum_{i=1}^{m} \lambda_{i} \inf _{y_{i}}\left(f_{i}\left(y_{i}\right)-\left(\nu_{i} / \lambda_{i}\right)^{T} y_{i}\right) \\ &\quad=\sum_{i=0}^{m} \nu_{i}^{T} b_{i}-f_{0}^{*}\left(\nu_{0}\right)-\sum_{i=1}^{m} \lambda_{i} f_{i}^{*}\left(\nu_{i} / \lambda_{i}\right) \end{aligned}$
The last expression involves the perspective of the conjugate function, and is therefore concave in the dual variables. Finally, we address the question of what happens when $\succ 0$ , but some $λ_i$ are zero. If $λ_i = 0$ and $ν_i \neq 0$ , then the dual function is $- \infty$ . If $λ_i = 0$ and $ν_i = 0$ , however, the terms involving $y_i$ , $ν_i$ , and $λ_i$ are all zero. Thus, the expression above for g is valid for all $\succ 0$ , if we take $λ_i f^∗_i (ν_i /λ_i ) = 0$ when $λ_i = 0$ and $ν_i = 0$ , and $λ_i f^∗_i (ν_i /λ_i ) = \infty$ when $λ_i = 0$ and $ν_i \neq 0$ .
Therefore we can express the dual of the problem as
$\begin{array}{ll} \min & \sum_{i=0}^{m} \nu_{i}^{T} b_{i}-f_{0}^{*}\left(\nu_{0}\right)-\sum_{i=1}^{m} \lambda_{i} f_{i}^{*}\left(\nu_{i} / \lambda_{i}\right)\\ & \lambda \succeq 0 \\ & \sum_{i=0}^m A_i^T \nu_i =0. \end{array}$

5.7.2 Transforming the objective

If we replace the objective $f_0$ by an increasing function of $f_0$ , the resulting problem is clearly equivalent. The dual of this equivalent problem, however, can be very different from the dual of the original problem.

Example 5.8

We consider again the minimum norm problem
$\min \| Ax - b \|,$ where $\| \cdot \|$ is some norm. We reformulate this problem as
$\begin{aligned} \min ~~&\frac{1}{2} \| y \|^2 \\ s.t .~~& Ax -b = y. \end{aligned}$ Here we have introduced new variables, and replaced the objective by half its square. Evidently it is equivalent to the original problem.
The dual of the reformulated problem is
$\begin{aligned} \min ~~&-\frac{1}{2} \| y \|^2_* + b^T \nu \\ s.t .~~& A^T \nu = 0. \end{aligned}$ where we use the fact that the conjugate of $(1/2)\|\cdot\|^2$ is $(1/2)\|\cdot\|^2_*$ .
Note that this dual problem is not the same as the dual problem (Example 5.6) derived earlier.

5.7.3 Implicit constraints

The next simple reformulation we study is to include some of the constraints in the objective function, by modifying the objective function to be infinite when the constraint is violated.

MadJieJie

关注

2
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
凸优化基础知识—对偶（Duality）

Directory对偶（Duality）拉格朗日对偶函数（The Lagrange dual function）A. 拉格朗日（Lagrange）B. 朗格朗日对偶函数（The Lagrange dual function）C. 最优值的下界（Lower bounds on optimal value）例子（Examples）线性等式的最小二乘解（Least-squares solution of linear equations）D. The Lagrange dual function and co..
复制链接

扫一扫