Convex Optimization 读书笔记 (4)

最新推荐文章于 2021-07-06 11:11:18 发布

来碗拿铁️

最新推荐文章于 2021-07-06 11:11:18 发布

阅读量644

点赞数

分类专栏：读书笔记凸优化

本文链接：https://blog.csdn.net/qq_39337332/article/details/109445623

版权

读书笔记同时被 2 个专栏收录

11 篇文章 1 订阅

订阅专栏

凸优化

10 篇文章 0 订阅

订阅专栏

Chapter5: Duality

5.1 The Lagrange dual function

5.1.1 The Lagrangian

Consider an optimization problem in the standard form
$\begin{aligned} {\rm minimize} \ \ \ \ & f_0(x)\\ {\rm subject \ to} \ \ \ \ & f_i(x)\leq0,i=1,...m \\ & h_i(x)=0,i=1,...p \end{aligned}$
We define the Lagrangian Duality $\mathbf{R}^n × \mathbf{R}^m ×\mathbf{R}^p → \mathbf{R}$ associated with the problem as
$L(x,\lambda,\nu)=f_0(x)+\sum_{i=1}^{m}\lambda_if_i(x)+\sum_{i=1}^{p}\nu_ih_i(x)$
with $\mathbf{dom} \ L = \mathcal{D} × \mathbf{R} × \mathbf{R}$ . We refer to $λ_i$ as the Lagrange multiplier associated with the $i$ -th inequality constraint $f_i(x) ≤ 0$ ; similarly we refer to $\nu_i$ as the Lagrange multiplier associated with the $i$ -th equality constraint $h_i(x) = 0$ . The vectors $λ$ and $\nu$ are called the dual variables or Lagrange multiplier vectors.

5.1.2 The Lagrange dual function

We define the Lagrange dual function $\mathbf{R}^m × \mathbf{R}^p → \mathbf{R}$ as the minimum value of the Lagrangian over ${\rm for} \ λ ∈ \mathbf{R}^m, \nu ∈ \mathbf{R}^p$ ,
$g(\lambda, \nu)=\inf_{x\in \mathcal{D}}L(x,\lambda,\nu)$

5.1.3 Lower bounds on optimal value

The dual function yields lower bounds on the optimal value $p^⋆$ of the problem: For any $\succeq 0$ and any $\nu$ we have
$g(\lambda,\nu)\leq p^*$

5.1.4 Linear approximation interpretation

5.1.5 Examples

5.1.6 The Lagrange dual function and conjugate functions

Consider an optimization problem with linear inequality and equality constraints,
$\begin{aligned} {\rm minimize} \ \ \ \ & f_0(x)\\ {\rm subject \ to} \ \ \ \ & Ax \preceq b \\ & Cx=d \end{aligned}$
Using the conjugate of $f_0$ we can write the dual function for the problem
$\begin{aligned} g(\lambda,\nu) &= \inf_x(f_0(x)+\lambda^T(Ax-b)+\nu^T(Cx-d)) \\ &=-b^T\lambda-d^T\nu-f^*_0(-A^T\lambda-C^T\nu) \end{aligned}$

5.2 The Lagrange dual problem

The optimization problem
$\begin{aligned} {\rm maximize} \ \ \ \ & g(\lambda,\nu)\\ {\rm subject \ to} \ \ \ \ & \lambda \succeq b \\ \end{aligned}$
This problem is called the Lagrange dual problem. We refer to $(λ^⋆, \nu^⋆)$ as dual optimal or optimal Lagrange multipliers if they are optimal for the problem. The Lagrange dual problem is a convex optimization problem whether the original one is convex or not.

5.2.1 Making dual constraints explicit

In many cases we can identify the affine hull of $\mathbf{dom} \ g$ , and describe it as a set of linear equality constraints.

5.2.2 Weak duality

The optimal value of the Lagrange dual problem, which we denote $d^⋆$ , is, by definition, the best lower bound on $p^⋆$ that can be obtained from the Lagrange dual function:
$d^*\leq p^*$
which holds even if the original problem is not convex. This property is called weak duality.

We refer to the difference $p^⋆ − d^⋆$ as the optimal duality gap of the original problem.

5.2.3 Strong duality and Slater’s constraint qualification

If the equality
$d^⋆ = p^⋆$
holds, then we say that strong duality holds.

One simple constraint qualification is Slater’s condition: There exists an $\mathbf{relint} \ \mathcal{D}$ such that
$f_i(x)<0, \ \ i=1,...,m, \ \ Ax=b$
and $f_0(x)$ is a convex function.

5.2.4 Examples

5.2.5 Mixed strategies for matrix games

5.3 Geometric interpretation

5.3.1 Weak and strong duality via set of values

Suppose $\mathcal{G}=\{(f_1(x),...,f_m(x),h_1(x),...,h_p(x),f_0(x))\in \mathbf{R}^m\times\mathbf{R}^p\times\mathbf{R}\mid x\in \mathcal{D}\}$ , then the dual function at $\lambda,\nu$ is a supporting hyperplane to $\mathcal{G}$ :

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0GlZffdB-1604287826338)(/Users/apple/Library/Application Support/typora-user-images/image-20201030211057304.png)]

5.3.2 Proof of strong duality under constraint qualification

5.3.3 Multicriterion interpretation

Take the scalarization method for the (unconstrained) multicriterion problem
${\rm minimize \ (w.r.t} \ \mathbf{R}_+^{m+1}) \ \ \ \ F(x)=(f_1(x),...,f_m(x),f_0(x))$
which is just the Lagrange duality for a problem without equality constraints
$\tilde{\lambda}^TF(x)=f_0(x)+\sum_{i=1}^{m}\lambda_if_i(x)$

5.4 Saddle-point interpretation

5.4.1 Max-min characterization of weak and strong duality

We can express the optimal value of the primal problem as
$p^*=\inf_x\sup_{\lambda\succeq0}L(x,\lambda)$
By the definition of the dual function, we also have
$d^*=\sup_{\lambda\succeq0}\inf_xL(x,\lambda)$
Thus, weak duality can be expressed as the inequality
$\sup_{\lambda\succeq0}\inf_xL(x,\lambda)\leq\inf_x\sup_{\lambda\succeq0}L(x,\lambda)$
and strong duality as the equality
$\sup_{\lambda\succeq0}\inf_xL(x,\lambda)=\inf_x\sup_{\lambda\succeq0}L(x,\lambda)$

5.4.2 Saddle-point interpretation

Returning to our discussion of Lagrange duality, we see that if $x^⋆$ and $λ^⋆$ are primal and dual optimal points for a problem in which strong duality obtains, they form a saddle-point for the Lagrangian. The converse is also true: If $(x, λ)$ is a saddle-point of the Lagrangian, then x is primal optimal, $λ$ is dual optimal, and the optimal duality gap is zero.

5.4.3 Game interpretation

The optimal duality gap for the problem is exactly equal to the advantage afforded the player who goes second, i.e., the player who has the advantage of knowing his or her opponent’s choice before choosing. If strong duality holds, then there is no advantage to the players of knowing their opponent’s choice.

5.4.4 Price or tax interpretation

5.5 Optimality conditions

5.5.1 Certificate of suboptimality and stopping criteria

A dual feasible point $(λ,\nu)$ provides a proof or certificate that $p^⋆ ≥ g(λ,\nu)$ .

The stopping criterion
$f_0(x^{(k)})-g(\lambda^{k},\nu^k)<\epsilon_{\rm abs}$
guarantees that when the algorithm terminates, $x^{(k)}$ is $\epsilon_{\rm abs}$ -suboptimal.

5.5.2 Complementary slackness

Let $x^⋆$ be a primal optimal and $λ^⋆, ν^⋆)$ be a dual optimal point. This condition is known as complementary slackness that $\lambda^*f_i(x^*)=0$

5.5.3 KKT optimality conditions

KKT conditions for nonconvex problems

Let $x^⋆$ be a primal optimal and $λ^⋆, ν^⋆)$ be a dual optimal point. Since $x^⋆$ minimizes $λ^⋆ , \nu^ ⋆ )$ over $x$ , it follows that its gradient must vanish at $x^⋆$ ,
$\nabla f_0(x^⋆)+\sum_{i=1}^m \lambda_i^*f_i(x^⋆)+\sum_{i=1}^p\nu_i^*h_i(x^⋆)=0$
then the Karush-Kuhn-Tucker (KKT) condition is
$\begin{aligned} f_i(x^⋆) & \leq 0, \ \ i=1,...,m \\ h_i(x^⋆) &=0,\ \ i=1,...,p \\ \lambda_i^* &\geq0,\ \ i=1,...,m \\ \lambda_i^*f_i(x^⋆)&=0,\ \ i=1,...,m \\ \nabla f_0(x^⋆)+\sum_{i=1}^m \lambda_i^*f_i(x^⋆)+\sum_{i=1}^p\nu_i^*h_i(x^⋆)&=0 \end{aligned}$
To summarize, for any optimization problem with differentiable objective and constraint functions for which strong duality obtains, any pair of primal and dual optimal points must satisfy the KKT conditions.

KKT conditions for convex problems

If $f_i$ are convex and $h_i$ are affine, and $\tilde{x}, λ, \tilde{ν}$ are any points that satisfy the KKT conditions, then $\tilde{x}$ and $(λ,\tilde{ν})$ are primal and dual optimal, with zero duality gap.

5.5.4 Mechanics interpretation of KKT conditions

5.5.5 Solving the primal problem via the dual

If strong duality holds and a dual optimal solution $λ^⋆,ν^⋆)$ exists, then any primal optimal point is also a minimizer of $L(x, λ^⋆, ν^⋆)$ .

5.6 Perturbation and sensitivity analysis

5.6.1 The perturbed problem

Consider the following perturbed version of the original optimization problem
$\begin{aligned} {\rm minimize} \ \ \ \ & f_0(x)\\ {\rm subject \ to} \ \ \ \ & f_i(x)\leq u_i,i=1,...m \\ & h_i(x)=v_i,i=1,...p \end{aligned}$

5.6.2 A global inequality

Let $λ^⋆,ν^⋆)$ be optimal for the dual of the unperturbed problem. Then for all $u$ and $\nu$ we have
$p^*(u,v)\geq p^*(0,0)-\lambda^{*T}u-\nu^{*T}$

5.6.3 Local sensitivity analysis

Provided strong duality holds, the optimal dual variables $λ^⋆, \nu^⋆$ are related to the gradient of $p^⋆$ at
$\lambda^*_i=-\frac{\partial p^*(0,0)}{\partial u_i}, \ \ \ \ \nu^*_i=-\frac{\partial p^*(0,0)}{\partial v_i}$
If $λ^⋆_i$ is small, it means that the constraint can be loosened or tightened a bit without much effect on the optimal value; if $λ^⋆_i$ is large, it means that if the constraint is loosened or tightened a bit, the effect on the optimal value will be great.

5.7 Examples

5.7.1 Introducing new variables and equality constraints

5.7.2 Transforming the objective

5.7.3 Implicit constraints

5.8 Theorems of alternatives

5.8.1 Weak alternatives via the dual function

We can apply Lagrange duality theory to the problem of determining feasibility of a system of inequalities and equalities
$\begin{aligned} {\rm minimize} \ \ \ \ & 0\\ {\rm subject \ to} \ \ \ \ & f_i(x)\leq0,i=1,...m \\ & h_i(x)=0,i=1,...p \end{aligned}$
This problem has optimal value
$p^*=\left\{ \begin{array}{rcl} & 0 , & {\rm feasible} \\ & \infty ,& {\rm infeasible} \end{array} \right.$
Two systems of inequalities (and equalities) are called weak alternatives if at most one of the two is feasible.

5.8.2 Strong alternatives

When the original inequality system is convex and some type of constraint qualification holds, then the pairs of weak alternatives described above are strong alternatives, which means that exactly one of the two alternatives holds.

5.8.3 Examples

5.9 Generalized inequalities

Lagrange duality extends to a problem with generalized inequality constraints
$\begin{aligned} {\rm minimize} \ \ \ \ & f_0(x)\\ {\rm subject \ to} \ \ \ \ & f_i(x)\preceq_{K_i}0,i=1,...m \\ & h_i(x)=0,i=1,...p \end{aligned}$

5.9.1 The Lagrange dual

The Lagrange dual optimization problem is
$\begin{aligned} {\rm maximize} \ \ \ \ & g(\lambda,\nu)\\ {\rm subject \ to} \ \ \ \ & \lambda_i \succeq_{K_i^*} b \\ \end{aligned}$
where $K_i^∗$ denotes the dual cone of $K_i$ .

5.9.2 Optimality conditions

Complementary slackness

We can conclude that
$\lambda_i^*\succ_{K_i}0\Longrightarrow f_i(x^*)=0, \ \ \ \ f_i(x^*)\prec_{K_i}0\Longrightarrow \lambda_i^*=0$
However, in contrast to problems with scalar inequalities, it is possible to satisfy with $λ^⋆_i \ne 0$ and $f_i(x^⋆) \ne 0$ .

KKT conditions

Now we add the assumption that the functions $f_i, h_i$ are differentiable, and generalize the KKT conditions to problems with generalized inequalities:
$\begin{aligned} f_i(x^⋆) & \preceq_{K_i} 0, \ \ i=1,...,m \\ h_i(x^⋆) &=0,\ \ i=1,...,p \\ \lambda_i^* & \succeq_{K_i} 0,\ \ i=1,...,m \\ \lambda_i^*f_i(x^⋆)&=0,\ \ i=1,...,m \\ \nabla f_0(x^⋆)+\sum_{i=1}^m Df_i(x^⋆)^T\lambda_i^*+\sum_{i=1}^p\nu_i^*h_i(x^⋆)&=0 \end{aligned}$
where $Df_i(x^⋆)\in \mathbf{R}^{k_i\times n}$ is the derivative of $f_i$ evaluated at $x^⋆$ .

5.9.3 Perturbation and sensitivity analysis

We consider the associated perturbed version of the problem
$\begin{aligned} {\rm minimize} \ \ \ \ & f_0(x)\\ {\rm subject \ to} \ \ \ \ & f_i(x)\preceq_{K_i} u_i,i=1,...m \\ & h_i(x)=v_i,i=1,...p \end{aligned}$

5.9.4 Theorems of alternatives

We can derive theorems of alternatives for systems of generalized inequalities and equalities
$f_i(x)\preceq_{K_i}0,i=1,...m \\ h_i(x)=0,i=1,...p$
where $K_i ⊆ \mathbf{R}^k_i$ are proper cones. We will also consider systems with strict inequalities,
$f_i(x)\prec_{K_i}0,i=1,...m \\ h_i(x)=0,i=1,...p$