凸优化基础知识-强凸性&Hessian矩阵的上下限

最新推荐文章于 2023-08-09 19:11:42 发布

MadJieJie

最新推荐文章于 2023-08-09 19:11:42 发布

阅读量1.2k

点赞数

分类专栏： Convex Optimization 文章标签：无约束优化强凸性 Hessian矩阵梯度不等式最优解

若有帮助，请点赞&收藏，转载请标注出处。

本文链接：https://blog.csdn.net/madjiejie/article/details/118974222

版权

Convex Optimization 专栏收录该内容

6 篇文章 14 订阅

订阅专栏

9.1 Unconstrained minimization problems

In this chapter, we discuss methods for solving the unconstrained optimization problem
$P 9.1 : f (x)$ where $f : R_n → R$ is convex and twice continuously differentiable (which implies that domf is open). We denote the optimal value, $inf_x f(x) = f(x^⋆)$ , as $p^⋆$ .

Since $f$ is differentiable and convex, a necessary and sufficient condition for a point $x^⋆$ to be optimal is
$\nabla f (x^*) = 0.$ Thus, solving the unconstrained minimization problem $(9.1)$ is the same as finding a solution of $(9.2)$ , which is a set of n equations in the n variables $x_1 ,...,x_n$ . In a few special cases, we can find a solution to the problem $(9.1)$ by
analytically solving the optimality equation $(9.2)$ , but usually the problem must be solved by an iterative algorithm. By this, we mean an algorithm that computes a sequence of points $x^{(0)}, x^{(1)},...\in \mathbf{dom} f$ with $f(x (k) ) → p^⋆$ as $k \to \infty$ .

9.1.2 Strong Convexity and implications

Lower bound on $\nabla^2 f(x)$ (Hessian Matrix)

We assume that the obejctive function is strongly convex on $\mathcal{S}$ , which means there exists an $m > 0$ such that $\nabla^2 f(x) \succeq m \mathbf{I}$ for all $x\in\mathcal{S}$ .

Strong convexity has some interesting consequences. For $\in \mathcal{S}$ , we have $\nabla f(x)^T (y-x) + \frac{1}{2}(y-x)^T\nabla^2 f(z) (y-z),$ for some $z$ on the line segment $[x, y]$ .

By the strong convexity assumption $(9.7)$ , the last term on the righthand side is at least $\frac{m}{2}\|y-x\|_2^2$ , so we have the inequality $\geq f(x) + \nabla f(x)^T (y-x) + \frac{m}{2}\|y-x\|_2^2$ for all $x$ and $y$ in $\mathcal{S}$ . When $m = 0$ , we recover the basic inequality characterizing convexity; for $m > 0$ , we obtain a better lower bound on $f (y)$ than follows from convexity alone.

Then, we will show the inequality (9.8) can be used to bound $f(x)-p^*$ , which is the suboptimality of the point $x$ , in terms of $\|\nabla f(x) \|_2$ . the righthand side of (9.8) is a convex quadratic function of $y$ (for fixed x). Setting the gradient with respect to $y$ equal to zeros, we find that $\tilde{y} = x - \frac{1}{m}\nabla f(x)$ minimizes the righthand side. Therefore we have $\begin{aligned} f(y) & \geq f(x) + \nabla f(x)^T(y-x) + \frac{1}{m} \|y-x\|_2^2 \\ & \geq f(x) + \nabla f(x)^T(\tilde{y}-x) + \frac{1}{m} \|\tilde{y}-x\|_2^2 \\ &=f(x) -\frac{1}{2m}\|\nabla f(x) \|_2^2 \end{aligned}$ Since this holds fpr any $\in S$ , we have
$~p^* \geq f(x) - \frac{1}{2m} \|\nabla f(x) \|_2^2,$ which can be rewritten as
$\| f(x) - p^*\|_2 \leq \frac{1}{2m} \|\nabla f(x) \|_2^2.$
We can also derive a bound on $x-x^*\|_2,$ the distance between $x$ and any optimal point $x^*$ , in terms of $\| \nabla f(x) \|_2:$
$P9.11:~\| x - x^*\| \leq \frac{2}{m}\| \nabla f(x) \|_2.$

To see this, we apply (9.8) with $y = x^*$ to obtain
$\begin{aligned} p^* = f(x^*) & \geq f(x) + \nabla f(x)^T (x^* - x) + \frac{m}{2} \| x^* - x \|_2^2 \\ & \geq f(x) + \| \nabla f(x) \|_2 \| x^* - x\|_2 + \frac{m}{2} \| x^* - x \|_2^2, \end{aligned}$ where we use the Cauchy-Schwarz inequality in the second inequality, $<x^*,x>+\|x^*-x\| \geq 0$ . Since $p^* \leq f(x)$ , we must have
$-\|\nabla f(x) \|_2 \| x^* -x \|_2 + \frac{m}{2}\| x^* - x \|_2^2 \leq 0,$ from which $(9.11)$ follows.

Upper bound on $\nabla^2 f(x)$ (Hessian Matrix)

The inequality (9.8) implies that the sublevel sets contained in $S$ are bounded, so in particular, $S$ is bounded. Therefore the maximum eigenvalue of $\nabla^2 f(x)$ , which is a continuous function of $x$ on $S$ , is bounded above on $S$ , i.e., there exists a constant $M$ such that $\nabla^2 f(x) \preceq M \mathbf{I}$ for all $\in \mathcal{S}$ . This upper bound on the Hessian implies for any $\in \mathcal{S},$
$\leq f(x) + \nabla f(x)^T (y-x) + \frac{M}{2}\|y-x\|_2^2,$ which is analogous to $(9.8)$ . Minimizing each side over $y$ yieldes
$p^* \leq f(x) - \frac{1}{2M} \| \nabla f(x) \|_2^2$ which can be rewritten as $\| f(x)-p^*\| \geq \frac{1}{2M}\| \nabla f(x) \|_2^2,$ the counterpart to $(9.9) .$