凸优化基础知识-强凸性&Hessian矩阵的上下限


9.1 Unconstrained minimization problems

In this chapter, we discuss methods for solving the unconstrained optimization problem
P 9.1 :   f ( x ) P9.1: ~f(x) P9.1: f(x) where f : R n → R f : R_n → R f:RnR is convex and twice continuously differentiable (which implies that domf is open). We denote the optimal value, inf ⁡ x f ( x ) = f ( x ⋆ ) \inf_x f(x) = f(x^⋆) infxf(x)=f(x), as p ⋆ p^⋆ p.

Since f f f is differentiable and convex, a necessary and sufficient condition for a point x ⋆ x^⋆ x to be optimal is
P 9.2 :   ∇ f ( x ∗ ) = 0. P9.2: ~ \nabla f (x^*) = 0. P9.2: f(x)=0. Thus, solving the unconstrained minimization problem ( 9.1 ) (9.1) (9.1) is the same as finding a solution of ( 9.2 ) (9.2) (9.2), which is a set of n equations in the n variables x 1 , . . . , x n x_1 ,...,x_n x1,...,xn. In a few special cases, we can find a solution to the problem ( 9.1 ) (9.1) (9.1) by
analytically solving the optimality equation ( 9.2 ) (9.2) (9.2), but usually the problem must be solved by an iterative algorithm. By this, we mean an algorithm that computes a sequence of points x ( 0 ) , x ( 1 ) , . . . ∈ d o m f x^{(0)}, x^{(1)},...\in \mathbf{dom} f x(0),x(1),...domf with f ( x ( k ) ) → p ⋆ f(x (k) ) → p^⋆ f(x(k))p as k → ∞ k → ∞ k.


9.1.2 Strong Convexity and implications

Lower bound on ∇ 2 f ( x ) \nabla^2 f(x) 2f(x) (Hessian Matrix)

We assume that the obejctive function is strongly convex on S \mathcal{S} S, which means there exists an m > 0 m>0 m>0 such that P 9.7 :   ∇ 2 f ( x ) ⪰ m I P9.7:~ \nabla^2 f(x) \succeq m \mathbf{I} P9.7: 2f(x)mI for all x ∈ S x\in\mathcal{S} xS.

Strong convexity has some interesting consequences. For x , y ∈ S x,y \in \mathcal{S} x,yS, we have f ( y ) = f ( x ) + ∇ f ( x ) T ( y − x ) + 1 2 ( y − x ) T ∇ 2 f ( z ) ( y − z ) , f(y) = f(x) + \nabla f(x)^T (y-x) + \frac{1}{2}(y-x)^T\nabla^2 f(z) (y-z), f(y)=f(x)+f(x)T(yx)+21(yx)T2f(z)(yz), for some z z z on the line segment [ x , y ] [x,y] [x,y].

By the strong convexity assumption ( 9.7 ) (9.7) (9.7), the last term on the righthand side is at least m 2 ∥ y − x ∥ 2 2 \frac{m}{2}\|y-x\|_2^2 2myx22, so we have the inequality P 9.8 :   f ( y ) ≥ f ( x ) + ∇ f ( x ) T ( y − x ) + m 2 ∥ y − x ∥ 2 2 P9.8:~f(y) \geq f(x) + \nabla f(x)^T (y-x) + \frac{m}{2}\|y-x\|_2^2 P9.8: f(y)f(x)+f(x)T(yx)+2myx22 for all x x x and y y y in S \mathcal{S} S. When m = 0 m=0 m=0, we recover the basic inequality characterizing convexity; for m > 0 m>0 m>0, we obtain a better lower bound on f ( y ) f(y) f(y) than follows from convexity alone.

Then, we will show the inequality (9.8) can be used to bound f ( x ) − p ∗ f(x)-p^* f(x)p, which is the suboptimality of the point x x x, in terms of ∥ ∇ f ( x ) ∥ 2 \|\nabla f(x) \|_2 f(x)2. the righthand side of (9.8) is a convex quadratic function of y y y (for fixed x). Setting the gradient with respect to y y y equal to zeros, we find that y ~ = x − 1 m ∇ f ( x ) \tilde{y} = x - \frac{1}{m}\nabla f(x) y~=xm1f(x) minimizes the righthand side. Therefore we have f ( y ) ≥ f ( x ) + ∇ f ( x ) T ( y − x ) + 1 m ∥ y − x ∥ 2 2 ≥ f ( x ) + ∇ f ( x ) T ( y ~ − x ) + 1 m ∥ y ~ − x ∥ 2 2 = f ( x ) − 1 2 m ∥ ∇ f ( x ) ∥ 2 2 \begin{aligned} f(y) & \geq f(x) + \nabla f(x)^T(y-x) + \frac{1}{m} \|y-x\|_2^2 \\ & \geq f(x) + \nabla f(x)^T(\tilde{y}-x) + \frac{1}{m} \|\tilde{y}-x\|_2^2 \\ &=f(x) -\frac{1}{2m}\|\nabla f(x) \|_2^2 \end{aligned} f(y)f(x)+f(x)T(yx)+m1yx22f(x)+f(x)T(y~x)+m1y~x22=f(x)2m1f(x)22 Since this holds fpr any y ∈ S y \in S yS, we have
P 9.9 :   p ∗ ≥ f ( x ) − 1 2 m ∥ ∇ f ( x ) ∥ 2 2 , P9.9: ~p^* \geq f(x) - \frac{1}{2m} \|\nabla f(x) \|_2^2, P9.9: pf(x)2m1f(x)22, which can be rewritten as
∥ f ( x ) − p ∗ ∥ 2 ≤ 1 2 m ∥ ∇ f ( x ) ∥ 2 2 . \| f(x) - p^*\|_2 \leq \frac{1}{2m} \|\nabla f(x) \|_2^2. f(x)p22m1f(x)22.
We can also derive a bound on ∥ x − x ∗ ∥ 2 , \|x-x^*\|_2, xx2, the distance between x x x and any optimal point x ∗ x^* x, in terms of ∥ ∇ f ( x ) ∥ 2 : \| \nabla f(x) \|_2: f(x)2:
P 9.11 :   ∥ x − x ∗ ∥ ≤ 2 m ∥ ∇ f ( x ) ∥ 2 . P9.11:~\| x - x^*\| \leq \frac{2}{m}\| \nabla f(x) \|_2. P9.11: xxm2f(x)2.

To see this, we apply (9.8) with y = x ∗ y = x^* y=x to obtain
p ∗ = f ( x ∗ ) ≥ f ( x ) + ∇ f ( x ) T ( x ∗ − x ) + m 2 ∥ x ∗ − x ∥ 2 2 ≥ f ( x ) + ∥ ∇ f ( x ) ∥ 2 ∥ x ∗ − x ∥ 2 + m 2 ∥ x ∗ − x ∥ 2 2 , \begin{aligned} p^* = f(x^*) & \geq f(x) + \nabla f(x)^T (x^* - x) + \frac{m}{2} \| x^* - x \|_2^2 \\ & \geq f(x) + \| \nabla f(x) \|_2 \| x^* - x\|_2 + \frac{m}{2} \| x^* - x \|_2^2, \end{aligned} p=f(x)f(x)+f(x)T(xx)+2mxx22f(x)+f(x)2xx2+2mxx22, where we use the Cauchy-Schwarz inequality in the second inequality, < x ∗ , x > + ∥ x ∗ − x ∥ ≥ 0 <x^*,x>+\|x^*-x\| \geq 0 <x,x>+xx0. Since p ∗ ≤ f ( x ) p^* \leq f(x) pf(x), we must have
− ∥ ∇ f ( x ) ∥ 2 ∥ x ∗ − x ∥ 2 + m 2 ∥ x ∗ − x ∥ 2 2 ≤ 0 , -\|\nabla f(x) \|_2 \| x^* -x \|_2 + \frac{m}{2}\| x^* - x \|_2^2 \leq 0, f(x)2xx2+2mxx220, from which ( 9.11 ) (9.11) (9.11) follows.

Upper bound on ∇ 2 f ( x ) \nabla^2 f(x) 2f(x) (Hessian Matrix)

The inequality (9.8) implies that the sublevel sets contained in S S S are bounded, so in particular, S S S is bounded. Therefore the maximum eigenvalue of ∇ 2 f ( x ) \nabla^2 f(x) 2f(x), which is a continuous function of x x x on S S S, is bounded above on S S S, i.e., there exists a constant M M M such that ∇ 2 f ( x ) ⪯ M I \nabla^2 f(x) \preceq M \mathbf{I} 2f(x)MI for all x ∈ S x \in \mathcal{S} xS. This upper bound on the Hessian implies for any x , y ∈ S , x,y \in \mathcal{S}, x,yS,
f ( y ) ≤ f ( x ) + ∇ f ( x ) T ( y − x ) + M 2 ∥ y − x ∥ 2 2 , f(y) \leq f(x) + \nabla f(x)^T (y-x) + \frac{M}{2}\|y-x\|_2^2, f(y)f(x)+f(x)T(yx)+2Myx22, which is analogous to ( 9.8 ) (9.8) (9.8). Minimizing each side over y y y yieldes
p ∗ ≤ f ( x ) − 1 2 M ∥ ∇ f ( x ) ∥ 2 2 p^* \leq f(x) - \frac{1}{2M} \| \nabla f(x) \|_2^2 pf(x)2M1f(x)22 which can be rewritten as ∥ f ( x ) − p ∗ ∥ ≥ 1 2 M ∥ ∇ f ( x ) ∥ 2 2 , \| f(x)-p^*\| \geq \frac{1}{2M}\| \nabla f(x) \|_2^2, f(x)p2M1f(x)22, the counterpart to ( 9.9 ) . (9.9). (9.9).


评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值