First Order Methods in Optimization Ch5. Smoothness and Strong Convexity

最新推荐文章于 2023-07-04 16:29:25 发布

Learner Hu

最新推荐文章于 2023-07-04 16:29:25 发布

阅读量2.1k

点赞数 3

分类专栏： FOM in Optimization

原文链接：https://download.csdn.net/download/m0_37854871/11562555

版权

FOM in Optimization 专栏收录该内容

10 篇文章 69 订阅

订阅专栏

第五章: 光滑性与强凸性

文章目录

第五章: 光滑性与强凸性

1. $L$ -光滑性

定义1 ( $L$ -光滑性) 设 $L\ge0$ . 我们称函数 $f:\mathbb{E}\to(-\infty,\infty]$ 在 $D\subset\mathbb{E}$ 上是 $L$ -光滑的, 若它在 $D$ 上可微且满足 $\Vert\nabla f(\mathbf{x})-\nabla f(\mathbf{y})\Vert_*\le L\Vert\mathbf{x-y}\Vert,\quad\forall\mathbf{x,y}\in D.$ 常数 $L$ 称作是光滑参数 (smoothness parameter). 从定义我们看出 $L$ 是与所选的范数有关的. 因此我们有时会刻意地说成, 在范数 $\Vert\cdot\Vert$ 下的光滑参数.

显然由可微的定义, 若 $f$ 在集合 $D\subset\mathbb{E}$ 上 $L$ -光滑, 则必有 $D\subset\mathrm{int(dom}f)$ ; 若函数在全空间 $\mathbb{E}$ 上 $L$ -光滑, 则我们就直接称其为 $L$ -光滑函数. 在许多其他的文献中, $D$ 上的 $L$ -光滑函数也常被称作“Lipschitz常数为 $L$ 的梯度Lipschitz连续函数”. 我们记 $D$ 上的 $L$ -光滑函数全体为 $C_L^{1,1}(D)$ . 当 $D=\mathbb{E}$ , 就简写为 $C_L^{1,1}$ . 进一步, 记 $C^{1,1}=\left\{f:\exists L\ge0,\,\text{s.t. }f\in C_L^{1,1}\right\}.$ 由 $L$ -光滑性的定义, 显然有 $C_{L_1}^{1,1}\subset C_{L_2}^{1,1},\,L_2\ge L_1$ . 因此对某一给定函数, 使其 $L$ -光滑的参数 $L$ 不止一个. 这些 $L$ 的全体组成的集合显然有下界 $0$ , 从而必定有下确界. 但是定出这一给定函数的最小光滑参数则是一件不平凡的事, 也是一件有趣的事.

例1 (二次函数的光滑性) 考虑函数 $f:\mathbb{R}^n\to\mathbb{R}$ 定义为 $f(\mathbf{x})=\frac{1}{2}\mathbf{x}^T\mathbf{Ax}+\mathbf{b}^T\mathbf{x}+c$ , 其中 $\mathbf{A}\in\mathbb{S}^n,\,\mathbf{b}\in\mathbb{R}^n,\,c\in\mathbb{R}$ . 假定 $\mathbb{R}^n$ 上赋以了 $\ell_p$ -范数( $1\le p\le\infty$ ). 则对 $\forall\mathbf{x,y}\in\mathbb{R}^n$ , $\Vert\nabla f(\mathbf{x})-\nabla f(\mathbf{y})\Vert_q=\Vert\mathbf{Ax-Ay}\Vert_q\le\Vert A\Vert_{p,q}\Vert\mathbf{x-y}\Vert_p,$ ¹这里 $q\in[1,\infty]: \frac{1}{p}+\frac{1}{q}=1$ . 于是我们得到 $f$ 是 $\Vert\mathbf{A}\Vert_{p,q}$ -光滑的. 下面我们证明 $\Vert\mathbf{A}\Vert_{p,q}$ 是 $f$ 的最小光滑参数. 为此只需证明对任一使 $f$ $L$ -光滑的 $L$ 都有 $\Vert\mathbf{A}\Vert_{p,q}\le L$ . 取向量 $\tilde\mathbf{x}:\Vert\tilde\mathbf{x}\Vert_p=1,\,\Vert\mathbf{A\tilde x}\Vert_q=\Vert\mathbf{A}\Vert_{p,q}$ ². 于是 $\Vert\mathbf{A}\Vert_{p,q}=\Vert\mathbf{A\tilde x}\Vert_q=\Vert\nabla f(\tilde\mathbf{x})-\nabla f(\mathbf{0})\Vert_q\le L\Vert\mathbf{\tilde x-0}\Vert_p=L.$

例2 (线性函数的 $0$ -光滑性) 设 $f:\mathbb{E}\to\mathbb{R}$ 定义为 $f(\mathbf{x})=\langle\mathbf{b,x}\rangle+c$ , 其中 $\mathbf{b}\in\mathbb{E}^*,\,c\in\mathbb{R}$ . 对 $\forall\mathbf{x,y}\in\mathbb{E}$ , $\Vert\nabla f(\mathbf{x})-\nabla f(\mathbf{y})\Vert_*=\Vert\mathbf{b-b}\Vert_*=0\le0\Vert\mathbf{x-y}\Vert.$ 从而线性函数都是 $0$ -光滑的. $0$ 显然也是它们的最小光滑参数. 注意这一结论对任何范数都成立.

下面的例3与例4要用到正交投影算子的严格非增大性质(firm nonexpansiveness)与非增大性质(nonexpansiveness). 我们在此仅将其列出. 在下一章中我们将证明更一般化的结论.

定理1 设 $\mathbb{E}$ 为欧式空间, $C\subset\mathbb{E}$ 为一非空闭凸集. 于是
(i) (严格非增大性) 对 $\forall\mathbf{v,w}\in\mathbb{E}$ , $\langle P_C(\mathbf{v})-P_C(\mathbf{w}),\mathbf{v-w}\rangle\ge\Vert P_C(\mathbf{v})-P_C(\mathbf{w})\Vert^2.$ (ii) (非增大性) 对 $\forall\mathbf{v,w}\in\mathbb{E}$ , $\Vert P_C(\mathbf{v})-P_C(\mathbf{w})\Vert\le\Vert\mathbf{v-w}\Vert.$

例3 ( $\frac{1}{2}d_C^2$ 的 $1$ -光滑性) 设 $\mathbb{E}$ 为欧式空间, $C\subset\mathbb{E}$ 为一非空闭凸集. 考虑函数 $\varphi_C(\mathbf{x})=\frac{1}{2}d_C^2(\mathbf{x})$ . 由第三章例9, $\varphi_C$ 在全空间上可微且 $\nabla\varphi_C(\mathbf{x})=\mathbf{x}-P_C(\mathbf{x})$ . 下面证明 $\varphi_C$ 是 $1$ -光滑的. 对 $\forall\mathbf{x,y}\in\mathbb{E}$ , $\begin{aligned}\Vert\nabla\varphi_C(\mathbf{x})-\nabla\varphi_C(\mathbf{y})\Vert^2&=\Vert\mathbf{x-y}-P_C(\mathbf{x})+P_C(\mathbf{y})\Vert^2\\&=\Vert\mathbf{x-y}\Vert^2-2\langle P_C(\mathbf{x})-P_C(\mathbf{y}),\mathbf{x-y}\rangle+\Vert P_C(\mathbf{x})-P_C(\mathbf{y})\Vert^2\\&\le\Vert\mathbf{x-y}\Vert^2-2\Vert P_C(\mathbf{x})-P_C(\mathbf{y})\Vert^2+\Vert P_C(\mathbf{x})-P_C(\mathbf{y})\Vert^2\:(\because 严格非增大性)\\&=\Vert\mathbf{x-y}\Vert^2-\Vert P_C(\mathbf{x})-P_C(\mathbf{y})\Vert^2\\&=\Vert\mathbf{x-y}\Vert^2.\end{aligned}$

例4 ( $\frac{1}{2}\Vert\cdot\Vert^2-\frac{1}{2}d_C^2$ 的 $1$ -光滑性) 设 $\mathbb{E}$ 为欧式空间, $C\subset\mathbb{E}$ 为一非空闭凸集. 考虑函数 $\psi_C(\mathbf{x})=\frac{1}{2}\Vert\mathbf{x}\Vert^2-\frac{1}{2}d_C^2(\mathbf{x})$ . 由第二章的例5, $\psi_C$ 是凸函数³. 由上例, $\frac{1}{2}d_C^2(\mathbf{x})$ 可微, 且梯度为 $\mathbf{x}-P_C(\mathbf{x})$ . 因此 $\nabla\psi_C(\mathbf{x})=\mathbf{x}-(\mathbf{x}-P_C(\mathbf{x}))=P_C(\mathbf{x}).$ 于是由投影算子的非增大性, 对 $\forall\mathbf{x,y}\in\mathbb{E}$ , $\Vert\nabla\psi_C(\mathbf{x})-\nabla\psi_C(\mathbf{y})\Vert=\Vert P_C(\mathbf{x})-P_C(\mathbf{y})\Vert\le\Vert\mathbf{x-y}\Vert.$

1.1 下降引理

下面的下降引理告诉我们, $L$ -光滑函数以某一二次函数为上界.

引理1 (下降引理) 设 $f:\mathbb{E}\to(-\infty,\infty]$ 为在一给定凸集 $D$ 上的 $L$ -光滑函数 $(L\ge0)$ . 则对 $\forall\mathbf{x,y}\in D$ , $f(\mathbf{y})\le f(\mathbf{x})+\langle\nabla f(\mathbf{x}),\mathbf{y-x}\rangle+\frac{L}{2}\Vert\mathbf{x-y}\Vert^2.$ ⁴
证明: 根据微积分基本定理, $f(\mathbf{y})-f(\mathbf{x})=\int_0^1\langle\nabla f(\mathbf{x}+t(\mathbf{y-x})),\mathbf{y-x}\rangle\,\mathrm{d}t.$ 因此, $f(\mathbf{y})-f(\mathbf{x})=\langle\nabla f(\mathbf{x}),\mathbf{y-x}\rangle+\int_0^1\langle\nabla f(\mathbf{x}+t(\mathbf{y-x}))-\nabla f(\mathbf{x}),\mathbf{y-x}\rangle\,\mathrm{d}t.$ 取模可得 $\begin{aligned}|f(\mathbf{y})-f(\mathbf{x})-\langle\nabla f(\mathbf{x}),\mathbf{y-x}\rangle|&=\left|\int_0^1\langle\nabla f(\mathbf{x}+t(\mathbf{y-x}))-\nabla f(\mathbf{x}),\mathbf{y-x}\rangle\,\mathrm{d}t\right|\\&\le\int_0^1|\langle\nabla f(\mathbf{x}+t(\mathbf{y-x}))-\nabla f(\mathbf{x}),\mathbf{y-x}\rangle|\,\mathrm{d}t\\&\le\int_0^1\Vert\nabla f(\mathbf{x}+t(\mathbf{y-x}))-\nabla f(\mathbf{x})\Vert_*\cdot\Vert\mathbf{y-x}\Vert\,\mathrm{d}t\\&\le\int_0^1tL\Vert\mathbf{y-x}\Vert^2\,\mathrm{d}t\\&=\frac{L}{2}\Vert\mathbf{y-x}\Vert^2.\end{aligned}$

1.2 $L$ -光滑函数的一阶等价刻画

当 $f$ 为凸函数⁵时, 下面的定理2给出了全空间上⁶ $L$ -光滑函数的几种不同的一阶等价刻画. 值得注意的是, 在这种情形下, 1.1节中的下降引理同时也是使 $f$ 成为 $L$ -光滑函数的充分条件.

定理2 ( $L$ -光滑函数的一阶等价刻画) 设 $f:\mathbb{E}\to\mathbb{R}$ 为一可微凸函数. 设 $L > 0$ . 于是下面的四件事是等价的:
(i) $f$ 是 $L$ -光滑的;
(ii) $f(\mathbf{y})\le f(\mathbf{x})+\langle\nabla f(\mathbf{x}),\mathbf{y-x}\rangle+\frac{L}{2}\Vert\mathbf{x-y}\Vert^2,\,\forall\mathbf{x,y}\in\mathbb{E}$ ; (此即下降引理)
(iii) $f(\mathbf{y})\ge f(\mathbf{x})+\langle\nabla f(\mathbf{x}),\mathbf{y-x}\rangle+\frac{1}{2L}\Vert\nabla f(\mathbf{x})-\nabla f(\mathbf{y})\Vert_*^2,\,\forall\mathbf{x,y}\in\mathbb{E}$ ;
(iv) $\langle\nabla f(\mathbf{x})-\nabla f(\mathbf{y}),\mathbf{x-y}\rangle\ge\frac{1}{L}\Vert\nabla f(\mathbf{x})-\nabla f(\mathbf{y})\Vert_*^2,\,\forall\mathbf{x,y}\in\mathbb{E}$ ;
(v) $f(\lambda\mathbf{x}+(1-\lambda)\mathbf{y})\ge\lambda f(\mathbf{x})+(1-\lambda)f(\mathbf{y})-\frac{L}{2}\lambda(1-\lambda)\Vert\mathbf{x-y}\Vert^2,\,\forall\mathbf{x,y}\in\mathbb{E},\,\lambda\in[0,1]$ .

证明: (i) $\Rightarrow$ (ii): 由下降引理即可得.
(ii) $\Rightarrow$ (iii): 假设(ii)成立. 注意到当 $\nabla f(\mathbf{x})=\nabla f(\mathbf{y})$ 时(iii)显然成立. 所以下面假设 $\nabla f(\mathbf{x})\ne\nabla f(\mathbf{y})$ . 固定 $\mathbf{x}\in\mathbb{E}$ , 考虑 $g_{\mathbf{x}}(\mathbf{y})=f(\mathbf{y})-f(\mathbf{x})-\langle\nabla f(\mathbf{x}),\mathbf{y-x}\rangle,\quad\mathbf{y}\in\mathbb{E}.$ ⁷可以验证 $g_{\mathbf{x}}$ 也满足(ii). 事实上, 对 $\forall\mathbf{y,z}\in\mathbb{E}$ , $\begin{aligned}g_{\mathbf{x}}(\mathbf{z})&=f(\mathbf{z})-f(\mathbf{x})-\langle\nabla f(\mathbf{x}),\mathbf{z-x}\rangle\\&\le f(\mathbf{y})+\langle\nabla f(\mathbf{y}),\mathbf{z-y}\rangle+\frac{L}{2}\Vert\mathbf{z-y}\Vert^2-f(\mathbf{x})-\langle\nabla f(\mathbf{x}),\mathbf{z-x}\rangle\\&=f(\mathbf{y})-f(\mathbf{x})-\langle\nabla f(\mathbf{x}),\mathbf{y-x}\rangle+\langle\nabla f(\mathbf{y})-\nabla f(\mathbf{x}),\mathbf{z-y}\rangle+\frac{L}{2}\Vert\mathbf{z-y}\Vert^2\\&=g_{\mathbf{x}}(\mathbf{y})+\langle\nabla g_{\mathbf{x}}(\mathbf{y}),\mathbf{z-y}\rangle+\frac{L}{2}\Vert\mathbf{z-y}\Vert^2,\end{aligned}$ 注意到 $\nabla g_{\mathbf{x}}(\mathbf{x})=\mathbf{0}$ , 再由 $g_{\mathbf{x}}$ 的凸性, 即得 $\mathbf{x}$ 为 $g$ 的极小点: $g_{\mathbf{x}}(\mathbf{x})\le g_{\mathbf{x}}(\mathbf{z}),\quad\forall\mathbf{z}\in\mathbb{E}.$ 对 $\mathbf{y}\in\mathbb{E}$ , 令 $\mathbf{v}\in\mathbb{E}$ 为满足 $\Vert\mathbf{v}\Vert=1,\,\langle\nabla g_{\mathbf{x}}(\mathbf{y}),\mathbf{v}\rangle=\Vert\nabla g_{\mathbf{x}}(\mathbf{y})\Vert_*$ 的向量. 令 $\mathbf{z}=\mathbf{y}-\frac{\Vert\nabla g_{\mathbf{x}}(\mathbf{y})\Vert_*}{L}\mathbf{v}$ 就有 $0=g_{\mathbf{x}}(\mathbf{x})\le g_{\mathbf{x}}\left(\mathbf{y}-\frac{\Vert\nabla g_{\mathbf{x}}(\mathbf{y})\Vert_*}{L}\mathbf{v}\right).$ 再利用 $g_{\mathbf{x}}$ 的性质可得 $\begin{aligned}0&=g_{\mathbf{x}}(\mathbf{x})\\&\le g_{\mathbf{x}}(\mathbf{y})-\frac{\Vert\nabla g_{\mathbf{x}}(\mathbf{y})\Vert_*}{L}\langle\nabla g_{\mathbf{x}}(\mathbf{y}),\mathbf{v}\rangle+\frac{1}{2L}\Vert g_{\mathbf{x}}(\mathbf{y})\Vert_*^2\cdot\Vert\mathbf{v}\Vert^2\\&=g_{\mathbf{x}}(\mathbf{y})-\frac{1}{2L}\Vert\nabla g_{\mathbf{x}}(\mathbf{y})\Vert^2_*\\&=f(\mathbf{y})-f(\mathbf{x})-\langle\nabla f(\mathbf{x}),\mathbf{y-x}\rangle-\frac{1}{2L}\Vert\nabla f(\mathbf{x})-\nabla f(\mathbf{y})\Vert^2_*,\end{aligned}$ 这就证明了(iii).
(iii) $\Rightarrow$ (iv): 假设(iii)成立, 则对 $(\mathbf{x,y})$ 交替地有 $\begin{aligned}f(\mathbf{y})&\ge f(\mathbf{x})+\langle\nabla f(\mathbf{x}),\mathbf{y-x}\rangle+\frac{1}{2L}\Vert\nabla f(\mathbf{x})-\nabla f(\mathbf{y})\Vert^2_*,\\f(\mathbf{x})&\ge f(\mathbf{y})+\langle\nabla f(\mathbf{y}),\mathbf{x-y}\rangle+\frac{1}{2L}\Vert\nabla f(\mathbf{x})-\nabla f(\mathbf{y})\Vert^2_*.\end{aligned}$ 两式相加即得(iv).
(iv) $\Rightarrow$ (i): 假设(iv)成立. 不妨假设 $\nabla f(\mathbf{x})\ne\nabla f(\mathbf{y})$ . 由推广的Cauchy-Schwarz不等式, 对 $\forall\mathbf{x,y}\in\mathbb{E}$ , $\Vert\nabla f(\mathbf{x})-\nabla f(\mathbf{y})\Vert_*\cdot\Vert\mathbf{x-y}\Vert\ge\langle\nabla f(\mathbf{x})-\nabla f(\mathbf{y}),\mathbf{x-y}\rangle\ge\frac{1}{L}\Vert\nabla f(\mathbf{x})-\nabla f(\mathbf{y})\Vert^2_*.$ 两边同除 $\Vert\nabla f(\mathbf{x})-\nabla f(\mathbf{y})\Vert_*$ 再同乘 $L$ 即得(i).
至此, 我们已证明了(i),(ii),(iii)和(iv)的等价性. 为证明(v)与这四条等价, 下面我们证明(ii) $\Leftrightarrow$ (v).
(ii) $\Rightarrow$ (v): 设 $\mathbf{x,y}\in\mathbb{E},\,\lambda\in[0,1]$ . 记 $\mathbf{x}_{\lambda}=\lambda\mathbf{x}+(1-\lambda)\mathbf{y}$ . 由(ii), $\begin{aligned}f(\mathbf{x})&\le f(\mathbf{x}_{\lambda})+\langle\nabla f(\mathbf{x}_{\lambda}),\mathbf{x-x}_{\lambda}\rangle+\frac{L}{2}\Vert\mathbf{x-x}_{\lambda}\Vert^2,\\f(\mathbf{y})&\le f(\mathbf{x}_{\lambda})+\langle\nabla f(\mathbf{x}_{\lambda}),\mathbf{y-x}_{\lambda}\rangle+\frac{L}{2}\Vert\mathbf{y-x}_{\lambda}\Vert^2,\end{aligned}$ 这等同于 $\begin{aligned}f(\mathbf{x})&\le f(\mathbf{x}_{\lambda})+(1-\lambda)\langle\nabla f(\mathbf{x}_{\lambda}),\mathbf{x-y}\rangle+\frac{L(1-\lambda)^2}{2}\Vert\mathbf{x-y}\Vert^2,\\f(\mathbf{y})&\le f(\mathbf{x}_{\lambda})+\lambda\langle\nabla f(\mathbf{x}_{\lambda}),\mathbf{y-x}\rangle+\frac{L\lambda^2}{2}\Vert\mathbf{x-y}\Vert^2,\end{aligned}$ 再在第一个不等式两边同乘 $\lambda$ , 第二个不等式两边同乘 $1-\lambda$ , 二者再相加即得(v).
(v) $\Rightarrow$ (ii): 重新整理(v)可得 $f(\mathbf{y})\le f(\mathbf{x})+\frac{f(\mathbf{x}+(1-\lambda)(\mathbf{y-x}))-f(\mathbf{x})}{1-\lambda}+\frac{L}{2}\lambda\Vert\mathbf{x-y}\Vert^2.$ 令 $\lambda\to1^{-1}$ , 则推出 $f(\mathbf{y})\le f(\mathbf{x})+f'(\mathbf{x;y-x})+\frac{L}{2}\Vert\mathbf{x-y}\Vert^2.$ 根据第三章定理11, $f'(\mathbf{x;y-x})=\langle\nabla f(\mathbf{x}),\mathbf{y-x}\rangle$ . 即得(ii).

下面的例子需要用到多元函数的微分中值定理.

定理3 (多元函数的微分中值定理) 设 $f:U\to\mathbb{R}$ 为开集 $U\subset\mathbb{R}^n$ 上的二次连续可微函数⁸. 设 $\mathbf{x}\in U,\,r>0$ 满足 $B(\mathbf{x},r)\subset U$ . 则对 $\forall\mathbf{y}\in B(\mathbf{x},r)$ , $\exists\bm{\xi}\in[\mathbf{x,y}]$ ⁹, 使得 $f(\mathbf{y})=f(\mathbf{x})+\nabla f(\mathbf{x})^T(\mathbf{y-x})+\frac{1}{2}(\mathbf{y-x})^T\nabla f(\bm{\xi})(\mathbf{y-x}).$

例5 ( $\ell_p$ -范数函数平方之一半的 $(p - 1)$ -光滑性) 考虑凸函数 $f:\mathbb{R}^n\to\mathbb{R}$ 定义为 $f(\mathbf{x})=\frac{1}{2}\Vert\mathbf{x}\Vert_p^2=\frac{1}{2}\left(\sum_{i=1}^n|x_i|^p\right)^{\frac{2}{p}},$ 这里 $p\in[2,\infty)$ . 下面我们证明 $f$ 在 $\ell_p$ -范数下是 $(p - 1)$ -光滑的. 当 $p = 2$ 时, 结论成立(见例1). 因此下面假设 $p > 2$ . 由于 $f$ 是凸函数, 因此我们想要利用定理2去证明结论. 为此, 先计算 $f$ 的偏导数与二阶偏导数: $\frac{\partial f}{\partial x_i}(\mathbf{x})=\left\{\begin{array}{ll}\mathrm{sgn}(x_i)\frac{|x_i|^{p-1}}{\Vert\mathbf{x}\Vert_p^{p-2}}, & \mathbf{x\ne0},\\0, & \mathbf{x=0},\end{array}\right.$ 注意到 $f$ 的偏导数在 $\mathbb{R}^n$ 上连续, 因此 $f$ 在 $\mathbb{R}^n$ 上可微¹⁰. 在 $\mathbf{x\ne0}$ 的点处 $f$ 有二阶偏导数: $\frac{\partial^2f}{\partial x_i\partial x_j}(\mathbf{x})=\left\{\begin{array}{ll}(2-p)\mathrm{sgn}(x_j)\frac{|x_i|^{p-1}|x_j|^{p-1}}{\Vert\mathbf{x}\Vert_p^{2p-2}}, & i\ne j,\\(p-1)\frac{|x_i|^{p-2}}{\Vert\mathbf{x}\Vert_p^{p-2}}+(2-p)\frac{|x_i|^{2p-2}}{\Vert\mathbf{x}\Vert_p^{2p-2}}, & i=j.\end{array}\right.$ 易知 $f$ 的二阶偏导数在 $\forall\mathbf{x\ne0}$ 处是连续的. 下面我们证明 $f$ 对 $L = p - 1$ 满足定理2的(ii). 设 $\mathbf{x,y}\in\mathbb{R}^n:\mathbf{0}\notin[\mathbf{x,y}]$ . 于是由微分中值定理, 取 $U$ 为包含 $[\mathbf{x,y}]$ 但不包含 $\mathbf{0}$ 的开集, 存在 $\bm{\xi}\in[\mathbf{x,y}]$ , 使得 $f(\mathbf{y})=f(\mathbf{x})+\nabla f(\mathbf{x})^T(\mathbf{y-x})+\frac{1}{2}(\mathbf{y-x})^T\nabla^2f(\bm{\xi})(\mathbf{y-x}).$ 只需证 $\mathbf{d}^T\nabla^2f(\bm{\xi})\mathbf{d}\le(p-1)\Vert\mathbf{d}\Vert_p^2,\,\forall\mathbf{d}\in\mathbb{R}^n$ . 由于 $\nabla^2f(t\bm{\xi})=\nabla^2f(\bm{\xi}),\,\forall t\in\mathbb{R}\setminus\{0\}$ , 因此我们不妨假设 $\Vert\bm{\xi}\Vert_p=1$ . 现对 $\forall\mathbf{d}\in\mathbb{R}^n$ , $\begin{aligned}\mathbf{d}^T\nabla^2f(\bm{\xi})\mathbf{d}&=(2-p)\Vert\bm{\xi}\Vert_p^{2-2p}\left(\sum_{i=1}^n|\xi_i|^{p-1}\mathrm{sgn}(\xi_i)d_i\right)^2+(p-1)\Vert\bm{\xi}\Vert_p^{2-p}\sum_{i=1}^n|\xi_i|^{p-2}d_i^2\\&\le(p-1)\Vert\bm{\xi}\Vert_p^{2-p}\sum_{i=1}^n|\xi_i|^{p-1}d_i^2,\end{aligned}$ 这里最后一个不等式是由于 $p > 2$ . 根据Cauchy-Schwarz不等式, $\sum_{i=1}^n|\xi_i|^{p-2}d_i^2\le\left(\sum_{i=1}^n\left(|\xi_i|^{p-2}\right)^{\frac{p}{p-2}}\right)^{\frac{p-2}{p}}\left(\sum_{i=1}^n\left(d_i^2\right)^{\frac{p}{2}}\right)^{\frac{2}{p}}=\left(\sum_{i=1}^n|\xi_i|^p\right)^{\frac{p-2}{p}}\left(\sum_{i=1}^n|d_i|^p\right)^{\frac{2}{p}}=\Vert\mathbf{d}\Vert_p^2.$ 于是, 对 $\forall\mathbf{d}\in\mathbb{R}^n$ , $\mathbf{d}^T\nabla^2f(\bm{\xi})\mathbf{d}\le(p-1)\Vert\mathbf{d}\Vert_p^2.$ 若 $\mathbf{0}\in[\mathbf{x,y}]$ , 则取一列 $\{\mathbf{y}_k\}_{k\ge0}$ 收敛到 $\mathbf{y}$ , 同时又有 $\mathbf{0}\notin[\mathbf{x,y}_k]$ . 因此由已有的结论, 对 $\forall k\ge0$ , $f(\mathbf{y}_k)\le f(\mathbf{x})+\nabla f(\mathbf{x})^T(\mathbf{y}_k-\mathbf{x})+\frac{p-1}{2}\Vert\mathbf{x-y}_k\Vert_p^2.$ 在不等式两边令 $k\to\infty$ 并利用 $f$ 的连续性即得证.

1.3 $L$ -光滑函数的二阶等价刻画

下面考虑 $\mathbb{E}=\mathbb{R}^n$ , 范数为 $\ell_p$ -范数( $p\ge1$ ). 对 $\mathbb{R}^n$ 上的二次连续可微函数, 我们可通过其Hessian矩阵的范数刻画其 $L$ -光滑性.

定理4 ( $L$ -光滑性与Hessian矩阵范数的有界性) 设 $f:\mathbb{R}^n\to\mathbb{R}$ 为 $\mathbb{R}^n$ 上的二次连续可微函数. 对一给定 $L\ge0$ , 下面两件事是等价的:
(i) $f$ 在 $\ell_p$ -范数( $p\in[1,\infty]$ )下是 $L$ -光滑的;
(ii) $\Vert\nabla^2f(\mathbf{x})\Vert_{p,q}\le L,\,\forall\mathbf{x}\in\mathbb{R}^n$ , 这里 $q\in[1,\infty]:\frac{1}{p}+\frac{1}{q}=1$ .

证明: (ii) $\Rightarrow$ (i): 假设 $\Vert\nabla^2f(\mathbf{x})\Vert_{p,q}\le L,\,\forall\mathbf{x}\in\mathbb{R}^n$ . 由微积分基本定理, 对 $\forall\mathbf{x,y}\in\mathbb{R}^n$ , $\begin{aligned}\nabla f(\mathbf{y})&=\nabla f(\mathbf{x})+\int_0^1\nabla^2f(\mathbf{x}+t(\mathbf{y-x}))(\mathbf{y-x})\,\mathrm{d}t\\&=\nabla f(\mathbf{x})+\left(\int_0^1\nabla^2f(\mathbf{x}+t(\mathbf{y-x}))\,\mathrm{d}t\right)\cdot(\mathbf{y-x}).\end{aligned}$ 于是 $\begin{aligned}\Vert\nabla f(\mathbf{y})-\nabla f(\mathbf{x})\Vert_q&=\left\Vert\left(\int_0^1\nabla^2f(\mathbf{x}+t(\mathbf{y-x}))\,\mathrm{d}t\right)\cdot(\mathbf{y-x})\right\Vert_q\\ &\le\left\Vert\int_0^1\nabla^2f(\mathbf{x}+t(\mathbf{y-x}))\,\mathrm{d}t\right\Vert_{p,q}\cdot\Vert\mathbf{y-x}\Vert_p\\ &\le\left(\int_0^1\left\Vert\nabla^2f(\mathbf{x}+t(\mathbf{y-x}))\right\Vert_{p,q}\,\mathrm{d}t\right)\cdot\Vert\mathbf{y-x}\Vert_p\\&\le L\Vert\mathbf{y-x}\Vert_p,\end{aligned}$ 这就证明了(i).
(i) $\Rightarrow$ (ii): 假设 $f$ 在 $\ell_p$ -范数下 $L$ -光滑. 再次由微积分基本定理, 对 $\forall\mathbf{d}\in\mathbb{R}^n,\,\alpha>0$ , $\nabla f(\mathbf{x}+\alpha\mathbf{d})-\nabla f(\mathbf{x})=\int_0^{\alpha}\nabla^2f(\mathbf{x}+t\mathbf{d})\mathbf{d}\,\mathrm{d}t.$ 因此, $\left\Vert\left(\int_0^{\alpha}\nabla^2f(\mathbf{x}+t\mathbf{d})\,\mathrm{d}t\right)\mathbf{d}\right\Vert_q\le\alpha L\Vert\mathbf{d}\Vert_p.$ 同除 $\alpha$ 并令 $\alpha\to0^+$ , 就有 $\Vert\nabla^2f(\mathbf{x})\mathbf{d}\Vert_q\le L\Vert\mathbf{d}\Vert_p,\quad\forall\mathbf{d}\in\mathbb{R}^n,$ 这表明 $\Vert\nabla^2f(\mathbf{x})\Vert_{p,q}\le L,\,\forall\mathbf{x}\in\mathbb{R}^n$ .

定理4的直接推论是, 对于二次连续可微的凸函数, 在 $\ell_2$ -范数下的 $L$ -光滑性等价于其Hessian矩阵的最大特征值小于等于 $L$ .

推论1 设 $f:\mathbb{R}^n\to\mathbb{R}$ 为 $\mathbb{R}^n$ 上二次连续可微的凸函数. 则 $f$ 在 $\ell_2$ -范数下 $L$ -光滑等价于 $\lambda_{\max}\left(\nabla^2f(\mathbf{x})\right)\le L,\,\forall\mathbf{x}\in\mathbb{R}^n$ .

证明: 因 $f$ 是凸函数, 所以 $\nabla^2f(\mathbf{x})\succeq\mathbf{0},\,\forall\mathbf{x}\in\mathbb{R}^n$ . 此时 $\Vert\nabla^2f(\mathbf{x})\Vert_{2,2}=\sqrt{\lambda_{\max}\left((\nabla^2f(\mathbf{x}))^2\right)}=\lambda_{\max}\left(\nabla^2f(\mathbf{x})\right).$ 再结合定理4, 就得证.

例6 ( $\sqrt{1+\Vert\cdot\Vert_2^2}$ 在 $\ell_2$ -范数下的 $1$ -光滑性) 设 $f:\mathbb{R}^n\to\mathbb{R}$ 为如下的凸函数 $f(\mathbf{x})=\sqrt{1+\Vert\mathbf{x}\Vert_2^2}.$ 对 $\forall\mathbf{x}\in\mathbb{R}^n$ , $\nabla f(\mathbf{x})=\frac{\mathbf{x}}{\sqrt{\Vert\mathbf{x}\Vert_2^2+1}},\,\nabla^2f(\mathbf{x})=\frac{1}{\sqrt{\Vert\mathbf{x}\Vert_2^2+1}}\mathbf{I}-\frac{\mathbf{xx}^T}{\left(\Vert\mathbf{x}\Vert_2^2+1\right)^{3/2}}\preceq\mathbf{I}.$ 从而有 $\lambda_{\max}\left(\nabla^2f(\mathbf{x})\right)\le1,\,\forall\mathbf{x}\in\mathbb{R}^n$ . 根据推论1, 可知 $f$ 在 $\ell_2$ -范数下 $1$ -光滑.

例7 (对数求和指数函数在 $\ell_2$ -, $\ell_{\infty}$ -范数下的 $1$ -光滑性) 考虑对数求和指数函数 $f:\mathbb{R}^n\to\mathbb{R}$ : $f(\mathbf{x})=\log(e^{x_1}+e^{x_2}+\cdots+e^{x_n}).$ 首先考虑 $\ell_2$ -范数. $f$ 的一阶偏导数为 $\frac{\partial f}{\partial x_i}(\mathbf{x})=\frac{e^{x_i}}{\sum_{k=1}^ne^{x_k}},\quad i=1,2,\ldots,n,$ 二阶偏导数为 $\frac{\partial^2f}{\partial x_i\partial x_j}(\mathbf{x})=\left\{\begin{array}{ll}-\frac{e^{x_i}e^{x_j}}{\left(\sum_{k=1}^ne^{x_k}\right)^2}, & i\ne j,\\-\frac{e^{2x_i}}{\left(\sum_{k=1}^ne^{x_k}\right)^2}+\frac{e^{x_i}}{\sum_{k=1}^ne^{x_k}}, & i=j.\end{array}\right.$ 于是Hessian矩阵可以写作 $\nabla^2f(\mathbf{x})=\mathrm{diag}(\mathbf{w})-\mathbf{ww}^T\succ\mathbf{0},$ 这里 $w_i=\frac{e^{x_i}}{\sum_{k=1}^ne^{x_k}}$ . 注意到对 $\forall\mathbf{x}\in\mathbb{R}^n$ , $\nabla^2f(\mathbf{x})=\mathrm{diag}(\mathbf{w})-\mathbf{ww}^T\preceq\mathrm{diag}(\mathbf{w})\preceq\mathbf{I},$ 因此 $\lambda_{\max}\left(\nabla^2f(\mathbf{x})\right)\le1,\,\forall\mathbf{x}\in\mathbb{R}^n$ . 再因 $f$ 的Hessian矩阵正定, 因此 $f$ 是凸函数, 由推论1即得 $f$ 在 $\ell_2$ -范数下是 $1$ -光滑的.

下证 $\ell_{\infty}$ -范数的情形. 我们首先证明对 $\forall\mathbf{d}\in\mathbb{R}^n$ , $\mathbf{d}^T\nabla^2f(\mathbf{x})\mathbf{d}\le\Vert\mathbf{d}\Vert_{\infty}^2.$ 事实上, $\begin{aligned}\mathbf{d}^T\nabla^2f(\mathbf{x})\mathbf{d}&=\mathbf{d}^T\left(\mathrm{diag}(\mathbf{w})-\mathbf{ww}^T\right)\mathbf{d}=\mathbf{d}^T\mathrm{diag}(\mathbf{w})\mathbf{d}-\left(\mathbf{w}^T\mathbf{d}\right)^2\\&\le\mathbf{d}^T\mathrm{diag}(\mathbf{w})\mathbf{d}=\sum_{i=1}^nw_id_i^2\\&\le\Vert\mathbf{d}\Vert_{\infty}^2\sum_{i=1}^nw_i=\Vert\mathbf{d}\Vert_{\infty}^2.\end{aligned}$ 因 $f$ 在 $\mathbb{R}^n$ 上二次连续可微, 于是由微分中值定理, 对 $\forall\mathbf{x,y}\in\mathbb{R}^n$ , $\exists\bm{\xi}\in[\mathbf{x,y}]$ , 使得 $f(\mathbf{y})=f(\mathbf{x})+\nabla f(\mathbf{x})^T(\mathbf{y-x})+\frac{1}{2}(\mathbf{y-x})^T\nabla^2f(\bm{\xi})(\mathbf{y-x}).$ 结合上面的不等式, 有 $f(\mathbf{y})\le f(\mathbf{x})+\nabla f(\mathbf{x})^T(\mathbf{y-x})+\frac{1}{2}\Vert\mathbf{y-x}\Vert_{\infty}^2,$ 再由定理2的(ii)即得 $f$ 在 $\ell_{\infty}$ -范数下的 $1$ -光滑性.

1.4 光滑参数计算小结

下表总结了本节讨论的函数在不同范数下的光滑参数. 其中最后一个函数的讨论放在下一章.

$f(\mathbf{x})$	$\mathrm{dom}(f)$	光滑参数	范数	例号
$\frac{1}{2}\mathbf{x}^T\mathbf{Ax}+\mathbf{b}^T\mathbf{x}+c\,(\mathbf{A}\in\mathbb{S}^n,\,\mathbf{b}\in\mathbb{R}^n,\,c\in\mathbb{R})$	$\mathbb{R}^n$	$\Vert\mathbf{A}\Vert_{p,q}$	$\ell_p$	1
$\langle\mathbf{b,x}\rangle+c\,(\mathbf{b}\in\mathbb{E}^*,\,c\in\mathbb{R})$	$\mathbb{E}$	$0$	任何范数	2
$\frac{1}{2}\Vert\mathbf{x}\Vert_p^2,\,p\in[2,\infty)$	$\mathbb{R}^n$	$p - 1$	$\ell_p$	5
$\sqrt{1+\Vert\mathbf{x}\Vert_2^2}$	$\mathbb{R}^n$	$1$	$\ell_2$	6
$\log(\sum_{i=1}^ne^{x_i})$	$\mathbb{R}^n$	$1$	$\ell_2,\ell_{\infty}$	7
$\frac{1}{2}d_C^2(\mathbf{x})\,(\emptyset\ne C\subset\mathbb{E}$ 闭凸 $)$	$\mathbb{E}$	$1$	欧式范数	3
$\frac{1}{2}\Vert\mathbf{x}\Vert^2-\frac{1}{2}d_C^2(\mathbf{x})\,(\emptyset\ne C\subset\mathbb{E}$ 闭凸 $)$	$\mathbb{E}$	$1$	欧式范数	4
$H_{\mu}(\mathbf{x})\,(\mu>0)$	$\mathbb{E}$	$\frac{1}{\mu}$	欧式范数	第六章例28

2. $\sigma$ -强凸性

定义2 (强凸性) 对一给定 $\sigma>0$ , 我们称函数 $f:\mathbb{E}\to(-\infty,\infty]$ 是 $\sigma$ -强凸的, 若 $\mathrm{dom}(f)$ 是凸集且对 $\forall\mathbf{x,y}\in\mathrm{dom}(f),\,\lambda\in[0,1]$ , 均有 $f(\lambda\mathbf{x}+(1-\lambda)\mathbf{y})\le\lambda f(\mathbf{x})+(1-\lambda)f(\mathbf{y})-\frac{\sigma}{2}\lambda(1-\lambda)\Vert\mathbf{x-y}\Vert^2.$ 称 $\sigma$ 为强凸参数. 有时我们也称 $f$ 对于参数 $\sigma$ 强凸.

需要说明的是, 强凸参数 $\sigma$ 也依赖于定义所用的范数. 因此我们有时也会刻意地说成, 在范数 $\Vert\cdot\Vert$ 下的强凸参数. 由于如上定义的强凸函数的有效域是凸的, 而且显然有Jensen不等式成立, 所以强凸函数必定是凸函数.

当 $\mathbb{E}$ 是欧式空间时, 我们可给出等价于强凸性的一个简单性质.

定理5 设 $\mathbb{E}$ 为欧式空间¹¹. 则 $f:\mathbb{E}\to(-\infty,\infty]$ 是 $\sigma$ -强凸函数 $(\sigma>0)$ 当且仅当 $f(\cdot)-\frac{\sigma}{2}\Vert\cdot\Vert^2$ 是凸函数.

证明: 函数 $g(\mathbf{x})\equiv f(\mathbf{x})-\frac{\sigma}{2}\Vert\mathbf{x}\Vert^2$ 是凸函数当且仅当其有效域 $\mathrm{dom}(g)=\mathrm{dom}(f)$ 是凸集, 且对 $\forall\mathbf{x,y}\in\mathrm{dom}(f),\,\lambda\in[0,1]$ , $g(\lambda\mathbf{x}+(1-\lambda)\mathbf{y})\le\lambda g(\mathbf{x})+(1-\lambda)g(\mathbf{y}).$ 这等价于 $f(\lambda\mathbf{x}+(1-\lambda)\mathbf{y})\le\lambda f(\mathbf{x})+(1-\lambda)f(\mathbf{y})+\frac{\sigma}{2}[\Vert\lambda\mathbf{x}+(1-\lambda)\mathbf{y}\Vert^2-\lambda\Vert\mathbf{x}\Vert^2-(1-\lambda)\Vert\mathbf{y}\Vert^2].$ 由于 $\mathbb{E}$ 是欧式空间, 所以 $\Vert\lambda\mathbf{x}+(1-\lambda)\mathbf{y}\Vert^2-\lambda\Vert\mathbf{x}\Vert^2-(1-\lambda)\Vert\mathbf{y}\Vert^2=-\lambda(1-\lambda)\Vert\mathbf{x-y}\Vert^2,$ 代入上面的不等式即可得证.

另外, $\sigma$ -强凸性也有类似于 $L$ -光滑性的单调性, 即若函数 $f$ 是 $\sigma_1$ -强凸的( $\sigma_1>0$ ), 则对 $\forall\sigma_2\in(0,\sigma_1)$ , 它必是 $\sigma_2$ -强凸的. 对应地, 定出一给定函数的最大强凸参数则也是一件不平凡的事, 也是一件有趣的事.

例8 (二次函数的强凸性) 假设 $\mathbb{E}=\mathbb{R}^n$ 赋 $\ell_2$ -范数, 考虑二次函数 $f:\mathbb{R}^n\to\mathbb{R}$ 定义为 $f(\mathbf{x})=\frac{1}{2}\mathbf{x}^T\mathbf{Ax}+\mathbf{b}^T\mathbf{x}+c,$ 其中 $\mathbf{A}\in\mathbb{S}^n,\,\mathbf{b}\in\mathbb{R}^n,\,c\in\mathbb{R}$ . 由定理5, $f$ 是 $\sigma$ -强凸函数当且仅当函数 $\frac{1}{2}\mathbf{x}^T(\mathbf{A-\sigma I})\mathbf{x}+\mathbf{b}^T\mathbf{x}+c$ 是凸函数, 而这等价于 $\mathbf{A-\sigma I}\succeq\mathbf{0}$ , 即 $\lambda_{\min}(\mathbf{A})\ge\sigma$ . 因此, $f$ 强凸当且仅当 $\mathbf{A}$ 正定, 且此时 $\lambda_{\min}(\mathbf{A})$ 就是 $f$ 的最大强凸参数.

强凸函数与凸函数的和仍然是强凸函数, 且不改变其强凸参数.

引理1 设 $f:\mathbb{E}\to(-\infty,\infty]$ 是 $\sigma$ -强凸函数 $(\sigma>0$ ), $g:\mathbb{E}\to(-\infty,\infty]$ 是凸函数. 则 $f + g$ 仍然是 $\sigma$ -强凸函数.

证明: 证明是直接的. 因 $f, g$ 是凸函数, 所以 $\mathrm{dom}(f),\mathrm{dom}(g)$ 都是凸集, 从而 $\mathrm{dom}(f+g)=\mathrm{dom}(f)\cap\mathrm{dom}(g)$ 是凸集. 设 $\mathbf{x,y}\in\mathrm{dom}(f)\cap\mathrm{dom}(g),\,\lambda\in[0,1]$ . 则由 $f$ 的 $\sigma$ -强凸性, $f(\lambda\mathbf{x}+(1-\lambda)\mathbf{y})\le\lambda f(\mathbf{x})+(1-\lambda)f(\mathbf{y})-\frac{\sigma}{2}\lambda(1-\lambda)\Vert\mathbf{x-y}\Vert^2.$ 再由 $g$ 是凸函数, $g(\lambda\mathbf{x}+(1-\lambda)\mathbf{y})\le\lambda g(\mathbf{x})+(1-\lambda)g(\mathbf{y}).$ 两不等式相加得到 $(f+g)(\lambda\mathbf{x}+(1-\lambda)\mathbf{y})\le\lambda(f+g)(\mathbf{x})+(1-\lambda)(f+g)(\mathbf{y})-\frac{\sigma}{2}\lambda(1-\lambda)\Vert\mathbf{x-y}\Vert^2,$ 得证.

例9 ( $\frac{1}{2}\Vert\cdot\Vert^2+\delta_C$ 的强凸性) 假设 $\mathbb{E}$ 为欧式空间, $C\subset\mathbb{E}$ 为一非空凸集. 则由例8知 $\frac{1}{2}\Vert\mathbf{x}\Vert^2$ 是 $1$ -强凸函数, 再由 $C$ 是凸集, 从而 $\delta_C$ 是凸函数. 最后根据引理1, 函数 $\frac{1}{2}\Vert\mathbf{x}\Vert^2+\delta_C(\mathbf{x})$ 就是 $1$ -强凸的.

我们之前给出了刻画 $L$ -光滑函数的一阶和二阶性质. 这里我们也给出 $\sigma$ -强凸的两个等价一阶性质. 它们的证明要用到下面的一维中值定理(引理2¹²)与线段原理(引理3¹³).

引理2(中值定理) 设 $f:\mathbb{R}\to(-\infty,\infty]$ 为一闭凸函数, $[a,b]\subset\mathrm{dom}(f)(a<b)$ . 于是 $f(b)-f(a)=\int_a^bh(t)\,\mathrm{d}t,$ 其中 $h:(a,b)\to\mathbb{R}$ 满足 $h(t)\in\partial f(t),\,\forall t\in(a,b)$ .

引理3(线段原理) 设 $C$ 为凸集. 假设 $\mathbf{x}\in\mathrm{ri}(C),\,\mathbf{y}\in\mathrm{cl}(C),\,\lambda\in(0,1]$ . 于是 $\lambda\mathbf{x}+(1-\lambda)\mathbf{y}\in\mathrm{ri}(C)$ .

定理6 (强凸性的一阶刻画) 设 $f:\mathbb{E}\to(-\infty,\infty]$ 为一正常闭凸函数. 则对一给定 $\sigma>0$ , 下面三件事是等价的:
(i) $f$ 是 $\sigma$ -强凸函数;
(ii) $f(\mathbf{y})\ge f(\mathbf{x})+\langle\mathbf{g,y-x}\rangle+\frac{\sigma}{2}\Vert\mathbf{y-x}\Vert^2,\,\forall\mathbf{x}\in\mathrm{dom}(\partial f),\,\mathbf{y}\in\mathrm{dom}(f),\,\mathbf{g}\in\partial f(\mathbf{x})$ ¹⁴;
(iii) $\langle\mathbf{g_x-g_y,x-y}\rangle\ge\sigma\Vert\mathbf{x-y}\Vert^2,\,\forall\mathbf{x,y}\in\mathrm{dom}(\partial f),\,\mathbf{g_x}\in\partial f(\mathbf{x}),\,\mathbf{g_y}\in\partial f(\mathbf{y})$ ¹⁵.

证明: (ii) $\Rightarrow$ (i): 假设(ii)成立. 任取 $\mathbf{x,y}\in\mathrm{dom}(f),\,\lambda\in(0,1),\,\mathbf{z}\in\mathrm{ri(dom}(f))$ . 于是对 $\forall\alpha\in(0,1]$ , 由线段原理, $\tilde\mathbf{x}=(1-\alpha)\mathbf{x}+\alpha\mathbf{z}\in\mathrm{ri(dom}(f))$ . 固定 $\alpha$ . 记 $\mathbf{x_{\lambda}}=\lambda\tilde\mathbf{x}+(1-\lambda)\mathbf{y}$ , 再由线段原理, 知 $\mathbf{x_{\lambda}}\in\mathrm{ri(dom}(f)),\,\forall\lambda\in(0,1)$ . 因此根据第三章定理6, $\partial f(\mathbf{x_{\lambda}})\ne\emptyset$ , $\mathbf{x_{\lambda}}\in\mathrm{dom}(\partial f)$ . 取 $\mathbf{g}\in\partial f(\mathbf{x_{\lambda}})$ . 由(ii), $f(\tilde\mathbf{x})\ge f(\mathbf{x_{\lambda}})+\langle\mathbf{g},\tilde\mathbf{x}-\mathbf{x_{\lambda}}\rangle+\frac{\sigma}{2}\Vert\tilde\mathbf{x}-\mathbf{x_{\lambda}}\Vert^2,$ 代入 $\mathbf{x_{\lambda}}$ 定义就有 $f(\tilde\mathbf{x})\ge f(\mathbf{x_{\lambda}})+(1-\lambda)\langle\mathbf{g},\tilde\mathbf{x}-\mathbf{y}\rangle+\frac{\sigma(1-\lambda)^2}{2}\Vert\mathbf{y}-\tilde\mathbf{x}\Vert^2.$ 类似地有 $f(\mathbf{y})\ge f(\mathbf{x_{\lambda}})+\lambda\langle\mathbf{g},\mathbf{y}-\tilde\mathbf{x}\rangle+\frac{\sigma\lambda^2}{2}\Vert\mathbf{y}-\tilde\mathbf{x}\Vert^2.$ 前者两边乘以 $\lambda$ , 后者两边乘以 $1-\lambda$ , 再相加可得 $f(\lambda\tilde\mathbf{x}+(1-\lambda)\mathbf{y})\le\lambda f(\tilde\mathbf{x})+(1-\lambda)f(\mathbf{y})-\frac{\sigma\lambda(1-\lambda)}{2}\Vert\tilde\mathbf{x}-\mathbf{y}\Vert^2.$ 将 $\tilde\mathbf{x}$ 的定义代入上式, 可得 $g_1(\alpha)\le\lambda g_2(\alpha)+(1-\lambda)f(\mathbf{y})-\frac{\sigma\lambda(1-\lambda)}{2}\Vert(1-\alpha)\mathbf{x}+\alpha\mathbf{z}-\mathbf{y}\Vert^2,$ 其中 $g_1(\alpha)\equiv f(\lambda(1-\alpha)\mathbf{x}+(1-\lambda)\mathbf{y}+\lambda\alpha\mathbf{z})$ , $g_2(\alpha)\equiv f((1-\alpha)\mathbf{x}+\alpha\mathbf{z})$ . 函数 $g_1,g_2$ 均是一维正常闭凸函数, 从而根据第二章定理10可知, 它们都在它们的有效域上连续. 令 $\alpha\to0^+$ , 推出 $g_1(0)\le\lambda g_2(0)+(1-\lambda)f(\mathbf{y})-\frac{\sigma\lambda(1-\lambda)}{2}\Vert\mathbf{x-y}\Vert^2.$ 又因 $g_1(0)=f(\lambda\mathbf{x}+(1-\lambda)\mathbf{y}),\,g_2(0)=f(\mathbf{x})$ , 故得 $f$ 的 $\sigma$ -强凸性.

(i) $\Rightarrow$ (iii): 假设(i)成立. 设 $\mathbf{x,y}\in\mathrm{dom}(\partial f),\,\mathbf{g_x}\in\partial f(\mathbf{x}),\,\mathbf{g_y}\in\partial f(\mathbf{y})$ . 任取 $\lambda\in[0,1)$ , 并记 $\mathbf{x_{\lambda}}=\lambda\mathbf{x}+(1-\lambda)\mathbf{y}$ . 由(i), $f(\mathbf{x_{\lambda}})\le\lambda f(\mathbf{x})+(1-\lambda)f(\mathbf{y})-\frac{\sigma}{2}\lambda(1-\lambda)\Vert\mathbf{x-y}\Vert^2,$ 进一步有 $\frac{f(\mathbf{x_{\lambda}})-f(\mathbf{x})}{1-\lambda}\le f(\mathbf{y})-f(\mathbf{x})-\frac{\sigma}{2}\lambda\Vert\mathbf{x-y}\Vert^2.$ 因为 $\mathbf{g_x}\in\partial f(\mathbf{x})$ , 于是 $\frac{f(\mathbf{x_{\lambda}})-f(\mathbf{x})}{1-\lambda}\ge\frac{\langle\mathbf{g_x,x_{\lambda}-x}\rangle}{1-\lambda}=\langle\mathbf{g_x,y-x}\rangle,$ 所以 $\langle\mathbf{g_x,y-x}\rangle\le f(\mathbf{y})-f(\mathbf{x})-\frac{\sigma\lambda}{2}\Vert\mathbf{x-y}\Vert^2.$ 令 $\lambda\to1^{-1}$ , 就有 $\langle\mathbf{g_x,y-x}\rangle\le f(\mathbf{y})-f(\mathbf{x})-\frac{\sigma}{2}\Vert\mathbf{x-y}\Vert^2.$ 交换 $\mathbf{x,y}$ 的位置又得到 $\langle\mathbf{g_y,x-y}\rangle\le f(\mathbf{x})-f(\mathbf{y})-\frac{\sigma}{2}\Vert\mathbf{x-y}\Vert^2.$ 两式相加即得(iii).

(iii) $\Rightarrow$ (ii): 假设(iii)成立. 设 $\mathbf{x}\in\mathrm{dom}(\partial f),\,\mathbf{y}\in\mathrm{dom}(f),\,\mathbf{g}\in\partial f(\mathbf{x})$ . 令 $\mathbf{z}\in\mathrm{ri(dom}(f))$ , 定义 $\tilde\mathbf{y}=(1-\alpha)\mathbf{y}+\alpha\mathbf{z},\,\alpha\in(0,1)$ . 固定 $\alpha$ . 由线段原理, $\tilde\mathbf{y}\in\mathrm{ri(dom}(f))$ . 考虑一元函数 $\varphi(\lambda)=f(\mathbf{x_{\lambda}}),\quad\lambda\in[0,1],$ 其中 $\mathbf{x_{\lambda}}=(1-\lambda)\mathbf{x}+\lambda\tilde\mathbf{y}$ . 对 $\forall\lambda\in(0,1)$ , 令 $\mathbf{g_{\lambda}}\in\partial f(\mathbf{x_{\lambda}})$ ¹⁶. 于是 $\langle\mathbf{g_{\lambda}},\tilde\mathbf{y}-\mathbf{x}\rangle\in\partial\varphi(\lambda)$ , 从而由中值定理, $f(\tilde\mathbf{y})-f(\mathbf{x})=\varphi(1)-\varphi(0)=\int_0^1\langle\mathbf{g_{\lambda}},\tilde\mathbf{y}-\mathbf{x}\rangle\,\mathrm{d}\lambda.$ 因为 $\mathbf{g}\in\partial f(\mathbf{x}),\,\mathbf{g_{\lambda}}\in\partial f(\mathbf{x_{\lambda}})$ , 根据(iii), $\langle\mathbf{g_{\lambda}-g,x_{\lambda}-x}\rangle\ge\sigma\Vert\mathbf{x_{\lambda}-x}\Vert^2,$ 代入 $\mathbf{x_{\lambda}}$ 的定义, $\langle\mathbf{g_{\lambda}},\tilde\mathbf{y}-\mathbf{x}\rangle\ge\langle\mathbf{g},\tilde\mathbf{y}-\mathbf{x}\rangle+\sigma\lambda\Vert\tilde\mathbf{y}-\mathbf{x}\Vert^2.$ 将此代入中值定理的不等式, 就有 $f(\tilde\mathbf{y})-f(\mathbf{x})\ge\int_0^1\left[\langle\mathbf{g},\tilde\mathbf{y}-\mathbf{x}\rangle+\sigma\lambda\Vert\tilde\mathbf{y}-\mathbf{x}\Vert^2\right]\,\mathrm{d}\lambda=\langle\mathbf{g},\tilde\mathbf{y}-\mathbf{x}\rangle+\frac{\sigma}{2}\Vert\tilde\mathbf{y}-\mathbf{x}\Vert^2.$ 代入 $\tilde\mathbf{y}$ 的定义, 我们有对 $\forall\alpha\in(0,1)$ , $f((1-\alpha)\mathbf{y}+\alpha\mathbf{z})\ge f(\mathbf{x})+\langle\mathbf{g},(1-\alpha)\mathbf{y}+\alpha\mathbf{z}-\mathbf{x}\rangle+\frac{\sigma}{2}\Vert(1-\alpha)\mathbf{y}+\alpha\mathbf{z}-\mathbf{x}\Vert^2.$ 令 $\alpha\to0^+$ 并利用一元函数 $\alpha\mapsto f((1-\alpha)\mathbf{y}+\alpha\mathbf{z})$ 在 $[0, 1]$ 上的连续性¹⁷就得证.

下一个定理表明, 正常闭强凸函数有唯一的极小点, 且它在极小点附近满足一定的增长性质.

定理7 (闭强凸函数极小点的存在唯一性) 设 $f:\mathbb{E}\to(-\infty,\infty]$ 为一正常闭 $\sigma$ -强凸函数( $\sigma>0$ ). 于是
(i) $f$ 有唯一全局极小点;
(ii) $f(\mathbf{x})-f(\mathbf{x}^*)\ge\frac{\sigma}{2}\Vert\mathbf{x-x}^*\Vert^2,\,\forall\mathbf{x}\in\mathrm{dom}(f)$ , 其中 $\mathbf{x}^*$ 是(i)中 $f$ 的唯一极小点.

证明: (i) 因为 $\mathrm{dom}(f)$ 是非空凸集, 从而根据第三章定理5, 存在 $\mathbf{x}_0\in\mathrm{ri(dom}(f))$ , 从而再根据第三章定理6, $\partial f(\mathbf{x}_0)\ne\emptyset$ . 设 $\mathbf{g}\in\partial f(\mathbf{x}_0)$ . 由定理6的(ii), 推出 $f(\mathbf{x})\ge f(\mathbf{x}_0)+\langle\mathbf{g,x-x}_0\rangle+\frac{\sigma}{2}\Vert\mathbf{x-x}_0\Vert^2,\quad\forall\mathbf{x}\in\mathbb{E}.$ 因为有限维空间中所有的范数都等价, 所以存在常数 $C > 0$ 使得 $\Vert\mathbf{y}\Vert\ge\sqrt{C}\Vert\mathbf{y}\Vert_a,$ 其中 $\Vert\cdot\Vert_a$ 是与空间内积对应的欧式范数. 因此 $f(\mathbf{x})\ge f(\mathbf{x}_0)+\langle\mathbf{g,x-x}_0\rangle+\frac{C\sigma}{2}\Vert\mathbf{x-x}_0\Vert_a^2,\quad\forall\mathbf{x}\in\mathbb{E},$ 配方后可得 $f(\mathbf{x})\ge f(\mathbf{x}_0)-\frac{1}{2C\sigma}\Vert\mathbf{g}\Vert_a^2+\frac{C\sigma}{2}\left\Vert\mathbf{x}-\left(\mathbf{x}_0-\frac{1}{C\sigma}\mathbf{g}\right)\right\Vert_a^2,\quad\forall\mathbf{x}\in\mathbb{E}.$ 特别地, $\mathrm{Lev}(f,f(\mathbf{x}_0))\subset B_{\Vert\cdot\Vert_a}\left[\mathbf{x}_0-\frac{1}{C\sigma}\mathbf{g},\frac{1}{C\sigma}\Vert\mathbf{g}\Vert_a\right].$ 由于 $f$ 是闭函数, 所以根据第二章定理1, 上述水平集是闭集(同时也是有界集). 因此 $\mathrm{Lev}(f,f(\mathbf{x}_0))$ 是紧集. 同时注意到 $\mathbf{x}_0\in\mathrm{Lev}(f,f(\mathbf{x}_0))$ . 所以 $f$ 在 $\mathrm{dom}(f)$ 上的最优点集等同于 $f$ 在非空紧集 $\mathrm{Lev}(f,f(\mathbf{x}_0))$ 上的最优点集. 由第二章定理4(闭函数的Weierstrass定理), 得知这样的全局极小点是存在的.
下面证明唯一性. 假设 $\tilde\mathbf{x},\hat\mathbf{x}$ 都是 $f$ 的全局极小点. 则 $f(\tilde\mathbf{x})=f(\hat\mathbf{x})=f_{\mathrm{opt}}$ , 这里 $f_{\mathrm{opt}}$ 是 $f$ 的最小值. 由 $f$ 的 $\sigma$ -强凸性, $f_{\mathrm{opt}}\le f\left(\frac{1}{2}\tilde\mathbf{x}+\frac{1}{2}\hat\mathbf{x}\right)\le\frac{1}{2}f(\tilde\mathbf{x})+\frac{1}{2}f(\hat\mathbf{x})-\frac{\sigma}{8}\Vert\tilde\mathbf{x}-\hat\mathbf{x}\Vert^2=f_{\mathrm{opt}}-\frac{\sigma}{8}\Vert\tilde\mathbf{x}-\hat\mathbf{x}\Vert^2,$ 表明 $\tilde\mathbf{x}=\hat\mathbf{x}$ .

(ii) 设 $\mathbf{x}^*$ 是(i)中 $f$ 的唯一全局极小点. 由Fermat最优性条件, $\mathbf{0}\in\partial f(\mathbf{x}^*)$ . 再由定理6的(ii), $f(\mathbf{x})-f(\mathbf{x}^*)\ge\langle\mathbf{0},\mathbf{x-x}^*\rangle+\frac{\sigma}{2}\Vert\mathbf{x-x}^*\Vert^2=\frac{\sigma}{2}\Vert\mathbf{x-x}^*\Vert^2,\quad\forall\mathbf{x}\in\mathbb{E}.$ 证毕.

3. 光滑性与强凸性的关系

3.1 共轭关联定理

光滑性与强凸性是靠共轭运算关联起来的. 粗略地讲, $f$ 是 $\sigma$ -强凸函数当且仅当 $f^*$ 是 $\frac{1}{\sigma}$ -光滑函数.

定理8 (共轭关联定理) 设 $\sigma>0$ . 则
(i) 若 $f:\mathbb{E}\to\mathbb{R}$ 为一 $\frac{1}{\sigma}$ -光滑凸函数, 则 $f^*$ 是对偶范数 $\Vert\cdot\Vert_*$ 下的 $\sigma$ -强凸函数¹⁸;
(ii) 若 $f:\mathbb{R}\to(-\infty,\infty]$ 为一正常闭 $\sigma$ -强凸函数, 则 $f^*:\mathbb{E}^*\to\mathbb{R}$ 是对偶范数下的 $\frac{1}{\sigma}$ -光滑函数.

证明: (i) 假设 $f:\mathbb{E}\to\mathbb{R}$ 为一 $\frac{1}{\sigma}$ -光滑凸函数. 任取 $\mathbf{y}_1,\mathbf{y}_2\in\mathrm{dom}(\partial f^*),\,\mathbf{v}_1\in\partial f^*(\mathbf{y}_1),\,\mathbf{v}_2\in\partial f^*(\mathbf{y}_2)$ . 根据第四章的共轭次梯度定理以及 $f$ 的正常闭凸性, 就有 $\mathbf{y}_1\in\partial f(\mathbf{v}_1),\,\mathbf{y}_2\in\partial f(\mathbf{v}_2)$ , 再由 $f$ 的可微性, 就有 $\mathbf{y}_1=\nabla f(\mathbf{v}_1),\,\mathbf{y}_2=\nabla f(\mathbf{v}_2)$ . 由定理2(i)与(iv)的等价性, 有 $\langle\mathbf{y}_1-\mathbf{y}_2,\mathbf{v}_1-\mathbf{v}_2\rangle\ge\sigma\Vert\mathbf{y}_1-\mathbf{y}_2\Vert_*^2.$ 因为这一不等式对 $\forall\mathbf{y}_1,\mathbf{y}_2\in\mathrm{dom}(\partial f^*),\,\mathbf{v}_1\in\partial f^*(\mathbf{y}_1),\,\mathbf{v}_2\in\partial f^*(\mathbf{y}_2)$ 都成立, 由定理6(i)和(iii)的等价性, 就推出 $f^*$ 是对偶范数下的 $\sigma$ -强凸函数.

(ii) 设 $f$ 是正常闭 $\sigma$ -强凸函数. 再由共轭次梯度定理(或其推论), $\partial f^*(\mathbf{y})=\arg\max_{\mathbf{x}\in\mathbb{E}}\{\langle\mathbf{x,y}\rangle-f(\mathbf{x})\},\quad\forall\mathbf{y}\in\mathbb{E}^*.$ 根据 $f$ 的闭强凸性以及定理7的(i), 我们推出对 $\forall\mathbf{y}\in\mathbb{E}^*$ , $\partial f^*(\mathbf{y})$ 都是单点集. 于是根据第三章定理12, $f^*$ 在整个对偶空间 $\mathbb{E}^*$ 上就都是可微的. 现任取 $\mathbf{y}_1,\mathbf{y}_2\in\mathbb{E}^*$ , 并记 $\mathbf{v}_1=\nabla f^*(\mathbf{y}_1),\,\mathbf{v}_2=\nabla f^*(\mathbf{y}_2)$ . 再次利用共轭次梯度定理, 这些等式等价于 $\mathbf{y}_1\in\partial f(\mathbf{v}_1),\,\mathbf{y}_2\in\partial f(\mathbf{v}_2)$ . 由定理6(i)与(iii)的等价性以及广义Cauchy-Schwarz不等式, $\Vert\mathbf{y}_1-\mathbf{y}_2\Vert_*\cdot\Vert\nabla f^*(\mathbf{y}_1)-\nabla f^*(\mathbf{y}_2)\Vert\ge\langle\mathbf{y}_1-\mathbf{y}_2,\nabla f^*(\mathbf{y}_1)-\nabla f^*(\mathbf{y}_2)\rangle\ge\sigma\Vert\nabla f^*(\mathbf{y}_1)-\nabla f^*(\mathbf{y}_2)\Vert^2,$ 于是 $\Vert\nabla f^*(\mathbf{y}_1)-\nabla f^*(\mathbf{y}_2)\Vert\le\frac{1}{\sigma}\Vert\mathbf{y}_1-\mathbf{y}_2\Vert_*.$

3.2 强凸函数的例子

类似于在第四章我们利用共轭运算得到了一些函数的凸性, 这里我们也可以利用共轭关联定理得到许多重要函数的强凸性.

例10 (单位单纯形上的负熵函数) 考虑函数 $f:\mathbb{R}^n\to(-\infty,\infty]$ 定义为 $f(\mathbf{x})=\left\{\begin{array}{ll}\sum_{i=1}^nx_i\log x_i, & \mathbf{x}\in\Delta_n,\\\infty, & 其它.\end{array}\right.$ 根据第四章第4.10节我们知道, 此函数的共轭是对数求和指数函数 $f^*(\mathbf{y})=\log\left(\sum_{i=1}^ne^{y_i}\right)$ (从而使凸函数), 而这在例7中已经证明了, 是在 $\ell_{\infty}$ -与 $\ell_2$ -范数下的 $1$ -光滑函数. 由共轭关联定理, $f$ 就是 $\ell_1$ -和 $\ell_2$ -范数下的 $1$ -强凸函数.

例11 (平方 $\ell_p$ -范数, $p\in(1,2]$ ) 考虑函数 $f:\mathbb{R}^n\to\mathbb{R}$ 定义为 $f(\mathbf{x})=\frac{1}{2}\Vert\mathbf{x}\Vert_p^2(p\in(1,2])$ . 根据第四章第4.15节, $f^*(\mathbf{y})=\frac{1}{2}\Vert\mathbf{y}\Vert_q^2,\,q\ge2:\frac{1}{p}+\frac{1}{q}=1$ . 由例5, $f^*$ 是 $\ell_p$ -范数下的 $(q - 1)$ -光滑函数, 再由共轭关联定理, 就有 $f$ 是 $\ell_p$ -范数下的 $\frac{1}{q-1}=(p-1)$ -强凸函数.

例12 ( $\ell_2$ -范数下半球面函数) 考虑下半球面函数 $f:\mathbb{R}^n\to(-\infty,\infty]$ , $f(\mathbf{x})=\left\{\begin{array}{ll}-\sqrt{1-\Vert\mathbf{x}\Vert_2^2}, & \Vert\mathbf{x}\Vert_2\le1,\\\infty, & 其它.\end{array}\right.$ 根据第四章第4.13节, $f$ 的共轭函数是 $f^*(\mathbf{y})=\sqrt{\Vert\mathbf{y}\Vert_2^2+1},$ 而例6告诉我们 $f^*$ 是 $\ell_2$ -范数下的 $1$ -光滑函数. 因此由共轭关联定理, $f$ 是 $\ell_2$ -范数下的 $1$ -强凸函数.

3.3 强凸参数计算小结

下表总结了本章碰到的所有强凸函数.

$f(\mathbf{x})$	$\mathrm{dom}(f)$	强凸参数	范数	例号
$\frac{1}{2}\mathbf{x}^T\mathbf{Ax}+2\mathbf{b}^T\mathbf{x}+c\,(\mathbf{A}\in\mathbb{S}_{++}^n,\,\mathbf{b}\in\mathbb{R}^n,\,c\in\mathbb{R})$	$\mathbb{R}^n$	$\lambda_{\min}(\mathbf{A})$	$\ell_2$	8
$\frac{1}{2}\Vert\mathbf{x}\Vert^2+\delta_C(\mathbf{x})\,(\emptyset\ne C\subset\mathbb{E}$ 凸 $)$	$C$	$1$	欧式范数	9
$-\sqrt{1-\Vert\mathbf{x}\Vert^2_2}$	$B_{\Vert\cdot\Vert_2}[\mathbf{0},1]$	$1$	$\ell_2$	12
$\frac{1}{2}\Vert\mathbf{x}\Vert_p^2\,(p\in(1,2])$	$\mathbb{R}^n$	$p - 1$	$\ell_p$	11
$\sum_{i=1}^nx_i\log x_i$	$\Delta_n$	$1$	$\ell_2$ 或 $\ell_1$	10

3.4 极小卷积的光滑性与可微性

本节我们将基于共轭关联定理, 证明在一定条件下, 一个凸函数和一个 $L$ -光滑凸函数的极小卷积仍然是 $L$ -光滑的. 特别地, 我们还将导出其梯度的表达式.

定理9 (极小卷积的光滑性) 设 $f:\mathbb{E}\to(-\infty,\infty]$ 为一正常闭凸函数, $\omega:\mathbb{E}\to\mathbb{R}$ 为一 $L$ -光滑凸函数. 假定 $f\square\omega$ 是实值的. 则有以下结论成立:
(i) $f\square\omega$ 是 $L$ -光滑的;
(ii) 设 $\mathbf{x}\in\mathbb{E}$ , 并假定 $\mathbf{u(x)}$ 是 $\min_{\mathbf{u}}\{f(\mathbf{u})+\omega(\mathbf{x-u})\}$ 的全局极小点. 则 $\nabla(f\square\omega)(\mathbf{x})=\nabla\omega(\mathbf{x-u(x)})$ .

证明: (i) 根据第四章定理11, $f\square\omega=(f^*+\omega^*)^*.$ 又因为 $f,\omega$ 是正常闭凸函数, 根据第四章定理1和定理2, 就推出 $f^*,\omega^*$ 也是正常闭凸函数. 由共轭关联定理, $\omega^*$ 是 $\frac{1}{L}$ -强凸函数. 因此, 由引理1, $f^*+\omega^*$ 是 $\frac{1}{L}$ -强凸函数. 同时作为两个闭函数的和, 它也是闭函数. 为使用共轭关联定理, 我们还需证明它是正常函数. 事实上, 根据第四章定理9, $(f\square\omega)^*=f^*+\omega^*.$ 因为极小卷积函数 $f\square\omega$ 是正常凸函数, 因此根据第四章定理2, $f^*+\omega^*$ 是正常函数. 此时, $f^*+\omega^*$ 是正常闭 $\frac{1}{L}$ -强凸函数, 由共轭关联定理, 就有 $f\square\omega=(f^*+\omega^*)^*$ 是 $L$ -光滑函数.

(ii) 设 $\mathbf{x}\in\mathbb{E}$ , $(f\square\omega)(\mathbf{x})=f(\mathbf{u(x)})+\omega(\mathbf{x}-\mathbf{u(x)}).$ 记 $\mathbf{z}\equiv\nabla\omega(\mathbf{x}-\mathbf{u(x)})$ . 下证 $\nabla(f\square\omega)(\mathbf{x})=\mathbf{z}$ . 这需要我们证明 $\lim_{\Vert\bm{\xi}\Vert\to0}|\phi(\bm{\xi})|/\Vert\bm{\xi}\Vert=0$ , 其中 $\phi(\bm{\xi})\equiv(f\square\omega)(\mathbf{x+\bm{\xi}})-(f\square\omega)(\mathbf{x})-\langle\bm{\xi},\mathbf{z}\rangle$ . 由极小卷积的定义, $(f\square\omega)(\mathbf{x+\bm{\xi}})\le f(\mathbf{u(x)})+\omega(\mathbf{x}+\bm{\xi}-\mathbf{u(x)}).$ 于是, $\begin{aligned}\phi(\bm{\xi})&=(f\square\omega)(\mathbf{x+\bm{\xi}})-(f\square\omega)(\mathbf{x})-\langle\bm{\xi},\mathbf{z}\rangle\\&\le\omega(\mathbf{x}+\bm{\xi}-\mathbf{u(x)})-\omega(\mathbf{x}-\mathbf{u(x)})-\langle\bm{\xi},\mathbf{z}\rangle\\&\le\langle\bm{\xi},\nabla\omega(\mathbf{x}+\bm{\xi}-\mathbf{u(x)})\rangle-\langle\bm{\xi},\mathbf{z}\rangle\:(\omega的梯度不等式)\\&=\langle\bm{\xi},\nabla\omega(\mathbf{x}+\bm{\xi}-\mathbf{u(x)})-\nabla\omega(\mathbf{x}-\mathbf{u(x)})\rangle\\&\le\Vert\bm{\xi}\Vert\cdot\Vert\nabla\omega(\mathbf{x}+\bm{\xi}-\mathbf{u(x)})-\nabla\omega(\mathbf{x}-\mathbf{u(x)})\Vert_*\\&\le L\Vert\bm{\xi}\Vert^2.\:(\omega的L-光滑性)\end{aligned}$ 下面仅需证明另一边: $\phi(\bm{\xi})\ge -L\Vert\bm{\xi}\Vert^2$ . 因为 $f\square\omega$ 是凸函数, 从而 $\phi$ 也是. 因为 $\phi(\mathbf{0})=0$ , 所以 $0=\phi(\mathbf{0})\le\phi(\bm{\xi})+\phi(-\bm{\xi}),\,\forall\bm{\xi}$ . 从而 $\phi(\bm{\xi})\ge-\phi(-\bm{\xi})\ge-L\Vert\bm{\xi}\Vert^2$ .

例13 ( $\frac{1}{2}d_C^2$ 的 $1$ -光滑性) 假设 $\mathbb{E}$ 是欧式空间, $C\subset\mathbb{E}$ 为一非空闭凸集. 考虑函数 $\varphi_C(\mathbf{x})=\frac{1}{2}d_C^2(\mathbf{x})$ . 我们已经在例3中证明了它是 $1$ -光滑的. 这里我们再提供基于定理9的第二种证明. 因为 $\varphi_C=\delta_C\square h$ , 其中 $h(\mathbf{x})=\frac{1}{2}\Vert\mathbf{x}\Vert^2$ , 且 $h$ 为实值 $1$ -光滑凸函数, $\delta_C$ 为正常闭凸函数. 于是由定理9, $\varphi_C$ 是 $1$ -光滑函数.

这里 $\Vert\mathbf{A}\Vert_{p,q}=\max\{\Vert\mathbf{Ax}\Vert_q:\Vert\mathbf{x}\Vert_p\le1\}$ 或可参见第一章. ↩︎
根据诱导范数的定义, 这样的 $\tilde\mathbf{x}$ 是存在的. ↩︎
事实上 $\psi_C$ 的凸性并不需要 $C$ 是凸集; 但是投影算子的非增大性是需要的. ↩︎
从这一不等式我们可知, 下降引理实际上还告诉我们, 如果 $\nabla f(\mathbf{x})$ 与 $\mathbf{y-x}$ 成钝角且 $\Vert\mathbf{x-y}\Vert$ 充分小, 则当 $f$ 从 $\mathbf{x}$ 移动到 $\mathbf{y}$ 时, 函数值至少下降 $\langle\nabla f(\mathbf{x}),\mathbf{x-y}\rangle-\frac{L}{2}\Vert\mathbf{x-y}\Vert^2$ . 这也是为什么称这个引理为下降引理的原因. ↩︎
定理2中关于函数凸性的假设是很关键的. 考虑 $f:\mathbb{R}^n\to\mathbb{R}$ 定义为 $f(\mathbf{x})=-\frac{1}{2}\Vert\mathbf{x}\Vert_2^2$ . 它在 $\ell_2$ -范数下是 $1$ -光滑的, 但不是 $L$ -光滑的( $L < 1$ , 见例1). 但由于 $f$ 是凹函数, 因此 $f(\mathbf{y})\le f(\mathbf{x})+\langle\nabla f(\mathbf{x}),\mathbf{y-x}\rangle$ , 这表明定理2的(ii)对 $L = 0$ 成立. 但显然 $f$ 并不是 $0$ -光滑函数. ↩︎
这里的“全空间”假设是为了在处理时的方便. ↩︎
特别地, 我们有 $\nabla g_{\mathbf{x}}(\mathbf{x})=\mathbf{0}$ , 再结合 $g_{\mathbf{x}}$ 是凸函数, 我们推出 $\mathbf{x}$ 是 $g_{\mathbf{x}}$ 的全局极小点: $g_{\mathbf{x}}(\mathbf{x})\le g_{\mathbf{x}}(\mathbf{z}),\quad\forall\mathbf{z}\in\mathbb{E}.$ ↩︎
这里在 $U$ 上的二次连续可微意思是, $f$ 的二阶偏导数均在 $U$ 上连续. ↩︎
这里 $[\mathbf{x,y}]$ 是在第一章第五节中定义的闭线段, 而不是矩形盒. ↩︎
这里的“可微”, 是按第三章的定义4定义的可微, 内积是点积. ↩︎
这里欧式空间的假设是关键的. 例如, 考虑单位单纯形上的负熵函数 $f(\mathbf{x})=\left\{\begin{array}{ll}\sum_{i=1}^nx_i\log x_i, & \mathbf{x}\in\Delta_n,\\\infty, & 其它.\end{array}\right.$ (之后我们会在例10中证明 $f$ 是 $\ell_1$ -范数下的 $1$ -强凸函数) 注意 $\ell_1$ -范数与空间上的点积是不相容的. 这时函数 $g(\mathbf{x})=f(\mathbf{x})-\alpha\Vert\mathbf{x}\Vert_1^2$ 对 $\forall\alpha>0$ 都是凸函数. 这是因为在 $f$ 的有效域上恒有 $\Vert\mathbf{x}\Vert_1=1$ . 如果直接用定理5的结论, 会推出 $f$ 对 $\forall\alpha>0$ 都是 $\alpha$ -强凸函数. 但一个函数是不可能如此的. ↩︎
证明可见Jean-Baptiste Hiriart-Urruty与Claude Lemarechal的专著《Convex Analysis and Minimization Algorithms I》的第26页定理4.2.4 ↩︎
证明可见R. Tyrrell Rockafellar的专著《Convex Analysis》的第45页定理6.1 ↩︎
这表明 $f$ 以一个严格凸二次函数为下界. ↩︎
(iii)与定理2的(iv)是十分相像的. 这也是建立光滑函数与强凸函数联系的关键. 而架起这一桥梁的是共轭运算. 这可见第四章的共轭次梯度定理. 详细的证明见定理8. ↩︎
存在性来自于线段原理. ↩︎
根据第二章定理10)定理10以及此一元函数闭凸. ↩︎
将 $f$ 的有效域设成全空间是为了在使用 $f^*$ 次微分时遇到不必要的麻烦. 而 $f$ 实值其实是保证了 $f$ 是闭函数. ↩︎