First Order Methods in Optimization Ch6. The Proximal Operator

最新推荐文章于 2024-06-20 09:43:59 发布

Learner Hu

最新推荐文章于 2024-06-20 09:43:59 发布

阅读量3.3k

点赞数 10

分类专栏： FOM in Optimization

原文链接：https://download.csdn.net/download/m0_37854871/11562555

版权

FOM in Optimization 专栏收录该内容

10 篇文章 70 订阅

订阅专栏

第六章: 临近算子

文章目录

第六章: 临近算子

本章所考虑的空间$\mathbb{E}$默认是欧式空间.

本章旨在介绍临近映射的相关内容. 这部分内容是本书后半部分许多算法的基础. 由于Moreau最先研究了临近算子及其性质, 所以我们也称这一映射为“Moreau临近映射”.

1. 定义、存在性和唯一性

定义1 (临近映射) 给定函数 $f:\mathbb{E}\to(-\infty,\infty]$ , 与 $f$ 相关的临近映射(proximal mapping)¹ $\mathrm{prox}_f$ 定义为 $\mathrm{prox}_f(\mathbf{x})=\arg\min_{\mathbf{u}\in\mathbb{E}}\left\{f(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\},\quad\forall\mathbf{x}\in\mathbb{E}.$ 即映射 $\mathrm{prox}_f$ 将 $\mathbf{x}\in\mathbb{E}$ 映成 $\mathbb{E}$ 中的一个子集. 这个子集可能是空集, 可能是单点集, 也可能有多个向量. 下面我们用几个例子来说明这几种情况.

例1 考虑如下三个 $\mathbb{R}\to\mathbb{R}$ 的函数: $\begin{aligned}g_1(x)&\equiv0,\\g_2(x)&=\left\{\begin{array}{ll}0, & x\ne0,\\-\lambda, & x=0,\end{array}\right.\\g_3(x)&=\left\{\begin{array}{ll}0, & x\ne0,\\\lambda, & x=0,\end{array}\right.\end{aligned}$ 其中 $\lambda>0$ 为一给定常数. 注意到 $g_2,g_3$ 是不连续函数.

$g_1$ 的prox: $\mathrm{prox}_{g_1}(x)=\arg\min_{u\in\mathbb{R}}\left\{g_1(u)+\frac{1}{2}(u-x)^2\right\}=\arg\min_{u\in\mathbb{R}}\left\{\frac{1}{2}(u-x)^2\right\}=\{x\},$ 即 $g_1$ 的prox都是单点集;
$g_2$ 的prox: 记 $\mathrm{prox}_{g_2}(x)=\arg\min_{u\in\mathbb{R}}\tilde g_2(u,x)$ , 其中 $\tilde g_2(u,x)\equiv g_2(u)+\frac{1}{2}(u-x)^2=\left\{\begin{array}{ll}-\lambda+\frac{x^2}{2}, & u=0,\\\frac{1}{2}(u-x)^2, & u\ne0.\end{array}\right.$ 若 $x\ne0$ , 则 $\frac{1}{2}(u-x)^2$ 在 $\mathbb{R}\setminus\{0\}$ 上的全局极小在 $u=x(\ne0)$ 处取得, 且最小值为 $0$ . 此时, 若 $0>-\lambda+\frac{x^2}{2}$ , 则 $\tilde g_2(\cdot,x)$ 在 $\mathbb{R}$ 上的唯一全局极小点是 $u = 0$ ; 若 $0<-\lambda+\frac{x^2}{2}$ , 则 $\tilde g_2(\cdot,x)$ 在 $\mathbb{R}$ 上的唯一全局极小点是 $u = x$ ; 若 $0=-\lambda+\frac{x^2}{2}$ , 则 $0, x$ 都是 $\tilde g_2(\cdot,x)$ 在 $\mathbb{R}$ 上的全局极小点. 最后, 若 $x = 0$ , 则显然 $\mathrm{prox}_{g_2}(x)=0$ . $\mathrm{prox}_{g_2}(x)=\left\{\begin{array}{ll}\{0\}, & |x|<\sqrt{2\lambda},\\\{x\}, & |x|>\sqrt{2\lambda},\\\{0,x\}, & |x|=\sqrt{2\lambda};\end{array}\right.$
$g_3$ 的prox: 计算 $\mathrm{prox}_{g_3}$ 的过程与 $g_2$ 类似. 我们直接给出 $\mathrm{prox}_{g_3}(x)=\left\{\begin{array}{ll}\{x\}, & x\ne0,\\\emptyset, & x=0.\end{array}\right.$

下面的第一临近定理给出了函数的prox是单点集的充分条件: 若 $f$ 正常闭凸, 则 $\mathrm{prox}_f(\mathbf{x})$ 必是单点集, 即prox存在且唯一. 尽管这仅是充分条件, 但也解释了上面 $g_1$ 的prox为什么总是单点集. 反过来, 由于 $\mathrm{prox}_{g_2},\mathrm{prox}_{g_3}$ 不总是单点集, 所以 $g_2,g_3$ 都不是正常闭凸函数.

定理1 (第一临近定理) 设 $f:\mathbb{E}\to(-\infty,\infty]$ 为一正常闭凸函数. 则对 $\forall\mathbf{x}\in\mathbb{E}$ , $\mathrm{prox}_f(\mathbf{x})$ 都是单点集.

证明: 对 $\forall\mathbf{x}\in\mathbb{E}$ , $\mathrm{prox}_f(\mathbf{x})=\arg\min_{\mathbf{u}\in\mathbb{E}}\tilde f(\mathbf{u,x}),$ 其中 $\tilde f(\mathbf{u,x})=f(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2$ . 由于 $\frac{1}{2}\Vert\cdot-\mathbf{x}\Vert^2$ 是闭强凸函数, $f$ 是闭凸函数, 所以根据第五章的引理1与第二章的定理2的(ii), $\tilde f(\cdot,\mathbf{x})$ 是闭强凸函数. 显然 $f(\cdot,\mathbf{x})$ 也是正常函数. 因此根据第五章的定理7, $\tilde f(\cdot,\mathbf{x})$ 在 $\mathbb{E}$ 上的全局极小点存在且唯一.

由于本章考虑的函数中绝大部分都是正常闭凸的, 所以我们将 $\mathrm{prox}_f$ 视作 $\mathbb{E}\to\mathbb{E}$ 的单值映射, 写作 $\mathrm{prox}_f(\mathbf{x})=\mathbf{y}$ 而不再写成 $\mathrm{prox}_f(\mathbf{x})=\{\mathbf{y}\}$ .

若我们放松第一临近定理中的条件, 仅要求函数是正常闭函数. 则我们仍可以在一定的强制性假设下证明 $\mathrm{prox}_f(\mathbf{x})$ 必定不是空集.

定理2 (闭性与强制性下prox的非空性) 设 $f:\mathbb{E}\to(-\infty,\infty]$ 为一正常闭函数, 并且假设函数 $\mathbf{u}\mapsto f(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2$ 对任何 $\mathbf{x}\in\mathbb{E}$ 都是强制的. 则对 $\forall\mathbf{x}\in\mathbb{E}$ , $\mathrm{prox}_f(\mathbf{x})\ne\emptyset$ .

证明: 对 $\forall\mathbf{x}\in\mathbb{E}$ , $h(\mathbf{u})\equiv f(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2$ 为正常闭强制函数. 根据第二章定理5, $h$ 在 $\mathbb{E}$ 上可以取到最小值, 于是必有 $\mathrm{prox}_f(\mathbf{x})$ 非空.

例1中的 $g_2,g_3$ 都满足强制性假设, 但仅有 $g_2$ 是闭函数. 因此相比于 $\mathrm{prox}_{g_2}(x)$ 从不为空, $\mathrm{prox}_{g_3}(x)$ 在某些特定的 $x$ 上为空也就不足为奇.

2. 临近映射的例子

本节讨论一些正常闭凸函数的临近映射. 由定理1可知, 它们都是单值映射.

2.1 常值函数

若 $f\equiv c\in\mathbb{R}$ , 则 $\mathrm{prox}_f(\mathbf{x})=\arg\min_{\mathbf{u}\in\mathbb{E}}\left\{c+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}=\mathbf{x}.$ 因此, $\boxed{\mathrm{prox}_f(\mathbf{x})=\mathbf{x}}$ 是恒等映射.

2.2 仿射函数

设 $f(\mathbf{x})=\langle\mathbf{a,x}\rangle+b$ , 其中 $\mathbf{a}\in\mathbb{E},\,b\in\mathbb{R}$ . 则 $\begin{aligned}\mathrm{prox}_f(\mathbf{x})&=\arg\min_{\mathbf{u}\in\mathbb{E}}\left\{\langle\mathbf{a,u}\rangle+b+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}\\&=\arg\min_{\mathbf{u}\in\mathbb{E}}\left\{\langle\mathbf{a,x}\rangle+b-\frac{1}{2}\Vert\mathbf{a}\Vert^2+\frac{1}{2}\Vert\mathbf{u-(x-a)}\Vert^2\right\}\\&=\mathbf{x-a}.\end{aligned}$ 因此, $\boxed{\mathrm{prox}_f(\mathbf{x})=\mathbf{x-a}}$ 是平移变换.

2.3 凸二次函数

设 $f:\mathbb{R}^n\to\mathbb{R}$ 定义为 $f(\mathbf{x})=\frac{1}{2}\mathbf{x}^T\mathbf{Ax}+\mathbf{b}^T\mathbf{x}+c$ , 其中 $\mathbf{A}\in\mathbb{S}_+^n,\,\mathbf{b}\in\mathbb{R}^n,\,c\in\mathbb{R}$ . 于是 $\mathrm{prox}_f(\mathbf{x})=\arg\min_{\mathbf{u}\in\mathbb{E}}\left\{\frac{1}{2}\mathbf{u}^T\mathbf{Au}+\mathbf{b}^T\mathbf{u}+c+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}.$ 由于目标函数是严格凸的, 所以其最优解在梯度为 $\mathbf{0}$ 时取得: $\mathbf{Au}+\mathbf{b}+\mathbf{u-x}=\mathbf{0}\Rightarrow (\mathbf{A+I})\mathbf{u}=\mathbf{x-b}.$ 因此, $\boxed{\mathrm{prox}_f(\mathbf{x})=(\mathbf{A+I})^{-1}(\mathbf{x-b}).}$

2.4 一维的例子

下面的引理包含了多个一维正常闭凸函数prox的计算结果. 它们在后面的推导中会起到重要作用.

引理1 $\begin{aligned}g_1(x)&=\left\{\begin{array}{ll}\mu x, & x\ge0,\\\infty, & x<0,\end{array}\right. &&\mathrm{prox}_{g_1}(x)=[x-\mu]_+,\\g_2(x)&=\lambda|x|, &&\mathrm{prox}_{g_2}(x)=[|x|-\lambda]_+\mathrm{sgn}(x),\\g_3(x)&=\left\{\begin{array}{ll}\lambda x^3, & x\ge0,\\\infty, & x<0,\end{array}\right. &&\mathrm{prox}_{g_3}(x)=\frac{-1+\sqrt{1+12\lambda[x]_+}}{6\lambda},\\g_4(x)&=\left\{\begin{array}{ll}-\lambda\log x, & x>0,\\\infty, & x\le0,\end{array}\right. &&\mathrm{prox}_{g_4}(x)=\frac{x+\sqrt{x^2+4\lambda}}{2},\\g_5&=\delta_{[0,\eta]\cap\mathbb{R}}(x), &&\mathrm{prox}_{g_5}(x)=\min\{\max\{x,0\},\eta\},\end{aligned}$ 其中 $\lambda\in\mathbb{R}_{++},\,\eta\in[0,\infty],\,\mu\in\mathbb{R}$ .

证明: 下面的证明将反复用到两件事:
(i) 若凸函数 $f$ 在一点 $u$ 处有 $f^{'} (u) = 0$ , 则 $u$ 必定为其全局极小点;
(ii) 若凸函数的全局极小点存在且最优值不在可微点处取到, 则它必定在不可微点处取到.

$g_1$ 的prox: 由定义, $\mathrm{prox}_{g_1}(x)$ 为函数 $f(u)=\left\{\begin{array}{ll}\infty, & u<0,\\f_1(u), & u\ge0\end{array}\right.$ 的全局极小点, 其中 $f_1(u)=\mu u+\frac{1}{2}(u-x)^2$ . 首先 $f'_1(u)=0$ 当且仅当 $u=x-\mu$ . 若 $x>\mu$ , 则 $f'(x-\mu)=f'_1(x-\mu)=0$ , 从而 $\mathrm{prox}_{g_1}(x)=x-\mu$ ; 若 $x\le\mu$ , 则 $f$ 的最优值必不在可微点处取到, 这时只能在 $0$ 处取到. 从而 $\mathrm{prox}_{g_1}(x)=[x-\mu]_+$ .
$g_2$ 的prox: $\mathrm{prox}_{g_2}(x)$ 为函数 $h(u)=\left\{\begin{array}{ll}h_1(u)\equiv\lambda u+\frac{1}{2}(u-x)^2, & u>0,\\h_2(u)\equiv-\lambda u+\frac{1}{2}(u-x)^2, & u\le0\end{array}\right.$ 的全局极小点. 若 $x>\lambda$ , 则令 $u=x-\lambda$ , $0=h_1'(u)=\lambda+u-x$ , 从而 $\mathrm{prox}_{g_2}(x)=x-\lambda$ . 类似地, 若 $x<-\lambda$ , 则 $\mathrm{prox}_{g_2}(x)=x+\lambda$ . 若 $|x|\le\lambda$ , 则 $\mathrm{prox}_{g_2}(x)$ 必为 $h$ 的唯一不可微点 $0$ .
$g_3$ 的prox: $\mathrm{prox}_{g_3}(x)$ 为函数 $s(u)=\left\{\begin{array}{ll}\lambda u^3+\frac{1}{2}(u-x)^2, & u\ge0,\\\infty, & u<0\end{array}\right.$ 的全局极小点. 若全局极小点为正, 则 $\tilde u=\mathrm{prox}_{g_3}(x)$ 满足 $s'(\tilde u)=0$ , 即 $3\lambda\tilde u^2+\tilde u-x=0.$ 这一方程有正解当且仅当 $x > 0$ , 且此时 $\mathrm{prox}_{g_3}(x)=\tilde u=\frac{-1+\sqrt{1+12\lambda x}}{6\lambda}$ ; 若 $x\le0$ , 则 $s$ 的全局极小点只能是不可微点, 从而必是有效域中的 $0$ .
$g_4$ 的prox: $\tilde u=\mathrm{prox}_{g_4}(x)$ 是函数 $t(u)=-\lambda\log u+\frac{1}{2}(u-x)^2$ 的全局极小点. 令 $t (u)$ 的导数为 $0$ , 即 $-\frac{\lambda}{\tilde u}+(\tilde u-x)=0\Rightarrow\tilde u^2-\tilde ux-\lambda=0.$ 由于此方程恒有正解, 从而 $\tilde u$ 必在 $\mathbb{R}_{++}$ 上取到, $\mathrm{prox}_{g_4}(x)=\tilde u=\frac{x+\sqrt{x^2+4\lambda}}{2}.$
$g_5$ 的prox: 首先假设 $\eta<\infty$ . 注意此时 $\tilde u=\mathrm{prox}_{g_5}(x)$ 为函数 $w(u)=\frac{1}{2}(u-x)^2$ 在 $[0,\eta]$ 上的全局极小点. 显然 $w$ 在 $\mathbb{R}$ 上的全局极小点是 $u = x$ . 因此, 若 $0\le x\le\eta$ , 则 $\tilde u=x$ ; 若 $x < 0$ , 由于 $w$ 在 $[0,\eta]$ 上单调递增, 所以 $\tilde u=0$ ; 若 $x>\eta$ , 由于 $w$ 在 $[0,\eta]$ 上单调递减, 所以 $\tilde u=\eta$ . $\mathrm{prox}_{g_5}(x)=\tilde u=\left\{\begin{array}{ll}x, & 0\le x\le\eta,\\0, & x<0,\\\eta, & x>\eta,\end{array}\right.=\min\{\max\{x,0\},\eta\}.$ 再考虑 $\eta=\infty$ , 此时 $g_5(x)=\delta_{[0,\infty)}(x)$ 就是 $\mu=0$ 的 $g_1$ . 所以 $\mathrm{prox}_{g_5}(x)=[x]_+$ . 这也可以写成 $\mathrm{prox}_{g_5}(x)=\min\{\max\{x,0\},\infty\}.$

3. 临近运算法则

在本节, 我们给出一些关于计算临近映射的结果. 其中某些结果是不需要任何关于凸性或闭性的假设的.

定理3 (可分函数的prox) 设 $f:\mathbb{E}_1\times\mathbb{E}_2\times\cdot\times\mathbb{E}_m\to(-\infty,\infty]$ 定义为 $f(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)=\sum_{i=1}^mf_i(\mathbf{x}_i),\quad\forall\mathbf{x}_i\in\mathbb{E}_i,\,i=1,2,\ldots,m.$ 则对 $\forall\mathbf{x}_1\in\mathbb{E}_1,\mathbf{x}_2\in\mathbb{E}_2,\ldots,\mathbf{x}_m\in\mathbb{E}_m$ , $\mathrm{prox}_f(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)=\mathrm{prox}_{f_1}(\mathbf{x}_1)\times\mathrm{prox}_{f_2}(\mathbf{x}_2)\times\cdots\times\mathrm{prox}_{f_m}(\mathbf{x}_m).$ ²

证明: $\begin{aligned}\mathrm{prox}_f(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)&=\arg\min_{\mathbf{y}_1,\mathbf{y}_2,\ldots,\mathbf{y}_m}\sum_{i=1}^m\left[\frac{1}{2}\Vert\mathbf{y}_i-\mathbf{x}_i\Vert^2+f_i(\mathbf{y}_i)\right]\\&=\prod_{i=1}^m\arg\min_{\mathbf{y}_i}\left[\frac{1}{2}\Vert\mathbf{y}_i-\mathbf{x}_i\Vert^2+f_i(\mathbf{y}_i)\right]\\&=\prod_{i=1}^m\mathrm{prox}_{f_i}(\mathbf{x}_i).\end{aligned}$

例2 ( $\ell_1$ -范数的prox) 设 $g:\mathbb{R}^n\to\mathbb{R}$ 定义为 $g(\mathbf{x})=\lambda\Vert\mathbf{x}\Vert_1$ , 其中 $\lambda>0$ . 则 $g(\mathbf{x})=\sum_{i=1}^n\varphi(x_i),$ 其中 $\varphi(t)=\lambda|t|$ . 由引理1中的 $\mathrm{prox}_{g_2}$ , 就有 $\mathrm{prox}_{\varphi}(s)=\mathcal{T}_{\lambda}(s)$ , 其中 $\mathcal{T}_{\lambda}$ 定义为 $\mathcal{T}_{\lambda}(y)=[|y|-\lambda]_+\mathrm{sgn}(y)=\left\{\begin{array}{ll}y-\lambda, & y\ge\lambda,\\0, & |y|<\lambda,\\y+\lambda, & y\le-\lambda.\end{array}\right.$ 函数 $\mathcal{T}_{\lambda}$ 被称作是软阈值函数(soft thresholding function), 其图像可见下图.
在这里插入图片描述
于是由定理3, $\mathrm{prox}_g(\mathbf{x})=\left(\mathcal{T}_{\lambda}(x_j)\right)_{j=1}^n$ . 为表述方便, 我们推广软阈值函数的定义, 使它成为 $\mathbb{R}^n$ 上的函数, 即对 $\forall\mathbf{x}\in\mathbb{R}^n$ , $\mathcal{T}_{\lambda}(\mathbf{x})\equiv\left(\mathcal{T}_{\lambda}(x_j)\right)_{j=1}^n=[|\mathbf{x}|-\lambda\mathbf{e}]_+\odot\mathrm{sgn}(\mathbf{x}).$ 因此, $\boxed{\mathrm{prox}_g(\mathbf{x})=\mathcal{T}_{\lambda}(\mathbf{x}).}$

例3 (负对数和的prox) 设 $g:\mathbb{R}^n\to(-\infty,\infty]$ 定义为 $g(\mathbf{x})=\left\{\begin{array}{ll}-\lambda\sum_{k=1}^n\log x_j, & \mathbf{x}>\mathbf{0},\\\infty, & 其它,\end{array}\right.$ 其中 $\lambda>0$ . 于是 $g(\mathbf{x})=\sum_{i=1}^n\varphi(x_i)$ , 其中 $\varphi(t)=\left\{\begin{array}{ll}-\lambda\log t, & t>0,\\\infty, & t<0.\end{array}\right.$ 由引理1中的 $\mathrm{prox}_{g_4}$ , 就有 $\mathrm{prox}_{\varphi}(s)=\frac{s+\sqrt{s^2+4\lambda}}{2}.$ 最后, 由定理3, $\boxed{\mathrm{prox}_g(\mathbf{x})=\left(\mathrm{prox}_{\varphi}(x_j)\right)_{j=1}^n=\left(\frac{x_j+\sqrt{x_j^2+4\lambda}}{2}\right)_{j=1}^n.}$

例4 ( $\ell_0$ -范数的prox) 设 $f:\mathbb{R}^n\to\mathbb{R}$ 定义为 $f(\mathbf{x})=\lambda\Vert\mathbf{x}\Vert_0$ , 其中 $\lambda>0,\,\Vert\mathbf{x}\Vert_0=\#\{i:x_i\ne0\}$ . 于是对 $\forall\mathbf{x}\in\mathbb{R}^n$ , $f(\mathbf{x})=\sum_{i=1}^nI(x_i),$ 其中 $I(t)=\left\{\begin{array}{ll}\lambda, & t\ne0,\\0, & t=0.\end{array}\right.$ 注意到 $I(\cdot)=J(\cdot)+\lambda$ , 其中 $J(t)=\left\{\begin{array}{ll}0, & t\ne0,\\-\lambda, & t=0,\end{array}\right.$ 而由例1中的 $\mathrm{prox}_{g_2}$ , $\mathrm{prox}_J(s)=\left\{\begin{array}{ll}\{0\}, & |s|<\sqrt{2\lambda},\\\{s\}, & |s|>\sqrt{2\lambda},\\\{0,s\}, & |s|=\sqrt{2\lambda}.\end{array}\right.$ 我们引入硬阈值映射 $\mathcal{H}_{\alpha}$ , 它的定义是 $\mathcal{H}_{\alpha}(s)\equiv\left\{\begin{array}{ll}\{0\}, & |s|<\alpha,\\\{s\}, & |s|>\alpha,\\\{0,s\}, & |s|=\alpha.\end{array}\right.$ 因此, $\mathrm{prox}_J(s)=\mathcal{H}_{\sqrt{2\lambda}}(s)$ . 易验证 $\mathrm{prox}_I=\mathrm{prox}_J$ . 于是由定理3, $\boxed{\mathrm{prox}_g(\mathbf{x})=\mathcal{H}_{\sqrt{2\lambda}}(x_1)\times\mathcal{H}_{\sqrt{2\lambda}}(x_2)\times\cdots\times\mathcal{H}_{\sqrt{2\lambda}}(x_n).}$

定理4 (伸缩与平移变换后的prox) 设 $g:\mathbb{E}\to(-\infty,\infty]$ 为一正常函数, $\lambda\ne0,\,\mathbf{a}\in\mathbb{E}$ . 定义 $f(\mathbf{x})=g(\lambda\mathbf{x+a})$ . 则 $\mathrm{prox}_f(\mathbf{x})=\frac{1}{\lambda}\left[\mathrm{prox}_{\lambda^2g}(\lambda\mathbf{x+a})-\mathbf{a}\right].$

证明: $\begin{aligned}\mathrm{prox}_f(\mathbf{x})&=\arg\min_{\mathbf{u}}\left\{f(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}\\&=\arg\min_{\mathbf{u}}\left\{g(\lambda\mathbf{u+a})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}\\&\overset{\mathbf{z}=\lambda\mathbf{u+a}}{=}\frac{1}{\lambda}\left[\arg\min_{\mathbf{z}}\left\{g(\mathbf{z})+\frac{1}{2}\left\Vert\frac{1}{\lambda}(\mathbf{z-a})-\mathbf{x}\right\Vert^2\right\}-\mathbf{a}\right]\\&=\frac{1}{\lambda}\left[\arg\min_{\mathbf{z}}\left\{\lambda^2g(\mathbf{z})+\frac{1}{2}\Vert\mathbf{z}-(\lambda\mathbf{x+a})\Vert^2\right\}-\mathbf{a}\right]\\&=\frac{1}{\lambda}\left[\mathrm{prox}_{\lambda^2g}(\lambda\mathbf{x+a})-\mathbf{a}\right].\end{aligned}$

定理5 ( $\lambda g(\cdot/\lambda)$ 的prox) 设 $g:\mathbb{E}\to(-\infty,\infty]$ 为一正常函数, $\lambda\ne0$ . 定义 $f(\mathbf{x})=\lambda g(\mathbf{x}/\lambda)$ . 则 $\mathrm{prox}_f(\mathbf{x})=\lambda\mathrm{prox}_{g/\lambda}(\mathbf{x}/\lambda).$

证明: $\begin{aligned}\mathrm{prox}_f(\mathbf{x})&=\arg\min_{\mathbf{u}}\left\{\lambda g\left(\frac{\mathbf{u}}{\lambda}\right)+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}\\&\overset{\mathbf{z}=\mathbf{u}/\lambda}{=}\lambda\arg\min_{\mathbf{z}}\left\{\lambda g(\mathbf{z})+\frac{1}{2}\Vert\lambda\mathbf{z-x}\Vert^2\right\}\\&=\lambda\arg\min_{\mathbf{z}}\left\{\frac{g(\mathbf{z})}{\lambda}+\frac{1}{2}\left\Vert\mathbf{z-\frac{x}{\lambda}}\right\Vert^2\right\}\\&=\lambda\mathrm{prox}_{g/\lambda}(\mathbf{x}/\lambda).\end{aligned}$

定理6 (二次扰动下的prox) 设 $g:\mathbb{E}\to(-\infty,\infty]$ 为一正常函数, $f(\mathbf{x})=g(\mathbf{x})+\frac{c}{2}\Vert\mathbf{x}\Vert^2+\langle\mathbf{a,x}\rangle+\gamma$ , 其中 $c>0,\,\mathbf{a}\in\mathbb{E},\,\gamma\in\mathbb{R}$ . 则 $\mathrm{prox}_f(\mathbf{x})=\mathrm{prox}_{\frac{1}{c+1}g}\left(\frac{\mathbf{x-a}}{c+1}\right).$

证明: $\begin{aligned}\mathrm{prox}_f(\mathbf{x})&=\arg\min_{\mathbf{u}}\left\{f(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}\\&=\arg\min_{\mathbf{u}}\left\{g(\mathbf{u})+\frac{c}{2}\Vert\mathbf{u}\Vert^2+\langle\mathbf{a,u}\rangle+\gamma+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}\\&=\arg\min_{\mathbf{u}}\left\{g(\mathbf{u})+\frac{c+1}{2}\left\Vert\mathbf{u}-\left(\frac{\mathbf{x-a}}{c+1}\right)\right\Vert^2\right\}\\&=\mathrm{prox}_{\frac{1}{c+1}g}\left(\frac{\mathbf{x-a}}{c+1}\right).\end{aligned}$

例5 考虑函数 $f:\mathbb{R}\to(-\infty,\infty]$ 定义为, 对 $\forall x\in\mathbb{R}$ , $f(x)=\left\{\begin{array}{ll}\mu x, & 0\le x\le\alpha,\\\infty, & 其它,\end{array}\right.$ 其中 $\mu\in\mathbb{R},\,\alpha\in[0,\infty]$ . 首先注意到 $f$ 可以表示成 $f(x)=\delta_{[0,\alpha]\cap\mathbb{R}}(x)+\mu x.$ 由引理1的 $\mathrm{prox}_{g_5}$ , $\mathrm{prox}_{\delta_{[0,\alpha]\cap\mathbb{R}}}(x)=\min\{\max\{x,0\},\alpha\}$ . 再利用定理6(令 $c=0,\,\mathbf{a}=\mu,\,\gamma=0$ ), 我们就有对 $\forall x\in\mathbb{R}$ , $\boxed{\mathrm{prox}_f(x)=\mathrm{prox}_g(x-\mu)=\min\{\max\{x-\mu,0\},\alpha\}.}$

遗憾的是, 至今仍未有一个函数与一个一般仿射映射的复合函数的prox公式. 但若相应的线性变换满足一定的正交性条件, 情况就不一样了.

定理7 (与仿射映射复合的prox) 设 $g:\mathbb{R}^m\to(-\infty,\infty]$ 为一正常闭凸函数, $f(\mathbf{x})=g(\mathcal{A}(\mathbf{x})+\mathbf{b})$ , 其中 $\mathbf{b}\in\mathbb{R}^m$ , $\mathcal{A}:\mathbb{V}\to\mathbb{R}^m$ 为对某个常量 $\alpha>0$ 满足 $\mathcal{A}\circ\mathcal{A}^T=\alpha\mathcal{I}$ 的线性变换. 则对 $\forall\mathbf{x}\in\mathbb{V}$ , $\mathrm{prox}_f(\mathbf{x})=\mathbf{x}+\frac{1}{\alpha}\mathcal{A}^T\left(\mathrm{prox}_{\alpha g}(\mathcal{A}(\mathbf{x})+\mathbf{b})-\mathcal{A}(\mathbf{x})-\mathbf{b}\right).$

证明: 由定义, $\mathrm{prox}_f(\mathbf{x})$ 为以下问题的最优解: $\min_{\mathbf{u}\in\mathbb{V}}\left\{g(\mathcal{A}(\mathbf{u})+\mathbf{b})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}.$ 引入分裂变量 $\mathbf{z}$ 后, 此问题又等价于如下的约束优化问题: $\begin{array}{ll}\min_{\mathbf{u}\in\mathbb{V},\mathbf{z}\in\mathbb{R}^m} & g(\mathbf{z})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\\\mathrm{s.t.} & \mathbf{z}=\mathcal{A}(\mathbf{u})+\mathbf{b}.\end{array}$ 令 $(\tilde\mathbf{z},\tilde\mathbf{u})$ 为其最优解³. 注意到 $\tilde\mathbf{u}=\mathrm{prox}_f(\mathbf{x})$ . 固定 $\mathbf{z}=\tilde\mathbf{z}$ . 于是 $\tilde\mathbf{u}$ 是问题 $\begin{array}{ll}\min_{\mathbf{u}\in\mathbb{V}} & \frac{1}{2}\Vert\mathbf{u-x}\Vert^2\\\mathrm{s.t.} & \mathcal{A}(\mathbf{u})=\tilde\mathbf{z}-\mathbf{b}\end{array}$ 的最优解. 因为此问题满足强对偶性(见第三章的介绍), 于是有其最优性条件: 存在 $\mathbf{y}\in\mathbb{R}^m$ 使得 $\begin{aligned}\tilde\mathbf{u}&\in\arg\min_{\mathbf{u}\in\mathbb{V}}\left\{\frac{1}{2}\Vert\mathbf{u-x}\Vert^2+\langle\mathbf{y},\mathcal{A}(\mathbf{u})-\tilde\mathbf{z}+\mathbf{b}\rangle\right\},\\\mathcal{A}(\tilde\mathbf{u})&=\tilde\mathbf{z}-\mathbf{b}.\end{aligned}$ 根据第一个式子, $\tilde\mathbf{u}=\mathbf{x}-\mathcal{A}^T(\mathbf{y}).$ 再将此代入第二个式子, $\mathcal{A}\left(\mathbf{x}-\mathcal{A}^T(\mathbf{y})\right)=\tilde\mathbf{z}-\mathbf{b}.$ 利用 $\mathcal{A}$ 的正交性条件推出 $\alpha\mathbf{y}=\mathcal{A}(\mathbf{x})+\mathbf{b}-\tilde\mathbf{z},$ 于是 $\tilde\mathbf{u}=\mathbf{x}+\frac{1}{\alpha}\mathcal{A}^T(\tilde\mathbf{z}-\mathcal{A}(\mathbf{x})-\mathbf{b}).$ 这就得到了 $\tilde\mathbf{u}$ 关于 $\tilde\mathbf{z}$ 的表达式. 这就起到了消元的作用. 此时 $\tilde\mathbf{z}$ 就是 $\begin{aligned}\tilde\mathbf{z}&=\arg\min_{\mathbf{z}\in\mathbb{R}^m}\left\{g(\mathbf{z})+\frac{1}{2}\left\Vert\mathbf{x}+\frac{1}{\alpha}\mathcal{A}^T(\mathbf{z}-\mathcal{A}(\mathbf{x})-\mathbf{b})-\mathbf{x}\right\Vert^2\right\}\\&=\arg\min_{\mathbf{z}\in\mathbb{R}^m}\left\{g(\mathbf{z})+\frac{1}{2\alpha^2}\left\Vert\mathcal{A}^T(\mathbf{z}-\mathcal{A}(\mathbf{x})-\mathbf{b})\right\Vert^2\right\}\\&=\arg\min_{\mathbf{z}\in\mathbb{R}^m}\left\{\alpha g(\mathbf{z})+\frac{1}{2}\left\Vert\mathbf{z}-\mathcal{A}(\mathbf{x})-\mathbf{b}\right\Vert^2\right\}\\&=\mathrm{prox}_{\alpha g}\left(\mathcal{A}(\mathbf{x})+\mathbf{b}\right).\end{aligned}$ 最后把 $\tilde\mathbf{z}$ 关于 $\mathbf{x}$ 的表达式代入 $\tilde\mathbf{u}$ 关于 $\tilde\mathbf{z}$ 的表达式即得证.

例6 设 $g:\mathbb{E}\to(-\infty,\infty]$ 为正常闭凸函数, 其中 $\mathbb{E}=\mathbb{R}^d$ ; $f:\mathbb{E}^m\to(-\infty,\infty]$ 定义为 $f(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)=g(\mathbf{x}_1+\mathbf{x}_2+\cdots+\mathbf{x}_m).$ 利用复合运算, $f$ 可以写成函数复合的形式: $f(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)=g(\mathcal{A}(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m))$ , 其中 $\mathcal{A}:\mathbb{E}^m\to\mathbb{E}$ 是线性变换 $\mathcal{A}(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)=\mathbf{x}_1+\mathbf{x}_2+\cdots+\mathbf{x}_m.$ 于是 $\mathcal{A}$ 的伴随变换 $\mathcal{A}^T:\mathbb{E}\to\mathbb{E}^m$ 为 $\mathcal{A}^T(\mathbf{x})=(\mathbf{x},\mathbf{x},\ldots,\mathbf{x}),$ 于是对 $\forall\mathbf{x}\in\mathbb{E}$ , $\mathcal{A}(\mathcal{A}^T(\mathbf{x}))=m\mathbf{x}.$ 因此, 在定理7中令 $\alpha=m,\,\mathbf{b}=\mathbf{0}$ , 对 $\forall(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)\in\mathbb{E}^m$ , $\boxed{\mathrm{prox}_f(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)_j=\mathbf{x}_j+\frac{1}{m}\left(\mathrm{prox}_{mg}\left(\sum_{i=1}^m\mathbf{x}_i\right)-\sum_{i=1}^m\mathbf{x}_i\right),\quad j=1,2,\ldots,m.}$

例7 设 $f:\mathbb{R}^n\to\mathbb{E}$ 定义为 $f(\mathbf{x})=|\mathbf{a}^T\mathbf{x}|$ , 其中 $\mathbf{a}\in\mathbb{R}^n\setminus\{\mathbf{0}\}$ . 利用复合运算, 我们可以将 $f$ 写成 $f(\mathbf{x})=g(\mathbf{a}^T\mathbf{x})$ , 其中 $g (t) = ∣ t ∣$ . 由引理1中的 $\mathrm{prox}_{g_2}$ , $\mathrm{prox}_{\lambda g}=\mathcal{T}_{\lambda}$ , 其中 $\mathcal{T}_{\lambda}(x)=[|x|-\lambda]_+\mathrm{sgn}(x)$ 为软阈值函数. 在定理7中令 $\alpha=\Vert\mathbf{a}\Vert^2,\,\mathbf{b}=\mathbf{0}$ , $\mathcal{A}:\mathbf{x}\mapsto\mathbf{a}^T\mathbf{x}$ , 就有 $\boxed{\mathrm{prox}_f(\mathbf{x})=\mathbf{x}+\frac{1}{\Vert\mathbf{a}\Vert^2}\left(\mathcal{T}_{\Vert\mathbf{a}\Vert^2}\left(\mathbf{a}^T\mathbf{x})\right)-\mathbf{a}^T\mathbf{x}\right)\mathbf{a}.}$

定理8 (范数复合) 设 $f:\mathbb{E}\to\mathbb{R}$ 定义为 $f(\mathbf{x})=g(\Vert\mathbf{x}\Vert)$ , 其中 $g:\mathbb{R}\to(-\infty,\infty]$ 为正常闭凸函数, 满足 $\mathrm{dom}(g)\subset[0,\infty)$ . 于是 $\mathrm{prox}_f(\mathbf{x})=\left\{\begin{array}{ll}\mathrm{prox}_g(\Vert\mathbf{x}\Vert)\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}, & \mathbf{x\ne0},\\\{\mathbf{u}\in\mathbb{E}:\Vert\mathbf{u}\Vert=\mathrm{prox}_g(0)\}, & \mathbf{x=0}.\end{array}\right.$

证明: 由定义, $\mathrm{prox}_f(\mathbf{0})$ 是以下问题的全局极小点: $\min_{\mathbf{u}\in\mathbb{E}}\left\{f(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u}\Vert^2\right\}=\min_{\mathbf{u}\in\mathbb{E}}\left\{g(\Vert\mathbf{u}\Vert)+\frac{1}{2}\Vert\mathbf{u}\Vert^2\right\}.$ 作变量替换 $w=\Vert\mathbf{u}\Vert$ , 则问题变为等价的⁴ $\min_{w\in\mathbb{R}}\left\{g(w)+\frac{1}{2}w^2\right\}.$ 此问题最优解为 $\mathrm{prox}_g(0)$ ⁵, 因此 $\mathrm{prox}_f(\mathbf{0})$ 就是所有满足 $\Vert\mathbf{u}\Vert=\mathrm{prox}_g(0)$ 的 $\mathbf{u}$ . 下面考虑 $\mathbf{x\ne0}$ 的情形. $\begin{aligned}\min_{\mathbf{u}\in\mathbb{E}}\left\{g(\Vert\mathbf{u}\Vert)+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}&=\min_{\mathbf{u}\in\mathbb{E}}\left\{g(\Vert\mathbf{u}\Vert)+\frac{1}{2}\Vert\mathbf{u}\Vert^2-\langle\mathbf{u,x}\rangle+\frac{1}{2}\Vert\mathbf{x}\Vert^2\right\}\\&=\min_{\alpha\in\mathbb{R}_+}\min_{\mathbf{u}\in\mathbb{E}:\Vert\mathbf{u}\Vert=\alpha}\left\{g(\alpha)+\frac{1}{2}\alpha^2-\langle\mathbf{u,x}\rangle+\frac{1}{2}\Vert\mathbf{x}\Vert^2\right\}.\end{aligned}$ 根据Cauchy-Schwarz不等式易知内部极小化问题的解为 $\mathbf{u}=\alpha\mathbf{x}/\Vert\mathbf{x}\Vert$ , 对应的最优值为 $g(\alpha)+\frac{1}{2}(\alpha-\Vert\mathbf{x}\Vert)^2$ . 因此要得到此时的 $\mathrm{prox}_f(\mathbf{x})$ , 只需求解外部极小化问题: $\begin{aligned}\alpha&=\arg\min_{\alpha\in\mathbb{R}_+}\left\{g(\alpha)+\frac{1}{2}(\alpha-\Vert\mathbf{x}\Vert^2)\right\}\\&=\arg\min_{\alpha\in\mathbb{R}}\left\{g(\alpha)+\frac{1}{2}(\alpha-\Vert\mathbf{x}\Vert^2)\right\}\\&=\mathrm{prox}_g(\Vert\mathbf{x}\Vert).\end{aligned}$ 于是 $\mathrm{prox}_f(\mathbf{x})=\mathrm{prox}_g(\Vert\mathbf{x}\Vert)\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}$ .

例8 (欧式范数的prox) 设 $f:\mathbb{E}\to\mathbb{R}$ 定义为 $f(\mathbf{x})=\lambda\Vert\mathbf{x}\Vert$ , 其中 $\lambda>0$ , $\Vert\cdot\Vert$ 是欧式范数. 利用复合运算, $f(\mathbf{x})=g(\Vert\mathbf{x}\Vert)$ , 其中 $g(t)=\left\{\begin{array}{ll}\lambda t, & t\ge0,\\\infty, & t<0.\end{array}\right.$ 由定理8, 对 $\forall\mathbf{x}\in\mathbb{E}$ , $\mathrm{prox}_f(\mathbf{x})=\left\{\begin{array}{ll}\mathrm{prox}_g(\Vert\mathbf{x}\Vert)\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}, & \mathbf{x\ne0},\\\{\mathbf{u}\in\mathbb{E}:\Vert\mathbf{u}\Vert=\mathrm{prox}_g(0)\}, & \mathbf{x=0}.\end{array}\right.$ 由引理1中的 $\mathrm{prox}_{g_1}$ , $\mathrm{prox}_g(t)=[t-\lambda]_+$ . 因此 $\mathrm{prox}_g(0)=0,\,\mathrm{prox}_g(\Vert\mathbf{x}\Vert)=[\Vert\mathbf{x}\Vert-\lambda]_+$ . 代入可得 $\boxed{\mathrm{prox}_f(\mathbf{x})=\left\{\begin{array}{ll}[\Vert\mathbf{x}\Vert-\lambda]_+\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}, & \mathbf{x\ne0},\\\mathbf{0}, & \mathbf{x=0}\end{array}\right.=\left(1-\frac{\lambda}{\max\{\Vert\mathbf{x}\Vert,\lambda\}}\right)\mathbf{x}.}$

例9 (立方欧式范数的prox) 设 $f(\mathbf{x})=\lambda\Vert\mathbf{x}\Vert^3$ , 其中 $\lambda>0$ . 利用复合运算, $f(\mathbf{x})=\lambda g(\Vert\mathbf{x}\Vert)$ , 其中 $g(t)=\left\{\begin{array}{ll}t^3, & t\ge0,\\\infty, & t<0.\end{array}\right.$ 首先由定理8, 对 $\forall\mathbf{x}\in\mathbb{E}$ , $\mathrm{prox}_f(\mathbf{x})=\left\{\begin{array}{ll}\mathrm{prox}_g(\Vert\mathbf{x}\Vert)\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}, & \mathbf{x\ne0},\\\{\mathbf{u}\in\mathbb{E}:\Vert\mathbf{u}\Vert=\mathrm{prox}_g(0)\}, & \mathbf{x=0}.\end{array}\right.$ 再由引理1的 $\mathrm{prox}_{g_3}$ , $\mathrm{prox}_g(t)=\frac{-1+\sqrt{1+12\lambda[t]_+}}{6\lambda}$ . 因此 $\mathrm{prox}_g(0)=0$ , $\boxed{\mathrm{prox}_f(\mathbf{x})=\left\{\begin{array}{ll}\frac{-1+\sqrt{1+12\lambda\Vert\mathbf{x}\Vert}}{6\lambda}\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}, & \mathbf{x\ne0},\\\mathbf{0}, & \mathbf{x=0}\end{array}\right.=\frac{2}{1+\sqrt{1+12\lambda\Vert\mathbf{x}\Vert}}\mathbf{x}.}$

例10 (负欧式范数的prox) 设 $f:\mathbb{E}\to\mathbb{R}$ 定义为 $f(\mathbf{x})=-\lambda\Vert\mathbf{x}\Vert$ , 其中 $\lambda>0$ . 这时 $f$ 不是凸函数, 因此我们不能说prox是单值映射. 但 $f$ 是闭函数, 且映射 $\mathbf{u}\mapsto f(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2$ 对任一 $\mathbf{x}\in\mathbb{E}$ 是强制的. 于是由定理2, $\mathrm{prox}_f(\mathbf{x})$ 总是非空的. 为计算之, 首先利用复合运算, $f(\mathbf{x})=g(\Vert\mathbf{x}\Vert)$ , 其中 $g(t)=\left\{\begin{array}{ll}-\lambda t, & t\ge0,\\\infty, & t<0.\end{array}\right.$ 再由定理8, 对 $\forall\mathbf{x}\in\mathbb{E}$ , $\mathrm{prox}_f(\mathbf{x})=\left\{\begin{array}{ll}\mathrm{prox}_g(\Vert\mathbf{x}\Vert)\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}, & \mathbf{x\ne0},\\\{\mathbf{u}\in\mathbb{E}:\Vert\mathbf{u}\Vert=\mathrm{prox}_g(0)\}, & \mathbf{x=0}.\end{array}\right.$ 由引理1的 $\mathrm{prox}_{g_1}$ , $\mathrm{prox}_g(t)=[t+\lambda]_+$ . 代入可得 $\mathrm{prox}_g(0)=\lambda$ , $\boxed{\mathrm{prox}_f(\mathbf{x})=\left\{\begin{array}{ll}\left(1+\frac{\lambda}{\Vert\mathbf{x}\Vert}\right)\mathbf{x}, & \mathbf{x\ne0},\\\{\mathbf{u}:\Vert\mathbf{u}\Vert=\lambda\}, & \mathbf{x=0}.\end{array}\right.}$

例11 (对称区间上绝对值函数的prox) 考虑函数 $f:\mathbb{R}\to(-\infty,\infty]$ 定义为 $f(x)=\left\{\begin{array}{ll}\lambda|x|, & |x|\le\alpha,\\\infty, & 其它,\end{array}\right.$ 其中 $\lambda\in[0,\infty),\,\alpha\in[0,\infty]$ . 于是 $f (x) = g (∣ x ∣)$ , 其中 $g(x)=\left\{\begin{array}{ll}\lambda x, & 0\le x\le\alpha,\\\infty, & 其它.\end{array}\right.$ 由定理8, 对 $\forall x$ , $\mathrm{prox}_f(\mathbf{x})=\left\{\begin{array}{ll}\mathrm{prox}_g(|x|)\frac{x}{|x|}, & x\ne0,\\\{u\in\mathbb{R}:|u|=\mathrm{prox}_g(0)\}, & x=0.\end{array}\right.$ 由例5, $\mathrm{prox}_g(x)=\min\{\max\{x-\lambda,0\},\alpha\}$ , 代入并注意到 $\frac{x}{|x|}=\mathrm{sgn}(x),\,\forall x\ne0$ , 可得 $\boxed{\mathrm{prox}_f(x)=\min\{\max\{|x|-\lambda,0\},\alpha\}\mathrm{sgn}(x).}$

例12 (盒型区域上加权 $\ell_1$ -范数的prox) 考虑函数 $f:\mathbb{R}^n\to\mathbb{R}$ 定义为 $f(\mathbf{x})=\left\{\begin{array}{ll}\sum_{i=1}^n\omega_i|x_i|, & -\bm{\alpha}\le\mathbf{x}\le\bm{\alpha},\\\infty, & 其它,\end{array}\right.$ 其中 $\mathbf{x}\in\mathbb{R}^n,\,\bm{\omega}\in\mathbb{R}_+^n,\,\bm{\alpha}\in[0,\infty]^n$ . 于是 $f=\sum_{i=1}^nf_i$ , 其中 $f_i(x)=\left\{\begin{array}{ll}\omega_i|x|, & -\alpha_i\le x\le\alpha_i,\\\infty, & 其它.\end{array}\right.$ 由例11和定理3, 就有 $\boxed{\mathrm{prox}_f(\mathbf{x})=\left(\min\{\max\{|x_i|-\omega_i,0\},\alpha_i\}\mathrm{sgn}(x_i)\right)_{i=1}^n.}$

3.1 临近计算小结

$f(\mathbf{x})$	$\mathrm{prox}_f(\mathbf{x})$	假设条件	定理号
$\sum_{i=1}^mf_i(\mathbf{x}_i)$	$\mathrm{prox}_{f_1}(\mathbf{x}_1)\times\cdots\times\mathrm{prox}_{f_m}(\mathbf{x}_m)$		3
$g(\lambda\mathbf{x}+\mathbf{a})$	$\frac{1}{\lambda}\left[\mathrm{prox}_{\lambda^2g}(\lambda\mathbf{x}+\mathbf{a})-\mathbf{a}\right]$	$\lambda\ne0,\,\mathbf{a}\in\mathbb{E},g$ 正常	4
$\lambda g(\mathbf{x}/\lambda)$	$\lambda\mathrm{prox}_{g/\lambda}(\mathbf{x}/\lambda)$	$\lambda\ne0,\,g$ 正常	5
$g(\mathbf{x})+\frac{c}{2}\Vert\mathbf{x}\Vert^2+\langle\mathbf{a,x}\rangle+\gamma$	$\mathrm{prox}_{\frac{1}{c+1}g}\left(\frac{\mathbf{x-a}}{c+1}\right)$	$\mathbf{a}\in\mathbb{E},\,c>0,\,\gamma\in\mathbb{R},\,g$ 正常	6
$g(\mathcal{A}(\mathbf{x})+\mathbf{b})$	$\mathbf{x}+\frac{1}{\alpha}\mathcal{A}^T\left(\mathrm{prox}_{\alpha g}(\mathcal{A}(\mathbf{x})+\mathbf{b})-\mathcal{A}(\mathbf{x})-\mathbf{b}\right)$	$\mathbf{b}\in\mathbb{R}^m,\,\mathcal{A}:\mathbb{V}\to\mathbb{R}^m,\,\mathcal{A}\circ\mathcal{A}^T=\alpha\mathcal{I},\,g$ 正常闭凸, $\alpha>0$	7
$g(\Vert\mathbf{x}\Vert)$	$\begin{array}{ll}\mathrm{prox}_g(\Vert\mathbf{x}\Vert)\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}, & \mathbf{x\ne0},\\\{\mathbf{u}:\Vert\mathbf{u}\Vert=\mathrm{prox}_g(0)\}, & \mathbf{x=0}.\end{array}$	$g$ 正常闭凸, $\mathrm{dom}(g)\subset[0,\infty)$	8

4. 指示函数的prox–正交投影

4.1 第一投影定理

设 $g:\mathbb{E}\to(-\infty,\infty]$ 定义为 $g(\mathbf{x})=\delta_C(\mathbf{x})$ , 其中 $C$ 为非空集合. 则 $\mathrm{prox}_g(\mathbf{x})=\arg\min_{\mathbf{u}\in\mathbb{E}}\left\{\delta_C(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}=\arg\min_{\mathbf{u}\in C}\Vert\mathbf{u-x}\Vert^2=P_C(\mathbf{x}).$ 于是, 集合指示函数的临近映射就是同一集合上的正交投影算子.

定理9 设 $C\subset\mathbb{E}$ 非空. 则 $\mathrm{prox}_{\delta_C}(\mathbf{x})=P_C(\mathbf{x}),\,\forall\mathbf{x}\in\mathbb{E}$ .

若除了非空外, $C$ 还是闭凸集, 则相应的指示函数 $\delta_C$ 就是正常闭凸函数, 从而由第一临近定理, 正交投影算子就是单值的.

定理10 (第一投影定理) 设 $C\subset\mathbb{E}$ 为一非空闭凸集. 则对 $\forall\mathbf{x}\in\mathbb{E}$ , $P_C(\mathbf{x})$ 是单点集.

4.2 $\mathbb{R}^n$ 中的例子

引理2 (到 $\mathbb{R}^n$ 子集上的正交投影) 以下是 $\mathbb{R}^n$ 中的一些非空闭凸集及其对应的正交投影:
$\begin{aligned}& \text{非负象限} &&C_1=\mathbb{R}_+^n, &&[\mathbf{x}]_+,\\&\text{盒型区域} &&C_2=\text{Box}[\mathbf{\ell},\mathbf{u}], &&(\min\{\max\{x_i,\ell_i\},u_i\})_{i=1}^n,\\&\text{仿射集} &&C_3=\{\mathbf{x}\in\mathbb{R}^n:\mathbf{Ax=b}\}, &&\mathbf{x}-\mathbf{A}^T(\mathbf{AA}^T)^{-1}(\mathbf{Ax-b}),\\&\ell_2\text{球} &&C_4=B_{\Vert\cdot\Vert_2}[\mathbf{c},r], &&\mathbf{c}+\frac{r}{\max\{\Vert\mathbf{x-c}\Vert_2,r\}}(\mathbf{x-c}),\\&\text{半空间} &&C_5=\{\mathbf{x}:\mathbf{a}^T\mathbf{x}\le\alpha\}, &&\mathbf{x}-\frac{[\mathbf{a}^T\mathbf{x}-\alpha]_+}{\Vert\mathbf{a}\Vert^2}\mathbf{a},\end{aligned}$ 其中 $\mathbf{\ell}\in[-\infty,\infty)^n,\,\mathbf{u}\in(-\infty,\infty]^n:\mathbf{\ell}\le\mathbf{u},\,\mathbf{A}\in\mathbb{R}^{m\times n}:\text{rank}(\mathbf{A})=m,\,\mathbf{b}\in\mathbb{R}^m,\,\mathbf{c}\in\mathbb{R}^n,\,r>0,\,\mathbf{a}\in\mathbb{R}^n\setminus\{\mathbf{0}\},\,\alpha\in\mathbb{R}$ .

引理2的结论较易验证. 注意尽管我们将盒型子集的概念扩充到无界情形, 但盒型子集总是 $\mathbb{R}^n$ 的子集. 例如 $\text{Box}[\mathbf{0},\infty\mathbf{e}]=\mathbb{R}_+^n$ .

4.3 到超平面与盒型区域之交上的投影

定理11 (到超平面与盒型区域之交上的正交投影) 设 $C\subset\mathbb{R}^n$ 为 $C=H_{\mathbf{a},b}\cap\text{Box}[\mathbf{\ell},\mathbf{u}]=\{\mathbf{x}\in\mathbb{R}^n:\mathbf{a}^T\mathbf{x}=b,\,\mathbf{\ell}\le\mathbf{x}\le\mathbf{u}\},$ 其中 $\mathbf{a}\in\mathbb{R}^n\setminus\{\mathbf{0}\},\,b\in\mathbb{R},\,\mathbf{\ell}\in[-\infty,\infty)^n,\,\mathbf{u}\in(-\infty,\infty]^n$ . 假设 $C\ne\emptyset$ . 则 $P_C(\mathbf{x})=P_{\text{Box}[\mathbf{\ell},\mathbf{u}]}(\mathbf{x}-\mu^*\mathbf{a}),$ 其中 $\text{Box}[\mathbf{\ell},\mathbf{u}]=\{\mathbf{y}\in\mathbb{R}^n:\ell_i\le y_i\le u_i,\,i=1,2,\ldots,n\}$ , $\mu^*$ 为方程 $\mathbf{a}^TP_{\text{Box}[\mathbf{\ell},\mathbf{u}]}(\mathbf{x-\mu a})=b$ 的解.

证明: 对 $\forall\mathbf{x}\in\mathbb{R}^n$ , 它到 $C$ 上的正交投影就是以下问题的唯一最优解: $\min_{\mathbf{y}}\left\{\frac{1}{2}\Vert\mathbf{y-x}\Vert_2^2:\mathbf{a}^T\mathbf{y}=b,\,\mathbf{\ell}\le\mathbf{y}\le\mathbf{u}\right\}.$ 此问题的Lagrange函数是 $L(\mathbf{y};\mu)=\frac{1}{2}\Vert\mathbf{y-x}\Vert_2^2+\mu(\mathbf{a}^T\mathbf{y}-b)=\frac{1}{2}\Vert\mathbf{y}-(\mathbf{x-\mu a})\Vert^2_2-\frac{\mu^2}{2}\Vert\mathbf{a}\Vert_2^2+\mu(\mathbf{a}^T\mathbf{x}-b).$ 由于对此问题有强对偶性成立, 于是有最优性条件: $\mathbf{y}^*$ 为问题最优解当且仅当存在 $\mu^*\in\mathbb{R}$ 使得 $\begin{aligned}\mathbf{y}^*&\in\arg\min_{\mathbf{\ell}\le\mathbf{y}\le\mathbf{u}}L(\mathbf{y};\mu^*),\\\mathbf{a}^T\mathbf{y}^*&=b.\end{aligned}$ 利用Lagrange函数的表达式, $\mathbf{y}^*=P_{\text{Box}[\mathbf{\ell},\mathbf{u}]}(\mathbf{x}-\mu^*\mathbf{a}),$ 可行性条件为 $\mathbf{a}^TP_{\text{Box}[\mathbf{\ell},\mathbf{u}]}(\mathbf{x}-\mu^*\mathbf{a})=b.$ ⁶

推论1 (到单位单纯形上的正交投影) 对 $\forall\mathbf{x}\in\mathbb{R}^n$ , $P_{\Delta_n}(\mathbf{x})=[\mathbf{x-\mu^* e}]_+,$ 其中 $\mu^*$ 为方程 $\mathbf{e}^T[\mathbf{x-\mu^*e}]_+-1=0$ 的解.

证明: 在定理11中令 $\mathbf{a=e},\,b=1,\,\ell_i=0,\,u_i=\infty,\,i=1,2,\ldots,n$ , 并注意到此时 $P_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x})=[\mathbf{x}]_+$ 即得证.

在下面两个小节中, 我们还将讨论到水平集和上镜图上的正交投影, 以得到更多关于正交投影算子的结论.

4.4 到水平集上的正交投影

定理12 (到水平集上的正交投影) 设 $C=\mathrm{Lev}(f,\alpha)=\{\mathbf{x}\in\mathbb{E}:f(\mathbf{x})\le\alpha\}$ , 其中 $f:\mathbb{E}\to(-\infty,\infty]$ 为一正常闭凸函数, $\alpha\in\mathbb{R}$ . 假设存在 $\hat\mathbf{x}\in\mathbb{E}$ , 使得 $f(\hat\mathbf{x})<\alpha$ . 于是 $P_C(\mathbf{x})=\left\{\begin{array}{ll}P_{\mathrm{dom}(f)}(\mathbf{x}), & f\left(P_{\mathrm{dom}(f)}(\mathbf{x})\right)\le\alpha,\\\mathrm{prox}_{\lambda^*f}(\mathbf{x}), & 其它,\end{array}\right.$ 其中 $\lambda^*$ 为方程 $\varphi(\lambda)\equiv f(\mathrm{prox}_{\lambda f}(\mathbf{x}))-\alpha=0$ 的任一正解. 另外, $\varphi$ 单调递减.

证明: $\mathbf{x}$ 到 $C$ 上的正交投影是以下问题的最优解: $\min_{\mathbf{y}\in\mathbb{E}}\left\{\frac{1}{2}\Vert\mathbf{y-x}\Vert^2:f(\mathbf{y})\le\alpha,\,\mathbf{y}\in X\right\},$ 其中 $X=\mathrm{dom}(f)$ . 此问题的Lagrange函数为( $\lambda\ge0$ ): $L(\mathbf{y};\lambda)=\frac{1}{2}\Vert\mathbf{y-x}\Vert^2+\lambda f(\mathbf{y})-\alpha\lambda.$ 由于对此问题有强对偶性成立, 因此有最优性条件: $\mathbf{y}^*$ 为问题最优解当且仅当存在 $\lambda^*\in\mathbb{R}^+$ 使得 $\begin{aligned}\mathbf{y}^*&\in\arg\min_{\mathbf{y}\in X}L(\mathbf{y};\lambda^*),\\f(\mathbf{y}^*)&\le\alpha,\\\lambda^*(f(\mathbf{y}^*)-\alpha)&=0.\end{aligned}$ (i) 若 $P_X(\mathbf{x})$ 存在且 $f(P_X(\mathbf{x}))\le\alpha$ , 则 $\mathbf{y}^*=P_X(\mathbf{x}),\,\lambda^*=0$ 就满足最优性条件;
(ii) 若 $P_X(\mathbf{x})$ 不存在⁷或 $f(P_X(\mathbf{x}))>\alpha$ , 则必有 $\lambda^*>0$ , 此时最优性条件就变成 $\mathbf{y}^*=\mathrm{prox}_{\lambda^*f}(\mathbf{x}),\,f(\mathrm{prox}_{\lambda^*f}(\mathbf{x}))=\alpha$ . 这就给出了定理中 $P_C(\mathbf{x})$ 的表达式.

下证 $\varphi$ 单调递减. 任取 $0\le\lambda_1<\lambda_2$ . 记 $\mathbf{v}_1=\mathrm{prox}_{\lambda_1f}(\mathbf{x}),\,\mathbf{v}_2=\mathrm{prox}_{\lambda_2f}(\mathbf{x})$ . 于是 $\begin{aligned}&\frac{1}{2}\Vert\mathbf{v}_2-\mathbf{x}\Vert^2+\lambda_2(f(\mathbf{v}_2)-\alpha)\\&=\frac{1}{2}\Vert\mathbf{v}_2-\mathbf{x}\Vert^2+\lambda_1(f(\mathbf{v}_2)-\alpha)+(\lambda_2-\lambda_1)(f(\mathbf{v}_2)-\alpha)\\&\ge\frac{1}{2}\Vert\mathbf{v}_1-\mathbf{x}\Vert^2+\lambda_1(f(\mathbf{v}_1)-\alpha)+(\lambda_2-\lambda_1)(f(\mathbf{v}_2)-\alpha)\\&=\frac{1}{2}\Vert\mathbf{v}_1-\mathbf{x}\Vert^2+\lambda_2(f(\mathbf{v}_1)-\alpha)+(\lambda_2-\lambda_1)(f(\mathbf{v}_2)-f(\mathbf{v}_1))\\&\ge\frac{1}{2}\Vert\mathbf{v}_2-\mathbf{x}\Vert^2+\lambda_2(f(\mathbf{v}_2)-\alpha)+(\lambda_2-\lambda_1)(f(\mathbf{v}_2)-f(\mathbf{v}_1)).\end{aligned}$ 因此, $(\lambda_2-\lambda_1)(f(\mathbf{v}_2)-f(\mathbf{v}_1))\le0$ . 因 $\lambda<\lambda_2$ , 所以 $f(\mathbf{v}_2)\le f(\mathbf{v}_1)$ . 最后, $\varphi(\lambda_2)=f(\mathbf{v}_2)-\alpha\le f(\mathbf{v}_1)-\alpha=\varphi(\lambda_1).$

例13 (到半空间与盒型区域之交上的正交投影) 考虑集合 $C=H_{\mathbf{a},b}^-\cap\text{Box}[\mathbf{\ell,u}]=\{\mathbf{x}\in\mathbb{R}^n:\mathbf{a}^T\mathbf{x}\le b,\,\mathbf{\ell\le x\le u}\},$ 其中 $\mathbf{a}\in\mathbb{R}^n\setminus\{\mathbf{0}\},\,b\in\mathbb{R},\,\mathbf{\ell}\in[-\infty,\infty)^n,\,\mathbf{u}\in(-\infty,\infty]^n$ . 假设 $C\ne\emptyset$ . 则 $C=\mathrm{Lev}(f,b)$ , 其中 $f(\mathbf{x})=\mathbf{a}^T\mathbf{x}+\delta_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x})$ . 对 $\forall\lambda>0$ , $\mathrm{prox}_{\lambda f}(\mathbf{x})=\mathrm{prox}_{\lambda\mathbf{a}^T(\cdot)+\delta_{\text{Box}[\mathbf{\ell,u}]}(\cdot)}(\mathbf{x})\overset{定理6}{=}\mathrm{prox}_{\delta_{\text{Box}[\mathbf{\ell,u}]}}(\mathbf{x-\lambda a})=P_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x-\lambda a}).$ 由定理12, $\boxed{\begin{aligned}P_C(\mathbf{x})&=\left\{\begin{array}{ll}P_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x}), & \mathbf{a}^TP_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x})\le b,\\P_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x-\lambda^*a}), & \mathbf{a}^TP_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x})>b,\end{array}\right.\\ \text{其中}\lambda^*为\varphi(\lambda)&=\mathbf{a}^TP_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x-\lambda a})-b的任一正根.\end{aligned}}$

例14 (到 $\ell_1$ 球上的正交投影) 设 $C=B_{\Vert\cdot\Vert_1}[\mathbf{0},\alpha]=\{\mathbf{x}\in\mathbb{R}^n:\Vert\mathbf{x}\Vert_1\le\alpha\}$ , 其中 $\alpha>0$ . 于是 $C=\mathrm{Lev}(f,\alpha)$ , 其中 $f(\mathbf{x})=\Vert\mathbf{x}\Vert_1$ . 在例2中我们已得到 $\mathrm{prox}_{\lambda f}(\mathbf{x})=\mathcal{T}_{\lambda}(\mathbf{x}),\quad\forall\mathbf{x}\in\mathbb{R}^n,$ 其中 $\mathcal{T}_{\lambda}(\mathbf{x})=[\mathbf{x-\lambda e}]_+\odot\mathrm{sgn}(\mathbf{x})$ . 由定理12, $\boxed{\begin{aligned}P_{B_{\Vert\cdot\Vert_1}[\mathbf{0},\alpha]}(\mathbf{x})&=\left\{\begin{array}{ll}\mathbf{x}, & \Vert\mathbf{x}\Vert_1\le\alpha,\\\mathcal{T}_{\lambda^*}(\mathbf{x}), & \Vert\mathbf{x}\Vert_1>\alpha,\end{array}\right.\\其中\lambda^*为\varphi(\lambda)&=\Vert\mathcal{T}_{\lambda}(\mathbf{x})\Vert_1-\alpha的任一正根.\end{aligned}}$

下面的一个例子要用到软阈值映射的推广形式——双边软阈值算子: 对 $\forall\mathbf{a,b}\in(-\infty,\infty]^n$ , 定义 $\mathcal{S}_{\mathbf{a,b}}(\mathbf{x})=(\min\{\max\{|x_i|-a_i,0\},b_i\}\mathrm{sgn}(x_i))_{i=1}^n,\quad\forall\mathbf{x}\in\mathbb{R}^n.$ 函数 $t\mapsto\mathcal{S}_{1,2}(t)$ 的图像可见下图.

在这里插入图片描述
软阈值算子是双边软阈值算子的特例: $\mathcal{S}_{\lambda\mathbf{e},\infty\mathbf{e}}=\mathcal{T}_{\lambda}.$

例15 (到加权 $\ell_1$ 球与盒型区域之交上的正交投影) 设 $C\subset\mathbb{R}^n$ 为 $C=\left\{\mathbf{x}\in\mathbb{R}^n:\sum_{i=1}^n\omega_i|x_i|\le\beta,\,-\bm{\alpha}\le\mathbf{x}\le\bm{\alpha}\right\},$ 其中 $\bm{\omega}\in\mathbb{R}_+^n,\,\bm{\alpha}\in[0,\infty]^n,\,\beta\in\mathbb{R}_{++}$ . 于是 $C=\mathrm{Lev}(f,\beta)$ , 其中 $f(\mathbf{x})=\bm{\omega}^T|\mathbf{x}|+\delta_{\text{Box}[-\bm{\alpha},\bm{\alpha}]}(\mathbf{x})=\left\{\begin{array}{ll}\sum_{i=1}^n\omega_i|x_i|, & -\bm{\alpha}\le\mathbf{x}\le\bm{\alpha},\\\infty, & 其它,\end{array}\right.\quad\forall\mathbf{x}\in\mathbb{R}^n.$ 由例12, 对 $\forall\lambda>0,\,\mathbf{x}\in\mathbb{R}^n$ , $\mathrm{prox}_{\lambda f}(\mathbf{x})=(\min\{\max\{|x_i|-\lambda\omega_i,0\},\alpha_i\}\mathrm{sgn}(x_i))_{i=1}^n=\mathcal{S}_{\lambda\bm{\omega},\bm{\alpha}}(\mathbf{x}).$ 最后由定理12, $\boxed{\begin{aligned}P_C(\mathbf{x})&=\left\{\begin{array}{ll}P_{\text{Box}[-\bm{\alpha},\bm{\alpha}]}(\mathbf{x}), & \bm{\omega}^T\left|P_{\text{Box}[-\bm{\alpha},\bm{\alpha}]}(\mathbf{x})\right|\le\beta,\\\mathcal{S}_{\lambda^*\bm{\omega},\bm{\alpha}}(\mathbf{x}), & \bm{\omega}^T\left|P_{\text{Box}[-\bm{\alpha},\bm{\alpha}]}(\mathbf{x})\right|>\beta,\end{array}\right.\\其中\lambda^*是函数\varphi(\lambda)&=\bm{\omega}^T\left|\mathcal{S}_{\lambda\bm{\omega},\bm{\alpha}}(\mathbf{x})\right|-\beta的任一正根.\end{aligned}}$

为说明定理12中对 $P_{\mathrm{dom}(f)}(\mathbf{x})$ 存在性讨论的必要, 下面我们举一个 $f$ 的有效域非闭的例子.

例16 设 $C=\{\mathbf{x}\in\mathbb{R}_{++}^n:\prod_{i=1}^nx_i\ge\alpha\},$ 其中 $\alpha>0$ . 于是 $C$ 可以写成 $C=\left\{\mathbf{x}\in\mathbb{R}_{++}^n:-\sum_{i=1}^n\log x_i\le-\log\alpha\right\},$ 因此 $C=\mathrm{Lev}(f,-\log\alpha)$ , 其中 $f:\mathbb{R}^n\to(-\infty,\infty]$ 是负对数和函数: $f(\mathbf{x})=\left\{\begin{array}{ll}-\sum_{i=1}^n\log x_i, & \mathbf{x}\in\mathbb{R}_{++}^n,\\\infty, & 其它.\end{array}\right.$ 在例3中我们推出对 $\forall\mathbf{x}\in\mathbb{R}^n$ , $\mathrm{prox}_{\lambda f}(\mathbf{x})=\left(\frac{x_j+\sqrt{x_j^2+4\lambda}}{2}\right)_{j=1}^n.$ 由定理12, 我们就可以得到到 $C$ 上的正交投影公式. 注意此时若 $\mathbf{x}\in C$ , 则 $P_{\mathbb{R}_{++}^n}(\mathbf{x})=\mathbf{x}$ 且 $f(\mathbf{x})\le-\log\alpha$ ; 若 $\mathbf{x}\notin\mathbb{R}_{++}^n$ , $P_{\mathbb{R}^n_{++}}(\mathbf{x})$ 就不存在. 这时直接有 $P_C(\mathbf{x})=\mathrm{prox}_{\lambda^*f}(\mathbf{x})$ ; 若 $\mathbf{x}\in\mathbb{R}_{++}^n$ 但 $f(\mathbf{x})>-\log\alpha$ , 则也有 $P_C(\mathbf{x})=\mathrm{prox}_{\lambda^* f}(\mathbf{x})$ . 后两种情形合在一起就是 $\mathbf{x}\notin C$ . 所以, $\boxed{\begin{aligned}P_C(\mathbf{x})&=\left\{\begin{array}{ll}\mathbf{x}, & \mathbf{x}\in C,\\\left(\frac{x_j+\sqrt{x_j^2+4\lambda^*}}{2}\right)_{j=1}^n, & \mathbf{x}\notin C,\end{array}\right. \\其中\lambda^*是函数\varphi(\lambda)&=-\sum_{j=1}^n\log\left(\frac{x_j+\sqrt{x_j^2+4\lambda}}{2}\right)+\log\alpha的任一正根.\end{aligned}}$

4.5 到上镜图上的正交投影

由定理12, 我们可以给出到凸函数上镜图的正交投影定理.

定理13 (到上镜图的正交投影) 设 $C=\mathrm{epi}(g)=\{(\mathbf{x},t)\in\mathbb{E}\times\mathbb{R}:g(\mathbf{x})\le t\},$ 其中 $g:\mathbb{E}\to\mathbb{R}$ 是凸函数. 则 $P_C((\mathbf{x},s))=\left\{\begin{array}{ll}(\mathbf{x},s), & g(\mathbf{x})\le s,\\\left(\mathrm{prox}_{\lambda^*g}(\mathbf{x}),s+\lambda^*\right), & g(\mathbf{x})>s,\end{array}\right.$ 其中 $\lambda^*$ 为函数 $\psi(\lambda)=g(\mathrm{prox}_{\lambda g}(\mathbf{x}))-\lambda-s$ 的任一正根. 另外, $\psi$ 是单调递减函数.

证明: 定义 $f:\mathbb{E}\times\mathbb{R}\to\mathbb{R}$ 为 $f(\mathbf{x},t)\equiv g(\mathbf{x})-t$ . 于是, $\begin{aligned}\mathrm{prox}_{\lambda f}(\mathbf{x},s)&=\arg\min_{\mathbf{y},t}\left\{\frac{1}{2}\Vert\mathbf{y-x}\Vert^2+\frac{1}{2}(t-s)^2+\lambda f(\mathbf{y},t)\right\}\\&=\arg\min_{\mathbf{y},t}\left\{\frac{1}{2}\Vert\mathbf{y-x}\Vert^2+\frac{1}{2}(t-s)^2+\lambda g(\mathbf{y})-\lambda t\right\}.\end{aligned}$ 由于问题可分, 因此 $\begin{aligned}\mathrm{prox}_{\lambda f}(\mathbf{x},s)&=\left(\arg\min_{\mathbf{y}}\left\{\frac{1}{2}\Vert\mathbf{y-x}\Vert^2+\lambda g(\mathbf{y})\right\},\arg\min_t\left\{\frac{1}{2}(t-s)^2-\lambda t\right\}\right)\\&=\left(\mathrm{prox}_{\lambda g}(\mathbf{x}),\mathrm{prox}_{\lambda h}(s)\right),\end{aligned}$ 其中 $h(t)\equiv-t$ . 由2.2节, $\mathrm{prox}_{\lambda h}(z)=z+\lambda,\,\forall z \in\mathbb{R}$ . 于是 $\mathrm{prox}_{\lambda f}(\mathbf{x},s)=\left(\mathrm{prox}_{\lambda g}(\mathbf{x}),s+\lambda\right).$ 因为 $\mathrm{epi}(g)=\mathrm{Lev}(f,0)$ , 于是由定理12(注意到 $\mathrm{dom}(f)=\mathbb{E}$ )就有 $P_C\left((\mathbf{x},s)\right)=\left\{\begin{array}{ll}(\mathbf{x},s), & g(\mathbf{x})\le s,\\\left(\mathrm{prox}_{\lambda^*g}(\mathbf{x}),s+\lambda\right), & g(\mathbf{x})>s,\end{array}\right.$ 其中 $\lambda^*$ 是函数 $\psi(\lambda)=g(\mathrm{prox}_{\lambda g}(\mathbf{x}))-\lambda-s$ 的任一正根, 且 $\psi$ 是单调递减函数.

例17 (到Lorentz锥上的正交投影) 考虑Lorentz锥 $L^n=\{(\mathbf{x},t)\in\mathbb{R}^n\times\mathbb{R}:\Vert\mathbf{x}\Vert_2\le t\}$ . 下面证明对 $\forall(\mathbf{x},s)\in\mathbb{R}^n\times\mathbb{R}$ , $\boxed{P_{L^n}(\mathbf{x},s)=\left\{\begin{array}{ll}\left(\frac{\Vert\mathbf{x}\Vert_2+s}{2\Vert\mathbf{x}\Vert_2}\mathbf{x},\frac{\Vert\mathbf{x}\Vert_2+s}{2}\right), & \Vert\mathbf{x}\Vert_2\ge|s|,\\(\mathbf{0},0), & s<\Vert\mathbf{x}\Vert_2<-s,\\(\mathbf{x},s), & \Vert\mathbf{x}\Vert_2\le s.\end{array}\right.}$ 直接利用定理13即得 $P_{L^n}((\mathbf{x},s))=\left\{\begin{array}{ll}(\mathbf{x},s), & \Vert\mathbf{x}\Vert_2\le s,\\\left(\mathrm{prox}_{\lambda^*\Vert\cdot\Vert_2}(\mathbf{x}),s+\lambda^*\right), & \Vert\mathbf{x}\Vert_2>s,\end{array}\right.$ 其中 $\lambda^*$ 是函数 $\psi(\lambda)=\Vert\mathrm{prox}_{\lambda\Vert\cdot\Vert_2}(\mathbf{x})\Vert_2-\lambda-s$ 的任一正根. 设 $(\mathbf{x},s)\in\mathbb{R}^n\times\mathbb{R}:\Vert\mathbf{x}\Vert_2>s$ . 由例8, $\mathrm{prox}_{\lambda\Vert\cdot\Vert_2}(\mathbf{x})=\left[1-\frac{\lambda}{\max\{\Vert\mathbf{x}\Vert_2,\lambda\}}\right]\mathbf{x}.$ 将此代入 $\psi$ 的表达式中推出 $\psi(\lambda)=\left\{\begin{array}{ll}\Vert\mathbf{x}\Vert_2-2\lambda-s, & \lambda\le\Vert\mathbf{x}\Vert_2,\\-\lambda-s, & \lambda\ge\Vert\mathbf{x}\Vert_2.\end{array}\right.$ 所以 $\psi$ 是个分段线性函数, 其唯一正根为 $\lambda^*=\left\{\begin{array}{ll}\frac{\Vert\mathbf{x}\Vert_2-s}{2}, & \Vert\mathbf{x}\Vert_2\ge-s,\\-s, & \Vert\mathbf{x}\Vert_2<-s.\end{array}\right.$ 因此在 $\Vert\mathbf{x}\Vert_2>s$ 时, $\begin{aligned}\left(\mathrm{prox}_{\lambda^*\Vert\cdot\Vert_2}(\mathbf{x}),s+\lambda^*\right)&=\left(\left[1-\frac{\lambda^*}{\max\{\Vert\mathbf{x}\Vert_2,\lambda^*\}}\right]\mathbf{x},s+\lambda^*\right)\\&=\left\{\begin{array}{ll}\left(\left[1-\frac{\Vert\mathbf{x}\Vert_2-s}{2\Vert\mathbf{x}\Vert_2}\right]\mathbf{x},\frac{\Vert\mathbf{x}\Vert_2+s}{2}\right), & \Vert\mathbf{x}\Vert_2\ge-s,\\(\mathbf{0},0), & \Vert\mathbf{x}\Vert_2<-s.\end{array}\right.\\&=\left\{\begin{array}{ll}\left(\frac{\Vert\mathbf{x}\Vert_2+s}{2\Vert\mathbf{x}\Vert_2}\mathbf{x},\frac{\Vert\mathbf{x}\Vert_2+s}{2}\right), & \Vert\mathbf{x}\Vert\ge-s,\\(\mathbf{0},0), & \Vert\mathbf{x}\Vert_s<-s.\end{array}\right.\end{aligned}$ 最后再注意到 $\{(\mathbf{x},s):\Vert\mathbf{x}\Vert_2\ge|s|\}=\{(\mathbf{x},s):\Vert\mathbf{x}\Vert>s,\Vert\mathbf{x}\Vert_2\ge-s\}\cup\{(\mathbf{x},s):\Vert\mathbf{x}\Vert_2=s\}$ 即可得证.

例18 (到 $\ell_1$ -范数上镜图上的正交投影) 设 $C=\{(\mathbf{y},t)\in\mathbb{R}^n\times\mathbb{R}:\Vert\mathbf{y}\Vert_1\le t\}.$ 直接由定理13以及对 $\forall\lambda>0$ , 有 $\mathrm{prox}_{\lambda\Vert\cdot\Vert_1}=\mathcal{T}_{\lambda}$ , 就有 $\boxed{\begin{aligned}P_C((\mathbf{x},s))&=\left\{\begin{array}{ll}(\mathbf{x},s), & \Vert\mathbf{x}\Vert_1\le s,\\\left(\mathcal{T}_{\lambda^*}(\mathbf{x}),s+\lambda^*\right), & \Vert\mathbf{x}\Vert_1>s,\end{array}\right.\\其中\lambda^*是函数\varphi(\lambda)&=\Vert\mathcal{T}_{\lambda}(\mathbf{x})\Vert_1-\lambda-s的任一正根.\end{aligned}}$

4.6 正交投影计算小结

集合 $(C)$	$P_C(\mathbf{x})$	假设条件	参考
$\mathbb{R}_+^n$	$[\mathbf{x}]_+$	-	引理2
$\text{Box}[\mathbf{\ell,u}]$	$P_C(\mathbf{x})_i=\min\{\max\{x_i,\ell_i\},u_i\}$	$\ell_i\le u_i$	引理2
$B_{\Vert\cdot\Vert_2}[\mathbf{c},r]$	$\mathbf{c}+\frac{r}{\max\{\Vert\mathbf{x-c}\Vert_2,r\}}(\mathbf{x-c})$	$\mathbf{c}\in\mathbb{R}^n\,,r>0$	引理2
$\{\mathbf{x}:\mathbf{Ax=b}\}$	$\mathbf{x}-\mathbf{A}^T\left(\mathbf{AA}^T\right)^{-1}(\mathbf{Ax-b})$	$\mathbf{A}\in\mathbb{R}^{m\times n},\,\mathbf{b}\in\mathbb{R}^m,\,\mathbf{A}$ 行满秩	引理2
$\{\mathbf{x}:\mathbf{a}^T\mathbf{x}\le b\}$	$\mathbf{x}-\frac{[\mathbf{a}^T\mathbf{x}-b]_+}{\Vert\mathbf{a}\Vert^2}\mathbf{a}$	$\mathbf{0}\ne\mathbf{a}\in\mathbb{R}^n,\,b\in\mathbb{R}$	引理2
$\Delta_n$	$[\mathbf{x}-\mu^\mathbf{e}]_+$ 其中 $\mu^\in\mathbb{R}$ 满足 $\mathbf{e}^T[\mathbf{x-\mu^*e}]_+=1$	-	推论1
$H_{\mathbf{a},b}\cap\text{Box}[\mathbf{\ell,u}]$	$P_{\text{Box}[\mathbf{\ell},\mathbf{u}]}(\mathbf{x}-\mu^\mathbf{a})$ 其中 $\mu^\in\mathbb{R}$ 满足 $\mathbf{a}^TP_{\text{Box}[\mathbf{\ell},\mathbf{u}]}(\mathbf{x-\mu a})=b$	$\mathbf{a}\in\mathbb{R}^n\setminus\{\mathbf{0}\},\,b\in\mathbb{R}$	定理11
$H_{\mathbf{a},b}^-\cap\text{Box}[\mathbf{\ell,u}]$	$\left\{\begin{array}{ll}P_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x}), & \mathbf{a}^TP_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x})\le b,\\P_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x-\lambda^a}), & \mathbf{a}^TP_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x})>b,\end{array}\right.$ 其中 $\lambda^>0$ 满足 $\mathbf{a}^TP_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x-\lambda^* a})=b$	$\mathbf{a}\in\mathbb{R}^n\setminus\{\mathbf{0}\},\,b\in\mathbb{R}$	例13
$B_{\Vert\cdot\Vert_1}[\mathbf{0},\alpha]$	$\left\{\begin{array}{ll}\mathbf{x}, & \Vert\mathbf{x}\Vert_1\le\alpha,\\\mathcal{T}_{\lambda^}(\mathbf{x}), & \Vert\mathbf{x}\Vert_1>\alpha,\end{array}\right.$ 其中 $\lambda^>0$ 满足 $\Vert\mathcal{T}_{\lambda^*}(\mathbf{x})\Vert_1=\alpha$	$\alpha>0$	例14
$\{\mathbf{x}:\bm{\omega}^T\mathrm{abs}(\mathbf{x})\le\beta,\,-\bm{\alpha}\le\mathbf{x}\le\bm{\alpha}\}$	$\left\{\begin{array}{ll}P_{\text{Box}[-\bm{\alpha},\bm{\alpha}]}(\mathbf{x}), & \bm{\omega}^T\mathrm{abs}\left(P_{\text{Box}[-\bm{\alpha},\bm{\alpha}]}(\mathbf{x})\right)\le\beta,\\\mathcal{S}_{\lambda^\bm{\omega},\bm{\alpha}}(\mathbf{x}), & \bm{\omega}^T\mathrm{abs}\left(P_{\text{Box}[-\bm{\alpha},\bm{\alpha}]}(\mathbf{x})\right)>\beta,\end{array}\right.$ 其中 $\lambda^>0$ 满足 $\bm{\omega}^T\mathrm{abs}\left(\mathcal{S}_{\lambda^*\bm{\omega},\bm{\alpha}}(\mathbf{x})\right)=\beta$	$\bm{\omega}\in\mathbb{R}_+^n,\,\bm{\alpha}\in[0,\infty]^n,\,\beta\in\mathbb{R}_{++}$	例15
$\{\mathbf{x}>\mathbf{0}:\prod x_i\ge\alpha\}$	$\left\{\begin{array}{ll}\mathbf{x}, & \mathbf{x}\in C,\\\left(\frac{x_j+\sqrt{x_j^2+4\lambda^}}{2}\right)_{j=1}^n, & \mathbf{x}\notin C,\end{array}\right.$ 其中 $\lambda^>0$ 满足 $\sum_{j=1}^n\log\left(\frac{x_j+\sqrt{x_j^2+4\lambda^*}}{2}\right)=\log\alpha$	$\alpha>0$	例16
$\{(\mathbf{x},s):\Vert\mathbf{x}\Vert_2\le s\}$	$\left\{\begin{array}{ll}\left(\frac{\Vert\mathbf{x}\Vert_2+s}{2\Vert\mathbf{x}\Vert_2}\mathbf{x},\frac{\Vert\mathbf{x}\Vert_2+s}{2}\right), & \Vert\mathbf{x}\Vert_2\ge\mathrm{abs}(s),\\(\mathbf{0},0), & s<\Vert\mathbf{x}\Vert_2<-s,\\(\mathbf{x},s), & \Vert\mathbf{x}\Vert_2\le s.\end{array}\right.$		例17
$\{(\mathbf{x},s):\Vert\mathbf{x}\Vert_1\le s\}$	$\left\{\begin{array}{ll}(\mathbf{x},s), & \Vert\mathbf{x}\Vert_1\le s,\\\left(\mathcal{T}_{\lambda^}(\mathbf{x}),s+\lambda^\right), & \Vert\mathbf{x}\Vert_1>s,\end{array}\right.$ 其中 $\lambda^>0$ 满足 $\Vert\mathcal{T}_{\lambda^}(\mathbf{x})\Vert_1-\lambda^*-s=0$		例18

5. 第二临近定理

我们使用第三章的Fermat最优性条件证明第二临近定理.

定理14 (第二临近定理) 设 $f:\mathbb{E}\to(-\infty,\infty]$ 为一正常闭凸函数. 则对 $\forall\mathbf{x,u}\in\mathbb{E}$ , 下面三件事是等价的:
(i) $\mathbf{u}=\mathrm{prox}_f(\mathbf{x})$ ;
(ii) $\mathbf{x-u}\in\partial f(\mathbf{u})$ ;
(iii) $\langle\mathbf{x-u,y-u}\rangle\le f(\mathbf{y})-f(\mathbf{u}),\,\forall\mathbf{y}\in\mathbb{E}$ .

证明: 由定义, $\mathbf{u}=\mathrm{prox}_f(\mathbf{x})$ 当且仅当 $\mathbf{u}$ 为以下问题的最优解: $\min_{\mathbf{v}}\left\{f(\mathbf{v})+\frac{1}{2}\Vert\mathbf{v-x}\Vert^2\right\},$ 根据第三章的Fermat最优性条件以及次微分的求和运算(定理15), 这等价于 $\mathbf{0}\in\partial f(\mathbf{u})+\mathbf{u-x}.$ 因此(i)(ii)等价. 而由次梯度的定义, 就有(ii)(iii)等价.

第二临近定理的一个直接推论是, 对于一个正常闭凸函数, $\mathbf{x}=\mathrm{prox}_f(\mathbf{x})$ 当且仅当 $\mathbf{x}$ 是 $f$ 的全局极小点.

推论2 设 $f$ 为一正常闭凸函数. 则 $\mathbf{x}$ 为 $f$ 的全局极小点当且仅当 $\mathbf{x}=\mathrm{prox}_f(\mathbf{x})$ .

证明: $\mathbf{x}$ 为 $f$ 的全局极小点当且仅当 $\mathbf{0}\in\partial f(\mathbf{x})$ , 也即当且仅当 $\mathbf{x-x}\in\partial f(\mathbf{x})$ . 由第二临近定理中(i)(ii)的等价性, 就等价于 $\mathbf{x}=\mathrm{prox}_f(\mathbf{x})$ .

当 $f=\delta_C$ , 其中 $C$ 为非空闭凸集, 则由第二临近定理中(i)(iii)的等价性就可以推出第二投影定理.

定理15 (第二投影定理) 设 $C\subset\mathbb{E}$ 为非空闭凸集, $\mathbf{u}\in C$ . 则 $\mathbf{u}=P_C(\mathbf{x})$ 当且仅当 $\langle\mathbf{x-u,y-u}\rangle\le0,\quad\forall\mathbf{y}\in C.$
这就是说, $\mathbf{u}$ 是 $\mathbf{x}$ 在 $C$ 中的投影当且仅当 $\mathbf{x-u}$ 与所有的 $\mathbf{y-u},\,\mathbf{y}\in C$ 都成钝角.

第二临近定理的另一个直接推论是临近算子的严格非增大性. 它的特例是第五章的定理1.

定理16 (临近算子的严格非增大性) 设 $f$ 为正常闭凸函数. 则对 $\forall\mathbf{x,y}\in\mathbb{E}$ ,
(i) (严格非增大性) $\langle\mathbf{x-y},\mathrm{prox}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y})\rangle\ge\Vert\mathrm{prox}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y})\Vert^2;$ (ii) (非增大性) $\Vert\mathrm{prox}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y})\Vert\le\Vert\mathbf{x-y}\Vert.$

证明: (i) 记 $\mathbf{u}=\mathrm{prox}_f(\mathbf{x}),\,\mathbf{v}=\mathrm{prox}_f(\mathbf{y})$ . 由第二临近定理中的(i)(ii)的等价性, $\mathbf{x-u}\in\partial f(\mathbf{u}),\,\mathbf{y-v}\in\partial f(\mathbf{v}).$ 由次梯度不等式, $\begin{aligned}f(\mathbf{v})&\ge f(\mathbf{u})+\langle\mathbf{x-u,v-u}\rangle,\\f(\mathbf{u})&\ge f(\mathbf{v})+\langle\mathbf{y-v,u-v}\rangle.\end{aligned}$ 二者相加可得 $0\ge\langle\mathbf{y-x+u-v,u-v}\rangle\Rightarrow\langle\mathbf{x-y,u-v}\rangle\ge\Vert\mathbf{u-v}\Vert^2,$ 此即 $\langle\mathbf{x-y},\mathrm{prox}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y})\rangle\ge\Vert\mathrm{prox}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y})\Vert^2.$

(ii) 若 $\mathrm{prox}_f(\mathbf{x})=\mathrm{prox}_f(\mathbf{y})$ , 结论显然成立. 现假设 $\mathrm{prox}_f(\mathbf{x})\ne\mathrm{prox}_f(\mathbf{y})$ . 由(i)与Cauchy-Schwarz不等式, 就有 $\begin{aligned}\Vert\mathrm{prof}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y})\Vert^2&\le\langle\mathrm{prox}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y}),\mathbf{x-y}\rangle\\&\le\Vert\mathrm{prox}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y})\Vert\cdot\Vert\mathbf{x-y}\Vert.\end{aligned}$ 两边同除 $\Vert\mathrm{prox}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y})\Vert$ 即得证.

下面的引理讨论如何计算到一个非空闭凸集合的距离函数的prox. 引理的证明要用到第二临近定理和第二投影定理.

引理3 (距离函数的prox) 设 $C\subset\mathbb{E}$ 为一非空闭凸集, $\lambda>0$ . 则对 $\forall\mathbf{x}\in\mathbb{E}$ , $\mathrm{prox}_{\lambda d_C}(\mathbf{x})=\left\{\begin{array}{ll}(1-\theta)\mathbf{x}+\theta P_C(\mathbf{x}), & d_C(\mathbf{x})>\lambda,\\P_C(\mathbf{x}), & d_C(\mathbf{x})\le\lambda,\end{array}\right.$ 其中⁸ $\theta=\frac{\lambda}{d_C(\mathbf{x})}.$
证明: 设 $\mathbf{u}=\mathrm{prox}_{\lambda d_C}(\mathbf{x})$ . 由第二临近定理, $\mathbf{x-u}\in\lambda\partial d_C(\mathbf{u}).$ 下面分两种情况讨论.

情形一: $\mathbf{u}\notin C$ . 根据第三章例16, 就有 $\mathbf{x-u}=\lambda\frac{\mathbf{u}-P_C(\mathbf{u})}{d_C(\mathbf{u})}.$ 记 $\alpha=\frac{\lambda}{d_C(\mathbf{u})}$ , 于是 $\mathbf{u}=\frac{1}{\alpha+1}\mathbf{x}+\frac{\alpha}{\alpha+1}P_C(\mathbf{u})$ 或者 $\mathbf{x}-P_C(\mathbf{u})=(\alpha+1)(\mathbf{u}-P_C(\mathbf{u})).$ 由第二投影定理, $P_C(\mathbf{u})=P_C(\mathbf{x})$ 当且仅当 $\langle\mathbf{x}-P_C(\mathbf{u}),\mathbf{y}-P_C(\mathbf{u})\rangle\le0,\quad\forall\mathbf{y}\in C.$ 代入 $\mathbf{x}-P_C(\mathbf{u})$ , 就等价于 $(\alpha+1)\langle\mathbf{u}-P_C(\mathbf{u}),\mathbf{y}-P_C(\mathbf{u})\rangle\le0,\quad\forall\mathbf{y}\in C,$ 而由第二投影定理, 这个不等式是成立的. 因此 $P_C(\mathbf{u})=P_C(\mathbf{x})$ . 所以 $d_C(\mathbf{x})=\Vert\mathbf{x}-P_C(\mathbf{x})\Vert=\Vert\mathbf{x}-P_C(\mathbf{u})\Vert=(\alpha+1)\Vert\mathbf{u}-P_C(\mathbf{u})\Vert=(\alpha+1)d_C(\mathbf{u})=d_C(\mathbf{u})+\lambda(>\lambda),$ 且有 $\frac{1}{\alpha+1}=\frac{d_C(\mathbf{u})}{\lambda+d_C(\mathbf{u})}=\frac{d_C(\mathbf{x})-\lambda}{d_C(\mathbf{x})}=1-\theta.$ 于是 $\mathrm{prox}_{\lambda d_C}(\mathbf{x})=(1-\theta)\mathbf{x}+\theta P_C(\mathbf{x}).$
情形二: $\mathbf{u}\in C$ . 下证 $\mathbf{u}=P_C(\mathbf{x})$ . 为此, 设 $\mathbf{v}\in C$ . 由于 $\mathbf{u}=\mathrm{prox}_{\lambda d_C}(\mathbf{x})$ , 所以 $\lambda d_C(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\le\lambda d_C(\mathbf{v})+\frac{1}{2}\Vert\mathbf{v-x}\Vert^2,$ 因为 $d_C(\mathbf{u})=d_C(\mathbf{v})=0$ , 进一步有 $\Vert\mathbf{u-x}\Vert\le\Vert\mathbf{v-x}\Vert.$ 因此, $\mathbf{u}=\arg\min_{\mathbf{v}\in C}\Vert\mathbf{v-x}\Vert=P_C(\mathbf{x}).$ 同样根据第三章例16, 此时最优性条件变为 $\frac{\mathbf{x}-P_C(\mathbf{x})}{\lambda}\in N_C(\mathbf{u})\cap B[\mathbf{0},1],$ 特别地, $\left\Vert\frac{\mathbf{x}-P_C(\mathbf{x})}{\lambda}\right\Vert\le1\Rightarrow d_C(\mathbf{x})=\Vert P_C(\mathbf{x})-\mathbf{x}\Vert\le\lambda.$
由于情形一、二分别对应 $d_C(\mathbf{x})>\lambda,d_C(\mathbf{x})\le\lambda$ , 因此得证.

6. Moreau分解

临近算子的一个重要性质是Moreau分解定理. 该定理将正常闭凸函数的临近算子和它们共轭函数的临近算子联结起来了.

定理17 (Moreau分解) 设 $f:\mathbb{E}\to(-\infty,\infty]$ 为正常闭凸函数. 则对 $\forall\mathbf{x}\in\mathbb{E}$ , $\mathrm{prox}_f(\mathbf{x})+\mathrm{prox}_{f^*}(\mathbf{x})=\mathbf{x}.$
证明: 设 $\mathbf{x}\in\mathbb{E}$ , 记 $\mathbf{u}=\mathrm{prox}_f(\mathbf{x})$ . 由第二临近定理, $\mathbf{x-u}\in\partial f(\mathbf{u})$ ; 再由共轭次梯度定理, 这等价于 $\mathbf{u}\in\partial f^*(\mathbf{x-u})$ . 再次由第二临近定理, $\mathbf{x-u}=\mathrm{prox}_{f^*}(\mathbf{x})$ . 因此, $\mathrm{prox}_f(\mathbf{x})+\mathrm{prox}_{f^*}(\mathbf{x})=\mathbf{u}+(\mathbf{x-u})=\mathbf{x}.$

定理18 (推广的Moreau分解) 设 $f:\mathbb{E}\to(-\infty,\infty]$ 为正常闭凸函数, $\lambda>0$ . 则对 $\forall\mathbf{x}\in\mathbb{E}$ , $\mathrm{prox}_{\lambda f}(\mathbf{x})+\lambda\mathrm{prox}_{\lambda^{-1}f^*}(\mathbf{x}/\lambda)=\mathbf{x}.$

证明: 由Moreau分解, 对 $\forall\mathbf{x}\in\mathbb{E}$ , $\mathrm{prox}_{\lambda f}(\mathbf{x})=\mathbf{x}-\mathrm{prox}_{(\lambda f)^*}(\mathbf{x})\overset{第四章定理7}{=}\mathbf{x}-\mathrm{prox}_{\lambda f^*(\cdot/\lambda)}(\mathbf{x}).$ 由定理5, $\mathrm{prox}_{\lambda f^*(\cdot/\lambda)}(\mathbf{x})=\lambda\mathrm{prox}_{\lambda^{-1}f^*}(\mathbf{x}/\lambda).$ 结合上式, 即得证.

6.1 支撑函数

利用Moreau分解, 我们可推导出计算给定非空闭凸集的支撑函数的prox公式.

定理19 (支撑函数的prox) 设 $C\subset\mathbb{E}$ 为非空闭凸集, $\lambda>0$ . 则对 $\forall\mathbf{x}\in\mathbb{E}$ , $\mathrm{prox}_{\lambda\sigma_C}(\mathbf{x})=\mathbf{x}-\lambda P_C(\mathbf{x}/\lambda).$

证明: 注意到 $(\sigma_C)^*=\delta_C$ (第四章例3), 直接应用推广的Moreau分解即可.

例19 (范数的prox) 设 $f:\mathbb{E}\to\mathbb{R}$ 定义为 $f(\mathbf{x})=\lambda\Vert\mathbf{x}\Vert_{\alpha}$ , 其中 $\lambda>0$ , $\Vert\cdot\Vert_{\alpha}$ 是 $\mathbb{E}$ 上的任一范数. 根据第二章例12, 我们知道 $\Vert\mathbf{x}\Vert_{\alpha}=\sigma_C(\mathbf{x}),$ 其中 $C=B_{\Vert\cdot\Vert_{\alpha,*}}[\mathbf{0},1]=\{\mathbf{x}\in\mathbb{E}:\Vert\mathbf{x}\Vert_{\alpha,*}\le1\},$ $\Vert\cdot\Vert_{\alpha,*}$ 为 $\Vert\cdot\Vert_{\alpha}$ 的对偶范数. 由定理19, 就有 $\boxed{\mathrm{prox}_{\lambda\Vert\cdot\Vert_{\alpha}}(\mathbf{x})=\mathbf{x}-\lambda P_{B_{\Vert\cdot\Vert_{\alpha,*}}[\mathbf{0},1]}(\mathbf{x}/\lambda).}$

例20 ( $\ell_{\infty}$ -范数的prox) 直接由例19, 对 $\forall\lambda>0,\,\mathbf{x}\in\mathbb{R}^n$ , $\boxed{\mathrm{prox}_{\lambda\Vert\cdot\Vert_{\infty}}(\mathbf{x})=\mathbf{x}-\lambda P_{B_{\Vert\cdot\Vert_1}[\mathbf{0},1]}(\mathbf{x}/\lambda).}$ 由例14, 到 $\ell_1$ 球上的正交投影可通过求一单调递减一维函数的根得到.

例21 (极大函数的prox) 考虑极大函数 $g:\mathbb{R}^n\to\mathbb{R}$ 定义为 $g(\mathbf{x})=\max(\mathbf{x})\equiv\max\{x_1,x_2,\ldots,x_n\}$ . 根据第二章例7, $\max(\mathbf{x})=\sigma_{\Delta_n}(\mathbf{x}).$ 因此由定理19, 对 $\forall\lambda>0,\,\mathbf{x}\in\mathbb{R}^n$ , $\boxed{\mathrm{prox}_{\lambda\max(\cdot)}(\mathbf{x})=\mathbf{x}-\lambda P_{\Delta_n}(\mathbf{x}/\lambda).}$ 到单位单纯形上的正交投影计算可见推论1.

例22 (前 $k$ 个最大分量求和函数的prox) 设 $f:\mathbb{R}^n\to\mathbb{R}$ 定义为 $f(\mathbf{x})=x_{[1]}+x_{[2]}+\cdots+x_{[k]},$ 其中 $k\in\{1,2,\ldots,n\}$ ; 对 $\forall i$ , $x_{[i]}$ 表示 $\mathbf{x}$ 分量中第 $i$ 大的分量. 不难证明, $f=\sigma_C$ , 其中 $C=\{\mathbf{y}\in\mathbb{R}^n:\mathbf{e}^T\mathbf{y}=k,\,\mathbf{0}\le\mathbf{y}\le\mathbf{e}\}.$ 事实上, 对 $\forall\mathbf{x}\in\mathbb{R}^n,\,\mathbf{y}\in C$ , $\begin{aligned}\sum_{i=1}^nx_iy_i&=\sum_{i=1}^kx_{[i]}y_{[i]}+\sum_{i=k+1}^nx_{[i]}y_{[i]}\\&=\sum_{i=1}^kx_{[i]}y_{[i]}+\sum_{i=k+1}^ny_{[i]}\left[\sum_{i=k+1}^nx_{[i]}\left(\frac{y_{[i]}}{\sum_{j=k+1}^ny_{[j]}}\right)\right]\\&\le\sum_{i=1}^kx_{[i]}y_{[i]}+\sum_{i=k+1}^ny_{[i]}x_{[k+1]}\\&=k\left[\sum_{i=1}^kx_{[i]}\left(\frac{1}{k}y_{[i]}\right)+\frac{t}{k}x_{[k+1]}\right]\left(记\sum_{i=k+1}^ny_{[i]}\triangleq t, 则\sum_{i=1}^ky_{[i]}=k-t\right).\end{aligned}$ 下证对 $\forall t\in[0,1]$ , $\sum_{i=1}^kx_{[i]}\left(\frac{1}{k}y_{[i]}\right)+\frac{t}{k}x_{[k+1]}\le\frac{1}{k}\sum_{i=1}^kx_{[i]},$ 从而完成了证明. 而 $\sum_{i=1}^kx_{[i]}\frac{1-y_{[i]}}{k}\overset{y_{[i]}\le 1}{\ge}x_{[k]}\left(1-1+\frac{t}{k}\right)\ge\frac{t}{k}x_{[k+1]}.$ 移项后即得证. 所以 $\sigma_C(\mathbf{x})=\max_{\mathbf{y}\in C}\langle\mathbf{y,x}\rangle\le\frac{1}{k}\sum_{i=1}^kx_{[i]}.$ 而右端上界显然可以取到. 所以不等号变等号, $\sigma_C=f$ . 因此由定理19, 对 $\forall\mathbf{x}\in\mathbb{R}^n$ , $\boxed{\mathrm{prox}_{\lambda f}(\mathbf{x})=\mathbf{x}-\lambda P_{\{\mathbf{y}:\mathbf{e}^T\mathbf{y}=k,\,\mathbf{0\le y\le e}\}}(\mathbf{x}/\lambda).}$ 其中正交投影的计算可见定理11.

例23 (前 $k$ 个模最大分量求和函数的prox) 设 $f:\mathbb{R}^n\to\mathbb{R}$ 定义为 $f(\mathbf{x})=\sum_{i=1}^k\left|x_{\langle i\rangle}\right|,$ 其中 $k\in\{1,2,\ldots,n\}$ , $x_{\langle i\rangle}$ 是 $\mathbf{x}$ 分量中模第 $i$ 大的分量. 类似于例22, 可以证明 $f(\mathbf{x})=\max\left\{\sum_{i=1}^nz_ix_i:\Vert\mathbf{z}\Vert_1\le k,\,\mathbf{-e\le z\le e}\right\}.$ 因此 $f=\sigma_C$ , 其中 $C=\{\mathbf{z}\in\mathbb{R}^n:\Vert\mathbf{z}\Vert_1\le k,\,\mathbf{-e\le z\le e}\}.$ 因此由定理19, 对 $\forall\mathbf{x}\in\mathbb{R}^n$ , $\boxed{\mathrm{prox}_{\lambda f}(\mathbf{x})=\mathbf{x}-\lambda P_{\{\mathbf{y}:\Vert\mathbf{y}\Vert_1\le k,\,\mathbf{-e\le y\le e}\}}(\mathbf{x}/\lambda).}$ 其中正交投影的计算可见例15.

7. Moreau包络

7.1 定义与基本性质

定义2 (Moreau包络) 给定正常闭凸函数 $f:\mathbb{E}\to(-\infty,\infty],\,\mu>0$ , $f$ 的Moreau包络是函数 $M_f^{\mu}(\mathbf{x})=\min_{\mathbf{u}\in\mathbb{E}}\left\{f(\mathbf{u})+\frac{1}{2\mu}\Vert\mathbf{x-u}\Vert^2\right\}.$ 这里 $\mu$ 称作光滑参数(smoothing parameter)⁹. 在下一小节我们会给出关于这一术语的解释. 由第一临近定理, Moreau包络定义中的极小化问题有唯一解, 即 $\mathrm{prox}_{\mu f}(\mathbf{x})$ . 因此 $M_f^{\mu}(\mathbf{x})$ 总是一个实数: $M_f^{\mu}(\mathbf{x})=f(\mathrm{prox}_{\mu f}(\mathbf{x}))+\frac{1}{2\mu}\Vert\mathbf{x}-\mathrm{prox}_{\mu f}(\mathbf{x})\Vert^2.$

例24 (指示函数的Moreau包络) 设 $f=\delta_C$ , 其中 $C\subset\mathbb{E}$ 为一非空闭凸集. 于是 $\mathrm{prox}_{\mu f}(\mathbf{x})=P_C(\mathbf{x})$ . 因此对 $\forall\mathbf{x}\in\mathbb{E}$ , $\boxed{M_{\delta_C}^{\mu}(\mathbf{x})=\delta_C(P_C(\mathbf{x}))+\frac{1}{2\mu}\Vert\mathbf{x}-P_C(\mathbf{x}))\Vert^2=\frac{1}{2\mu}d_C^2(\mathbf{x}).}$

下例中我们将说明欧式范数的Moreau包络是Huber函数, 其定义为 $H_{\mu}(\mathbf{x})=\left\{\begin{array}{ll}\frac{1}{2\mu}\Vert\mathbf{x}\Vert^2, & \Vert\mathbf{x}\Vert\le\mu,\\\Vert\mathbf{x}\Vert-\frac{\mu}{2}, & \Vert\mathbf{x}\Vert>\mu.\end{array}\right.$ 一维Huber函数的图像见下图. 从图中可见随着 $\mu$ 变得越大, 函数也变得越光滑.
在这里插入图片描述

例25 (欧式范数的Moreau包络——Huber函数) 设 $f:\mathbb{E}\to\mathbb{R}$ 为 $f(\mathbf{x})=\Vert\mathbf{x}\Vert$ . 由例8, 对 $\forall\mathbf{x}\in\mathbb{E},\,\mu>0$ , $\mathrm{prox}_{\mu f}(\mathbf{x})=\left(1-\frac{\mu}{\max\{\Vert\mathbf{x}\Vert,\mu\}}\right)\mathbf{x}.$ 因此, $\boxed{M_{\Vert\cdot\Vert}^{\mu}(\mathbf{x})=\Vert\mathrm{prox}_{\mu f}(\mathbf{x})\Vert+\frac{1}{2\mu}\Vert\mathbf{x}-\mathrm{prox}_{\mu f}(\mathbf{x})\Vert^2=\left\{\begin{array}{ll}\frac{1}{2\mu}\Vert\mathbf{x}\Vert^2, & \Vert\mathbf{x}\Vert\le\mu,\\\Vert\mathbf{x}\Vert-\frac{\mu}{2}, & \Vert\mathbf{x}\Vert>\mu\end{array}\right.=H_{\mu}(\mathbf{x}).}$

注意到Moreau包络实际上就是 $f$ 与函数 $\omega_{\mu}(\mathbf{x})=\frac{1}{2\mu}\Vert\mathbf{x}\Vert^2$ 的极小卷积, 即 $M_f^{\mu}=f\square\omega_{\mu}.$ 因此根据第二章的定理8, 若 $f$ 正常闭凸(实际上闭性是不需要的), 则 $M_f^{\mu}$ 是凸函数.

定理20 设 $f:\mathbb{E}\to(-\infty,\infty]$ 为一正常闭凸函数, $\omega_{\mu}$ 定义如上, $\mu>0$ . 则
(i) $M_f^{\mu}=f\square\omega_{\mu}$ ;
(ii) $M_f^{\mu}:\mathbb{E}\to\mathbb{R}$ 是实值凸函数.

再根据极小卷积与共轭运算的关系, 由定理20和第四章定理9, 我们就可以推出Moreau包络共轭函数的表达式.

推论3 设 $f:\mathbb{E}\to(-\infty,\infty]$ 为一正常闭凸函数, $\omega_{\mu}$ 定义如上, $\mu>0$ . 则 $(M_f^{\mu})^*=f^*+\omega_{\frac{1}{\mu}},$ 其中 $\omega_{\frac{1}{\mu}}$ 用 $\Vert\cdot\Vert$ 的对偶范数定义.

下面给出几个Moreau包络的运算规则.

引理4 设 $f:\mathbb{E}\to(-\infty,\infty]$ 为一正常闭凸函数, $\lambda,\,\mu>0$ . 则对 $\forall\mathbf{x}\in\mathbb{E}$ , $\lambda M_f^{\mu}(\mathbf{x})=M_{\lambda f}^{\mu/\lambda}(\mathbf{x}).$

证明: 对 $\forall\mathbf{x}\in\mathbb{E}$ , $\begin{aligned}\lambda M_f^{\mu}(\mathbf{x})&=\lambda\min_{\mathbf{u}}\left\{f(\mathbf{u})+\frac{1}{2\mu}\Vert\mathbf{u-x}\Vert^2\right\}\\&=\min_{\mathbf{u}}\left\{\lambda f(\mathbf{u})+\frac{1}{2\mu/\lambda}\Vert\mathbf{u-x}\Vert^2\right\}\\&=M_{\lambda f}^{\mu/\lambda}(\mathbf{x}).\end{aligned}$

定理21 (可分函数的Moreau包络) 设 $\mathbb{E}=\mathbb{E}_1\times\mathbb{E}_2\times\cdots\times\mathbb{E}_m$ , $f:\mathbb{E}\to(-\infty,\infty]$ 定义为 $f(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)=\sum_{i=1}^mf_i(\mathbf{x}_i),\quad\mathbf{x}_1\in\mathbb{E}_1,\,\mathbf{x}_2\in\mathbb{E}_2,\,\ldots,\mathbf{x}_m\in\mathbb{E}_m,$ 这里对 $\forall i$ , $f_i:\mathbb{E}_i\to(-\infty,\infty]$ 是正常闭凸函数. 则给定 $\mu>0$ , 对 $\forall\mathbf{x}_1\in\mathbb{E}_1,\,\mathbf{x}_2\in\mathbb{E}_2,\,\ldots,\mathbf{x}_m\in\mathbb{E}_m$ , $M_f^{\mu}(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)=\sum_{i=1}^mM_{f_i}^{\mu}(\mathbf{x}_i).$

证明: 对 $\forall\mathbf{x}_1\in\mathbb{E}_1,\,\mathbf{x}_2\in\mathbb{E}_2,\ldots,\mathbf{x}_m\in\mathbb{E}_m$ , 记 $\mathbf{x}=(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)$ , $\begin{aligned}M_f^{\mu}(\mathbf{x})&=\min_{\mathbf{u}_i\in\mathbb{E}_i,\,i=1,2,\ldots,m}\left\{f(\mathbf{u}_1,\mathbf{u}_2,\ldots,\mathbf{u}_m)+\frac{1}{2\mu}\Vert(\mathbf{u}_1,\mathbf{u}_2,\ldots,\mathbf{u}_m)-\mathbf{x}\Vert^2\right\}\\&=\min_{\mathbf{u}_i\in\mathbb{E}_i,\,i=1,2,\ldots,m}\left\{\sum_{i=1}^mf_i(\mathbf{u}_i)+\frac{1}{2\mu}\sum_{i=1}^m\Vert\mathbf{u}_i-\mathbf{x}_i\Vert^2\right\}\\&=\sum_{i=1}^m\min_{\mathbf{u}_i\in\mathbb{E}_i}\left\{f_i(\mathbf{u}_i)+\frac{1}{2\mu}\Vert\mathbf{u}_i-\mathbf{x}_i\Vert^2\right\}\\&=\sum_{i=1}^mM_{f_i}^{\mu}(\mathbf{x}_i).\end{aligned}$

例26 ( $\ell_1$ -范数的Moreau包络) 考虑函数 $f:\mathbb{R}^n\to\mathbb{R}$ 定义为 $f(\mathbf{x})=\Vert\mathbf{x}\Vert_1$ . 注意到 $f(\mathbf{x})=\Vert\mathbf{x}\Vert_1=\sum_{i=1}^ng(x_i),$ 其中 $g (t) = ∣ t ∣$ . 由例25, $M_g^{\mu}=H_{\mu}$ . 再由定理21, 我们就有对 $\forall\mathbf{x}\in\mathbb{R}^n$ , $M_f^{\mu}(\mathbf{x})=\sum_{i=1}^nM_g^{\mu}(x_i)=\sum_{i=1}^nH_{\mu}(x_i).$

7.2 Moreau包络的可微性

定理22 (Moreau包络的光滑性) 设 $f:\mathbb{E}\to(-\infty,\infty]$ 为一正常闭凸函数, $\mu>0$ . 则 $M_f^{\mu}$ 是 $\mathbb{E}$ 上的 $\frac{1}{\mu}$ -光滑函数¹⁰, 且对 $\forall\mathbf{x}\in\mathbb{E}$ , $\nabla M_f^{\mu}(\mathbf{x})=\frac{1}{\mu}(\mathbf{x}-\mathrm{prox}_{\mu f}(\mathbf{x})).$

证明: 由定理20(i), $M_f^{\mu}=f\square\omega_{\mu}$ , 其中 $\omega_{\mu}=\frac{1}{2\mu}\Vert\cdot\Vert^2$ . 根据第五章定理9(令 $\omega=\omega_{\mu},\,L=\frac{1}{\mu}$ ), 就有 $M_f^{\mu}$ 是 $\frac{1}{\mu}$ -光滑的. 由于 $\mathrm{prox}_{\mu f}(\mathbf{x})=\arg\min_{\mathbf{u}\in\mathbb{E}}\left\{f(\mathbf{u})+\frac{1}{2\mu}\Vert\mathbf{u-x}\Vert^2\right\},$ 因此第五章定理9中的 $\mathbf{u(x)}$ 就是 $\mathrm{prox}_{\mu f}(\mathbf{x})$ 且 $\nabla M_f^{\mu}(\mathbf{x})=\nabla \omega_{\mu}(\mathbf{x-u(x)})=\frac{1}{\mu}(\mathbf{x}-\mathrm{prox}_{\mu f}(\mathbf{x})).$

例27 ( $\frac{1}{2}d_C^2$ 的 $1$ -光滑性) 此前, 我们在第三章例9推导了 $\frac{1}{2}d_C^2$ 的梯度表达式, 其 $1$ -光滑性也在第五章的例3和例13中被两次讨论. 这里我们再从Moreau包络的角度出发. 设 $C\subset\mathbb{E}$ 为一非空闭凸集. 由例24, $\frac{1}{2}d_C^2=M_{\delta C}^1$ . 于是由定理22, $\frac{1}{2}d_C^2$ 是 $1$ -光滑的且 $\nabla\left(\frac{1}{2}d_C^2\right)(\mathbf{x})=\mathbf{x}-\mathrm{prox}_{\delta_C}(\mathbf{x})=\mathbf{x}-P_C(\mathbf{x}).$

例28 (Huber函数的光滑性) Huber函数的定义为 $H_{\mu}(\mathbf{x})=\left\{\begin{array}{ll}\frac{1}{2\mu}\Vert\mathbf{x}\Vert^2, & \Vert\mathbf{x}\Vert\le\mu\\\Vert\mathbf{x}\Vert-\frac{\mu}{2} & \Vert\mathbf{x}\Vert>\mu.\end{array}\right.$ 由例25, $H_{\mu}=M_f^{\mu}$ , 其中 $f$ 是欧式范数 $f(\mathbf{x})=\Vert\mathbf{x}\Vert$ . 于是由定理22, $H_{\mu}$ 是 $\frac{1}{\mu}$ -光滑函数且 $\begin{aligned}\nabla H_{\mu}(\mathbf{x})&=\frac{1}{\mu}\left(\mathbf{x}-\mathrm{prox}_{\mu f}(\mathbf{x})\right)\\&\overset{例8}{=}\frac{1}{\mu}\left(\mathbf{x}-\left(1-\frac{\mu}{\max\{\Vert\mathbf{x}\Vert,\mu\}}\right)\mathbf{x}\right)\\&=\left\{\begin{array}{ll}\frac{1}{\mu}\mathbf{x}, & \Vert\mathbf{x}\Vert\le\mu,\\\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}, & \Vert\mathbf{x}\Vert>\mu,\end{array}\right.\end{aligned}$ 这也说明Huber函数在 $\Vert\mathbf{x}\Vert=\mu$ 的位置是光滑连接的.

7.3 Moreau包络的prox

下面的定理23表明, 对一个正常闭凸函数 $f$ , 已知其prox, 则可以进一步计算出其Moreau包络的prox.

定理23 (Moreau包络的prox) 设 $f:\mathbb{E}\to(-\infty,\infty]$ 为一正常闭凸函数, $\mu>0$ . 则对 $\forall\mathbf{x}\in\mathbb{E}$ , $\mathrm{prox}_{M_f^{\mu}}(\mathbf{x})=\mathbf{x}+\frac{1}{\mu+1}\left(\mathrm{prox}_{(\mu+1)f}(\mathbf{x})-\mathbf{x}\right).$

证明: 首先注意到 $\min_{\mathbf{u}}\left\{M_f^{\mu}(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}=\min_{\mathbf{u}}\min_{\mathbf{y}}\left\{f(\mathbf{y})+\frac{1}{2\mu}\Vert\mathbf{u-y}\Vert^2+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}.$ 交换极小化次序可得 $\min_{\mathbf{y}}\min_{\mathbf{u}}\left\{f(\mathbf{y})+\frac{1}{2\mu}\Vert\mathbf{u-y}\Vert^2+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}.$ 内部子问题的最优解于梯度为 $\mathbf{0}$ 处取到, 即 $\frac{1}{\mu}(\mathbf{u-y})+(\mathbf{u-x})=\mathbf{0}\Rightarrow\mathbf{u}=\mathbf{u}_{\mu}=\frac{\mu\mathbf{x+y}}{\mu+1}.$ 因此, 内部子问题的最优值为 $\begin{aligned}f(\mathbf{y})+\frac{1}{2\mu}\Vert\mathbf{u}_{\mu}-\mathbf{y}\Vert^2+\frac{1}{2}\Vert\mathbf{u}_{\mu}-\mathbf{x}\Vert^2&=f(\mathbf{y})+\frac{1}{2\mu}\left\Vert\frac{\mu\mathbf{x}-\mu\mathbf{y}}{\mu+1}\right\Vert^2+\frac{1}{2}\left\Vert\frac{\mathbf{y-x}}{\mu+1}\right\Vert^2\\&=f(\mathbf{y})+\frac{1}{2(\mu+1)}\Vert\mathbf{x-y}\Vert^2.\end{aligned}$ 所以原问题最优解 $\mathbf{u}$ 的表达式中 $\mathbf{y}$ 是 $\min_{\mathbf{y}}\left\{f(\mathbf{y)}+\frac{1}{2(\mu+1)}\Vert\mathbf{x-y}\Vert^2\right\}$ 的解, 也即 $\mathbf{y}=\mathrm{prox}_{(\mu+1)f}(\mathbf{x})$ . 总之, $\mathrm{prox}_{M_f^{\mu}}=\frac{1}{\mu+1}\left(\mu\mathbf{x}+\mathrm{prox}_{(\mu+1)f}(\mathbf{x})\right).$

结合定理23与引理4, 就得到下面的推论4.

推论4 设 $f:\mathbb{E}\to(-\infty,\infty]$ 为一正常闭凸函数, $\lambda,\mu>0$ . 则对 $\forall\mathbf{x}\in\mathbb{E}$ , $\mathrm{prox}_{\lambda M_f^{\mu}}(\mathbf{x})=\mathbf{x}+\frac{\lambda}{\mu+\lambda}\left(\mathrm{prox}_{(\mu+\lambda)f}(\mathbf{x})-\mathbf{x}\right).$

证明: $\mathrm{prox}_{\lambda M_f^{\mu}}(\mathbf{x})=\mathrm{prox}_{M_{\lambda f}^{\mu/\lambda}}(\mathbf{x})=\mathbf{x}+\frac{\lambda}{\mu+\lambda}\left(\mathrm{prox}_{(\mu+\lambda)f}(\mathbf{x})-\mathbf{x}\right).$

例29 ( $\frac{\lambda}{2}d_C^2$ 的prox) 设 $C\subset\mathbb{E}$ 为一非空闭凸集, $\lambda>0$ . 考虑函数 $f=\frac{1}{2}d_C^2$ . 由例27, $f=M_g^1$ , 其中 $g=\delta_C$ . 由于 $\mathrm{prox}_g=P_C$ , 因此由推论4, 对 $\forall\mathbf{x}\in\mathbb{E}$ , $\mathrm{prox}_{\lambda f}(\mathbf{x})=\mathrm{prox}_{\lambda M_g^1}(\mathbf{x})=\mathbf{x}+\frac{\lambda}{\lambda+1}\left(\mathrm{prox}_{(\lambda+1)g}(\mathbf{x})-\mathbf{x}\right)=\mathbf{x}+\frac{\lambda}{\lambda+1}(P_C(\mathbf{x})-\mathbf{x}).$ $\boxed{\mathrm{prox}_{\frac{\lambda}{2}d_C^2}(\mathbf{x})=\frac{\lambda}{\lambda+1}P_C(\mathbf{x})+\frac{1}{\lambda+1}\mathbf{x}.}$

例30 (Huber函数的prox) 考虑函数 $f(\mathbf{x})=\lambda H_{\mu}(\mathbf{x})$ . 由例25, $H_{\mu}=M_g^{\mu}$ , 其中 $g(\mathbf{x})=\Vert\mathbf{x}\Vert$ . 因此由推论4和例8, 对 $\forall\lambda>0,\,\mathbf{x}\in\mathbb{E}$ , $\begin{aligned}\mathrm{prox}_{\lambda H_{\mu}}(\mathbf{x})&=\mathrm{prox}_{\lambda M_g^{\mu}}(\mathbf{x})=\mathbf{x}+\frac{\lambda}{\mu+\lambda}\left(\mathrm{prox}_{(\mu+\lambda)g}(\mathbf{x})-\mathbf{x}\right)\\&=\mathbf{x}+\frac{\lambda}{\mu+\lambda}\left(\left(1-\frac{\mu+\lambda}{\max\{\Vert\mathbf{x}\Vert,\mu+\lambda\}}\right)\mathbf{x}-\mathbf{x}\right).\end{aligned}$ 简化后可得 $\boxed{\mathrm{prox}_{\lambda H_{\mu}}(\mathbf{x})=\left(1-\frac{\lambda}{\max\{\Vert\mathbf{x}\Vert,\mu+\lambda\}}\right)\mathbf{x}.}$

类似于临近算子的Moreau分解公式, 我们也可以推导出对Moreau包络函数的分解公式.

定理24 (Moreau包络分解) 设 $f:\mathbb{E}\to(-\infty,\infty]$ 为一正常闭凸函数, $\mu>0$ . 则对 $\forall\mathbf{x}\in\mathbb{E}$ , $M_f^{\mu}(\mathbf{x})+M_{f^*}^{1/\mu}(\mathbf{x}/\mu)=\frac{1}{2\mu}\Vert\mathbf{x}\Vert^2.$

证明: 对 $\forall\mathbf{x}\in\mathbb{E}$ , $M_f^{\mu}(\mathbf{x})=\min_{\mathbf{u}\in\mathbb{E}}\{f(\mathbf{u})+\psi(\mathbf{u})\},$ 其中 $\psi(\mathbf{u})=\frac{1}{2\mu}\Vert\mathbf{u-x}\Vert^2$ . 由Fenchel对偶定理, $M_f^{\mu}(\mathbf{x})=\max_{\mathbf{v}\in\mathbb{E}}\{-f^*(\mathbf{v})-\psi^*(-\mathbf{v})\}=-\min_{\mathbf{v}\in\mathbb{E}}\{f^*(\mathbf{v})+\psi^*(\mathbf{-v})\}.$ 记 $\phi(\cdot)=\frac{1}{2}\Vert\cdot-\mathbf{x}\Vert^2$ . 于是 $\phi^*(\mathbf{v})=\frac{1}{2}\Vert\mathbf{v}\Vert^2+\langle\mathbf{x,v}\rangle.$ 因为 $\psi=\frac{1}{\mu}\phi$ , 于是由第四章定理7(i), $\psi^*(\mathbf{v})=\frac{1}{\mu}\phi^*(\mu\mathbf{v})=\frac{\mu}{2}\Vert\mathbf{v}\Vert^2+\langle\mathbf{x,v}\rangle.$ 因此 $\begin{aligned}M_f^{\mu}(\mathbf{x})&=-\min_{\mathbf{v}\in\mathbb{E}}\left\{f^*(\mathbf{v})+\frac{\mu}{2}\Vert\mathbf{v}\Vert^2-\langle\mathbf{x,v}\rangle\right\}\\&=-\min_{\mathbf{v}\in\mathbb{E}}\left\{f^*(\mathbf{v})+\frac{\mu}{2}\Vert\mathbf{v}-\mathbf{x}/\mu\Vert^2-\frac{1}{2\mu}\Vert\mathbf{x}\Vert^2\right\}\\&=\frac{1}{2\mu}\Vert\mathbf{x}\Vert^2-M_{f^*}^{1/\mu}(\mathbf{x}/\mu).\end{aligned}$

8. 关于prox计算的其它结论

本节我们给出一些prox计算的特殊例子. 它们的证明不依赖于本章中的任何结论.

8.1 $\mathbb{R}^n$ 上线性变换的范数

引理5 设 $f:\mathbb{R}^n\to\mathbb{R}$ 定义为 $f(\mathbf{x})=\Vert\mathbf{Ax}\Vert_2$ , 其中 $\mathbf{A}\in\mathbb{R}^{m\times n}$ 行满秩, $\lambda>0$ . 则 $\mathrm{prox}_{\lambda f}(\mathbf{x})=\left\{\begin{array}{ll}\mathbf{x}-\mathbf{A}^T\left(\mathbf{AA}^T\right)^{-1}\mathbf{Ax}, & \left\Vert\left(\mathbf{AA}^T\right)^{-1}\mathbf{Ax}\right\Vert_2\le\lambda,\\\mathbf{x}-\mathbf{A}^T\left(\mathbf{AA}^T+\alpha^*\mathbf{I}\right)^{-1}\mathbf{Ax}, & \left\Vert\left(\mathbf{AA}^T\right)^{-1}\mathbf{Ax}\right\Vert_2>\lambda,\end{array}\right.$ 其中 $\alpha^*$ 为严格单调递减函数 $g(\alpha)=\left\Vert\left(\mathbf{AA}^T+\alpha I\right)^{-1}\mathbf{Ax}\right\Vert_2^2-\lambda^2$ 的唯一正根.

证明: $\mathrm{prox}_{\lambda f}(\mathbf{x})$ 是 $\min_{\mathbf{u}\in\mathbb{R}^n}\left\{\lambda\Vert\mathbf{Au}\Vert_2+\frac{1}{2}\Vert\mathbf{u-x}\Vert_2^2\right\}$ 的唯一最优解. 这等价于 $\min_{\mathbf{u}\in\mathbb{R}^n,\,\mathbf{z}\in\mathbb{R}^m}\left\{\frac{1}{2}\Vert\mathbf{u-x}\Vert_2^2+\lambda\Vert\mathbf{z}\Vert_2:\mathbf{z=Au}\right\}.$ 其Lagrange函数为 $\begin{aligned}L(\mathbf{u,z;y})&=\frac{1}{2}\Vert\mathbf{u-x}\Vert_2^2+\lambda\Vert\mathbf{z}\Vert_2+\mathbf{y}^T(\mathbf{z-Au})\\&=\left[\frac{1}{2}\Vert\mathbf{u-x}\Vert_2^2-\left(\mathbf{A}^T\mathbf{y}\right)^T\mathbf{u}\right]+\left[\lambda\Vert\mathbf{z}\Vert_2+\mathbf{y}^T\mathbf{z}\right].\end{aligned}$ 由于Lagrange函数对变量 $\mathbf{u,z}$ 是可分的, 因此对偶问题的目标函数可以写作 $\min_{\mathbf{u,z}}L(\mathbf{u,z;y})=\min_{\mathbf{u}}\left[\frac{1}{2}\Vert\mathbf{u-x}\Vert_2^2-\left(\mathbf{A}^T\mathbf{y}\right)^T\mathbf{u}\right]+\min_{\mathbf{z}}\left[\lambda\Vert\mathbf{z}\Vert_2+\mathbf{y}^T\mathbf{z}\right].$ 关于 $\mathbf{u}$ 的极小化问题的最优解为 $\tilde\mathbf{u}=\mathbf{x}+\mathbf{A}^T\mathbf{y}$ , 对应的最优值为 $-\frac{1}{2}\mathbf{y}^T\mathbf{AA}^T\mathbf{y}-\left(\mathbf{Ax}\right)^T\mathbf{y}.$ 而关于 $\mathbf{z}$ 的极小化问题可以写作 $\min_{\mathbf{z}}\left[\lambda\Vert\mathbf{z}\Vert_2+\mathbf{y}^T\mathbf{z}\right]=-\max_{\mathbf{z}}\left[(\mathbf{-y})^T\mathbf{z}-\lambda\Vert\mathbf{z}\Vert_2\right]=-g^*(\mathbf{-y}),$ 其中 $g(\cdot)=\lambda\Vert\cdot\Vert_2$ . 由于 $g^*(\mathbf{w})=\lambda\delta_{B_{\Vert\cdot\Vert_2}[\mathbf{0},1]}(\mathbf{w}/\lambda)=\delta_{B_{\Vert\cdot\Vert_2}[\mathbf{0},\lambda]}$ (根据第四章定理7(i)与4.12节), 因此 $\min_{\mathbf{z}}\left[\lambda\Vert\mathbf{z}\Vert_2+\mathbf{y}^T\mathbf{z}\right]=\left\{\begin{array}{ll}0, & \Vert\mathbf{y}\Vert_2\le\lambda,\\-\infty, & \Vert\mathbf{y}\Vert_2>\lambda.\end{array}\right.$ 于此, 我们就有对偶问题: $\max_{\mathbf{y}\in\mathbb{R}^m}\left\{-\frac{1}{2}\mathbf{y}^T\mathbf{AA}^T\mathbf{y}-\left(\mathbf{Ax}\right)^T\mathbf{y}:\Vert\mathbf{y}\Vert_2\le\lambda\right\}.$ 注意到强对偶性成立. 我们首先将对偶问题写成等价的 $\min_{\mathbf{y}\in\mathbb{R}^m}\left\{\frac{1}{2}\mathbf{y}^T\mathbf{AA}^T\mathbf{y}+\left(\mathbf{Ax}\right)^T\mathbf{y}:\Vert\mathbf{y}\Vert_2^2\le\lambda^2\right\}.$ 我们已经知道 $\mathrm{prox}_{\lambda f}(\mathbf{x})=\mathbf{x}+\mathbf{A}^T\mathbf{y},$ 其中 $\mathbf{y}$ 是对偶问题最优解. 而对偶问题是凸问题且满足Slater条件(见第三章定理28), 因此 $\mathbf{y}$ 为其最优解当且仅当存在 $\alpha^*$ , 使得 $\begin{aligned}\left(\mathbf{AA}^T+\alpha^*\mathbf{I}\right)\mathbf{y}+\mathbf{Ax}&=\mathbf{0},\\\alpha^*\left(\Vert\mathbf{y}\Vert_2^2-\lambda^2\right)&=0,\\\Vert\mathbf{y}\Vert_2^2&\le\lambda^2,\\\alpha^*&\ge0.\end{aligned}$

情形一: $\alpha^*=0$ . 于是 $\mathbf{y}=-\left(\mathbf{AA}^T\right)^{-1}\mathbf{Ax}.$ 这时 $\mathbf{y}$ 是最优解当且仅当 $\left\Vert\left(\mathbf{AA}^T\right)^{-1}\mathbf{Ax}\right\Vert_2\le\lambda$ . 此时, $\mathrm{prox}_{\lambda f}(\mathbf{x})=\mathbf{x}-\mathbf{A}^T\left(\mathbf{AA}^T\right)^{-1}\mathbf{Ax}.$
情形二: 若 $\left\Vert\left(\mathbf{AA}^T\right)^{-1}\mathbf{Ax}\right\Vert_2>\lambda$ , 则 $\alpha^*>0$ . 此时 $\mathbf{y}=-\left(\mathbf{AA}^T+\alpha^*\mathbf{I}\right)^{-1}\mathbf{Ax},$ 且由互补松弛条件, $\Vert\mathbf{y}\Vert_2^2=\lambda^2.$ 二者结合即得, $\alpha^*$ 为函数 $g(\alpha)=\left\Vert\left(\mathbf{AA}^T+\alpha\mathbf{I}\right)^{-1}\mathbf{Ax}\right\Vert_2^2-\lambda^2$ 的正根. 可以验证, $g(\alpha)$ 在 $\alpha\ge0$ 时是严格单调递减函数. 因此 $\alpha^*$ 唯一确定.

8.2 $\ell_1$ -范数平方

$\ell_1$ -范数的prox就是软阈值函数(见例2). 但 $\ell_1$ -范数平方的prox就不那么好求了. 在下面的引理6中, 我们先证明 $\Vert\mathbf{x}\Vert_1^2$ 是一个优化问题的最优值. 其中要用到函数 $\varphi(s,t)=\left\{\begin{array}{ll}\frac{s^2}{t}, & t>0,\\0, & s=0,\,t=0,\\\infty, & 其它.\end{array}\right.$ 根据第二章例13, $\varphi$ 是闭凸函数(尽管它在 $(s, t) = (0, 0)$ 处并不连续).

引理6 ( $\Vert\cdot\Vert_1^2$ 的变分表示) 对 $\forall\mathbf{x}\in\mathbb{R}^n$ , 有 $\min_{\bm{\lambda}\in\Delta_n}\sum_{j=1}^n\varphi(x_j,\lambda_j)=\Vert\mathbf{x}\Vert_1^2.$ 此问题的一个最优解为 $\tilde\lambda_j=\left\{\begin{array}{ll}\frac{|x_j|}{\Vert\mathbf{x}\Vert_1}, & \mathbf{x\ne0},\\\frac{1}{n}, & \mathbf{x=0},\end{array}\right.\quad j=1,2,\ldots,n.$

证明: 根据闭函数的Weierstrass定理, 此问题必有最优解. 我们记之为 $\bm{\lambda}^*\in\Delta_n$ . 定义 $\begin{aligned}I_0&=\{i\in\{1,2,\ldots,n\}:\lambda_i^*=0\},\\I_1&=\{i\in\{1,2,\ldots,n\}:\lambda_i^*>0\}.\end{aligned}$ 由 $I_0,I_1$ 的定义, $\sum_{i\in I_1}\lambda_i^*=\sum_{i=1}^n\lambda_i^*=1.$ 对于 $i\in I_0$ , 必有 $x_i=0$ . 否则 $\varphi(x_i,\lambda_i^*)=\infty$ . 由Cauchy-Schwarz不等式, $\sum_{j=1}^n|x_j|=\sum_{j\in I_1}|x_j|=\sum_{j\in I_1}\frac{|x_j|}{\sqrt{\lambda_j^*}}\sqrt{\lambda_j^*}\le\sqrt{\sum_{j\in I_1}\frac{x_j^2}{\lambda_j^*}}\cdot\sqrt{\sum_{j\in I_1}\lambda_j^*}=\sqrt{\sum_{j\in I_i}\frac{x_j^2}{\lambda_j^*}}.$ 于是 $\sum_{j=1}^n\varphi(x_j,\lambda_j^*)=\sum_{j\in I_1}\varphi(x_j,\lambda_j^*)=\sum_{j\in I_1}\frac{x_j^2}{\lambda_j^*}\ge\Vert\mathbf{x}\Vert_1^2.$ 另一方面, 由于 $\bm{\lambda}^*$ 是问题的最优解, 所以 $\sum_{j=1}^n\varphi(x_j,\lambda_j^*)\le\sum_{j=1}^n\varphi(x_j,\tilde\lambda_j)=\Vert\mathbf{x}\Vert_1^2.$ 因此, 问题的最优值就是 $\Vert\mathbf{x}\Vert_1^2$ , 且 $\tilde\bm{\lambda}$ 是一个最优解.

引理7 ( $\Vert\cdot\Vert_1^2$ 的prox) 设 $f:\mathbb{R}^n\to\mathbb{R}$ 定义为 $f(\mathbf{x})=\Vert\mathbf{x}\Vert_1^2$ , $\rho>0$ . 则 $\mathrm{prox}_{\rho f}(\mathbf{x})=\left\{\begin{array}{ll}\left(\frac{\lambda_ix_i}{\lambda_i+2\rho}\right)_{i=1}^n, & \mathbf{x\ne0},\\\mathbf{0}, & \mathbf{x=0},\end{array}\right.$ 其中 $\lambda_i=\left[\frac{\sqrt{\rho}|x_i|}{\sqrt{\mu^*}}-2\rho\right]_+$ , $\mu^*$ 为单调递减函数 $\psi(\mu)=\sum_{i=1}^n\left[\frac{\sqrt{\rho}|x_i|}{\sqrt{\mu}}-2\rho\right]_+-1$ 的任一正根.

证明: 若 $\mathbf{x=0}$ , 则显然有 $\mathrm{prox}_{\rho f}(\mathbf{x})=\arg\min_{\mathbf{u}}\{\frac{1}{2}\Vert\mathbf{u}\Vert_2^2+\rho\Vert\mathbf{u}\Vert_1^2\}=\mathbf{0}$ . 现假设 $\mathbf{x\ne0}$ . 由引理6, $\mathbf{u}=\mathrm{prox}_{\rho f}(\mathbf{x})$ 当且仅当它就是 $\min_{\mathbf{u}\in\mathbb{R}^n,\,\bm{\lambda}\in\Delta_n}\left\{\frac{1}{2}\Vert\mathbf{u-x}\Vert_2^2+\rho\sum_{i=1}^n\varphi(u_i,\lambda_i)\right\}$ 最优解中的 $\mathbf{u}$ . 首先对 $\mathbf{u}$ 极小化, 就有 $u_i=\frac{\lambda_ix_i}{\lambda_i+2\rho}$ ¹¹, 问题变成 $\begin{array}{ll}\min_{\bm{\lambda}} & \sum\limits_{i=1}^n\dfrac{\rho x_i^2}{\lambda+2\rho}\\\mathrm{s.t.} & \mathbf{e}^T\bm{\lambda}=1,\\&\bm{\lambda}\ge\mathbf{0}.\end{array}$ 注意到此问题满足强对偶性. Lagrange函数为 $L(\bm{\lambda};\mu)=\sum_{i=1}^n\left(\frac{\rho x_i^2}{\lambda+2\rho}+\lambda_i\mu\right)-\mu.$ $\bm{\lambda}^*$ 为最优解当且仅当存在 $\mu^*$ 使得 $\begin{aligned}\bm{\lambda}^*&\in\arg\min_{\bm{\lambda}\ge\mathbf{0}}L(\bm{\lambda};\mu^*),\\\mathbf{e}^T\bm{\lambda}^*&=1.\end{aligned}$ 由于最小值有限且可取到, 又 $\mathbf{x\ne0}$ , 因此必有 $\mu^*>0$ (若 $\mu^*=0$ , 则最小值无法取到; 若 $\mu^*<0$ , 则最小值为 $-\infty$ ). 求导置零可得 $\lambda_i^*=\left[\frac{\sqrt{\rho}|x_i|}{\sqrt{\mu}}-2\rho\right]_+.$ 因此 $\mu^*$ 就必须满足 $\sum_{i=1}^n\left[\frac{\sqrt{\rho}|x_i|}{\sqrt{\mu}}-2\rho\right]_+=1.$

8.3 到 $s$ -稀疏向量集上的正交投影

设 $s\in\{1,2,\ldots,n\}$ , 考虑集合 $C_s=\{\mathbf{x}\in\mathbb{R}^n:\Vert\mathbf{x}\Vert_0\le s\}.$ 集合 $C_s$ 包含了所有 $s$ -稀疏向量, 即那些至多有 $s$ 个非零元的向量.

$C_s$ 不是凸集. 例如 $n = 2$ , $(0,1)^T,(1,0)^T\in C_1$ , 但 $(0.5,0.5)^T=0.5(0,1)^T+0.5(1,0)^T\notin C_1$ .
$C_s$ 是闭集. 它是闭函数 $\Vert\cdot\Vert_0$ 的水平集(见第二章例3).

由定理2, $P_{C_s}=\mathrm{prox}_{\delta_{C_s}}$ 非空, 但未必是单点集.

为进一步给出 $P_{C_s}$ 的表示, 下面引入一些记号: 对 $\mathbf{x}\in\mathbb{R}^n$ 与某个指标集 $S\subset\{1,2,\ldots,n\}$ ,

$\mathbf{x}_S$ 是 $\mathbf{x}$ 中那些指标在 $S$ 中的分量组成的向量;
矩阵 $\mathbf{U}_S$ 是单位阵中那些指标在 $S$ 中的列向量组成的子阵;
集合 $S^c$ 是 $S$ 在 $\{1,2,\ldots,n\}$ 的补集: $S^c=\{1,2,\ldots,n\}\setminus S$ ;
$x_{\langle i\rangle}$ 是 $\mathbf{x}$ 按模第 $i$ 大的分量.

下面的引理8表明, $P_{C_s}(\mathbf{x})$ 由具有 $\mathbf{x}$ 按模前 $s$ 大的分量组成的向量构成. 正是因为 $\mathbf{x}$ 中可能有相同的分量, $P_{C_s}(\mathbf{x})$ 才有可能不是单点集.

引理8 (到 $C_s$ 上的正交投影) 设 $s\in\{1,2,\ldots,n\},\,\mathbf{x}\in\mathbb{R}^n$ . 则 $P_{C_s}(\mathbf{x})=\left\{\mathbf{U}_S\mathbf{x}_S:|S|=s,\,S\subset\{1,2,\ldots,n\},\,\sum_{i\in S}|x_i|=\sum_{i=1}^s\left|x_{\langle i\rangle}\right|\right\}.$

证明: 按 $C_s$ 的定义, 它可以写成 $C_s=\bigcup_{S\subset\{1,2,\ldots,n\},\,|S|=s}A_S,$ 其中 $A_S=\{\mathbf{x}\in\mathbb{R}^n:\mathbf{x}_{S^c}=\mathbf{0}\}$ . 注意这里 $A_S$ 是闭凸集, 因此我们可以把 $P_{A_S}(\mathbf{x})$ 看成是向量. 而对于有限个闭凸集 $A_S$ , 我们有 $P_{C_s}(\mathbf{x})=P_{\bigcup_{S\subset\{1,2,\ldots,n\},|S|=s}A_S}(\mathbf{x})\subset\bigcup_{S\subset\{1,2,\ldots,n\},\,|S|=s}\{P_{A_S}(\mathbf{x})\}.$ 事实上, 对 $\forall\mathbf{y}\in P_{C_s}(\mathbf{x})$ , $\mathbf{y}\in C_s=\bigcup_{S\subset\{1,2,\ldots,n\},\,|S|=s}A_S$ . 因此必存在某个 $S\subset\{1,2,\ldots,n\}:|S|=s$ , 使得 $\mathbf{y}\in A_S$ . 一方面, $\mathbf{y}\in P_{C_s}(\mathbf{x})\Rightarrow\Vert\mathbf{y}-\mathbf{x}\Vert=\min_{\mathbf{u}\in C_s}\Vert\mathbf{u-x}\Vert\le\min_{\mathbf{u}\in A_S}\Vert\mathbf{u-x}\Vert,$ 另一方面, $\mathbf{y}\in A_S\Rightarrow\Vert\mathbf{y-x}\Vert\ge\min_{\mathbf{u}\in A_S}\Vert\mathbf{u-x}\Vert.$ 联立二者可得 $\Vert\mathbf{y-x}\Vert=\min_{\mathbf{u}\in A_S}\Vert\mathbf{u-x}\Vert\Rightarrow\mathbf{y}=P_{A_S}(\mathbf{x})\in\bigcup_{S\subset\{1,2,\ldots,n\},\,|S|=s}\{P_{A_S}(\mathbf{x})\}.$ 由以上分析还可知, $P_{C_s}(\mathbf{x})=\left\{P_{A_S}(\mathbf{x}):\Vert P_{A_S}(\mathbf{x})-\mathbf{x}\Vert=\min_{S'\subset\{1,2,\ldots,n\},\,|S'|=s}\left\Vert P_{A_{S'}}(\mathbf{x})-\mathbf{x}\right\Vert\right\}.$ 而 $P_{A_S}(\mathbf{x})$ 则是问题 $\min_{\mathbf{y}\in\mathbb{R}^n}\left\{\Vert\mathbf{y-x}\Vert_2^2:\mathbf{y}_{S^c}=\mathbf{0}\right\}=\min_{\mathbf{y}\in\mathbb{R}^n}\left\{\Vert\mathbf{y}_S-\mathbf{x}_S\Vert_2^2+\Vert\mathbf{x}_{S^c}\Vert_2^2:\mathbf{y}_{S^c}=\mathbf{0}\right\}$ 的最优解, 显然是 $\mathbf{y}_S=\mathbf{x}_S,\,\mathbf{y}_{S^c}=\mathbf{0}$ , 即 $\mathbf{y}=\mathbf{U}_S\mathbf{x}_S$ , 从而最优值为 $\Vert\mathbf{x}_{S^c}\Vert_2^2$ . 因此 $P_{C_s}(\mathbf{x})$ 中的向量会形如 $\mathbf{U}_S\mathbf{x}_S$ , 其中 $S$ 基数为 $s$ , 且应当具有最小的 $\Vert\mathbf{x}_{S^c}\Vert_2^2$ . 这就等价于 $S:|S|=s,\,S\subset\{1,2,\ldots,n\},\,\sum_{i\in S}|x_i|=\sum_{i=1}^s\left|x_{\langle i\rangle}\right|.$

例31 假定 $n = 4$ . 则 $P_{C_2}\left[(2,3,-2,1)^T\right]=\left\{(2,3,0,0)^T,(0,3,-2,0)^T\right\}.$

9. 特殊函数的临近计算小结

$f(\mathbf{x})$	$\mathrm{dom}(f)$	$\mathrm{prox}_f(\mathbf{x})$	假设条件	参考
$\frac{1}{2}\mathbf{x}^T\mathbf{Ax}+\mathbf{b}^T\mathbf{x}+c$	$\mathbb{R}^n$	$(\mathbf{A+I})^{-1}\mathbf{(x-b)}$	$\mathbf{A}\in\mathbb{S}_+^n,\,\mathbf{b}\in\mathbb{R}^n,\,c\in\mathbb{R}$	2.3节
$\lambda x^3$	$\mathbb{R}_+$	$\frac{-1+\sqrt{1+12\lambda[x]_+}}{6\lambda}$	$\lambda>0$	引理1
$\mu x$	$[0,\alpha]\cap\mathbb{R}$	$\min\{\max\{x-\mu,0\},\alpha\}$	$\mu\in\mathbb{R},\,\alpha\in[0,\infty]$	例5
$\lambda\Vert\mathbf{x}\Vert$	$\mathbb{E}$	$\left(1-\frac{\lambda}{\max\{\Vert\mathbf{x}\Vert,\lambda\}}\right)\mathbf{x}$	$\Vert\cdot\Vert$ 为欧式范数, $\lambda>0$	例8
$-\lambda\Vert\mathbf{x}\Vert$	$\mathbb{E}$	$\begin{array}{ll}\left(1+\frac{\lambda}{\Vert\mathbf{x}\Vert}\right)\mathbf{x}, & \mathbf{x\ne0},\\\{\mathbf{u}:\Vert\mathbf{u}\Vert=\lambda\}, & \mathbf{x=0}.\end{array}$	$\Vert\cdot\Vert$ 为欧式范数, $\lambda>0$	例10
$\lambda\Vert\mathbf{x}\Vert_1$	$\mathbb{R}^n$	$\mathcal{T}_{\lambda}(\mathbf{x})=[\mathrm{abs}(\mathbf{x})-\lambda\mathbf{e}]_+\odot\mathrm{sgn}(\mathbf{x})$	$\lambda>0$	例2
$\Vert\bm{\omega}\odot\mathbf{x}\Vert_1$	$\text{Box}[-\bm{\alpha},\bm{\alpha}]$	$\mathcal{S}_{\bm{\omega},\bm{\alpha}}(\mathbf{x})$	$\bm{\alpha}\in[0,\infty]^n,\,\bm{\omega}\in\mathbb{R}_+^n$	例12
$\lambda\Vert\mathbf{x}\Vert_{\infty}$	$\mathbb{R}^n$	$\mathbf{x}-\lambda P_{B_{\Vert\cdot\Vert_1}[\mathbf{0},1]}(\mathbf{x}/\lambda)$	$\lambda>0$	例20
$\lambda\Vert\mathbf{x}\Vert_a$	$\mathbb{E}$	$\mathbf{x}-\lambda P_{B_{\Vert\cdot\Vert_{a,*}}[\mathbf{0},1]}(\mathbf{x}/\lambda)$	$\Vert\cdot\Vert_a$ 为任一范数, $\lambda>0$	例19
$\lambda\Vert\mathbf{x}\Vert_0$	$\mathbb{R}^n$	$\mathcal{H}_{\sqrt{2\lambda}}(x_1)\times\cdots\times\mathcal{H}_{\sqrt{2\lambda}}(x_n)$	$\lambda>0$	例4
$\lambda\Vert\mathbf{x}\Vert^3$	$\mathbb{E}$	$\frac{2}{1+\sqrt{1+12\lambda\Vert\mathbf{x}\Vert}}\mathbf{x}$	$\Vert\cdot\Vert$ 为欧式范数, $\lambda>0$	例9
$-\lambda\sum_{j=1}^n\log x_j$	$\mathbb{R}_{++}^n$	$\left(\frac{x_j+\sqrt{x_j^2+4\lambda}}{2}\right)_{j=1}^n$	$\lambda>0$	例3
$\delta_C(\mathbf{x})$	$\mathbb{E}$	$P_C(\mathbf{x})$	$\emptyset\ne C\subset\mathbb{E}$	定理9
$\lambda\sigma_C(\mathbf{x})$	$\mathbb{E}$	$\mathbf{x}-\lambda P_C(\mathbf{x}/\lambda)$	$\lambda>0,\,C\ne\emptyset$ 闭凸	定理19
$\lambda\max\{x_i\}$	$\mathbb{R}^n$	$\mathbf{x}-\lambda P_{\Delta_n}(\mathbf{x}/\lambda)$	$\lambda>0$	例21
$\lambda\sum_{i=1}^kx_{[i]}$	$\mathbb{R}^n$	$\mathbf{x}-\lambda P_C(\mathbf{x}/\lambda),\,C=H_{\mathbf{e},k}\cap\text{Box}[\mathbf{0,e}]$	$\lambda>0$	例22
$\lambda\sum_{i=1}^k\mathrm{abs}\left(x_{\langle i\rangle}\right)$	$\mathbb{R}^n$	$\mathbf{x}-\lambda P_C(\mathbf{x}/\lambda),\,C=B_{\Vert\cdot\Vert_1}[\mathbf{0},k]\cap\text{Box}[\mathbf{-e,e}]$	$\lambda>0$	例23
$\lambda M_f^{\mu}(\mathbf{x})$	$\mathbb{E}$	$\mathbf{x}+\frac{\lambda}{\mu+\lambda}\left(\mathrm{prox}_{(\mu+\lambda)f}(\mathbf{x})-\mathbf{x}\right)$	$\lambda,\,u>0,\,f$ 正常闭凸	推论4
$\lambda d_C(\mathbf{x})$	$\mathbb{E}$	$\mathbf{x}+\min\left\{\frac{\lambda}{d_C(\mathbf{x})},1\right\}(P_C(\mathbf{x})-\mathbf{x})$	$\emptyset\ne C$ 闭凸, $\lambda>0$	引理3
$\frac{\lambda}{2}d_C^2(\mathbf{x})$	$\mathbb{E}$	$\frac{\lambda}{\lambda+1}P_C(\mathbf{x})+\frac{1}{\lambda+1}\mathbf{x}$	$\emptyset\ne C$ 闭凸, $\lambda>0$	例29
$\lambda H_{\mu}(\mathbf{x})$	$\mathbb{E}$	$\left(1-\frac{\lambda}{\max\{\Vert\mathbf{x}\Vert,\mu+\lambda\}}\right)\mathbf{x}$	$\lambda,\,\mu>0$	例30
$\rho\Vert\mathbf{x}\Vert_1^2$	$\mathbb{R}^n$	$\left(\frac{v_ix_i}{v_i+2\rho}\right)_{i=1}^n,\,\mathbf{v}=\left[\sqrt{\frac{\rho}{\mu}}\mathrm{abs}(\mathbf{x})-2\rho\right]_+,\,\mathbf{e}^T\mathbf{v}=1$	$\rho>0$	引理7
$\lambda\Vert\mathbf{Ax}\Vert_2$	$\mathbb{R}^n$	$\mathbf{x}-\mathbf{A}^T\left(\mathbf{AA}^T+\alpha^\mathbf{I}\right)^{-1}\mathbf{Ax},\,\alpha^=0,\,若\Vert\mathbf{v}_0\Vert\le\lambda; 否则,\,\Vert\mathbf{v}_{\alpha^*}\Vert_2=\lambda;\,\mathbf{v}_{\alpha}\equiv\left(\mathbf{AA}^T+\alpha\mathbf{I}\right)^{-1}\mathbf{Ax}$	$\mathbf{A}\in\mathbb{R}^{m\times n}$ 行满秩, $\lambda>0$	引理5

英文中, 我们常将“proximal”简写成“prox”. ↩︎
若 $f:\mathbb{R}^n\to\mathbb{R}$ 是正常闭凸可分函数, $f(\mathbf{x})=\sum_{i=1}^nf_i(x_i),$ 其中的 $f_i$ 是正常闭凸的一元函数, 则根据第一临近定理, 定理3的结论就可写成 $\mathrm{prox}_f(\mathbf{x})=\left(\mathrm{prox}_{f_i}(x_i)\right)_{i=1}^n.$ ↩︎
$\tilde\mathbf{z},\tilde\mathbf{u}$ 的存在唯一性可由 $g$ 是正常闭凸函数得到. ↩︎
等价性来源于 $\mathrm{dom}(g)\subset[0,\infty)$ . ↩︎
因为 $g$ 正常闭凸, 所以这是唯一的. ↩︎
投影到盒型区域 $\text{Box}[\mathbf{\ell,u}]$ 可以用引理2的结论逐元素进行; 求方程 $\mathbf{a}^TP_{\text{Box}}(\mathbf{x-\mu a})=b$ 的解则可以用二分法等简单的求根法. 这是因为 $\varphi(\mu)=\mathbf{a}^TP_{\text{Box}}(\mathbf{x-\mu a})-b$ 是单调函数. 事实上, $\varphi(\mu)=\sum_{i=1}^na_i\min\{\max\{x_i-\mu a_i,\ell_i\},u_i\}-b$ , 且对 $\forall i$ , $\mu\mapsto a_i\min\{\max\{x_i-\mu a_i,\ell_i\},u_i\}$ 都是非增函数. ↩︎
定理假设 $f$ 是闭函数, 但这并不一定能推出 $\mathrm{dom}(f)$ 是闭集. 此时 $P_{\mathrm{dom}(f)}(\mathbf{x})$ 就不一定存在. 反例可见例16. ↩︎
这里 $\theta$ 只在 $d_C(\mathbf{x})>\lambda>0$ 的时候用到, 所以 $\theta$ 是良定的. ↩︎
注意不要与第五章的光滑参数 $L$ 混淆. ↩︎
Moreau包络的定义实际上是在原函数 $f$ 的基础上加了一个强凸项 $\frac{1}{2\mu}\Vert\mathbf{x-u}\Vert^2$ . 由定理22我们进一步知道, $\mu$ 越大, $M_f^{\mu}$ 的光滑参数就越小. 这时强凸项在优化问题中的作用就越小. 这就是 $\mu$ 被称作光滑参数的原因. ↩︎
注意若 $\lambda_i=0$ , 则此式给出 $u_i=0$ . 因此这个式子包含了不连续点的情形. ↩︎

Learner Hu

关注

10
点赞
踩
32

收藏

觉得还不错? 一键收藏
0
评论
First Order Methods in Optimization Ch6. The Proximal Operator

第六章: 临近算子文章目录第六章: 临近算子1. 定义、存在性和唯一性2. 临近映射的例子2.1 常值函数2.2 仿射函数2.3 凸二次函数2.4 一维的例子3. 临近运算法则本章所考虑的空间$\mathbb{E}$默认是欧式空间. 本章旨在介绍临近映射的相关内容. 这部分内容是本书后半部分许多算法的基础. 由于Moreau最先研究了临近算子及其性质, 所以我们也称这一映射为“Moreau...
复制链接

扫一扫