First Order Methods in Optimization Ch6. The Proximal Operator

第六章: 临近算子

本章所考虑的空间$\mathbb{E}$默认是欧式空间. 

本章旨在介绍临近映射的相关内容. 这部分内容是本书后半部分许多算法的基础. 由于Moreau最先研究了临近算子及其性质, 所以我们也称这一映射为“Moreau临近映射”.

1. 定义、存在性和唯一性

定义1 (临近映射) 给定函数 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,], 与 f f f相关的临近映射(proximal mapping)1 p r o x f \mathrm{prox}_f proxf定义为 p r o x f ( x ) = arg ⁡ min ⁡ u ∈ E { f ( u ) + 1 2 ∥ u − x ∥ 2 } , ∀ x ∈ E . \mathrm{prox}_f(\mathbf{x})=\arg\min_{\mathbf{u}\in\mathbb{E}}\left\{f(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\},\quad\forall\mathbf{x}\in\mathbb{E}. proxf(x)=arguEmin{f(u)+21ux2},xE.即映射 p r o x f \mathrm{prox}_f proxf x ∈ E \mathbf{x}\in\mathbb{E} xE映成 E \mathbb{E} E中的一个子集. 这个子集可能是空集, 可能是单点集, 也可能有多个向量. 下面我们用几个例子来说明这几种情况.

例1 考虑如下三个 R → R \mathbb{R}\to\mathbb{R} RR的函数: g 1 ( x ) ≡ 0 , g 2 ( x ) = { 0 , x ≠ 0 , − λ , x = 0 , g 3 ( x ) = { 0 , x ≠ 0 , λ , x = 0 , \begin{aligned}g_1(x)&\equiv0,\\g_2(x)&=\left\{\begin{array}{ll}0, & x\ne0,\\-\lambda, & x=0,\end{array}\right.\\g_3(x)&=\left\{\begin{array}{ll}0, & x\ne0,\\\lambda, & x=0,\end{array}\right.\end{aligned} g1(x)g2(x)g3(x)0,={0,λ,x=0,x=0,={0,λ,x=0,x=0,其中 λ > 0 \lambda>0 λ>0为一给定常数. 注意到 g 2 , g 3 g_2,g_3 g2,g3是不连续函数.

  • g 1 g_1 g1的prox: p r o x g 1 ( x ) = arg ⁡ min ⁡ u ∈ R { g 1 ( u ) + 1 2 ( u − x ) 2 } = arg ⁡ min ⁡ u ∈ R { 1 2 ( u − x ) 2 } = { x } , \mathrm{prox}_{g_1}(x)=\arg\min_{u\in\mathbb{R}}\left\{g_1(u)+\frac{1}{2}(u-x)^2\right\}=\arg\min_{u\in\mathbb{R}}\left\{\frac{1}{2}(u-x)^2\right\}=\{x\}, proxg1(x)=arguRmin{g1(u)+21(ux)2}=arguRmin{21(ux)2}={x}, g 1 g_1 g1的prox都是单点集;
  • g 2 g_2 g2的prox: 记 p r o x g 2 ( x ) = arg ⁡ min ⁡ u ∈ R g ~ 2 ( u , x ) \mathrm{prox}_{g_2}(x)=\arg\min_{u\in\mathbb{R}}\tilde g_2(u,x) proxg2(x)=argminuRg~2(u,x), 其中 g ~ 2 ( u , x ) ≡ g 2 ( u ) + 1 2 ( u − x ) 2 = { − λ + x 2 2 , u = 0 , 1 2 ( u − x ) 2 , u ≠ 0. \tilde g_2(u,x)\equiv g_2(u)+\frac{1}{2}(u-x)^2=\left\{\begin{array}{ll}-\lambda+\frac{x^2}{2}, & u=0,\\\frac{1}{2}(u-x)^2, & u\ne0.\end{array}\right. g~2(u,x)g2(u)+21(ux)2={λ+2x2,21(ux)2,u=0,u=0. x ≠ 0 x\ne0 x=0, 则 1 2 ( u − x ) 2 \frac{1}{2}(u-x)^2 21(ux)2 R ∖ { 0 } \mathbb{R}\setminus\{0\} R{0}上的全局极小在 u = x ( ≠ 0 ) u=x(\ne0) u=x(=0)处取得, 且最小值为 0 0 0. 此时, 若 0 > − λ + x 2 2 0>-\lambda+\frac{x^2}{2} 0>λ+2x2, 则 g ~ 2 ( ⋅ , x ) \tilde g_2(\cdot,x) g~2(,x) R \mathbb{R} R上的唯一全局极小点是 u = 0 u=0 u=0; 若 0 < − λ + x 2 2 0<-\lambda+\frac{x^2}{2} 0<λ+2x2, 则 g ~ 2 ( ⋅ , x ) \tilde g_2(\cdot,x) g~2(,x) R \mathbb{R} R上的唯一全局极小点是 u = x u=x u=x; 若 0 = − λ + x 2 2 0=-\lambda+\frac{x^2}{2} 0=λ+2x2, 则 0 , x 0,x 0,x都是 g ~ 2 ( ⋅ , x ) \tilde g_2(\cdot,x) g~2(,x) R \mathbb{R} R上的全局极小点. 最后, 若 x = 0 x=0 x=0, 则显然 p r o x g 2 ( x ) = 0 \mathrm{prox}_{g_2}(x)=0 proxg2(x)=0. p r o x g 2 ( x ) = { { 0 } , ∣ x ∣ < 2 λ , { x } , ∣ x ∣ > 2 λ , { 0 , x } , ∣ x ∣ = 2 λ ; \mathrm{prox}_{g_2}(x)=\left\{\begin{array}{ll}\{0\}, & |x|<\sqrt{2\lambda},\\\{x\}, & |x|>\sqrt{2\lambda},\\\{0,x\}, & |x|=\sqrt{2\lambda};\end{array}\right. proxg2(x)={0},{x},{0,x},x<2λ ,x>2λ ,x=2λ ;
  • g 3 g_3 g3的prox: 计算 p r o x g 3 \mathrm{prox}_{g_3} proxg3的过程与 g 2 g_2 g2类似. 我们直接给出 p r o x g 3 ( x ) = { { x } , x ≠ 0 , ∅ , x = 0. \mathrm{prox}_{g_3}(x)=\left\{\begin{array}{ll}\{x\}, & x\ne0,\\\emptyset, & x=0.\end{array}\right. proxg3(x)={{x},,x=0,x=0.

下面的第一临近定理给出了函数的prox是单点集的充分条件: 若 f f f正常闭凸, 则 p r o x f ( x ) \mathrm{prox}_f(\mathbf{x}) proxf(x)必是单点集, 即prox存在且唯一. 尽管这仅是充分条件, 但也解释了上面 g 1 g_1 g1的prox为什么总是单点集. 反过来, 由于 p r o x g 2 , p r o x g 3 \mathrm{prox}_{g_2},\mathrm{prox}_{g_3} proxg2,proxg3不总是单点集, 所以 g 2 , g 3 g_2,g_3 g2,g3都不是正常闭凸函数.

定理1 (第一临近定理) 设 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常闭凸函数. 则对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, p r o x f ( x ) \mathrm{prox}_f(\mathbf{x}) proxf(x)都是单点集.

证明: 对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, p r o x f ( x ) = arg ⁡ min ⁡ u ∈ E f ~ ( u , x ) , \mathrm{prox}_f(\mathbf{x})=\arg\min_{\mathbf{u}\in\mathbb{E}}\tilde f(\mathbf{u,x}), proxf(x)=arguEminf~(u,x),其中 f ~ ( u , x ) = f ( u ) + 1 2 ∥ u − x ∥ 2 \tilde f(\mathbf{u,x})=f(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2 f~(u,x)=f(u)+21ux2. 由于 1 2 ∥ ⋅ − x ∥ 2 \frac{1}{2}\Vert\cdot-\mathbf{x}\Vert^2 21x2是闭强凸函数, f f f是闭凸函数, 所以根据第五章的引理1与第二章的定理2的(ii), f ~ ( ⋅ , x ) \tilde f(\cdot,\mathbf{x}) f~(,x)是闭强凸函数. 显然 f ( ⋅ , x ) f(\cdot,\mathbf{x}) f(,x)也是正常函数. 因此根据第五章的定理7, f ~ ( ⋅ , x ) \tilde f(\cdot,\mathbf{x}) f~(,x) E \mathbb{E} E上的全局极小点存在且唯一.

由于本章考虑的函数中绝大部分都是正常闭凸的, 所以我们将 p r o x f \mathrm{prox}_f proxf视作 E → E \mathbb{E}\to\mathbb{E} EE的单值映射, 写作 p r o x f ( x ) = y \mathrm{prox}_f(\mathbf{x})=\mathbf{y} proxf(x)=y而不再写成 p r o x f ( x ) = { y } \mathrm{prox}_f(\mathbf{x})=\{\mathbf{y}\} proxf(x)={y}.

若我们放松第一临近定理中的条件, 仅要求函数是正常闭函数. 则我们仍可以在一定的强制性假设下证明 p r o x f ( x ) \mathrm{prox}_f(\mathbf{x}) proxf(x)必定不是空集.

定理2 (闭性与强制性下prox的非空性) 设 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常闭函数, 并且假设函数 u ↦ f ( u ) + 1 2 ∥ u − x ∥ 2 \mathbf{u}\mapsto f(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2 uf(u)+21ux2对任何 x ∈ E \mathbf{x}\in\mathbb{E} xE都是强制的. 则对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, p r o x f ( x ) ≠ ∅ \mathrm{prox}_f(\mathbf{x})\ne\emptyset proxf(x)=.

证明: 对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, h ( u ) ≡ f ( u ) + 1 2 ∥ u − x ∥ 2 h(\mathbf{u})\equiv f(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2 h(u)f(u)+21ux2为正常闭强制函数. 根据第二章定理5, h h h E \mathbb{E} E上可以取到最小值, 于是必有 p r o x f ( x ) \mathrm{prox}_f(\mathbf{x}) proxf(x)非空.

例1中的 g 2 , g 3 g_2,g_3 g2,g3都满足强制性假设, 但仅有 g 2 g_2 g2是闭函数. 因此相比于 p r o x g 2 ( x ) \mathrm{prox}_{g_2}(x) proxg2(x)从不为空, p r o x g 3 ( x ) \mathrm{prox}_{g_3}(x) proxg3(x)在某些特定的 x x x上为空也就不足为奇.

2. 临近映射的例子

本节讨论一些正常闭凸函数的临近映射. 由定理1可知, 它们都是单值映射.

2.1 常值函数

f ≡ c ∈ R f\equiv c\in\mathbb{R} fcR, 则 p r o x f ( x ) = arg ⁡ min ⁡ u ∈ E { c + 1 2 ∥ u − x ∥ 2 } = x . \mathrm{prox}_f(\mathbf{x})=\arg\min_{\mathbf{u}\in\mathbb{E}}\left\{c+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}=\mathbf{x}. proxf(x)=arguEmin{c+21ux2}=x.因此, p r o x f ( x ) = x \boxed{\mathrm{prox}_f(\mathbf{x})=\mathbf{x}} proxf(x)=x是恒等映射.

2.2 仿射函数

f ( x ) = ⟨ a , x ⟩ + b f(\mathbf{x})=\langle\mathbf{a,x}\rangle+b f(x)=a,x+b, 其中 a ∈ E ,   b ∈ R \mathbf{a}\in\mathbb{E},\,b\in\mathbb{R} aE,bR. 则 p r o x f ( x ) = arg ⁡ min ⁡ u ∈ E { ⟨ a , u ⟩ + b + 1 2 ∥ u − x ∥ 2 } = arg ⁡ min ⁡ u ∈ E { ⟨ a , x ⟩ + b − 1 2 ∥ a ∥ 2 + 1 2 ∥ u − ( x − a ) ∥ 2 } = x − a . \begin{aligned}\mathrm{prox}_f(\mathbf{x})&=\arg\min_{\mathbf{u}\in\mathbb{E}}\left\{\langle\mathbf{a,u}\rangle+b+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}\\&=\arg\min_{\mathbf{u}\in\mathbb{E}}\left\{\langle\mathbf{a,x}\rangle+b-\frac{1}{2}\Vert\mathbf{a}\Vert^2+\frac{1}{2}\Vert\mathbf{u-(x-a)}\Vert^2\right\}\\&=\mathbf{x-a}.\end{aligned} proxf(x)=arguEmin{a,u+b+21ux2}=arguEmin{a,x+b21a2+21u(xa)2}=xa.因此, p r o x f ( x ) = x − a \boxed{\mathrm{prox}_f(\mathbf{x})=\mathbf{x-a}} proxf(x)=xa是平移变换.

2.3 凸二次函数

f : R n → R f:\mathbb{R}^n\to\mathbb{R} f:RnR定义为 f ( x ) = 1 2 x T A x + b T x + c f(\mathbf{x})=\frac{1}{2}\mathbf{x}^T\mathbf{Ax}+\mathbf{b}^T\mathbf{x}+c f(x)=21xTAx+bTx+c, 其中 A ∈ S + n ,   b ∈ R n ,   c ∈ R \mathbf{A}\in\mathbb{S}_+^n,\,\mathbf{b}\in\mathbb{R}^n,\,c\in\mathbb{R} AS+n,bRn,cR. 于是 p r o x f ( x ) = arg ⁡ min ⁡ u ∈ E { 1 2 u T A u + b T u + c + 1 2 ∥ u − x ∥ 2 } . \mathrm{prox}_f(\mathbf{x})=\arg\min_{\mathbf{u}\in\mathbb{E}}\left\{\frac{1}{2}\mathbf{u}^T\mathbf{Au}+\mathbf{b}^T\mathbf{u}+c+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}. proxf(x)=arguEmin{21uTAu+bTu+c+21ux2}.由于目标函数是严格凸的, 所以其最优解在梯度为 0 \mathbf{0} 0时取得: A u + b + u − x = 0 ⇒ ( A + I ) u = x − b . \mathbf{Au}+\mathbf{b}+\mathbf{u-x}=\mathbf{0}\Rightarrow (\mathbf{A+I})\mathbf{u}=\mathbf{x-b}. Au+b+ux=0(A+I)u=xb.因此, p r o x f ( x ) = ( A + I ) − 1 ( x − b ) . \boxed{\mathrm{prox}_f(\mathbf{x})=(\mathbf{A+I})^{-1}(\mathbf{x-b}).} proxf(x)=(A+I)1(xb).

2.4 一维的例子

下面的引理包含了多个一维正常闭凸函数prox的计算结果. 它们在后面的推导中会起到重要作用.

引理1 g 1 ( x ) = { μ x , x ≥ 0 , ∞ , x < 0 , p r o x g 1 ( x ) = [ x − μ ] + , g 2 ( x ) = λ ∣ x ∣ , p r o x g 2 ( x ) = [ ∣ x ∣ − λ ] + s g n ( x ) , g 3 ( x ) = { λ x 3 , x ≥ 0 , ∞ , x < 0 , p r o x g 3 ( x ) = − 1 + 1 + 12 λ [ x ] + 6 λ , g 4 ( x ) = { − λ log ⁡ x , x > 0 , ∞ , x ≤ 0 , p r o x g 4 ( x ) = x + x 2 + 4 λ 2 , g 5 = δ [ 0 , η ] ∩ R ( x ) , p r o x g 5 ( x ) = min ⁡ { max ⁡ { x , 0 } , η } , \begin{aligned}g_1(x)&=\left\{\begin{array}{ll}\mu x, & x\ge0,\\\infty, & x<0,\end{array}\right. &&\mathrm{prox}_{g_1}(x)=[x-\mu]_+,\\g_2(x)&=\lambda|x|, &&\mathrm{prox}_{g_2}(x)=[|x|-\lambda]_+\mathrm{sgn}(x),\\g_3(x)&=\left\{\begin{array}{ll}\lambda x^3, & x\ge0,\\\infty, & x<0,\end{array}\right. &&\mathrm{prox}_{g_3}(x)=\frac{-1+\sqrt{1+12\lambda[x]_+}}{6\lambda},\\g_4(x)&=\left\{\begin{array}{ll}-\lambda\log x, & x>0,\\\infty, & x\le0,\end{array}\right. &&\mathrm{prox}_{g_4}(x)=\frac{x+\sqrt{x^2+4\lambda}}{2},\\g_5&=\delta_{[0,\eta]\cap\mathbb{R}}(x), &&\mathrm{prox}_{g_5}(x)=\min\{\max\{x,0\},\eta\},\end{aligned} g1(x)g2(x)g3(x)g4(x)g5={μx,,x0,x<0,=λx,={λx3,,x0,x<0,={λlogx,,x>0,x0,=δ[0,η]R(x),proxg1(x)=[xμ]+,proxg2(x)=[xλ]+sgn(x),proxg3(x)=6λ1+1+12λ[x]+ ,proxg4(x)=2x+x2+4λ ,proxg5(x)=min{max{x,0},η},其中 λ ∈ R + + ,   η ∈ [ 0 , ∞ ] ,   μ ∈ R \lambda\in\mathbb{R}_{++},\,\eta\in[0,\infty],\,\mu\in\mathbb{R} λR++,η[0,],μR.

证明: 下面的证明将反复用到两件事:
(i) 若凸函数 f f f在一点 u u u处有 f ′ ( u ) = 0 f'(u)=0 f(u)=0, 则 u u u必定为其全局极小点;
(ii) 若凸函数的全局极小点存在且最优值不在可微点处取到, 则它必定在不可微点处取到.

  • g 1 g_1 g1的prox: 由定义, p r o x g 1 ( x ) \mathrm{prox}_{g_1}(x) proxg1(x)为函数 f ( u ) = { ∞ , u < 0 , f 1 ( u ) , u ≥ 0 f(u)=\left\{\begin{array}{ll}\infty, & u<0,\\f_1(u), & u\ge0\end{array}\right. f(u)={,f1(u),u<0,u0的全局极小点, 其中 f 1 ( u ) = μ u + 1 2 ( u − x ) 2 f_1(u)=\mu u+\frac{1}{2}(u-x)^2 f1(u)=μu+21(ux)2. 首先 f 1 ′ ( u ) = 0 f'_1(u)=0 f1(u)=0当且仅当 u = x − μ u=x-\mu u=xμ. 若 x > μ x>\mu x>μ, 则 f ′ ( x − μ ) = f 1 ′ ( x − μ ) = 0 f'(x-\mu)=f'_1(x-\mu)=0 f(xμ)=f1(xμ)=0, 从而 p r o x g 1 ( x ) = x − μ \mathrm{prox}_{g_1}(x)=x-\mu proxg1(x)=xμ; 若 x ≤ μ x\le\mu xμ, 则 f f f的最优值必不在可微点处取到, 这时只能在 0 0 0处取到. 从而 p r o x g 1 ( x ) = [ x − μ ] + \mathrm{prox}_{g_1}(x)=[x-\mu]_+ proxg1(x)=[xμ]+.
  • g 2 g_2 g2的prox: p r o x g 2 ( x ) \mathrm{prox}_{g_2}(x) proxg2(x)为函数 h ( u ) = { h 1 ( u ) ≡ λ u + 1 2 ( u − x ) 2 , u > 0 , h 2 ( u ) ≡ − λ u + 1 2 ( u − x ) 2 , u ≤ 0 h(u)=\left\{\begin{array}{ll}h_1(u)\equiv\lambda u+\frac{1}{2}(u-x)^2, & u>0,\\h_2(u)\equiv-\lambda u+\frac{1}{2}(u-x)^2, & u\le0\end{array}\right. h(u)={h1(u)λu+21(ux)2,h2(u)λu+21(ux)2,u>0,u0的全局极小点. 若 x > λ x>\lambda x>λ, 则令 u = x − λ u=x-\lambda u=xλ, 0 = h 1 ′ ( u ) = λ + u − x 0=h_1'(u)=\lambda+u-x 0=h1(u)=λ+ux, 从而 p r o x g 2 ( x ) = x − λ \mathrm{prox}_{g_2}(x)=x-\lambda proxg2(x)=xλ. 类似地, 若 x < − λ x<-\lambda x<λ, 则 p r o x g 2 ( x ) = x + λ \mathrm{prox}_{g_2}(x)=x+\lambda proxg2(x)=x+λ. 若 ∣ x ∣ ≤ λ |x|\le\lambda xλ, 则 p r o x g 2 ( x ) \mathrm{prox}_{g_2}(x) proxg2(x)必为 h h h的唯一不可微点 0 0 0.
  • g 3 g_3 g3的prox: p r o x g 3 ( x ) \mathrm{prox}_{g_3}(x) proxg3(x)为函数 s ( u ) = { λ u 3 + 1 2 ( u − x ) 2 , u ≥ 0 , ∞ , u < 0 s(u)=\left\{\begin{array}{ll}\lambda u^3+\frac{1}{2}(u-x)^2, & u\ge0,\\\infty, & u<0\end{array}\right. s(u)={λu3+21(ux)2,,u0,u<0的全局极小点. 若全局极小点为正, 则 u ~ = p r o x g 3 ( x ) \tilde u=\mathrm{prox}_{g_3}(x) u~=proxg3(x)满足 s ′ ( u ~ ) = 0 s'(\tilde u)=0 s(u~)=0, 即 3 λ u ~ 2 + u ~ − x = 0. 3\lambda\tilde u^2+\tilde u-x=0. 3λu~2+u~x=0.这一方程有正解当且仅当 x > 0 x>0 x>0, 且此时 p r o x g 3 ( x ) = u ~ = − 1 + 1 + 12 λ x 6 λ \mathrm{prox}_{g_3}(x)=\tilde u=\frac{-1+\sqrt{1+12\lambda x}}{6\lambda} proxg3(x)=u~=6λ1+1+12λx ; 若 x ≤ 0 x\le0 x0, 则 s s s的全局极小点只能是不可微点, 从而必是有效域中的 0 0 0.
  • g 4 g_4 g4的prox: u ~ = p r o x g 4 ( x ) \tilde u=\mathrm{prox}_{g_4}(x) u~=proxg4(x)是函数 t ( u ) = − λ log ⁡ u + 1 2 ( u − x ) 2 t(u)=-\lambda\log u+\frac{1}{2}(u-x)^2 t(u)=λlogu+21(ux)2的全局极小点. 令 t ( u ) t(u) t(u)的导数为 0 0 0, 即 − λ u ~ + ( u ~ − x ) = 0 ⇒ u ~ 2 − u ~ x − λ = 0. -\frac{\lambda}{\tilde u}+(\tilde u-x)=0\Rightarrow\tilde u^2-\tilde ux-\lambda=0. u~λ+(u~x)=0u~2u~xλ=0.由于此方程恒有正解, 从而 u ~ \tilde u u~必在 R + + \mathbb{R}_{++} R++上取到, p r o x g 4 ( x ) = u ~ = x + x 2 + 4 λ 2 . \mathrm{prox}_{g_4}(x)=\tilde u=\frac{x+\sqrt{x^2+4\lambda}}{2}. proxg4(x)=u~=2x+x2+4λ .
  • g 5 g_5 g5的prox: 首先假设 η < ∞ \eta<\infty η<. 注意此时 u ~ = p r o x g 5 ( x ) \tilde u=\mathrm{prox}_{g_5}(x) u~=proxg5(x)为函数 w ( u ) = 1 2 ( u − x ) 2 w(u)=\frac{1}{2}(u-x)^2 w(u)=21(ux)2 [ 0 , η ] [0,\eta] [0,η]上的全局极小点. 显然 w w w R \mathbb{R} R上的全局极小点是 u = x u=x u=x. 因此, 若 0 ≤ x ≤ η 0\le x\le\eta 0xη, 则 u ~ = x \tilde u=x u~=x; 若 x < 0 x<0 x<0, 由于 w w w [ 0 , η ] [0,\eta] [0,η]上单调递增, 所以 u ~ = 0 \tilde u=0 u~=0; 若 x > η x>\eta x>η, 由于 w w w [ 0 , η ] [0,\eta] [0,η]上单调递减, 所以 u ~ = η \tilde u=\eta u~=η. p r o x g 5 ( x ) = u ~ = { x , 0 ≤ x ≤ η , 0 , x < 0 , η , x > η , = min ⁡ { max ⁡ { x , 0 } , η } . \mathrm{prox}_{g_5}(x)=\tilde u=\left\{\begin{array}{ll}x, & 0\le x\le\eta,\\0, & x<0,\\\eta, & x>\eta,\end{array}\right.=\min\{\max\{x,0\},\eta\}. proxg5(x)=u~=x,0,η,0xη,x<0,x>η,=min{max{x,0},η}.再考虑 η = ∞ \eta=\infty η=, 此时 g 5 ( x ) = δ [ 0 , ∞ ) ( x ) g_5(x)=\delta_{[0,\infty)}(x) g5(x)=δ[0,)(x)就是 μ = 0 \mu=0 μ=0 g 1 g_1 g1. 所以 p r o x g 5 ( x ) = [ x ] + \mathrm{prox}_{g_5}(x)=[x]_+ proxg5(x)=[x]+. 这也可以写成 p r o x g 5 ( x ) = min ⁡ { max ⁡ { x , 0 } , ∞ } . \mathrm{prox}_{g_5}(x)=\min\{\max\{x,0\},\infty\}. proxg5(x)=min{max{x,0},}.

3. 临近运算法则

在本节, 我们给出一些关于计算临近映射的结果. 其中某些结果是不需要任何关于凸性或闭性的假设的.

定理3 (可分函数的prox) 设 f : E 1 × E 2 × ⋅ × E m → ( − ∞ , ∞ ] f:\mathbb{E}_1\times\mathbb{E}_2\times\cdot\times\mathbb{E}_m\to(-\infty,\infty] f:E1×E2××Em(,]定义为 f ( x 1 , x 2 , … , x m ) = ∑ i = 1 m f i ( x i ) , ∀ x i ∈ E i ,   i = 1 , 2 , … , m . f(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)=\sum_{i=1}^mf_i(\mathbf{x}_i),\quad\forall\mathbf{x}_i\in\mathbb{E}_i,\,i=1,2,\ldots,m. f(x1,x2,,xm)=i=1mfi(xi),xiEi,i=1,2,,m.则对 ∀ x 1 ∈ E 1 , x 2 ∈ E 2 , … , x m ∈ E m \forall\mathbf{x}_1\in\mathbb{E}_1,\mathbf{x}_2\in\mathbb{E}_2,\ldots,\mathbf{x}_m\in\mathbb{E}_m x1E1,x2E2,,xmEm, p r o x f ( x 1 , x 2 , … , x m ) = p r o x f 1 ( x 1 ) × p r o x f 2 ( x 2 ) × ⋯ × p r o x f m ( x m ) . \mathrm{prox}_f(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)=\mathrm{prox}_{f_1}(\mathbf{x}_1)\times\mathrm{prox}_{f_2}(\mathbf{x}_2)\times\cdots\times\mathrm{prox}_{f_m}(\mathbf{x}_m). proxf(x1,x2,,xm)=proxf1(x1)×proxf2(x2)××proxfm(xm).2

证明: p r o x f ( x 1 , x 2 , … , x m ) = arg ⁡ min ⁡ y 1 , y 2 , … , y m ∑ i = 1 m [ 1 2 ∥ y i − x i ∥ 2 + f i ( y i ) ] = ∏ i = 1 m arg ⁡ min ⁡ y i [ 1 2 ∥ y i − x i ∥ 2 + f i ( y i ) ] = ∏ i = 1 m p r o x f i ( x i ) . \begin{aligned}\mathrm{prox}_f(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)&=\arg\min_{\mathbf{y}_1,\mathbf{y}_2,\ldots,\mathbf{y}_m}\sum_{i=1}^m\left[\frac{1}{2}\Vert\mathbf{y}_i-\mathbf{x}_i\Vert^2+f_i(\mathbf{y}_i)\right]\\&=\prod_{i=1}^m\arg\min_{\mathbf{y}_i}\left[\frac{1}{2}\Vert\mathbf{y}_i-\mathbf{x}_i\Vert^2+f_i(\mathbf{y}_i)\right]\\&=\prod_{i=1}^m\mathrm{prox}_{f_i}(\mathbf{x}_i).\end{aligned} proxf(x1,x2,,xm)=argy1,y2,,ymmini=1m[21yixi2+fi(yi)]=i=1margyimin[21yixi2+fi(yi)]=i=1mproxfi(xi).

例2 ( ℓ 1 \ell_1 1-范数的prox) 设 g : R n → R g:\mathbb{R}^n\to\mathbb{R} g:RnR定义为 g ( x ) = λ ∥ x ∥ 1 g(\mathbf{x})=\lambda\Vert\mathbf{x}\Vert_1 g(x)=λx1, 其中 λ > 0 \lambda>0 λ>0. 则 g ( x ) = ∑ i = 1 n φ ( x i ) , g(\mathbf{x})=\sum_{i=1}^n\varphi(x_i), g(x)=i=1nφ(xi),其中 φ ( t ) = λ ∣ t ∣ \varphi(t)=\lambda|t| φ(t)=λt. 由引理1中的 p r o x g 2 \mathrm{prox}_{g_2} proxg2, 就有 p r o x φ ( s ) = T λ ( s ) \mathrm{prox}_{\varphi}(s)=\mathcal{T}_{\lambda}(s) proxφ(s)=Tλ(s), 其中 T λ \mathcal{T}_{\lambda} Tλ定义为 T λ ( y ) = [ ∣ y ∣ − λ ] + s g n ( y ) = { y − λ , y ≥ λ , 0 , ∣ y ∣ < λ , y + λ , y ≤ − λ . \mathcal{T}_{\lambda}(y)=[|y|-\lambda]_+\mathrm{sgn}(y)=\left\{\begin{array}{ll}y-\lambda, & y\ge\lambda,\\0, & |y|<\lambda,\\y+\lambda, & y\le-\lambda.\end{array}\right. Tλ(y)=[yλ]+sgn(y)=yλ,0,y+λ,yλ,y<λ,yλ.函数 T λ \mathcal{T}_{\lambda} Tλ被称作是软阈值函数(soft thresholding function), 其图像可见下图.
在这里插入图片描述
于是由定理3, p r o x g ( x ) = ( T λ ( x j ) ) j = 1 n \mathrm{prox}_g(\mathbf{x})=\left(\mathcal{T}_{\lambda}(x_j)\right)_{j=1}^n proxg(x)=(Tλ(xj))j=1n. 为表述方便, 我们推广软阈值函数的定义, 使它成为 R n \mathbb{R}^n Rn上的函数, 即对 ∀ x ∈ R n \forall\mathbf{x}\in\mathbb{R}^n xRn, T λ ( x ) ≡ ( T λ ( x j ) ) j = 1 n = [ ∣ x ∣ − λ e ] + ⊙ s g n ( x ) . \mathcal{T}_{\lambda}(\mathbf{x})\equiv\left(\mathcal{T}_{\lambda}(x_j)\right)_{j=1}^n=[|\mathbf{x}|-\lambda\mathbf{e}]_+\odot\mathrm{sgn}(\mathbf{x}). Tλ(x)(Tλ(xj))j=1n=[xλe]+sgn(x).因此, p r o x g ( x ) = T λ ( x ) . \boxed{\mathrm{prox}_g(\mathbf{x})=\mathcal{T}_{\lambda}(\mathbf{x}).} proxg(x)=Tλ(x).

例3 (负对数和的prox) 设 g : R n → ( − ∞ , ∞ ] g:\mathbb{R}^n\to(-\infty,\infty] g:Rn(,]定义为 g ( x ) = { − λ ∑ k = 1 n log ⁡ x j , x > 0 , ∞ , 其 它 , g(\mathbf{x})=\left\{\begin{array}{ll}-\lambda\sum_{k=1}^n\log x_j, & \mathbf{x}>\mathbf{0},\\\infty, & 其它,\end{array}\right. g(x)={λk=1nlogxj,,x>0,,其中 λ > 0 \lambda>0 λ>0. 于是 g ( x ) = ∑ i = 1 n φ ( x i ) g(\mathbf{x})=\sum_{i=1}^n\varphi(x_i) g(x)=i=1nφ(xi), 其中 φ ( t ) = { − λ log ⁡ t , t > 0 , ∞ , t < 0. \varphi(t)=\left\{\begin{array}{ll}-\lambda\log t, & t>0,\\\infty, & t<0.\end{array}\right. φ(t)={λlogt,,t>0,t<0.由引理1中的 p r o x g 4 \mathrm{prox}_{g_4} proxg4, 就有 p r o x φ ( s ) = s + s 2 + 4 λ 2 . \mathrm{prox}_{\varphi}(s)=\frac{s+\sqrt{s^2+4\lambda}}{2}. proxφ(s)=2s+s2+4λ .最后, 由定理3, p r o x g ( x ) = ( p r o x φ ( x j ) ) j = 1 n = ( x j + x j 2 + 4 λ 2 ) j = 1 n . \boxed{\mathrm{prox}_g(\mathbf{x})=\left(\mathrm{prox}_{\varphi}(x_j)\right)_{j=1}^n=\left(\frac{x_j+\sqrt{x_j^2+4\lambda}}{2}\right)_{j=1}^n.} proxg(x)=(proxφ(xj))j=1n=2xj+xj2+4λ j=1n.

例4 ( ℓ 0 \ell_0 0-范数的prox) 设 f : R n → R f:\mathbb{R}^n\to\mathbb{R} f:RnR定义为 f ( x ) = λ ∥ x ∥ 0 f(\mathbf{x})=\lambda\Vert\mathbf{x}\Vert_0 f(x)=λx0, 其中 λ > 0 ,   ∥ x ∥ 0 = # { i : x i ≠ 0 } \lambda>0,\,\Vert\mathbf{x}\Vert_0=\#\{i:x_i\ne0\} λ>0,x0=#{i:xi=0}. 于是对 ∀ x ∈ R n \forall\mathbf{x}\in\mathbb{R}^n xRn, f ( x ) = ∑ i = 1 n I ( x i ) , f(\mathbf{x})=\sum_{i=1}^nI(x_i), f(x)=i=1nI(xi),其中 I ( t ) = { λ , t ≠ 0 , 0 , t = 0. I(t)=\left\{\begin{array}{ll}\lambda, & t\ne0,\\0, & t=0.\end{array}\right. I(t)={λ,0,t=0,t=0.注意到 I ( ⋅ ) = J ( ⋅ ) + λ I(\cdot)=J(\cdot)+\lambda I()=J()+λ, 其中 J ( t ) = { 0 , t ≠ 0 , − λ , t = 0 , J(t)=\left\{\begin{array}{ll}0, & t\ne0,\\-\lambda, & t=0,\end{array}\right. J(t)={0,λ,t=0,t=0,而由例1中的 p r o x g 2 \mathrm{prox}_{g_2} proxg2, p r o x J ( s ) = { { 0 } , ∣ s ∣ < 2 λ , { s } , ∣ s ∣ > 2 λ , { 0 , s } , ∣ s ∣ = 2 λ . \mathrm{prox}_J(s)=\left\{\begin{array}{ll}\{0\}, & |s|<\sqrt{2\lambda},\\\{s\}, & |s|>\sqrt{2\lambda},\\\{0,s\}, & |s|=\sqrt{2\lambda}.\end{array}\right. proxJ(s)={0},{s},{0,s},s<2λ ,s>2λ ,s=2λ .我们引入硬阈值映射 H α \mathcal{H}_{\alpha} Hα, 它的定义是 H α ( s ) ≡ { { 0 } , ∣ s ∣ < α , { s } , ∣ s ∣ > α , { 0 , s } , ∣ s ∣ = α . \mathcal{H}_{\alpha}(s)\equiv\left\{\begin{array}{ll}\{0\}, & |s|<\alpha,\\\{s\}, & |s|>\alpha,\\\{0,s\}, & |s|=\alpha.\end{array}\right. Hα(s){0},{s},{0,s},s<α,s>α,s=α.因此, p r o x J ( s ) = H 2 λ ( s ) \mathrm{prox}_J(s)=\mathcal{H}_{\sqrt{2\lambda}}(s) proxJ(s)=H2λ (s). 易验证 p r o x I = p r o x J \mathrm{prox}_I=\mathrm{prox}_J proxI=proxJ. 于是由定理3, p r o x g ( x ) = H 2 λ ( x 1 ) × H 2 λ ( x 2 ) × ⋯ × H 2 λ ( x n ) . \boxed{\mathrm{prox}_g(\mathbf{x})=\mathcal{H}_{\sqrt{2\lambda}}(x_1)\times\mathcal{H}_{\sqrt{2\lambda}}(x_2)\times\cdots\times\mathcal{H}_{\sqrt{2\lambda}}(x_n).} proxg(x)=H2λ (x1)×H2λ (x2)××H2λ (xn).

定理4 (伸缩与平移变换后的prox) 设 g : E → ( − ∞ , ∞ ] g:\mathbb{E}\to(-\infty,\infty] g:E(,]为一正常函数, λ ≠ 0 ,   a ∈ E \lambda\ne0,\,\mathbf{a}\in\mathbb{E} λ=0,aE. 定义 f ( x ) = g ( λ x + a ) f(\mathbf{x})=g(\lambda\mathbf{x+a}) f(x)=g(λx+a). 则 p r o x f ( x ) = 1 λ [ p r o x λ 2 g ( λ x + a ) − a ] . \mathrm{prox}_f(\mathbf{x})=\frac{1}{\lambda}\left[\mathrm{prox}_{\lambda^2g}(\lambda\mathbf{x+a})-\mathbf{a}\right]. proxf(x)=λ1[proxλ2g(λx+a)a].

证明: p r o x f ( x ) = arg ⁡ min ⁡ u { f ( u ) + 1 2 ∥ u − x ∥ 2 } = arg ⁡ min ⁡ u { g ( λ u + a ) + 1 2 ∥ u − x ∥ 2 } = z = λ u + a 1 λ [ arg ⁡ min ⁡ z { g ( z ) + 1 2 ∥ 1 λ ( z − a ) − x ∥ 2 } − a ] = 1 λ [ arg ⁡ min ⁡ z { λ 2 g ( z ) + 1 2 ∥ z − ( λ x + a ) ∥ 2 } − a ] = 1 λ [ p r o x λ 2 g ( λ x + a ) − a ] . \begin{aligned}\mathrm{prox}_f(\mathbf{x})&=\arg\min_{\mathbf{u}}\left\{f(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}\\&=\arg\min_{\mathbf{u}}\left\{g(\lambda\mathbf{u+a})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}\\&\overset{\mathbf{z}=\lambda\mathbf{u+a}}{=}\frac{1}{\lambda}\left[\arg\min_{\mathbf{z}}\left\{g(\mathbf{z})+\frac{1}{2}\left\Vert\frac{1}{\lambda}(\mathbf{z-a})-\mathbf{x}\right\Vert^2\right\}-\mathbf{a}\right]\\&=\frac{1}{\lambda}\left[\arg\min_{\mathbf{z}}\left\{\lambda^2g(\mathbf{z})+\frac{1}{2}\Vert\mathbf{z}-(\lambda\mathbf{x+a})\Vert^2\right\}-\mathbf{a}\right]\\&=\frac{1}{\lambda}\left[\mathrm{prox}_{\lambda^2g}(\lambda\mathbf{x+a})-\mathbf{a}\right].\end{aligned} proxf(x)=argumin{f(u)+21ux2}=argumin{g(λu+a)+21ux2}=z=λu+aλ1[argzmin{g(z)+21λ1(za)x2}a]=λ1[argzmin{λ2g(z)+21z(λx+a)2}a]=λ1[proxλ2g(λx+a)a].

定理5 ( λ g ( ⋅ / λ ) \lambda g(\cdot/\lambda) λg(/λ)的prox) 设 g : E → ( − ∞ , ∞ ] g:\mathbb{E}\to(-\infty,\infty] g:E(,]为一正常函数, λ ≠ 0 \lambda\ne0 λ=0. 定义 f ( x ) = λ g ( x / λ ) f(\mathbf{x})=\lambda g(\mathbf{x}/\lambda) f(x)=λg(x/λ). 则 p r o x f ( x ) = λ p r o x g / λ ( x / λ ) . \mathrm{prox}_f(\mathbf{x})=\lambda\mathrm{prox}_{g/\lambda}(\mathbf{x}/\lambda). proxf(x)=λproxg/λ(x/λ).

证明: p r o x f ( x ) = arg ⁡ min ⁡ u { λ g ( u λ ) + 1 2 ∥ u − x ∥ 2 } = z = u / λ λ arg ⁡ min ⁡ z { λ g ( z ) + 1 2 ∥ λ z − x ∥ 2 } = λ arg ⁡ min ⁡ z { g ( z ) λ + 1 2 ∥ z − x λ ∥ 2 } = λ p r o x g / λ ( x / λ ) . \begin{aligned}\mathrm{prox}_f(\mathbf{x})&=\arg\min_{\mathbf{u}}\left\{\lambda g\left(\frac{\mathbf{u}}{\lambda}\right)+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}\\&\overset{\mathbf{z}=\mathbf{u}/\lambda}{=}\lambda\arg\min_{\mathbf{z}}\left\{\lambda g(\mathbf{z})+\frac{1}{2}\Vert\lambda\mathbf{z-x}\Vert^2\right\}\\&=\lambda\arg\min_{\mathbf{z}}\left\{\frac{g(\mathbf{z})}{\lambda}+\frac{1}{2}\left\Vert\mathbf{z-\frac{x}{\lambda}}\right\Vert^2\right\}\\&=\lambda\mathrm{prox}_{g/\lambda}(\mathbf{x}/\lambda).\end{aligned} proxf(x)=argumin{λg(λu)+21ux2}=z=u/λλargzmin{λg(z)+21λzx2}=λargzmin{λg(z)+21zλx2}=λproxg/λ(x/λ).

定理6 (二次扰动下的prox) 设 g : E → ( − ∞ , ∞ ] g:\mathbb{E}\to(-\infty,\infty] g:E(,]为一正常函数, f ( x ) = g ( x ) + c 2 ∥ x ∥ 2 + ⟨ a , x ⟩ + γ f(\mathbf{x})=g(\mathbf{x})+\frac{c}{2}\Vert\mathbf{x}\Vert^2+\langle\mathbf{a,x}\rangle+\gamma f(x)=g(x)+2cx2+a,x+γ, 其中 c > 0 ,   a ∈ E ,   γ ∈ R c>0,\,\mathbf{a}\in\mathbb{E},\,\gamma\in\mathbb{R} c>0,aE,γR. 则 p r o x f ( x ) = p r o x 1 c + 1 g ( x − a c + 1 ) . \mathrm{prox}_f(\mathbf{x})=\mathrm{prox}_{\frac{1}{c+1}g}\left(\frac{\mathbf{x-a}}{c+1}\right). proxf(x)=proxc+11g(c+1xa).

证明: p r o x f ( x ) = arg ⁡ min ⁡ u { f ( u ) + 1 2 ∥ u − x ∥ 2 } = arg ⁡ min ⁡ u { g ( u ) + c 2 ∥ u ∥ 2 + ⟨ a , u ⟩ + γ + 1 2 ∥ u − x ∥ 2 } = arg ⁡ min ⁡ u { g ( u ) + c + 1 2 ∥ u − ( x − a c + 1 ) ∥ 2 } = p r o x 1 c + 1 g ( x − a c + 1 ) . \begin{aligned}\mathrm{prox}_f(\mathbf{x})&=\arg\min_{\mathbf{u}}\left\{f(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}\\&=\arg\min_{\mathbf{u}}\left\{g(\mathbf{u})+\frac{c}{2}\Vert\mathbf{u}\Vert^2+\langle\mathbf{a,u}\rangle+\gamma+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}\\&=\arg\min_{\mathbf{u}}\left\{g(\mathbf{u})+\frac{c+1}{2}\left\Vert\mathbf{u}-\left(\frac{\mathbf{x-a}}{c+1}\right)\right\Vert^2\right\}\\&=\mathrm{prox}_{\frac{1}{c+1}g}\left(\frac{\mathbf{x-a}}{c+1}\right).\end{aligned} proxf(x)=argumin{f(u)+21ux2}=argumin{g(u)+2cu2+a,u+γ+21ux2}=argumin{g(u)+2c+1u(c+1xa)2}=proxc+11g(c+1xa).

例5 考虑函数 f : R → ( − ∞ , ∞ ] f:\mathbb{R}\to(-\infty,\infty] f:R(,]定义为, 对 ∀ x ∈ R \forall x\in\mathbb{R} xR, f ( x ) = { μ x , 0 ≤ x ≤ α , ∞ , 其 它 , f(x)=\left\{\begin{array}{ll}\mu x, & 0\le x\le\alpha,\\\infty, & 其它,\end{array}\right. f(x)={μx,,0xα,,其中 μ ∈ R ,   α ∈ [ 0 , ∞ ] \mu\in\mathbb{R},\,\alpha\in[0,\infty] μR,α[0,]. 首先注意到 f f f可以表示成 f ( x ) = δ [ 0 , α ] ∩ R ( x ) + μ x . f(x)=\delta_{[0,\alpha]\cap\mathbb{R}}(x)+\mu x. f(x)=δ[0,α]R(x)+μx.由引理1的 p r o x g 5 \mathrm{prox}_{g_5} proxg5, p r o x δ [ 0 , α ] ∩ R ( x ) = min ⁡ { max ⁡ { x , 0 } , α } \mathrm{prox}_{\delta_{[0,\alpha]\cap\mathbb{R}}}(x)=\min\{\max\{x,0\},\alpha\} proxδ[0,α]R(x)=min{max{x,0},α}. 再利用定理6(令 c = 0 ,   a = μ ,   γ = 0 c=0,\,\mathbf{a}=\mu,\,\gamma=0 c=0,a=μ,γ=0), 我们就有对 ∀ x ∈ R \forall x\in\mathbb{R} xR, p r o x f ( x ) = p r o x g ( x − μ ) = min ⁡ { max ⁡ { x − μ , 0 } , α } . \boxed{\mathrm{prox}_f(x)=\mathrm{prox}_g(x-\mu)=\min\{\max\{x-\mu,0\},\alpha\}.} proxf(x)=proxg(xμ)=min{max{xμ,0},α}.

遗憾的是, 至今仍未有一个函数与一个一般仿射映射的复合函数的prox公式. 但若相应的线性变换满足一定的正交性条件, 情况就不一样了.

定理7 (与仿射映射复合的prox) 设 g : R m → ( − ∞ , ∞ ] g:\mathbb{R}^m\to(-\infty,\infty] g:Rm(,]为一正常闭凸函数, f ( x ) = g ( A ( x ) + b ) f(\mathbf{x})=g(\mathcal{A}(\mathbf{x})+\mathbf{b}) f(x)=g(A(x)+b), 其中 b ∈ R m \mathbf{b}\in\mathbb{R}^m bRm, A : V → R m \mathcal{A}:\mathbb{V}\to\mathbb{R}^m A:VRm为对某个常量 α > 0 \alpha>0 α>0满足 A ∘ A T = α I \mathcal{A}\circ\mathcal{A}^T=\alpha\mathcal{I} AAT=αI的线性变换. 则对 ∀ x ∈ V \forall\mathbf{x}\in\mathbb{V} xV, p r o x f ( x ) = x + 1 α A T ( p r o x α g ( A ( x ) + b ) − A ( x ) − b ) . \mathrm{prox}_f(\mathbf{x})=\mathbf{x}+\frac{1}{\alpha}\mathcal{A}^T\left(\mathrm{prox}_{\alpha g}(\mathcal{A}(\mathbf{x})+\mathbf{b})-\mathcal{A}(\mathbf{x})-\mathbf{b}\right). proxf(x)=x+α1AT(proxαg(A(x)+b)A(x)b).

证明: 由定义, p r o x f ( x ) \mathrm{prox}_f(\mathbf{x}) proxf(x)为以下问题的最优解: min ⁡ u ∈ V { g ( A ( u ) + b ) + 1 2 ∥ u − x ∥ 2 } . \min_{\mathbf{u}\in\mathbb{V}}\left\{g(\mathcal{A}(\mathbf{u})+\mathbf{b})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}. uVmin{g(A(u)+b)+21ux2}.引入分裂变量 z \mathbf{z} z后, 此问题又等价于如下的约束优化问题: min ⁡ u ∈ V , z ∈ R m g ( z ) + 1 2 ∥ u − x ∥ 2 s . t . z = A ( u ) + b . \begin{array}{ll}\min_{\mathbf{u}\in\mathbb{V},\mathbf{z}\in\mathbb{R}^m} & g(\mathbf{z})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\\\mathrm{s.t.} & \mathbf{z}=\mathcal{A}(\mathbf{u})+\mathbf{b}.\end{array} minuV,zRms.t.g(z)+21ux2z=A(u)+b. ( z ~ , u ~ ) (\tilde\mathbf{z},\tilde\mathbf{u}) (z~,u~)为其最优解3. 注意到 u ~ = p r o x f ( x ) \tilde\mathbf{u}=\mathrm{prox}_f(\mathbf{x}) u~=proxf(x). 固定 z = z ~ \mathbf{z}=\tilde\mathbf{z} z=z~. 于是 u ~ \tilde\mathbf{u} u~是问题 min ⁡ u ∈ V 1 2 ∥ u − x ∥ 2 s . t . A ( u ) = z ~ − b \begin{array}{ll}\min_{\mathbf{u}\in\mathbb{V}} & \frac{1}{2}\Vert\mathbf{u-x}\Vert^2\\\mathrm{s.t.} & \mathcal{A}(\mathbf{u})=\tilde\mathbf{z}-\mathbf{b}\end{array} minuVs.t.21ux2A(u)=z~b的最优解. 因为此问题满足强对偶性(见第三章的介绍), 于是有其最优性条件: 存在 y ∈ R m \mathbf{y}\in\mathbb{R}^m yRm使得 u ~ ∈ arg ⁡ min ⁡ u ∈ V { 1 2 ∥ u − x ∥ 2 + ⟨ y , A ( u ) − z ~ + b ⟩ } , A ( u ~ ) = z ~ − b . \begin{aligned}\tilde\mathbf{u}&\in\arg\min_{\mathbf{u}\in\mathbb{V}}\left\{\frac{1}{2}\Vert\mathbf{u-x}\Vert^2+\langle\mathbf{y},\mathcal{A}(\mathbf{u})-\tilde\mathbf{z}+\mathbf{b}\rangle\right\},\\\mathcal{A}(\tilde\mathbf{u})&=\tilde\mathbf{z}-\mathbf{b}.\end{aligned} u~A(u~)arguVmin{21ux2+y,A(u)z~+b},=z~b.根据第一个式子, u ~ = x − A T ( y ) . \tilde\mathbf{u}=\mathbf{x}-\mathcal{A}^T(\mathbf{y}). u~=xAT(y).再将此代入第二个式子, A ( x − A T ( y ) ) = z ~ − b . \mathcal{A}\left(\mathbf{x}-\mathcal{A}^T(\mathbf{y})\right)=\tilde\mathbf{z}-\mathbf{b}. A(xAT(y))=z~b.利用 A \mathcal{A} A的正交性条件推出 α y = A ( x ) + b − z ~ , \alpha\mathbf{y}=\mathcal{A}(\mathbf{x})+\mathbf{b}-\tilde\mathbf{z}, αy=A(x)+bz~,于是 u ~ = x + 1 α A T ( z ~ − A ( x ) − b ) . \tilde\mathbf{u}=\mathbf{x}+\frac{1}{\alpha}\mathcal{A}^T(\tilde\mathbf{z}-\mathcal{A}(\mathbf{x})-\mathbf{b}). u~=x+α1AT(z~A(x)b).这就得到了 u ~ \tilde\mathbf{u} u~关于 z ~ \tilde\mathbf{z} z~的表达式. 这就起到了消元的作用. 此时 z ~ \tilde\mathbf{z} z~就是 z ~ = arg ⁡ min ⁡ z ∈ R m { g ( z ) + 1 2 ∥ x + 1 α A T ( z − A ( x ) − b ) − x ∥ 2 } = arg ⁡ min ⁡ z ∈ R m { g ( z ) + 1 2 α 2 ∥ A T ( z − A ( x ) − b ) ∥ 2 } = arg ⁡ min ⁡ z ∈ R m { α g ( z ) + 1 2 ∥ z − A ( x ) − b ∥ 2 } = p r o x α g ( A ( x ) + b ) . \begin{aligned}\tilde\mathbf{z}&=\arg\min_{\mathbf{z}\in\mathbb{R}^m}\left\{g(\mathbf{z})+\frac{1}{2}\left\Vert\mathbf{x}+\frac{1}{\alpha}\mathcal{A}^T(\mathbf{z}-\mathcal{A}(\mathbf{x})-\mathbf{b})-\mathbf{x}\right\Vert^2\right\}\\&=\arg\min_{\mathbf{z}\in\mathbb{R}^m}\left\{g(\mathbf{z})+\frac{1}{2\alpha^2}\left\Vert\mathcal{A}^T(\mathbf{z}-\mathcal{A}(\mathbf{x})-\mathbf{b})\right\Vert^2\right\}\\&=\arg\min_{\mathbf{z}\in\mathbb{R}^m}\left\{\alpha g(\mathbf{z})+\frac{1}{2}\left\Vert\mathbf{z}-\mathcal{A}(\mathbf{x})-\mathbf{b}\right\Vert^2\right\}\\&=\mathrm{prox}_{\alpha g}\left(\mathcal{A}(\mathbf{x})+\mathbf{b}\right).\end{aligned} z~=argzRmmin{g(z)+21x+α1AT(zA(x)b)x2}=argzRmmin{g(z)+2α21AT(zA(x)b)2}=argzRmmin{αg(z)+21zA(x)b2}=proxαg(A(x)+b).最后把 z ~ \tilde\mathbf{z} z~关于 x \mathbf{x} x的表达式代入 u ~ \tilde\mathbf{u} u~关于 z ~ \tilde\mathbf{z} z~的表达式即得证.

例6 g : E → ( − ∞ , ∞ ] g:\mathbb{E}\to(-\infty,\infty] g:E(,]为正常闭凸函数, 其中 E = R d \mathbb{E}=\mathbb{R}^d E=Rd; f : E m → ( − ∞ , ∞ ] f:\mathbb{E}^m\to(-\infty,\infty] f:Em(,]定义为 f ( x 1 , x 2 , … , x m ) = g ( x 1 + x 2 + ⋯ + x m ) . f(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)=g(\mathbf{x}_1+\mathbf{x}_2+\cdots+\mathbf{x}_m). f(x1,x2,,xm)=g(x1+x2++xm).利用复合运算, f f f可以写成函数复合的形式: f ( x 1 , x 2 , … , x m ) = g ( A ( x 1 , x 2 , … , x m ) ) f(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)=g(\mathcal{A}(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)) f(x1,x2,,xm)=g(A(x1,x2,,xm)), 其中 A : E m → E \mathcal{A}:\mathbb{E}^m\to\mathbb{E} A:EmE是线性变换 A ( x 1 , x 2 , … , x m ) = x 1 + x 2 + ⋯ + x m . \mathcal{A}(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)=\mathbf{x}_1+\mathbf{x}_2+\cdots+\mathbf{x}_m. A(x1,x2,,xm)=x1+x2++xm.于是 A \mathcal{A} A的伴随变换 A T : E → E m \mathcal{A}^T:\mathbb{E}\to\mathbb{E}^m AT:EEm A T ( x ) = ( x , x , … , x ) , \mathcal{A}^T(\mathbf{x})=(\mathbf{x},\mathbf{x},\ldots,\mathbf{x}), AT(x)=(x,x,,x),于是对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, A ( A T ( x ) ) = m x . \mathcal{A}(\mathcal{A}^T(\mathbf{x}))=m\mathbf{x}. A(AT(x))=mx.因此, 在定理7中令 α = m ,   b = 0 \alpha=m,\,\mathbf{b}=\mathbf{0} α=m,b=0, 对 ∀ ( x 1 , x 2 , … , x m ) ∈ E m \forall(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)\in\mathbb{E}^m (x1,x2,,xm)Em, p r o x f ( x 1 , x 2 , … , x m ) j = x j + 1 m ( p r o x m g ( ∑ i = 1 m x i ) − ∑ i = 1 m x i ) , j = 1 , 2 , … , m . \boxed{\mathrm{prox}_f(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)_j=\mathbf{x}_j+\frac{1}{m}\left(\mathrm{prox}_{mg}\left(\sum_{i=1}^m\mathbf{x}_i\right)-\sum_{i=1}^m\mathbf{x}_i\right),\quad j=1,2,\ldots,m.} proxf(x1,x2,,xm)j=xj+m1(proxmg(i=1mxi)i=1mxi),j=1,2,,m.

例7 f : R n → E f:\mathbb{R}^n\to\mathbb{E} f:RnE定义为 f ( x ) = ∣ a T x ∣ f(\mathbf{x})=|\mathbf{a}^T\mathbf{x}| f(x)=aTx, 其中 a ∈ R n ∖ { 0 } \mathbf{a}\in\mathbb{R}^n\setminus\{\mathbf{0}\} aRn{0}. 利用复合运算, 我们可以将 f f f写成 f ( x ) = g ( a T x ) f(\mathbf{x})=g(\mathbf{a}^T\mathbf{x}) f(x)=g(aTx), 其中 g ( t ) = ∣ t ∣ g(t)=|t| g(t)=t. 由引理1中的 p r o x g 2 \mathrm{prox}_{g_2} proxg2, p r o x λ g = T λ \mathrm{prox}_{\lambda g}=\mathcal{T}_{\lambda} proxλg=Tλ, 其中 T λ ( x ) = [ ∣ x ∣ − λ ] + s g n ( x ) \mathcal{T}_{\lambda}(x)=[|x|-\lambda]_+\mathrm{sgn}(x) Tλ(x)=[xλ]+sgn(x)为软阈值函数. 在定理7中令 α = ∥ a ∥ 2 ,   b = 0 \alpha=\Vert\mathbf{a}\Vert^2,\,\mathbf{b}=\mathbf{0} α=a2,b=0, A : x ↦ a T x \mathcal{A}:\mathbf{x}\mapsto\mathbf{a}^T\mathbf{x} A:xaTx, 就有 p r o x f ( x ) = x + 1 ∥ a ∥ 2 ( T ∥ a ∥ 2 ( a T x ) ) − a T x ) a . \boxed{\mathrm{prox}_f(\mathbf{x})=\mathbf{x}+\frac{1}{\Vert\mathbf{a}\Vert^2}\left(\mathcal{T}_{\Vert\mathbf{a}\Vert^2}\left(\mathbf{a}^T\mathbf{x})\right)-\mathbf{a}^T\mathbf{x}\right)\mathbf{a}.} proxf(x)=x+a21(Ta2(aTx))aTx)a.

定理8 (范数复合) 设 f : E → R f:\mathbb{E}\to\mathbb{R} f:ER定义为 f ( x ) = g ( ∥ x ∥ ) f(\mathbf{x})=g(\Vert\mathbf{x}\Vert) f(x)=g(x), 其中 g : R → ( − ∞ , ∞ ] g:\mathbb{R}\to(-\infty,\infty] g:R(,]为正常闭凸函数, 满足 d o m ( g ) ⊂ [ 0 , ∞ ) \mathrm{dom}(g)\subset[0,\infty) dom(g)[0,). 于是 p r o x f ( x ) = { p r o x g ( ∥ x ∥ ) x ∥ x ∥ , x ≠ 0 , { u ∈ E : ∥ u ∥ = p r o x g ( 0 ) } , x = 0 . \mathrm{prox}_f(\mathbf{x})=\left\{\begin{array}{ll}\mathrm{prox}_g(\Vert\mathbf{x}\Vert)\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}, & \mathbf{x\ne0},\\\{\mathbf{u}\in\mathbb{E}:\Vert\mathbf{u}\Vert=\mathrm{prox}_g(0)\}, & \mathbf{x=0}.\end{array}\right. proxf(x)={proxg(x)xx,{uE:u=proxg(0)},x=0,x=0.

证明: 由定义, p r o x f ( 0 ) \mathrm{prox}_f(\mathbf{0}) proxf(0)是以下问题的全局极小点: min ⁡ u ∈ E { f ( u ) + 1 2 ∥ u ∥ 2 } = min ⁡ u ∈ E { g ( ∥ u ∥ ) + 1 2 ∥ u ∥ 2 } . \min_{\mathbf{u}\in\mathbb{E}}\left\{f(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u}\Vert^2\right\}=\min_{\mathbf{u}\in\mathbb{E}}\left\{g(\Vert\mathbf{u}\Vert)+\frac{1}{2}\Vert\mathbf{u}\Vert^2\right\}. uEmin{f(u)+21u2}=uEmin{g(u)+21u2}.作变量替换 w = ∥ u ∥ w=\Vert\mathbf{u}\Vert w=u, 则问题变为等价的4 min ⁡ w ∈ R { g ( w ) + 1 2 w 2 } . \min_{w\in\mathbb{R}}\left\{g(w)+\frac{1}{2}w^2\right\}. wRmin{g(w)+21w2}.此问题最优解为 p r o x g ( 0 ) \mathrm{prox}_g(0) proxg(0) 5, 因此 p r o x f ( 0 ) \mathrm{prox}_f(\mathbf{0}) proxf(0)就是所有满足 ∥ u ∥ = p r o x g ( 0 ) \Vert\mathbf{u}\Vert=\mathrm{prox}_g(0) u=proxg(0) u \mathbf{u} u. 下面考虑 x ≠ 0 \mathbf{x\ne0} x=0的情形. min ⁡ u ∈ E { g ( ∥ u ∥ ) + 1 2 ∥ u − x ∥ 2 } = min ⁡ u ∈ E { g ( ∥ u ∥ ) + 1 2 ∥ u ∥ 2 − ⟨ u , x ⟩ + 1 2 ∥ x ∥ 2 } = min ⁡ α ∈ R + min ⁡ u ∈ E : ∥ u ∥ = α { g ( α ) + 1 2 α 2 − ⟨ u , x ⟩ + 1 2 ∥ x ∥ 2 } . \begin{aligned}\min_{\mathbf{u}\in\mathbb{E}}\left\{g(\Vert\mathbf{u}\Vert)+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}&=\min_{\mathbf{u}\in\mathbb{E}}\left\{g(\Vert\mathbf{u}\Vert)+\frac{1}{2}\Vert\mathbf{u}\Vert^2-\langle\mathbf{u,x}\rangle+\frac{1}{2}\Vert\mathbf{x}\Vert^2\right\}\\&=\min_{\alpha\in\mathbb{R}_+}\min_{\mathbf{u}\in\mathbb{E}:\Vert\mathbf{u}\Vert=\alpha}\left\{g(\alpha)+\frac{1}{2}\alpha^2-\langle\mathbf{u,x}\rangle+\frac{1}{2}\Vert\mathbf{x}\Vert^2\right\}.\end{aligned} uEmin{g(u)+21ux2}=uEmin{g(u)+21u2u,x+21x2}=αR+minuE:u=αmin{g(α)+21α2u,x+21x2}.根据Cauchy-Schwarz不等式易知内部极小化问题的解为 u = α x / ∥ x ∥ \mathbf{u}=\alpha\mathbf{x}/\Vert\mathbf{x}\Vert u=αx/x, 对应的最优值为 g ( α ) + 1 2 ( α − ∥ x ∥ ) 2 g(\alpha)+\frac{1}{2}(\alpha-\Vert\mathbf{x}\Vert)^2 g(α)+21(αx)2. 因此要得到此时的 p r o x f ( x ) \mathrm{prox}_f(\mathbf{x}) proxf(x), 只需求解外部极小化问题: α = arg ⁡ min ⁡ α ∈ R + { g ( α ) + 1 2 ( α − ∥ x ∥ 2 ) } = arg ⁡ min ⁡ α ∈ R { g ( α ) + 1 2 ( α − ∥ x ∥ 2 ) } = p r o x g ( ∥ x ∥ ) . \begin{aligned}\alpha&=\arg\min_{\alpha\in\mathbb{R}_+}\left\{g(\alpha)+\frac{1}{2}(\alpha-\Vert\mathbf{x}\Vert^2)\right\}\\&=\arg\min_{\alpha\in\mathbb{R}}\left\{g(\alpha)+\frac{1}{2}(\alpha-\Vert\mathbf{x}\Vert^2)\right\}\\&=\mathrm{prox}_g(\Vert\mathbf{x}\Vert).\end{aligned} α=argαR+min{g(α)+21(αx2)}=argαRmin{g(α)+21(αx2)}=proxg(x).于是 p r o x f ( x ) = p r o x g ( ∥ x ∥ ) x ∥ x ∥ \mathrm{prox}_f(\mathbf{x})=\mathrm{prox}_g(\Vert\mathbf{x}\Vert)\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert} proxf(x)=proxg(x)xx.

例8 (欧式范数的prox) 设 f : E → R f:\mathbb{E}\to\mathbb{R} f:ER定义为 f ( x ) = λ ∥ x ∥ f(\mathbf{x})=\lambda\Vert\mathbf{x}\Vert f(x)=λx, 其中 λ > 0 \lambda>0 λ>0, ∥ ⋅ ∥ \Vert\cdot\Vert 是欧式范数. 利用复合运算, f ( x ) = g ( ∥ x ∥ ) f(\mathbf{x})=g(\Vert\mathbf{x}\Vert) f(x)=g(x), 其中 g ( t ) = { λ t , t ≥ 0 , ∞ , t < 0. g(t)=\left\{\begin{array}{ll}\lambda t, & t\ge0,\\\infty, & t<0.\end{array}\right. g(t)={λt,,t0,t<0.由定理8, 对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, p r o x f ( x ) = { p r o x g ( ∥ x ∥ ) x ∥ x ∥ , x ≠ 0 , { u ∈ E : ∥ u ∥ = p r o x g ( 0 ) } , x = 0 . \mathrm{prox}_f(\mathbf{x})=\left\{\begin{array}{ll}\mathrm{prox}_g(\Vert\mathbf{x}\Vert)\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}, & \mathbf{x\ne0},\\\{\mathbf{u}\in\mathbb{E}:\Vert\mathbf{u}\Vert=\mathrm{prox}_g(0)\}, & \mathbf{x=0}.\end{array}\right. proxf(x)={proxg(x)xx,{uE:u=proxg(0)},x=0,x=0.由引理1中的 p r o x g 1 \mathrm{prox}_{g_1} proxg1, p r o x g ( t ) = [ t − λ ] + \mathrm{prox}_g(t)=[t-\lambda]_+ proxg(t)=[tλ]+. 因此 p r o x g ( 0 ) = 0 ,   p r o x g ( ∥ x ∥ ) = [ ∥ x ∥ − λ ] + \mathrm{prox}_g(0)=0,\,\mathrm{prox}_g(\Vert\mathbf{x}\Vert)=[\Vert\mathbf{x}\Vert-\lambda]_+ proxg(0)=0,proxg(x)=[xλ]+. 代入可得 p r o x f ( x ) = { [ ∥ x ∥ − λ ] + x ∥ x ∥ , x ≠ 0 , 0 , x = 0 = ( 1 − λ max ⁡ { ∥ x ∥ , λ } ) x . \boxed{\mathrm{prox}_f(\mathbf{x})=\left\{\begin{array}{ll}[\Vert\mathbf{x}\Vert-\lambda]_+\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}, & \mathbf{x\ne0},\\\mathbf{0}, & \mathbf{x=0}\end{array}\right.=\left(1-\frac{\lambda}{\max\{\Vert\mathbf{x}\Vert,\lambda\}}\right)\mathbf{x}.} proxf(x)={[xλ]+xx,0,x=0,x=0=(1max{x,λ}λ)x.

例9 (立方欧式范数的prox) 设 f ( x ) = λ ∥ x ∥ 3 f(\mathbf{x})=\lambda\Vert\mathbf{x}\Vert^3 f(x)=λx3, 其中 λ > 0 \lambda>0 λ>0. 利用复合运算, f ( x ) = λ g ( ∥ x ∥ ) f(\mathbf{x})=\lambda g(\Vert\mathbf{x}\Vert) f(x)=λg(x), 其中 g ( t ) = { t 3 , t ≥ 0 , ∞ , t < 0. g(t)=\left\{\begin{array}{ll}t^3, & t\ge0,\\\infty, & t<0.\end{array}\right. g(t)={t3,,t0,t<0.首先由定理8, 对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, p r o x f ( x ) = { p r o x g ( ∥ x ∥ ) x ∥ x ∥ , x ≠ 0 , { u ∈ E : ∥ u ∥ = p r o x g ( 0 ) } , x = 0 . \mathrm{prox}_f(\mathbf{x})=\left\{\begin{array}{ll}\mathrm{prox}_g(\Vert\mathbf{x}\Vert)\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}, & \mathbf{x\ne0},\\\{\mathbf{u}\in\mathbb{E}:\Vert\mathbf{u}\Vert=\mathrm{prox}_g(0)\}, & \mathbf{x=0}.\end{array}\right. proxf(x)={proxg(x)xx,{uE:u=proxg(0)},x=0,x=0.再由引理1的 p r o x g 3 \mathrm{prox}_{g_3} proxg3, p r o x g ( t ) = − 1 + 1 + 12 λ [ t ] + 6 λ \mathrm{prox}_g(t)=\frac{-1+\sqrt{1+12\lambda[t]_+}}{6\lambda} proxg(t)=6λ1+1+12λ[t]+ . 因此 p r o x g ( 0 ) = 0 \mathrm{prox}_g(0)=0 proxg(0)=0, p r o x f ( x ) = { − 1 + 1 + 12 λ ∥ x ∥ 6 λ x ∥ x ∥ , x ≠ 0 , 0 , x = 0 = 2 1 + 1 + 12 λ ∥ x ∥ x . \boxed{\mathrm{prox}_f(\mathbf{x})=\left\{\begin{array}{ll}\frac{-1+\sqrt{1+12\lambda\Vert\mathbf{x}\Vert}}{6\lambda}\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}, & \mathbf{x\ne0},\\\mathbf{0}, & \mathbf{x=0}\end{array}\right.=\frac{2}{1+\sqrt{1+12\lambda\Vert\mathbf{x}\Vert}}\mathbf{x}.} proxf(x)={6λ1+1+12λx xx,0,x=0,x=0=1+1+12λx 2x.

例10 (负欧式范数的prox) 设 f : E → R f:\mathbb{E}\to\mathbb{R} f:ER定义为 f ( x ) = − λ ∥ x ∥ f(\mathbf{x})=-\lambda\Vert\mathbf{x}\Vert f(x)=λx, 其中 λ > 0 \lambda>0 λ>0. 这时 f f f不是凸函数, 因此我们不能说prox是单值映射. 但 f f f是闭函数, 且映射 u ↦ f ( u ) + 1 2 ∥ u − x ∥ 2 \mathbf{u}\mapsto f(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2 uf(u)+21ux2对任一 x ∈ E \mathbf{x}\in\mathbb{E} xE是强制的. 于是由定理2, p r o x f ( x ) \mathrm{prox}_f(\mathbf{x}) proxf(x)总是非空的. 为计算之, 首先利用复合运算, f ( x ) = g ( ∥ x ∥ ) f(\mathbf{x})=g(\Vert\mathbf{x}\Vert) f(x)=g(x), 其中 g ( t ) = { − λ t , t ≥ 0 , ∞ , t < 0. g(t)=\left\{\begin{array}{ll}-\lambda t, & t\ge0,\\\infty, & t<0.\end{array}\right. g(t)={λt,,t0,t<0.再由定理8, 对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, p r o x f ( x ) = { p r o x g ( ∥ x ∥ ) x ∥ x ∥ , x ≠ 0 , { u ∈ E : ∥ u ∥ = p r o x g ( 0 ) } , x = 0 . \mathrm{prox}_f(\mathbf{x})=\left\{\begin{array}{ll}\mathrm{prox}_g(\Vert\mathbf{x}\Vert)\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}, & \mathbf{x\ne0},\\\{\mathbf{u}\in\mathbb{E}:\Vert\mathbf{u}\Vert=\mathrm{prox}_g(0)\}, & \mathbf{x=0}.\end{array}\right. proxf(x)={proxg(x)xx,{uE:u=proxg(0)},x=0,x=0.由引理1的 p r o x g 1 \mathrm{prox}_{g_1} proxg1, p r o x g ( t ) = [ t + λ ] + \mathrm{prox}_g(t)=[t+\lambda]_+ proxg(t)=[t+λ]+. 代入可得 p r o x g ( 0 ) = λ \mathrm{prox}_g(0)=\lambda proxg(0)=λ, p r o x f ( x ) = { ( 1 + λ ∥ x ∥ ) x , x ≠ 0 , { u : ∥ u ∥ = λ } , x = 0 . \boxed{\mathrm{prox}_f(\mathbf{x})=\left\{\begin{array}{ll}\left(1+\frac{\lambda}{\Vert\mathbf{x}\Vert}\right)\mathbf{x}, & \mathbf{x\ne0},\\\{\mathbf{u}:\Vert\mathbf{u}\Vert=\lambda\}, & \mathbf{x=0}.\end{array}\right.} proxf(x)={(1+xλ)x,{u:u=λ},x=0,x=0.

例11 (对称区间上绝对值函数的prox) 考虑函数 f : R → ( − ∞ , ∞ ] f:\mathbb{R}\to(-\infty,\infty] f:R(,]定义为 f ( x ) = { λ ∣ x ∣ , ∣ x ∣ ≤ α , ∞ , 其 它 , f(x)=\left\{\begin{array}{ll}\lambda|x|, & |x|\le\alpha,\\\infty, & 其它,\end{array}\right. f(x)={λx,,xα,,其中 λ ∈ [ 0 , ∞ ) ,   α ∈ [ 0 , ∞ ] \lambda\in[0,\infty),\,\alpha\in[0,\infty] λ[0,),α[0,]. 于是 f ( x ) = g ( ∣ x ∣ ) f(x)=g(|x|) f(x)=g(x), 其中 g ( x ) = { λ x , 0 ≤ x ≤ α , ∞ , 其 它 . g(x)=\left\{\begin{array}{ll}\lambda x, & 0\le x\le\alpha,\\\infty, & 其它.\end{array}\right. g(x)={λx,,0xα,.由定理8, 对 ∀ x \forall x x, p r o x f ( x ) = { p r o x g ( ∣ x ∣ ) x ∣ x ∣ , x ≠ 0 , { u ∈ R : ∣ u ∣ = p r o x g ( 0 ) } , x = 0. \mathrm{prox}_f(\mathbf{x})=\left\{\begin{array}{ll}\mathrm{prox}_g(|x|)\frac{x}{|x|}, & x\ne0,\\\{u\in\mathbb{R}:|u|=\mathrm{prox}_g(0)\}, & x=0.\end{array}\right. proxf(x)={proxg(x)xx,{uR:u=proxg(0)},x=0,x=0.由例5, p r o x g ( x ) = min ⁡ { max ⁡ { x − λ , 0 } , α } \mathrm{prox}_g(x)=\min\{\max\{x-\lambda,0\},\alpha\} proxg(x)=min{max{xλ,0},α}, 代入并注意到 x ∣ x ∣ = s g n ( x ) ,   ∀ x ≠ 0 \frac{x}{|x|}=\mathrm{sgn}(x),\,\forall x\ne0 xx=sgn(x),x=0, 可得 p r o x f ( x ) = min ⁡ { max ⁡ { ∣ x ∣ − λ , 0 } , α } s g n ( x ) . \boxed{\mathrm{prox}_f(x)=\min\{\max\{|x|-\lambda,0\},\alpha\}\mathrm{sgn}(x).} proxf(x)=min{max{xλ,0},α}sgn(x).

例12 (盒型区域上加权 ℓ 1 \ell_1 1-范数的prox) 考虑函数 f : R n → R f:\mathbb{R}^n\to\mathbb{R} f:RnR定义为 f ( x ) = { ∑ i = 1 n ω i ∣ x i ∣ , − α ≤ x ≤ α , ∞ , 其 它 , f(\mathbf{x})=\left\{\begin{array}{ll}\sum_{i=1}^n\omega_i|x_i|, & -\bm{\alpha}\le\mathbf{x}\le\bm{\alpha},\\\infty, & 其它,\end{array}\right. f(x)={i=1nωixi,,αxα,,其中 x ∈ R n ,   ω ∈ R + n ,   α ∈ [ 0 , ∞ ] n \mathbf{x}\in\mathbb{R}^n,\,\bm{\omega}\in\mathbb{R}_+^n,\,\bm{\alpha}\in[0,\infty]^n xRn,ωR+n,α[0,]n. 于是 f = ∑ i = 1 n f i f=\sum_{i=1}^nf_i f=i=1nfi, 其中 f i ( x ) = { ω i ∣ x ∣ , − α i ≤ x ≤ α i , ∞ , 其 它 . f_i(x)=\left\{\begin{array}{ll}\omega_i|x|, & -\alpha_i\le x\le\alpha_i,\\\infty, & 其它.\end{array}\right. fi(x)={ωix,,αixαi,.由例11和定理3, 就有 p r o x f ( x ) = ( min ⁡ { max ⁡ { ∣ x i ∣ − ω i , 0 } , α i } s g n ( x i ) ) i = 1 n . \boxed{\mathrm{prox}_f(\mathbf{x})=\left(\min\{\max\{|x_i|-\omega_i,0\},\alpha_i\}\mathrm{sgn}(x_i)\right)_{i=1}^n.} proxf(x)=(min{max{xiωi,0},αi}sgn(xi))i=1n.

3.1 临近计算小结

f ( x ) f(\mathbf{x}) f(x) p r o x f ( x ) \mathrm{prox}_f(\mathbf{x}) proxf(x)假设条件定理号
∑ i = 1 m f i ( x i ) \sum_{i=1}^mf_i(\mathbf{x}_i) i=1mfi(xi) p r o x f 1 ( x 1 ) × ⋯ × p r o x f m ( x m ) \mathrm{prox}_{f_1}(\mathbf{x}_1)\times\cdots\times\mathrm{prox}_{f_m}(\mathbf{x}_m) proxf1(x1)××proxfm(xm)3
g ( λ x + a ) g(\lambda\mathbf{x}+\mathbf{a}) g(λx+a) 1 λ [ p r o x λ 2 g ( λ x + a ) − a ] \frac{1}{\lambda}\left[\mathrm{prox}_{\lambda^2g}(\lambda\mathbf{x}+\mathbf{a})-\mathbf{a}\right] λ1[proxλ2g(λx+a)a] λ ≠ 0 ,   a ∈ E , g \lambda\ne0,\,\mathbf{a}\in\mathbb{E},g λ=0,aE,g正常4
λ g ( x / λ ) \lambda g(\mathbf{x}/\lambda) λg(x/λ) λ p r o x g / λ ( x / λ ) \lambda\mathrm{prox}_{g/\lambda}(\mathbf{x}/\lambda) λproxg/λ(x/λ) λ ≠ 0 ,   g \lambda\ne0,\,g λ=0,g正常5
g ( x ) + c 2 ∥ x ∥ 2 + ⟨ a , x ⟩ + γ g(\mathbf{x})+\frac{c}{2}\Vert\mathbf{x}\Vert^2+\langle\mathbf{a,x}\rangle+\gamma g(x)+2cx2+a,x+γ p r o x 1 c + 1 g ( x − a c + 1 ) \mathrm{prox}_{\frac{1}{c+1}g}\left(\frac{\mathbf{x-a}}{c+1}\right) proxc+11g(c+1xa) a ∈ E ,   c > 0 ,   γ ∈ R ,   g \mathbf{a}\in\mathbb{E},\,c>0,\,\gamma\in\mathbb{R},\,g aE,c>0,γR,g正常6
g ( A ( x ) + b ) g(\mathcal{A}(\mathbf{x})+\mathbf{b}) g(A(x)+b) x + 1 α A T ( p r o x α g ( A ( x ) + b ) − A ( x ) − b ) \mathbf{x}+\frac{1}{\alpha}\mathcal{A}^T\left(\mathrm{prox}_{\alpha g}(\mathcal{A}(\mathbf{x})+\mathbf{b})-\mathcal{A}(\mathbf{x})-\mathbf{b}\right) x+α1AT(proxαg(A(x)+b)A(x)b) b ∈ R m ,   A : V → R m ,   A ∘ A T = α I ,   g \mathbf{b}\in\mathbb{R}^m,\,\mathcal{A}:\mathbb{V}\to\mathbb{R}^m,\,\mathcal{A}\circ\mathcal{A}^T=\alpha\mathcal{I},\,g bRm,A:VRm,AAT=αI,g正常闭凸, α > 0 \alpha>0 α>07
g ( ∥ x ∥ ) g(\Vert\mathbf{x}\Vert) g(x) p r o x g ( ∥ x ∥ ) x ∥ x ∥ , x ≠ 0 , { u : ∥ u ∥ = p r o x g ( 0 ) } , x = 0 . \begin{array}{ll}\mathrm{prox}_g(\Vert\mathbf{x}\Vert)\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}, & \mathbf{x\ne0},\\\{\mathbf{u}:\Vert\mathbf{u}\Vert=\mathrm{prox}_g(0)\}, & \mathbf{x=0}.\end{array} proxg(x)xx,{u:u=proxg(0)},x=0,x=0. g g g正常闭凸, d o m ( g ) ⊂ [ 0 , ∞ ) \mathrm{dom}(g)\subset[0,\infty) dom(g)[0,)8

4. 指示函数的prox–正交投影

4.1 第一投影定理

g : E → ( − ∞ , ∞ ] g:\mathbb{E}\to(-\infty,\infty] g:E(,]定义为 g ( x ) = δ C ( x ) g(\mathbf{x})=\delta_C(\mathbf{x}) g(x)=δC(x), 其中 C C C为非空集合. 则 p r o x g ( x ) = arg ⁡ min ⁡ u ∈ E { δ C ( u ) + 1 2 ∥ u − x ∥ 2 } = arg ⁡ min ⁡ u ∈ C ∥ u − x ∥ 2 = P C ( x ) . \mathrm{prox}_g(\mathbf{x})=\arg\min_{\mathbf{u}\in\mathbb{E}}\left\{\delta_C(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}=\arg\min_{\mathbf{u}\in C}\Vert\mathbf{u-x}\Vert^2=P_C(\mathbf{x}). proxg(x)=arguEmin{δC(u)+21ux2}=arguCminux2=PC(x).于是, 集合指示函数的临近映射就是同一集合上的正交投影算子.

定理9 C ⊂ E C\subset\mathbb{E} CE非空. 则 p r o x δ C ( x ) = P C ( x ) ,   ∀ x ∈ E \mathrm{prox}_{\delta_C}(\mathbf{x})=P_C(\mathbf{x}),\,\forall\mathbf{x}\in\mathbb{E} proxδC(x)=PC(x),xE.

若除了非空外, C C C还是闭凸集, 则相应的指示函数 δ C \delta_C δC就是正常闭凸函数, 从而由第一临近定理, 正交投影算子就是单值的.

定理10 (第一投影定理) 设 C ⊂ E C\subset\mathbb{E} CE为一非空闭凸集. 则对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, P C ( x ) P_C(\mathbf{x}) PC(x)是单点集.

4.2 R n \mathbb{R}^n Rn中的例子

引理2 (到 R n \mathbb{R}^n Rn子集上的正交投影) 以下是 R n \mathbb{R}^n Rn中的一些非空闭凸集及其对应的正交投影:
非负象限 C 1 = R + n , [ x ] + , 盒型区域 C 2 = Box [ ℓ , u ] , ( min ⁡ { max ⁡ { x i , ℓ i } , u i } ) i = 1 n , 仿射集 C 3 = { x ∈ R n : A x = b } , x − A T ( A A T ) − 1 ( A x − b ) , ℓ 2 球 C 4 = B ∥ ⋅ ∥ 2 [ c , r ] , c + r max ⁡ { ∥ x − c ∥ 2 , r } ( x − c ) , 半空间 C 5 = { x : a T x ≤ α } , x − [ a T x − α ] + ∥ a ∥ 2 a , \begin{aligned}& \text{非负象限} &&C_1=\mathbb{R}_+^n, &&[\mathbf{x}]_+,\\&\text{盒型区域} &&C_2=\text{Box}[\mathbf{\ell},\mathbf{u}], &&(\min\{\max\{x_i,\ell_i\},u_i\})_{i=1}^n,\\&\text{仿射集} &&C_3=\{\mathbf{x}\in\mathbb{R}^n:\mathbf{Ax=b}\}, &&\mathbf{x}-\mathbf{A}^T(\mathbf{AA}^T)^{-1}(\mathbf{Ax-b}),\\&\ell_2\text{球} &&C_4=B_{\Vert\cdot\Vert_2}[\mathbf{c},r], &&\mathbf{c}+\frac{r}{\max\{\Vert\mathbf{x-c}\Vert_2,r\}}(\mathbf{x-c}),\\&\text{半空间} &&C_5=\{\mathbf{x}:\mathbf{a}^T\mathbf{x}\le\alpha\}, &&\mathbf{x}-\frac{[\mathbf{a}^T\mathbf{x}-\alpha]_+}{\Vert\mathbf{a}\Vert^2}\mathbf{a},\end{aligned} 非负象限盒型区域仿射集2半空间C1=R+n,C2=Box[,u],C3={xRn:Ax=b},C4=B2[c,r],C5={x:aTxα},[x]+,(min{max{xi,i},ui})i=1n,xAT(AAT)1(Axb),c+max{xc2,r}r(xc),xa2[aTxα]+a,其中 ℓ ∈ [ − ∞ , ∞ ) n ,   u ∈ ( − ∞ , ∞ ] n : ℓ ≤ u ,   A ∈ R m × n : rank ( A ) = m ,   b ∈ R m ,   c ∈ R n ,   r > 0 ,   a ∈ R n ∖ { 0 } ,   α ∈ R \mathbf{\ell}\in[-\infty,\infty)^n,\,\mathbf{u}\in(-\infty,\infty]^n:\mathbf{\ell}\le\mathbf{u},\,\mathbf{A}\in\mathbb{R}^{m\times n}:\text{rank}(\mathbf{A})=m,\,\mathbf{b}\in\mathbb{R}^m,\,\mathbf{c}\in\mathbb{R}^n,\,r>0,\,\mathbf{a}\in\mathbb{R}^n\setminus\{\mathbf{0}\},\,\alpha\in\mathbb{R} [,)n,u(,]n:u,ARm×n:rank(A)=m,bRm,cRn,r>0,aRn{0},αR.

引理2的结论较易验证. 注意尽管我们将盒型子集的概念扩充到无界情形, 但盒型子集总是 R n \mathbb{R}^n Rn的子集. 例如 Box [ 0 , ∞ e ] = R + n \text{Box}[\mathbf{0},\infty\mathbf{e}]=\mathbb{R}_+^n Box[0,e]=R+n.

4.3 到超平面与盒型区域之交上的投影

定理11 (到超平面与盒型区域之交上的正交投影) 设 C ⊂ R n C\subset\mathbb{R}^n CRn C = H a , b ∩ Box [ ℓ , u ] = { x ∈ R n : a T x = b ,   ℓ ≤ x ≤ u } , C=H_{\mathbf{a},b}\cap\text{Box}[\mathbf{\ell},\mathbf{u}]=\{\mathbf{x}\in\mathbb{R}^n:\mathbf{a}^T\mathbf{x}=b,\,\mathbf{\ell}\le\mathbf{x}\le\mathbf{u}\}, C=Ha,bBox[,u]={xRn:aTx=b,xu},其中 a ∈ R n ∖ { 0 } ,   b ∈ R ,   ℓ ∈ [ − ∞ , ∞ ) n ,   u ∈ ( − ∞ , ∞ ] n \mathbf{a}\in\mathbb{R}^n\setminus\{\mathbf{0}\},\,b\in\mathbb{R},\,\mathbf{\ell}\in[-\infty,\infty)^n,\,\mathbf{u}\in(-\infty,\infty]^n aRn{0},bR,[,)n,u(,]n. 假设 C ≠ ∅ C\ne\emptyset C=. 则 P C ( x ) = P Box [ ℓ , u ] ( x − μ ∗ a ) , P_C(\mathbf{x})=P_{\text{Box}[\mathbf{\ell},\mathbf{u}]}(\mathbf{x}-\mu^*\mathbf{a}), PC(x)=PBox[,u](xμa),其中 Box [ ℓ , u ] = { y ∈ R n : ℓ i ≤ y i ≤ u i ,   i = 1 , 2 , … , n } \text{Box}[\mathbf{\ell},\mathbf{u}]=\{\mathbf{y}\in\mathbb{R}^n:\ell_i\le y_i\le u_i,\,i=1,2,\ldots,n\} Box[,u]={yRn:iyiui,i=1,2,,n}, μ ∗ \mu^* μ为方程 a T P Box [ ℓ , u ] ( x − μ a ) = b \mathbf{a}^TP_{\text{Box}[\mathbf{\ell},\mathbf{u}]}(\mathbf{x-\mu a})=b aTPBox[,u](xμa)=b的解.

证明: 对 ∀ x ∈ R n \forall\mathbf{x}\in\mathbb{R}^n xRn, 它到 C C C上的正交投影就是以下问题的唯一最优解: min ⁡ y { 1 2 ∥ y − x ∥ 2 2 : a T y = b ,   ℓ ≤ y ≤ u } . \min_{\mathbf{y}}\left\{\frac{1}{2}\Vert\mathbf{y-x}\Vert_2^2:\mathbf{a}^T\mathbf{y}=b,\,\mathbf{\ell}\le\mathbf{y}\le\mathbf{u}\right\}. ymin{21yx22:aTy=b,yu}.此问题的Lagrange函数是 L ( y ; μ ) = 1 2 ∥ y − x ∥ 2 2 + μ ( a T y − b ) = 1 2 ∥ y − ( x − μ a ) ∥ 2 2 − μ 2 2 ∥ a ∥ 2 2 + μ ( a T x − b ) . L(\mathbf{y};\mu)=\frac{1}{2}\Vert\mathbf{y-x}\Vert_2^2+\mu(\mathbf{a}^T\mathbf{y}-b)=\frac{1}{2}\Vert\mathbf{y}-(\mathbf{x-\mu a})\Vert^2_2-\frac{\mu^2}{2}\Vert\mathbf{a}\Vert_2^2+\mu(\mathbf{a}^T\mathbf{x}-b). L(y;μ)=21yx22+μ(aTyb)=21y(xμa)222μ2a22+μ(aTxb).由于对此问题有强对偶性成立, 于是有最优性条件: y ∗ \mathbf{y}^* y为问题最优解当且仅当存在 μ ∗ ∈ R \mu^*\in\mathbb{R} μR使得 y ∗ ∈ arg ⁡ min ⁡ ℓ ≤ y ≤ u L ( y ; μ ∗ ) , a T y ∗ = b . \begin{aligned}\mathbf{y}^*&\in\arg\min_{\mathbf{\ell}\le\mathbf{y}\le\mathbf{u}}L(\mathbf{y};\mu^*),\\\mathbf{a}^T\mathbf{y}^*&=b.\end{aligned} yaTyargyuminL(y;μ),=b.利用Lagrange函数的表达式, y ∗ = P Box [ ℓ , u ] ( x − μ ∗ a ) , \mathbf{y}^*=P_{\text{Box}[\mathbf{\ell},\mathbf{u}]}(\mathbf{x}-\mu^*\mathbf{a}), y=PBox[,u](xμa),可行性条件为 a T P Box [ ℓ , u ] ( x − μ ∗ a ) = b . \mathbf{a}^TP_{\text{Box}[\mathbf{\ell},\mathbf{u}]}(\mathbf{x}-\mu^*\mathbf{a})=b. aTPBox[,u](xμa)=b.6

推论1 (到单位单纯形上的正交投影) 对 ∀ x ∈ R n \forall\mathbf{x}\in\mathbb{R}^n xRn, P Δ n ( x ) = [ x − μ ∗ e ] + , P_{\Delta_n}(\mathbf{x})=[\mathbf{x-\mu^* e}]_+, PΔn(x)=[xμe]+,其中 μ ∗ \mu^* μ为方程 e T [ x − μ ∗ e ] + − 1 = 0 \mathbf{e}^T[\mathbf{x-\mu^*e}]_+-1=0 eT[xμe]+1=0的解.

证明: 在定理11中令 a = e ,   b = 1 ,   ℓ i = 0 ,   u i = ∞ ,   i = 1 , 2 , … , n \mathbf{a=e},\,b=1,\,\ell_i=0,\,u_i=\infty,\,i=1,2,\ldots,n a=e,b=1,i=0,ui=,i=1,2,,n, 并注意到此时 P Box [ ℓ , u ] ( x ) = [ x ] + P_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x})=[\mathbf{x}]_+ PBox[,u](x)=[x]+即得证.

在下面两个小节中, 我们还将讨论到水平集上镜图上的正交投影, 以得到更多关于正交投影算子的结论.

4.4 到水平集上的正交投影

定理12 (到水平集上的正交投影) 设 C = L e v ( f , α ) = { x ∈ E : f ( x ) ≤ α } C=\mathrm{Lev}(f,\alpha)=\{\mathbf{x}\in\mathbb{E}:f(\mathbf{x})\le\alpha\} C=Lev(f,α)={xE:f(x)α}, 其中 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常闭凸函数, α ∈ R \alpha\in\mathbb{R} αR. 假设存在 x ^ ∈ E \hat\mathbf{x}\in\mathbb{E} x^E, 使得 f ( x ^ ) < α f(\hat\mathbf{x})<\alpha f(x^)<α. 于是 P C ( x ) = { P d o m ( f ) ( x ) , f ( P d o m ( f ) ( x ) ) ≤ α , p r o x λ ∗ f ( x ) , 其 它 , P_C(\mathbf{x})=\left\{\begin{array}{ll}P_{\mathrm{dom}(f)}(\mathbf{x}), & f\left(P_{\mathrm{dom}(f)}(\mathbf{x})\right)\le\alpha,\\\mathrm{prox}_{\lambda^*f}(\mathbf{x}), & 其它,\end{array}\right. PC(x)={Pdom(f)(x),proxλf(x),f(Pdom(f)(x))α,,其中 λ ∗ \lambda^* λ为方程 φ ( λ ) ≡ f ( p r o x λ f ( x ) ) − α = 0 \varphi(\lambda)\equiv f(\mathrm{prox}_{\lambda f}(\mathbf{x}))-\alpha=0 φ(λ)f(proxλf(x))α=0的任一正解. 另外, φ \varphi φ单调递减.

证明: x \mathbf{x} x C C C上的正交投影是以下问题的最优解: min ⁡ y ∈ E { 1 2 ∥ y − x ∥ 2 : f ( y ) ≤ α ,   y ∈ X } , \min_{\mathbf{y}\in\mathbb{E}}\left\{\frac{1}{2}\Vert\mathbf{y-x}\Vert^2:f(\mathbf{y})\le\alpha,\,\mathbf{y}\in X\right\}, yEmin{21yx2:f(y)α,yX},其中 X = d o m ( f ) X=\mathrm{dom}(f) X=dom(f). 此问题的Lagrange函数为( λ ≥ 0 \lambda\ge0 λ0): L ( y ; λ ) = 1 2 ∥ y − x ∥ 2 + λ f ( y ) − α λ . L(\mathbf{y};\lambda)=\frac{1}{2}\Vert\mathbf{y-x}\Vert^2+\lambda f(\mathbf{y})-\alpha\lambda. L(y;λ)=21yx2+λf(y)αλ.由于对此问题有强对偶性成立, 因此有最优性条件: y ∗ \mathbf{y}^* y为问题最优解当且仅当存在 λ ∗ ∈ R + \lambda^*\in\mathbb{R}^+ λR+使得 y ∗ ∈ arg ⁡ min ⁡ y ∈ X L ( y ; λ ∗ ) , f ( y ∗ ) ≤ α , λ ∗ ( f ( y ∗ ) − α ) = 0. \begin{aligned}\mathbf{y}^*&\in\arg\min_{\mathbf{y}\in X}L(\mathbf{y};\lambda^*),\\f(\mathbf{y}^*)&\le\alpha,\\\lambda^*(f(\mathbf{y}^*)-\alpha)&=0.\end{aligned} yf(y)λ(f(y)α)argyXminL(y;λ),α,=0.(i) 若 P X ( x ) P_X(\mathbf{x}) PX(x)存在且 f ( P X ( x ) ) ≤ α f(P_X(\mathbf{x}))\le\alpha f(PX(x))α, 则 y ∗ = P X ( x ) ,   λ ∗ = 0 \mathbf{y}^*=P_X(\mathbf{x}),\,\lambda^*=0 y=PX(x),λ=0就满足最优性条件;
(ii) 若 P X ( x ) P_X(\mathbf{x}) PX(x)不存在7 f ( P X ( x ) ) > α f(P_X(\mathbf{x}))>\alpha f(PX(x))>α, 则必有 λ ∗ > 0 \lambda^*>0 λ>0, 此时最优性条件就变成 y ∗ = p r o x λ ∗ f ( x ) ,   f ( p r o x λ ∗ f ( x ) ) = α \mathbf{y}^*=\mathrm{prox}_{\lambda^*f}(\mathbf{x}),\,f(\mathrm{prox}_{\lambda^*f}(\mathbf{x}))=\alpha y=proxλf(x),f(proxλf(x))=α. 这就给出了定理中 P C ( x ) P_C(\mathbf{x}) PC(x)的表达式.

下证 φ \varphi φ单调递减. 任取 0 ≤ λ 1 < λ 2 0\le\lambda_1<\lambda_2 0λ1<λ2. 记 v 1 = p r o x λ 1 f ( x ) ,   v 2 = p r o x λ 2 f ( x ) \mathbf{v}_1=\mathrm{prox}_{\lambda_1f}(\mathbf{x}),\,\mathbf{v}_2=\mathrm{prox}_{\lambda_2f}(\mathbf{x}) v1=proxλ1f(x),v2=proxλ2f(x). 于是 1 2 ∥ v 2 − x ∥ 2 + λ 2 ( f ( v 2 ) − α ) = 1 2 ∥ v 2 − x ∥ 2 + λ 1 ( f ( v 2 ) − α ) + ( λ 2 − λ 1 ) ( f ( v 2 ) − α ) ≥ 1 2 ∥ v 1 − x ∥ 2 + λ 1 ( f ( v 1 ) − α ) + ( λ 2 − λ 1 ) ( f ( v 2 ) − α ) = 1 2 ∥ v 1 − x ∥ 2 + λ 2 ( f ( v 1 ) − α ) + ( λ 2 − λ 1 ) ( f ( v 2 ) − f ( v 1 ) ) ≥ 1 2 ∥ v 2 − x ∥ 2 + λ 2 ( f ( v 2 ) − α ) + ( λ 2 − λ 1 ) ( f ( v 2 ) − f ( v 1 ) ) . \begin{aligned}&\frac{1}{2}\Vert\mathbf{v}_2-\mathbf{x}\Vert^2+\lambda_2(f(\mathbf{v}_2)-\alpha)\\&=\frac{1}{2}\Vert\mathbf{v}_2-\mathbf{x}\Vert^2+\lambda_1(f(\mathbf{v}_2)-\alpha)+(\lambda_2-\lambda_1)(f(\mathbf{v}_2)-\alpha)\\&\ge\frac{1}{2}\Vert\mathbf{v}_1-\mathbf{x}\Vert^2+\lambda_1(f(\mathbf{v}_1)-\alpha)+(\lambda_2-\lambda_1)(f(\mathbf{v}_2)-\alpha)\\&=\frac{1}{2}\Vert\mathbf{v}_1-\mathbf{x}\Vert^2+\lambda_2(f(\mathbf{v}_1)-\alpha)+(\lambda_2-\lambda_1)(f(\mathbf{v}_2)-f(\mathbf{v}_1))\\&\ge\frac{1}{2}\Vert\mathbf{v}_2-\mathbf{x}\Vert^2+\lambda_2(f(\mathbf{v}_2)-\alpha)+(\lambda_2-\lambda_1)(f(\mathbf{v}_2)-f(\mathbf{v}_1)).\end{aligned} 21v2x2+λ2(f(v2)α)=21v2x2+λ1(f(v2)α)+(λ2λ1)(f(v2)α)21v1x2+λ1(f(v1)α)+(λ2λ1)(f(v2)α)=21v1x2+λ2(f(v1)α)+(λ2λ1)(f(v2)f(v1))21v2x2+λ2(f(v2)α)+(λ2λ1)(f(v2)f(v1)).因此, ( λ 2 − λ 1 ) ( f ( v 2 ) − f ( v 1 ) ) ≤ 0 (\lambda_2-\lambda_1)(f(\mathbf{v}_2)-f(\mathbf{v}_1))\le0 (λ2λ1)(f(v2)f(v1))0. 因 λ < λ 2 \lambda<\lambda_2 λ<λ2, 所以 f ( v 2 ) ≤ f ( v 1 ) f(\mathbf{v}_2)\le f(\mathbf{v}_1) f(v2)f(v1). 最后, φ ( λ 2 ) = f ( v 2 ) − α ≤ f ( v 1 ) − α = φ ( λ 1 ) . \varphi(\lambda_2)=f(\mathbf{v}_2)-\alpha\le f(\mathbf{v}_1)-\alpha=\varphi(\lambda_1). φ(λ2)=f(v2)αf(v1)α=φ(λ1).

例13 (到半空间与盒型区域之交上的正交投影) 考虑集合 C = H a , b − ∩ Box [ ℓ , u ] = { x ∈ R n : a T x ≤ b ,   ℓ ≤ x ≤ u } , C=H_{\mathbf{a},b}^-\cap\text{Box}[\mathbf{\ell,u}]=\{\mathbf{x}\in\mathbb{R}^n:\mathbf{a}^T\mathbf{x}\le b,\,\mathbf{\ell\le x\le u}\}, C=Ha,bBox[,u]={xRn:aTxb,xu},其中 a ∈ R n ∖ { 0 } ,   b ∈ R ,   ℓ ∈ [ − ∞ , ∞ ) n ,   u ∈ ( − ∞ , ∞ ] n \mathbf{a}\in\mathbb{R}^n\setminus\{\mathbf{0}\},\,b\in\mathbb{R},\,\mathbf{\ell}\in[-\infty,\infty)^n,\,\mathbf{u}\in(-\infty,\infty]^n aRn{0},bR,[,)n,u(,]n. 假设 C ≠ ∅ C\ne\emptyset C=. 则 C = L e v ( f , b ) C=\mathrm{Lev}(f,b) C=Lev(f,b), 其中 f ( x ) = a T x + δ Box [ ℓ , u ] ( x ) f(\mathbf{x})=\mathbf{a}^T\mathbf{x}+\delta_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x}) f(x)=aTx+δBox[,u](x). 对 ∀ λ > 0 \forall\lambda>0 λ>0, p r o x λ f ( x ) = p r o x λ a T ( ⋅ ) + δ Box [ ℓ , u ] ( ⋅ ) ( x ) = 定 理 6 p r o x δ Box [ ℓ , u ] ( x − λ a ) = P Box [ ℓ , u ] ( x − λ a ) . \mathrm{prox}_{\lambda f}(\mathbf{x})=\mathrm{prox}_{\lambda\mathbf{a}^T(\cdot)+\delta_{\text{Box}[\mathbf{\ell,u}]}(\cdot)}(\mathbf{x})\overset{定理6}{=}\mathrm{prox}_{\delta_{\text{Box}[\mathbf{\ell,u}]}}(\mathbf{x-\lambda a})=P_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x-\lambda a}). proxλf(x)=proxλaT()+δBox[,u]()(x)=6proxδBox[,u](xλa)=PBox[,u](xλa).由定理12, P C ( x ) = { P Box [ ℓ , u ] ( x ) , a T P Box [ ℓ , u ] ( x ) ≤ b , P Box [ ℓ , u ] ( x − λ ∗ a ) , a T P Box [ ℓ , u ] ( x ) > b , 其中 λ ∗ 为 φ ( λ ) = a T P Box [ ℓ , u ] ( x − λ a ) − b 的 任 一 正 根 . \boxed{\begin{aligned}P_C(\mathbf{x})&=\left\{\begin{array}{ll}P_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x}), & \mathbf{a}^TP_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x})\le b,\\P_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x-\lambda^*a}), & \mathbf{a}^TP_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x})>b,\end{array}\right.\\ \text{其中}\lambda^*为\varphi(\lambda)&=\mathbf{a}^TP_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x-\lambda a})-b的任一正根.\end{aligned}} PC(x)其中λφ(λ)={PBox[,u](x),PBox[,u](xλa),aTPBox[,u](x)b,aTPBox[,u](x)>b,=aTPBox[,u](xλa)b.

例14 ( ℓ 1 \ell_1 1球上的正交投影) 设 C = B ∥ ⋅ ∥ 1 [ 0 , α ] = { x ∈ R n : ∥ x ∥ 1 ≤ α } C=B_{\Vert\cdot\Vert_1}[\mathbf{0},\alpha]=\{\mathbf{x}\in\mathbb{R}^n:\Vert\mathbf{x}\Vert_1\le\alpha\} C=B1[0,α]={xRn:x1α}, 其中 α > 0 \alpha>0 α>0. 于是 C = L e v ( f , α ) C=\mathrm{Lev}(f,\alpha) C=Lev(f,α), 其中 f ( x ) = ∥ x ∥ 1 f(\mathbf{x})=\Vert\mathbf{x}\Vert_1 f(x)=x1. 在例2中我们已得到 p r o x λ f ( x ) = T λ ( x ) , ∀ x ∈ R n , \mathrm{prox}_{\lambda f}(\mathbf{x})=\mathcal{T}_{\lambda}(\mathbf{x}),\quad\forall\mathbf{x}\in\mathbb{R}^n, proxλf(x)=Tλ(x),xRn,其中 T λ ( x ) = [ x − λ e ] + ⊙ s g n ( x ) \mathcal{T}_{\lambda}(\mathbf{x})=[\mathbf{x-\lambda e}]_+\odot\mathrm{sgn}(\mathbf{x}) Tλ(x)=[xλe]+sgn(x). 由定理12, P B ∥ ⋅ ∥ 1 [ 0 , α ] ( x ) = { x , ∥ x ∥ 1 ≤ α , T λ ∗ ( x ) , ∥ x ∥ 1 > α , 其 中 λ ∗ 为 φ ( λ ) = ∥ T λ ( x ) ∥ 1 − α 的 任 一 正 根 . \boxed{\begin{aligned}P_{B_{\Vert\cdot\Vert_1}[\mathbf{0},\alpha]}(\mathbf{x})&=\left\{\begin{array}{ll}\mathbf{x}, & \Vert\mathbf{x}\Vert_1\le\alpha,\\\mathcal{T}_{\lambda^*}(\mathbf{x}), & \Vert\mathbf{x}\Vert_1>\alpha,\end{array}\right.\\其中\lambda^*为\varphi(\lambda)&=\Vert\mathcal{T}_{\lambda}(\mathbf{x})\Vert_1-\alpha的任一正根.\end{aligned}} PB1[0,α](x)λφ(λ)={x,Tλ(x),x1α,x1>α,=Tλ(x)1α.

下面的一个例子要用到软阈值映射的推广形式——双边软阈值算子: 对 ∀ a , b ∈ ( − ∞ , ∞ ] n \forall\mathbf{a,b}\in(-\infty,\infty]^n a,b(,]n, 定义 S a , b ( x ) = ( min ⁡ { max ⁡ { ∣ x i ∣ − a i , 0 } , b i } s g n ( x i ) ) i = 1 n , ∀ x ∈ R n . \mathcal{S}_{\mathbf{a,b}}(\mathbf{x})=(\min\{\max\{|x_i|-a_i,0\},b_i\}\mathrm{sgn}(x_i))_{i=1}^n,\quad\forall\mathbf{x}\in\mathbb{R}^n. Sa,b(x)=(min{max{xiai,0},bi}sgn(xi))i=1n,xRn.函数 t ↦ S 1 , 2 ( t ) t\mapsto\mathcal{S}_{1,2}(t) tS1,2(t)的图像可见下图.

在这里插入图片描述
软阈值算子是双边软阈值算子的特例: S λ e , ∞ e = T λ . \mathcal{S}_{\lambda\mathbf{e},\infty\mathbf{e}}=\mathcal{T}_{\lambda}. Sλe,e=Tλ.

例15 (到加权 ℓ 1 \ell_1 1球与盒型区域之交上的正交投影) 设 C ⊂ R n C\subset\mathbb{R}^n CRn C = { x ∈ R n : ∑ i = 1 n ω i ∣ x i ∣ ≤ β ,   − α ≤ x ≤ α } , C=\left\{\mathbf{x}\in\mathbb{R}^n:\sum_{i=1}^n\omega_i|x_i|\le\beta,\,-\bm{\alpha}\le\mathbf{x}\le\bm{\alpha}\right\}, C={xRn:i=1nωixiβ,αxα},其中 ω ∈ R + n ,   α ∈ [ 0 , ∞ ] n ,   β ∈ R + + \bm{\omega}\in\mathbb{R}_+^n,\,\bm{\alpha}\in[0,\infty]^n,\,\beta\in\mathbb{R}_{++} ωR+n,α[0,]n,βR++. 于是 C = L e v ( f , β ) C=\mathrm{Lev}(f,\beta) C=Lev(f,β), 其中 f ( x ) = ω T ∣ x ∣ + δ Box [ − α , α ] ( x ) = { ∑ i = 1 n ω i ∣ x i ∣ , − α ≤ x ≤ α , ∞ , 其 它 , ∀ x ∈ R n . f(\mathbf{x})=\bm{\omega}^T|\mathbf{x}|+\delta_{\text{Box}[-\bm{\alpha},\bm{\alpha}]}(\mathbf{x})=\left\{\begin{array}{ll}\sum_{i=1}^n\omega_i|x_i|, & -\bm{\alpha}\le\mathbf{x}\le\bm{\alpha},\\\infty, & 其它,\end{array}\right.\quad\forall\mathbf{x}\in\mathbb{R}^n. f(x)=ωTx+δBox[α,α](x)={i=1nωixi,,αxα,,xRn.由例12, 对 ∀ λ > 0 ,   x ∈ R n \forall\lambda>0,\,\mathbf{x}\in\mathbb{R}^n λ>0,xRn, p r o x λ f ( x ) = ( min ⁡ { max ⁡ { ∣ x i ∣ − λ ω i , 0 } , α i } s g n ( x i ) ) i = 1 n = S λ ω , α ( x ) . \mathrm{prox}_{\lambda f}(\mathbf{x})=(\min\{\max\{|x_i|-\lambda\omega_i,0\},\alpha_i\}\mathrm{sgn}(x_i))_{i=1}^n=\mathcal{S}_{\lambda\bm{\omega},\bm{\alpha}}(\mathbf{x}). proxλf(x)=(min{max{xiλωi,0},αi}sgn(xi))i=1n=Sλω,α(x).最后由定理12, P C ( x ) = { P Box [ − α , α ] ( x ) , ω T ∣ P Box [ − α , α ] ( x ) ∣ ≤ β , S λ ∗ ω , α ( x ) , ω T ∣ P Box [ − α , α ] ( x ) ∣ > β , 其 中 λ ∗ 是 函 数 φ ( λ ) = ω T ∣ S λ ω , α ( x ) ∣ − β 的 任 一 正 根 . \boxed{\begin{aligned}P_C(\mathbf{x})&=\left\{\begin{array}{ll}P_{\text{Box}[-\bm{\alpha},\bm{\alpha}]}(\mathbf{x}), & \bm{\omega}^T\left|P_{\text{Box}[-\bm{\alpha},\bm{\alpha}]}(\mathbf{x})\right|\le\beta,\\\mathcal{S}_{\lambda^*\bm{\omega},\bm{\alpha}}(\mathbf{x}), & \bm{\omega}^T\left|P_{\text{Box}[-\bm{\alpha},\bm{\alpha}]}(\mathbf{x})\right|>\beta,\end{array}\right.\\其中\lambda^*是函数\varphi(\lambda)&=\bm{\omega}^T\left|\mathcal{S}_{\lambda\bm{\omega},\bm{\alpha}}(\mathbf{x})\right|-\beta的任一正根.\end{aligned}} PC(x)λφ(λ)={PBox[α,α](x),Sλω,α(x),ωTPBox[α,α](x)β,ωTPBox[α,α](x)>β,=ωTSλω,α(x)β.

为说明定理12中对 P d o m ( f ) ( x ) P_{\mathrm{dom}(f)}(\mathbf{x}) Pdom(f)(x)存在性讨论的必要, 下面我们举一个 f f f的有效域非闭的例子.

例16 C = { x ∈ R + + n : ∏ i = 1 n x i ≥ α } , C=\{\mathbf{x}\in\mathbb{R}_{++}^n:\prod_{i=1}^nx_i\ge\alpha\}, C={xR++n:i=1nxiα},其中 α > 0 \alpha>0 α>0. 于是 C C C可以写成 C = { x ∈ R + + n : − ∑ i = 1 n log ⁡ x i ≤ − log ⁡ α } , C=\left\{\mathbf{x}\in\mathbb{R}_{++}^n:-\sum_{i=1}^n\log x_i\le-\log\alpha\right\}, C={xR++n:i=1nlogxilogα},因此 C = L e v ( f , − log ⁡ α ) C=\mathrm{Lev}(f,-\log\alpha) C=Lev(f,logα), 其中 f : R n → ( − ∞ , ∞ ] f:\mathbb{R}^n\to(-\infty,\infty] f:Rn(,]是负对数和函数: f ( x ) = { − ∑ i = 1 n log ⁡ x i , x ∈ R + + n , ∞ , 其 它 . f(\mathbf{x})=\left\{\begin{array}{ll}-\sum_{i=1}^n\log x_i, & \mathbf{x}\in\mathbb{R}_{++}^n,\\\infty, & 其它.\end{array}\right. f(x)={i=1nlogxi,,xR++n,.在例3中我们推出对 ∀ x ∈ R n \forall\mathbf{x}\in\mathbb{R}^n xRn, p r o x λ f ( x ) = ( x j + x j 2 + 4 λ 2 ) j = 1 n . \mathrm{prox}_{\lambda f}(\mathbf{x})=\left(\frac{x_j+\sqrt{x_j^2+4\lambda}}{2}\right)_{j=1}^n. proxλf(x)=2xj+xj2+4λ j=1n.由定理12, 我们就可以得到到 C C C上的正交投影公式. 注意此时若 x ∈ C \mathbf{x}\in C xC, 则 P R + + n ( x ) = x P_{\mathbb{R}_{++}^n}(\mathbf{x})=\mathbf{x} PR++n(x)=x f ( x ) ≤ − log ⁡ α f(\mathbf{x})\le-\log\alpha f(x)logα; 若 x ∉ R + + n \mathbf{x}\notin\mathbb{R}_{++}^n x/R++n, P R + + n ( x ) P_{\mathbb{R}^n_{++}}(\mathbf{x}) PR++n(x)就不存在. 这时直接有 P C ( x ) = p r o x λ ∗ f ( x ) P_C(\mathbf{x})=\mathrm{prox}_{\lambda^*f}(\mathbf{x}) PC(x)=proxλf(x); 若 x ∈ R + + n \mathbf{x}\in\mathbb{R}_{++}^n xR++n f ( x ) > − log ⁡ α f(\mathbf{x})>-\log\alpha f(x)>logα, 则也有 P C ( x ) = p r o x λ ∗ f ( x ) P_C(\mathbf{x})=\mathrm{prox}_{\lambda^* f}(\mathbf{x}) PC(x)=proxλf(x). 后两种情形合在一起就是 x ∉ C \mathbf{x}\notin C x/C. 所以, P C ( x ) = { x , x ∈ C , ( x j + x j 2 + 4 λ ∗ 2 ) j = 1 n , x ∉ C , 其 中 λ ∗ 是 函 数 φ ( λ ) = − ∑ j = 1 n log ⁡ ( x j + x j 2 + 4 λ 2 ) + log ⁡ α 的 任 一 正 根 . \boxed{\begin{aligned}P_C(\mathbf{x})&=\left\{\begin{array}{ll}\mathbf{x}, & \mathbf{x}\in C,\\\left(\frac{x_j+\sqrt{x_j^2+4\lambda^*}}{2}\right)_{j=1}^n, & \mathbf{x}\notin C,\end{array}\right. \\其中\lambda^*是函数\varphi(\lambda)&=-\sum_{j=1}^n\log\left(\frac{x_j+\sqrt{x_j^2+4\lambda}}{2}\right)+\log\alpha的任一正根.\end{aligned}} PC(x)λφ(λ)=x,(2xj+xj2+4λ )j=1n,xC,x/C,=j=1nlog2xj+xj2+4λ +logα.

4.5 到上镜图上的正交投影

由定理12, 我们可以给出到凸函数上镜图的正交投影定理.

定理13 (到上镜图的正交投影) 设 C = e p i ( g ) = { ( x , t ) ∈ E × R : g ( x ) ≤ t } , C=\mathrm{epi}(g)=\{(\mathbf{x},t)\in\mathbb{E}\times\mathbb{R}:g(\mathbf{x})\le t\}, C=epi(g)={(x,t)E×R:g(x)t},其中 g : E → R g:\mathbb{E}\to\mathbb{R} g:ER是凸函数. 则 P C ( ( x , s ) ) = { ( x , s ) , g ( x ) ≤ s , ( p r o x λ ∗ g ( x ) , s + λ ∗ ) , g ( x ) > s , P_C((\mathbf{x},s))=\left\{\begin{array}{ll}(\mathbf{x},s), & g(\mathbf{x})\le s,\\\left(\mathrm{prox}_{\lambda^*g}(\mathbf{x}),s+\lambda^*\right), & g(\mathbf{x})>s,\end{array}\right. PC((x,s))={(x,s),(proxλg(x),s+λ),g(x)s,g(x)>s,其中 λ ∗ \lambda^* λ为函数 ψ ( λ ) = g ( p r o x λ g ( x ) ) − λ − s \psi(\lambda)=g(\mathrm{prox}_{\lambda g}(\mathbf{x}))-\lambda-s ψ(λ)=g(proxλg(x))λs的任一正根. 另外, ψ \psi ψ是单调递减函数.

证明: 定义 f : E × R → R f:\mathbb{E}\times\mathbb{R}\to\mathbb{R} f:E×RR f ( x , t ) ≡ g ( x ) − t f(\mathbf{x},t)\equiv g(\mathbf{x})-t f(x,t)g(x)t. 于是, p r o x λ f ( x , s ) = arg ⁡ min ⁡ y , t { 1 2 ∥ y − x ∥ 2 + 1 2 ( t − s ) 2 + λ f ( y , t ) } = arg ⁡ min ⁡ y , t { 1 2 ∥ y − x ∥ 2 + 1 2 ( t − s ) 2 + λ g ( y ) − λ t } . \begin{aligned}\mathrm{prox}_{\lambda f}(\mathbf{x},s)&=\arg\min_{\mathbf{y},t}\left\{\frac{1}{2}\Vert\mathbf{y-x}\Vert^2+\frac{1}{2}(t-s)^2+\lambda f(\mathbf{y},t)\right\}\\&=\arg\min_{\mathbf{y},t}\left\{\frac{1}{2}\Vert\mathbf{y-x}\Vert^2+\frac{1}{2}(t-s)^2+\lambda g(\mathbf{y})-\lambda t\right\}.\end{aligned} proxλf(x,s)=argy,tmin{21yx2+21(ts)2+λf(y,t)}=argy,tmin{21yx2+21(ts)2+λg(y)λt}.由于问题可分, 因此 p r o x λ f ( x , s ) = ( arg ⁡ min ⁡ y { 1 2 ∥ y − x ∥ 2 + λ g ( y ) } , arg ⁡ min ⁡ t { 1 2 ( t − s ) 2 − λ t } ) = ( p r o x λ g ( x ) , p r o x λ h ( s ) ) , \begin{aligned}\mathrm{prox}_{\lambda f}(\mathbf{x},s)&=\left(\arg\min_{\mathbf{y}}\left\{\frac{1}{2}\Vert\mathbf{y-x}\Vert^2+\lambda g(\mathbf{y})\right\},\arg\min_t\left\{\frac{1}{2}(t-s)^2-\lambda t\right\}\right)\\&=\left(\mathrm{prox}_{\lambda g}(\mathbf{x}),\mathrm{prox}_{\lambda h}(s)\right),\end{aligned} proxλf(x,s)=(argymin{21yx2+λg(y)},argtmin{21(ts)2λt})=(proxλg(x),proxλh(s)),其中 h ( t ) ≡ − t h(t)\equiv-t h(t)t. 由2.2节, p r o x λ h ( z ) = z + λ ,   ∀ z ∈ R \mathrm{prox}_{\lambda h}(z)=z+\lambda,\,\forall z \in\mathbb{R} proxλh(z)=z+λ,zR. 于是 p r o x λ f ( x , s ) = ( p r o x λ g ( x ) , s + λ ) . \mathrm{prox}_{\lambda f}(\mathbf{x},s)=\left(\mathrm{prox}_{\lambda g}(\mathbf{x}),s+\lambda\right). proxλf(x,s)=(proxλg(x),s+λ).因为 e p i ( g ) = L e v ( f , 0 ) \mathrm{epi}(g)=\mathrm{Lev}(f,0) epi(g)=Lev(f,0), 于是由定理12(注意到 d o m ( f ) = E \mathrm{dom}(f)=\mathbb{E} dom(f)=E)就有 P C ( ( x , s ) ) = { ( x , s ) , g ( x ) ≤ s , ( p r o x λ ∗ g ( x ) , s + λ ) , g ( x ) > s , P_C\left((\mathbf{x},s)\right)=\left\{\begin{array}{ll}(\mathbf{x},s), & g(\mathbf{x})\le s,\\\left(\mathrm{prox}_{\lambda^*g}(\mathbf{x}),s+\lambda\right), & g(\mathbf{x})>s,\end{array}\right. PC((x,s))={(x,s),(proxλg(x),s+λ),g(x)s,g(x)>s,其中 λ ∗ \lambda^* λ是函数 ψ ( λ ) = g ( p r o x λ g ( x ) ) − λ − s \psi(\lambda)=g(\mathrm{prox}_{\lambda g}(\mathbf{x}))-\lambda-s ψ(λ)=g(proxλg(x))λs的任一正根, 且 ψ \psi ψ是单调递减函数.

例17 (到Lorentz锥上的正交投影) 考虑Lorentz锥 L n = { ( x , t ) ∈ R n × R : ∥ x ∥ 2 ≤ t } L^n=\{(\mathbf{x},t)\in\mathbb{R}^n\times\mathbb{R}:\Vert\mathbf{x}\Vert_2\le t\} Ln={(x,t)Rn×R:x2t}. 下面证明对 ∀ ( x , s ) ∈ R n × R \forall(\mathbf{x},s)\in\mathbb{R}^n\times\mathbb{R} (x,s)Rn×R, P L n ( x , s ) = { ( ∥ x ∥ 2 + s 2 ∥ x ∥ 2 x , ∥ x ∥ 2 + s 2 ) , ∥ x ∥ 2 ≥ ∣ s ∣ , ( 0 , 0 ) , s < ∥ x ∥ 2 < − s , ( x , s ) , ∥ x ∥ 2 ≤ s . \boxed{P_{L^n}(\mathbf{x},s)=\left\{\begin{array}{ll}\left(\frac{\Vert\mathbf{x}\Vert_2+s}{2\Vert\mathbf{x}\Vert_2}\mathbf{x},\frac{\Vert\mathbf{x}\Vert_2+s}{2}\right), & \Vert\mathbf{x}\Vert_2\ge|s|,\\(\mathbf{0},0), & s<\Vert\mathbf{x}\Vert_2<-s,\\(\mathbf{x},s), & \Vert\mathbf{x}\Vert_2\le s.\end{array}\right.} PLn(x,s)=(2x2x2+sx,2x2+s),(0,0),(x,s),x2s,s<x2<s,x2s.直接利用定理13即得 P L n ( ( x , s ) ) = { ( x , s ) , ∥ x ∥ 2 ≤ s , ( p r o x λ ∗ ∥ ⋅ ∥ 2 ( x ) , s + λ ∗ ) , ∥ x ∥ 2 > s , P_{L^n}((\mathbf{x},s))=\left\{\begin{array}{ll}(\mathbf{x},s), & \Vert\mathbf{x}\Vert_2\le s,\\\left(\mathrm{prox}_{\lambda^*\Vert\cdot\Vert_2}(\mathbf{x}),s+\lambda^*\right), & \Vert\mathbf{x}\Vert_2>s,\end{array}\right. PLn((x,s))={(x,s),(proxλ2(x),s+λ),x2s,x2>s,其中 λ ∗ \lambda^* λ是函数 ψ ( λ ) = ∥ p r o x λ ∥ ⋅ ∥ 2 ( x ) ∥ 2 − λ − s \psi(\lambda)=\Vert\mathrm{prox}_{\lambda\Vert\cdot\Vert_2}(\mathbf{x})\Vert_2-\lambda-s ψ(λ)=proxλ2(x)2λs的任一正根. 设 ( x , s ) ∈ R n × R : ∥ x ∥ 2 > s (\mathbf{x},s)\in\mathbb{R}^n\times\mathbb{R}:\Vert\mathbf{x}\Vert_2>s (x,s)Rn×R:x2>s. 由例8, p r o x λ ∥ ⋅ ∥ 2 ( x ) = [ 1 − λ max ⁡ { ∥ x ∥ 2 , λ } ] x . \mathrm{prox}_{\lambda\Vert\cdot\Vert_2}(\mathbf{x})=\left[1-\frac{\lambda}{\max\{\Vert\mathbf{x}\Vert_2,\lambda\}}\right]\mathbf{x}. proxλ2(x)=[1max{x2,λ}λ]x.将此代入 ψ \psi ψ的表达式中推出 ψ ( λ ) = { ∥ x ∥ 2 − 2 λ − s , λ ≤ ∥ x ∥ 2 , − λ − s , λ ≥ ∥ x ∥ 2 . \psi(\lambda)=\left\{\begin{array}{ll}\Vert\mathbf{x}\Vert_2-2\lambda-s, & \lambda\le\Vert\mathbf{x}\Vert_2,\\-\lambda-s, & \lambda\ge\Vert\mathbf{x}\Vert_2.\end{array}\right. ψ(λ)={x22λs,λs,λx2,λx2.所以 ψ \psi ψ是个分段线性函数, 其唯一正根为 λ ∗ = { ∥ x ∥ 2 − s 2 , ∥ x ∥ 2 ≥ − s , − s , ∥ x ∥ 2 < − s . \lambda^*=\left\{\begin{array}{ll}\frac{\Vert\mathbf{x}\Vert_2-s}{2}, & \Vert\mathbf{x}\Vert_2\ge-s,\\-s, & \Vert\mathbf{x}\Vert_2<-s.\end{array}\right. λ={2x2s,s,x2s,x2<s.因此在 ∥ x ∥ 2 > s \Vert\mathbf{x}\Vert_2>s x2>s时, ( p r o x λ ∗ ∥ ⋅ ∥ 2 ( x ) , s + λ ∗ ) = ( [ 1 − λ ∗ max ⁡ { ∥ x ∥ 2 , λ ∗ } ] x , s + λ ∗ ) = { ( [ 1 − ∥ x ∥ 2 − s 2 ∥ x ∥ 2 ] x , ∥ x ∥ 2 + s 2 ) , ∥ x ∥ 2 ≥ − s , ( 0 , 0 ) , ∥ x ∥ 2 < − s . = { ( ∥ x ∥ 2 + s 2 ∥ x ∥ 2 x , ∥ x ∥ 2 + s 2 ) , ∥ x ∥ ≥ − s , ( 0 , 0 ) , ∥ x ∥ s < − s . \begin{aligned}\left(\mathrm{prox}_{\lambda^*\Vert\cdot\Vert_2}(\mathbf{x}),s+\lambda^*\right)&=\left(\left[1-\frac{\lambda^*}{\max\{\Vert\mathbf{x}\Vert_2,\lambda^*\}}\right]\mathbf{x},s+\lambda^*\right)\\&=\left\{\begin{array}{ll}\left(\left[1-\frac{\Vert\mathbf{x}\Vert_2-s}{2\Vert\mathbf{x}\Vert_2}\right]\mathbf{x},\frac{\Vert\mathbf{x}\Vert_2+s}{2}\right), & \Vert\mathbf{x}\Vert_2\ge-s,\\(\mathbf{0},0), & \Vert\mathbf{x}\Vert_2<-s.\end{array}\right.\\&=\left\{\begin{array}{ll}\left(\frac{\Vert\mathbf{x}\Vert_2+s}{2\Vert\mathbf{x}\Vert_2}\mathbf{x},\frac{\Vert\mathbf{x}\Vert_2+s}{2}\right), & \Vert\mathbf{x}\Vert\ge-s,\\(\mathbf{0},0), & \Vert\mathbf{x}\Vert_s<-s.\end{array}\right.\end{aligned} (proxλ2(x),s+λ)=([1max{x2,λ}λ]x,s+λ)={([12x2x2s]x,2x2+s),(0,0),x2s,x2<s.={(2x2x2+sx,2x2+s),(0,0),xs,xs<s.最后再注意到 { ( x , s ) : ∥ x ∥ 2 ≥ ∣ s ∣ } = { ( x , s ) : ∥ x ∥ > s , ∥ x ∥ 2 ≥ − s } ∪ { ( x , s ) : ∥ x ∥ 2 = s } \{(\mathbf{x},s):\Vert\mathbf{x}\Vert_2\ge|s|\}=\{(\mathbf{x},s):\Vert\mathbf{x}\Vert>s,\Vert\mathbf{x}\Vert_2\ge-s\}\cup\{(\mathbf{x},s):\Vert\mathbf{x}\Vert_2=s\} {(x,s):x2s}={(x,s):x>s,x2s}{(x,s):x2=s}即可得证.

例18 ( ℓ 1 \ell_1 1-范数上镜图上的正交投影) 设 C = { ( y , t ) ∈ R n × R : ∥ y ∥ 1 ≤ t } . C=\{(\mathbf{y},t)\in\mathbb{R}^n\times\mathbb{R}:\Vert\mathbf{y}\Vert_1\le t\}. C={(y,t)Rn×R:y1t}.直接由定理13以及对 ∀ λ > 0 \forall\lambda>0 λ>0, 有 p r o x λ ∥ ⋅ ∥ 1 = T λ \mathrm{prox}_{\lambda\Vert\cdot\Vert_1}=\mathcal{T}_{\lambda} proxλ1=Tλ, 就有 P C ( ( x , s ) ) = { ( x , s ) , ∥ x ∥ 1 ≤ s , ( T λ ∗ ( x ) , s + λ ∗ ) , ∥ x ∥ 1 > s , 其 中 λ ∗ 是 函 数 φ ( λ ) = ∥ T λ ( x ) ∥ 1 − λ − s 的 任 一 正 根 . \boxed{\begin{aligned}P_C((\mathbf{x},s))&=\left\{\begin{array}{ll}(\mathbf{x},s), & \Vert\mathbf{x}\Vert_1\le s,\\\left(\mathcal{T}_{\lambda^*}(\mathbf{x}),s+\lambda^*\right), & \Vert\mathbf{x}\Vert_1>s,\end{array}\right.\\其中\lambda^*是函数\varphi(\lambda)&=\Vert\mathcal{T}_{\lambda}(\mathbf{x})\Vert_1-\lambda-s的任一正根.\end{aligned}} PC((x,s))λφ(λ)={(x,s),(Tλ(x),s+λ),x1s,x1>s,=Tλ(x)1λs.

4.6 正交投影计算小结

集合 ( C ) (C) (C) P C ( x ) P_C(\mathbf{x}) PC(x)假设条件参考
R + n \mathbb{R}_+^n R+n [ x ] + [\mathbf{x}]_+ [x]+-引理2
Box [ ℓ , u ] \text{Box}[\mathbf{\ell,u}] Box[,u] P C ( x ) i = min ⁡ { max ⁡ { x i , ℓ i } , u i } P_C(\mathbf{x})_i=\min\{\max\{x_i,\ell_i\},u_i\} PC(x)i=min{max{xi,i},ui} ℓ i ≤ u i \ell_i\le u_i iui引理2
B ∥ ⋅ ∥ 2 [ c , r ] B_{\Vert\cdot\Vert_2}[\mathbf{c},r] B2[c,r] c + r max ⁡ { ∥ x − c ∥ 2 , r } ( x − c ) \mathbf{c}+\frac{r}{\max\{\Vert\mathbf{x-c}\Vert_2,r\}}(\mathbf{x-c}) c+max{xc2,r}r(xc) c ∈ R n   , r > 0 \mathbf{c}\in\mathbb{R}^n\,,r>0 cRn,r>0引理2
{ x : A x = b } \{\mathbf{x}:\mathbf{Ax=b}\} {x:Ax=b} x − A T ( A A T ) − 1 ( A x − b ) \mathbf{x}-\mathbf{A}^T\left(\mathbf{AA}^T\right)^{-1}(\mathbf{Ax-b}) xAT(AAT)1(Axb) A ∈ R m × n ,   b ∈ R m ,   A \mathbf{A}\in\mathbb{R}^{m\times n},\,\mathbf{b}\in\mathbb{R}^m,\,\mathbf{A} ARm×n,bRm,A行满秩引理2
{ x : a T x ≤ b } \{\mathbf{x}:\mathbf{a}^T\mathbf{x}\le b\} {x:aTxb} x − [ a T x − b ] + ∥ a ∥ 2 a \mathbf{x}-\frac{[\mathbf{a}^T\mathbf{x}-b]_+}{\Vert\mathbf{a}\Vert^2}\mathbf{a} xa2[aTxb]+a 0 ≠ a ∈ R n ,   b ∈ R \mathbf{0}\ne\mathbf{a}\in\mathbb{R}^n,\,b\in\mathbb{R} 0=aRn,bR引理2
Δ n \Delta_n Δn [ x − μ ∗ e ] + [\mathbf{x}-\mu^*\mathbf{e}]_+ [xμe]+其中 μ ∗ ∈ R \mu^*\in\mathbb{R} μR满足 e T [ x − μ ∗ e ] + = 1 \mathbf{e}^T[\mathbf{x-\mu^*e}]_+=1 eT[xμe]+=1-推论1
H a , b ∩ Box [ ℓ , u ] H_{\mathbf{a},b}\cap\text{Box}[\mathbf{\ell,u}] Ha,bBox[,u] P Box [ ℓ , u ] ( x − μ ∗ a ) P_{\text{Box}[\mathbf{\ell},\mathbf{u}]}(\mathbf{x}-\mu^*\mathbf{a}) PBox[,u](xμa)其中 μ ∗ ∈ R \mu^*\in\mathbb{R} μR满足 a T P Box [ ℓ , u ] ( x − μ a ) = b \mathbf{a}^TP_{\text{Box}[\mathbf{\ell},\mathbf{u}]}(\mathbf{x-\mu a})=b aTPBox[,u](xμa)=b a ∈ R n ∖ { 0 } ,   b ∈ R \mathbf{a}\in\mathbb{R}^n\setminus\{\mathbf{0}\},\,b\in\mathbb{R} aRn{0},bR定理11
H a , b − ∩ Box [ ℓ , u ] H_{\mathbf{a},b}^-\cap\text{Box}[\mathbf{\ell,u}] Ha,bBox[,u] { P Box [ ℓ , u ] ( x ) , a T P Box [ ℓ , u ] ( x ) ≤ b , P Box [ ℓ , u ] ( x − λ ∗ a ) , a T P Box [ ℓ , u ] ( x ) > b , \left\{\begin{array}{ll}P_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x}), & \mathbf{a}^TP_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x})\le b,\\P_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x-\lambda^*a}), & \mathbf{a}^TP_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x})>b,\end{array}\right. {PBox[,u](x),PBox[,u](xλa),aTPBox[,u](x)b,aTPBox[,u](x)>b,其中 λ ∗ > 0 \lambda^*>0 λ>0满足 a T P Box [ ℓ , u ] ( x − λ ∗ a ) = b \mathbf{a}^TP_{\text{Box}[\mathbf{\ell,u}]}(\mathbf{x-\lambda^* a})=b aTPBox[,u](xλa)=b a ∈ R n ∖ { 0 } ,   b ∈ R \mathbf{a}\in\mathbb{R}^n\setminus\{\mathbf{0}\},\,b\in\mathbb{R} aRn{0},bR例13
B ∥ ⋅ ∥ 1 [ 0 , α ] B_{\Vert\cdot\Vert_1}[\mathbf{0},\alpha] B1[0,α] { x , ∥ x ∥ 1 ≤ α , T λ ∗ ( x ) , ∥ x ∥ 1 > α , \left\{\begin{array}{ll}\mathbf{x}, & \Vert\mathbf{x}\Vert_1\le\alpha,\\\mathcal{T}_{\lambda^*}(\mathbf{x}), & \Vert\mathbf{x}\Vert_1>\alpha,\end{array}\right. {x,Tλ(x),x1α,x1>α,其中 λ ∗ > 0 \lambda^*>0 λ>0满足 ∥ T λ ∗ ( x ) ∥ 1 = α \Vert\mathcal{T}_{\lambda^*}(\mathbf{x})\Vert_1=\alpha Tλ(x)1=α α > 0 \alpha>0 α>0例14
{ x : ω T a b s ( x ) ≤ β ,   − α ≤ x ≤ α } \{\mathbf{x}:\bm{\omega}^T\mathrm{abs}(\mathbf{x})\le\beta,\,-\bm{\alpha}\le\mathbf{x}\le\bm{\alpha}\} {x:ωTabs(x)β,αxα} { P Box [ − α , α ] ( x ) , ω T a b s ( P Box [ − α , α ] ( x ) ) ≤ β , S λ ∗ ω , α ( x ) , ω T a b s ( P Box [ − α , α ] ( x ) ) > β , \left\{\begin{array}{ll}P_{\text{Box}[-\bm{\alpha},\bm{\alpha}]}(\mathbf{x}), & \bm{\omega}^T\mathrm{abs}\left(P_{\text{Box}[-\bm{\alpha},\bm{\alpha}]}(\mathbf{x})\right)\le\beta,\\\mathcal{S}_{\lambda^*\bm{\omega},\bm{\alpha}}(\mathbf{x}), & \bm{\omega}^T\mathrm{abs}\left(P_{\text{Box}[-\bm{\alpha},\bm{\alpha}]}(\mathbf{x})\right)>\beta,\end{array}\right. {PBox[α,α](x),Sλω,α(x),ωTabs(PBox[α,α](x))β,ωTabs(PBox[α,α](x))>β,其中 λ ∗ > 0 \lambda^*>0 λ>0满足 ω T a b s ( S λ ∗ ω , α ( x ) ) = β \bm{\omega}^T\mathrm{abs}\left(\mathcal{S}_{\lambda^*\bm{\omega},\bm{\alpha}}(\mathbf{x})\right)=\beta ωTabs(Sλω,α(x))=β ω ∈ R + n ,   α ∈ [ 0 , ∞ ] n ,   β ∈ R + + \bm{\omega}\in\mathbb{R}_+^n,\,\bm{\alpha}\in[0,\infty]^n,\,\beta\in\mathbb{R}_{++} ωR+n,α[0,]n,βR++例15
{ x > 0 : ∏ x i ≥ α } \{\mathbf{x}>\mathbf{0}:\prod x_i\ge\alpha\} {x>0:xiα} { x , x ∈ C , ( x j + x j 2 + 4 λ ∗ 2 ) j = 1 n , x ∉ C , \left\{\begin{array}{ll}\mathbf{x}, & \mathbf{x}\in C,\\\left(\frac{x_j+\sqrt{x_j^2+4\lambda^*}}{2}\right)_{j=1}^n, & \mathbf{x}\notin C,\end{array}\right. x,(2xj+xj2+4λ )j=1n,xC,x/C,其中 λ ∗ > 0 \lambda^*>0 λ>0满足 ∑ j = 1 n log ⁡ ( x j + x j 2 + 4 λ ∗ 2 ) = log ⁡ α \sum_{j=1}^n\log\left(\frac{x_j+\sqrt{x_j^2+4\lambda^*}}{2}\right)=\log\alpha j=1nlog(2xj+xj2+4λ )=logα α > 0 \alpha>0 α>0例16
{ ( x , s ) : ∥ x ∥ 2 ≤ s } \{(\mathbf{x},s):\Vert\mathbf{x}\Vert_2\le s\} {(x,s):x2s} { ( ∥ x ∥ 2 + s 2 ∥ x ∥ 2 x , ∥ x ∥ 2 + s 2 ) , ∥ x ∥ 2 ≥ a b s ( s ) , ( 0 , 0 ) , s < ∥ x ∥ 2 < − s , ( x , s ) , ∥ x ∥ 2 ≤ s . \left\{\begin{array}{ll}\left(\frac{\Vert\mathbf{x}\Vert_2+s}{2\Vert\mathbf{x}\Vert_2}\mathbf{x},\frac{\Vert\mathbf{x}\Vert_2+s}{2}\right), & \Vert\mathbf{x}\Vert_2\ge\mathrm{abs}(s),\\(\mathbf{0},0), & s<\Vert\mathbf{x}\Vert_2<-s,\\(\mathbf{x},s), & \Vert\mathbf{x}\Vert_2\le s.\end{array}\right. (2x2x2+sx,2x2+s),(0,0),(x,s),x2abs(s),s<x2<s,x2s.例17
{ ( x , s ) : ∥ x ∥ 1 ≤ s } \{(\mathbf{x},s):\Vert\mathbf{x}\Vert_1\le s\} {(x,s):x1s} { ( x , s ) , ∥ x ∥ 1 ≤ s , ( T λ ∗ ( x ) , s + λ ∗ ) , ∥ x ∥ 1 > s , \left\{\begin{array}{ll}(\mathbf{x},s), & \Vert\mathbf{x}\Vert_1\le s,\\\left(\mathcal{T}_{\lambda^*}(\mathbf{x}),s+\lambda^*\right), & \Vert\mathbf{x}\Vert_1>s,\end{array}\right. {(x,s),(Tλ(x),s+λ),x1s,x1>s,其中 λ ∗ > 0 \lambda^*>0 λ>0满足 ∥ T λ ∗ ( x ) ∥ 1 − λ ∗ − s = 0 \Vert\mathcal{T}_{\lambda^*}(\mathbf{x})\Vert_1-\lambda^*-s=0 Tλ(x)1λs=0例18

5. 第二临近定理

我们使用第三章的Fermat最优性条件证明第二临近定理.

定理14 (第二临近定理) 设 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常闭凸函数. 则对 ∀ x , u ∈ E \forall\mathbf{x,u}\in\mathbb{E} x,uE, 下面三件事是等价的:
(i) u = p r o x f ( x ) \mathbf{u}=\mathrm{prox}_f(\mathbf{x}) u=proxf(x);
(ii) x − u ∈ ∂ f ( u ) \mathbf{x-u}\in\partial f(\mathbf{u}) xuf(u);
(iii) ⟨ x − u , y − u ⟩ ≤ f ( y ) − f ( u ) ,   ∀ y ∈ E \langle\mathbf{x-u,y-u}\rangle\le f(\mathbf{y})-f(\mathbf{u}),\,\forall\mathbf{y}\in\mathbb{E} xu,yuf(y)f(u),yE.

证明: 由定义, u = p r o x f ( x ) \mathbf{u}=\mathrm{prox}_f(\mathbf{x}) u=proxf(x)当且仅当 u \mathbf{u} u为以下问题的最优解: min ⁡ v { f ( v ) + 1 2 ∥ v − x ∥ 2 } , \min_{\mathbf{v}}\left\{f(\mathbf{v})+\frac{1}{2}\Vert\mathbf{v-x}\Vert^2\right\}, vmin{f(v)+21vx2},根据第三章的Fermat最优性条件以及次微分的求和运算(定理15), 这等价于 0 ∈ ∂ f ( u ) + u − x . \mathbf{0}\in\partial f(\mathbf{u})+\mathbf{u-x}. 0f(u)+ux.因此(i)(ii)等价. 而由次梯度的定义, 就有(ii)(iii)等价.

第二临近定理的一个直接推论是, 对于一个正常闭凸函数, x = p r o x f ( x ) \mathbf{x}=\mathrm{prox}_f(\mathbf{x}) x=proxf(x)当且仅当 x \mathbf{x} x f f f的全局极小点.

推论2 f f f为一正常闭凸函数. 则 x \mathbf{x} x f f f的全局极小点当且仅当 x = p r o x f ( x ) \mathbf{x}=\mathrm{prox}_f(\mathbf{x}) x=proxf(x).

证明: x \mathbf{x} x f f f的全局极小点当且仅当 0 ∈ ∂ f ( x ) \mathbf{0}\in\partial f(\mathbf{x}) 0f(x), 也即当且仅当 x − x ∈ ∂ f ( x ) \mathbf{x-x}\in\partial f(\mathbf{x}) xxf(x). 由第二临近定理中(i)(ii)的等价性, 就等价于 x = p r o x f ( x ) \mathbf{x}=\mathrm{prox}_f(\mathbf{x}) x=proxf(x).

f = δ C f=\delta_C f=δC, 其中 C C C为非空闭凸集, 则由第二临近定理中(i)(iii)的等价性就可以推出第二投影定理.

定理15 (第二投影定理) 设 C ⊂ E C\subset\mathbb{E} CE为非空闭凸集, u ∈ C \mathbf{u}\in C uC. 则 u = P C ( x ) \mathbf{u}=P_C(\mathbf{x}) u=PC(x)当且仅当 ⟨ x − u , y − u ⟩ ≤ 0 , ∀ y ∈ C . \langle\mathbf{x-u,y-u}\rangle\le0,\quad\forall\mathbf{y}\in C. xu,yu0,yC.
这就是说, u \mathbf{u} u x \mathbf{x} x C C C中的投影当且仅当 x − u \mathbf{x-u} xu与所有的 y − u ,   y ∈ C \mathbf{y-u},\,\mathbf{y}\in C yu,yC都成钝角.

第二临近定理的另一个直接推论是临近算子的严格非增大性. 它的特例是第五章的定理1.

定理16 (临近算子的严格非增大性) 设 f f f为正常闭凸函数. 则对 ∀ x , y ∈ E \forall\mathbf{x,y}\in\mathbb{E} x,yE,
(i) (严格非增大性) ⟨ x − y , p r o x f ( x ) − p r o x f ( y ) ⟩ ≥ ∥ p r o x f ( x ) − p r o x f ( y ) ∥ 2 ; \langle\mathbf{x-y},\mathrm{prox}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y})\rangle\ge\Vert\mathrm{prox}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y})\Vert^2; xy,proxf(x)proxf(y)proxf(x)proxf(y)2;(ii) (非增大性) ∥ p r o x f ( x ) − p r o x f ( y ) ∥ ≤ ∥ x − y ∥ . \Vert\mathrm{prox}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y})\Vert\le\Vert\mathbf{x-y}\Vert. proxf(x)proxf(y)xy.

证明: (i) 记 u = p r o x f ( x ) ,   v = p r o x f ( y ) \mathbf{u}=\mathrm{prox}_f(\mathbf{x}),\,\mathbf{v}=\mathrm{prox}_f(\mathbf{y}) u=proxf(x),v=proxf(y). 由第二临近定理中的(i)(ii)的等价性, x − u ∈ ∂ f ( u ) ,   y − v ∈ ∂ f ( v ) . \mathbf{x-u}\in\partial f(\mathbf{u}),\,\mathbf{y-v}\in\partial f(\mathbf{v}). xuf(u),yvf(v).由次梯度不等式, f ( v ) ≥ f ( u ) + ⟨ x − u , v − u ⟩ , f ( u ) ≥ f ( v ) + ⟨ y − v , u − v ⟩ . \begin{aligned}f(\mathbf{v})&\ge f(\mathbf{u})+\langle\mathbf{x-u,v-u}\rangle,\\f(\mathbf{u})&\ge f(\mathbf{v})+\langle\mathbf{y-v,u-v}\rangle.\end{aligned} f(v)f(u)f(u)+xu,vu,f(v)+yv,uv.二者相加可得 0 ≥ ⟨ y − x + u − v , u − v ⟩ ⇒ ⟨ x − y , u − v ⟩ ≥ ∥ u − v ∥ 2 , 0\ge\langle\mathbf{y-x+u-v,u-v}\rangle\Rightarrow\langle\mathbf{x-y,u-v}\rangle\ge\Vert\mathbf{u-v}\Vert^2, 0yx+uv,uvxy,uvuv2,此即 ⟨ x − y , p r o x f ( x ) − p r o x f ( y ) ⟩ ≥ ∥ p r o x f ( x ) − p r o x f ( y ) ∥ 2 . \langle\mathbf{x-y},\mathrm{prox}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y})\rangle\ge\Vert\mathrm{prox}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y})\Vert^2. xy,proxf(x)proxf(y)proxf(x)proxf(y)2.

(ii) 若 p r o x f ( x ) = p r o x f ( y ) \mathrm{prox}_f(\mathbf{x})=\mathrm{prox}_f(\mathbf{y}) proxf(x)=proxf(y), 结论显然成立. 现假设 p r o x f ( x ) ≠ p r o x f ( y ) \mathrm{prox}_f(\mathbf{x})\ne\mathrm{prox}_f(\mathbf{y}) proxf(x)=proxf(y). 由(i)与Cauchy-Schwarz不等式, 就有 ∥ p r o f f ( x ) − p r o x f ( y ) ∥ 2 ≤ ⟨ p r o x f ( x ) − p r o x f ( y ) , x − y ⟩ ≤ ∥ p r o x f ( x ) − p r o x f ( y ) ∥ ⋅ ∥ x − y ∥ . \begin{aligned}\Vert\mathrm{prof}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y})\Vert^2&\le\langle\mathrm{prox}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y}),\mathbf{x-y}\rangle\\&\le\Vert\mathrm{prox}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y})\Vert\cdot\Vert\mathbf{x-y}\Vert.\end{aligned} proff(x)proxf(y)2proxf(x)proxf(y),xyproxf(x)proxf(y)xy.两边同除 ∥ p r o x f ( x ) − p r o x f ( y ) ∥ \Vert\mathrm{prox}_f(\mathbf{x})-\mathrm{prox}_f(\mathbf{y})\Vert proxf(x)proxf(y)即得证.

下面的引理讨论如何计算到一个非空闭凸集合的距离函数的prox. 引理的证明要用到第二临近定理和第二投影定理.

引理3 (距离函数的prox) 设 C ⊂ E C\subset\mathbb{E} CE为一非空闭凸集, λ > 0 \lambda>0 λ>0. 则对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, p r o x λ d C ( x ) = { ( 1 − θ ) x + θ P C ( x ) , d C ( x ) > λ , P C ( x ) , d C ( x ) ≤ λ , \mathrm{prox}_{\lambda d_C}(\mathbf{x})=\left\{\begin{array}{ll}(1-\theta)\mathbf{x}+\theta P_C(\mathbf{x}), & d_C(\mathbf{x})>\lambda,\\P_C(\mathbf{x}), & d_C(\mathbf{x})\le\lambda,\end{array}\right. proxλdC(x)={(1θ)x+θPC(x),PC(x),dC(x)>λ,dC(x)λ,其中8 θ = λ d C ( x ) . \theta=\frac{\lambda}{d_C(\mathbf{x})}. θ=dC(x)λ.
证明: 设 u = p r o x λ d C ( x ) \mathbf{u}=\mathrm{prox}_{\lambda d_C}(\mathbf{x}) u=proxλdC(x). 由第二临近定理, x − u ∈ λ ∂ d C ( u ) . \mathbf{x-u}\in\lambda\partial d_C(\mathbf{u}). xuλdC(u).下面分两种情况讨论.

  • 情形一: u ∉ C \mathbf{u}\notin C u/C. 根据第三章例16, 就有 x − u = λ u − P C ( u ) d C ( u ) . \mathbf{x-u}=\lambda\frac{\mathbf{u}-P_C(\mathbf{u})}{d_C(\mathbf{u})}. xu=λdC(u)uPC(u). α = λ d C ( u ) \alpha=\frac{\lambda}{d_C(\mathbf{u})} α=dC(u)λ, 于是 u = 1 α + 1 x + α α + 1 P C ( u ) \mathbf{u}=\frac{1}{\alpha+1}\mathbf{x}+\frac{\alpha}{\alpha+1}P_C(\mathbf{u}) u=α+11x+α+1αPC(u)或者 x − P C ( u ) = ( α + 1 ) ( u − P C ( u ) ) . \mathbf{x}-P_C(\mathbf{u})=(\alpha+1)(\mathbf{u}-P_C(\mathbf{u})). xPC(u)=(α+1)(uPC(u)).由第二投影定理, P C ( u ) = P C ( x ) P_C(\mathbf{u})=P_C(\mathbf{x}) PC(u)=PC(x)当且仅当 ⟨ x − P C ( u ) , y − P C ( u ) ⟩ ≤ 0 , ∀ y ∈ C . \langle\mathbf{x}-P_C(\mathbf{u}),\mathbf{y}-P_C(\mathbf{u})\rangle\le0,\quad\forall\mathbf{y}\in C. xPC(u),yPC(u)0,yC.代入 x − P C ( u ) \mathbf{x}-P_C(\mathbf{u}) xPC(u), 就等价于 ( α + 1 ) ⟨ u − P C ( u ) , y − P C ( u ) ⟩ ≤ 0 , ∀ y ∈ C , (\alpha+1)\langle\mathbf{u}-P_C(\mathbf{u}),\mathbf{y}-P_C(\mathbf{u})\rangle\le0,\quad\forall\mathbf{y}\in C, (α+1)uPC(u),yPC(u)0,yC,而由第二投影定理, 这个不等式是成立的. 因此 P C ( u ) = P C ( x ) P_C(\mathbf{u})=P_C(\mathbf{x}) PC(u)=PC(x). 所以 d C ( x ) = ∥ x − P C ( x ) ∥ = ∥ x − P C ( u ) ∥ = ( α + 1 ) ∥ u − P C ( u ) ∥ = ( α + 1 ) d C ( u ) = d C ( u ) + λ ( > λ ) , d_C(\mathbf{x})=\Vert\mathbf{x}-P_C(\mathbf{x})\Vert=\Vert\mathbf{x}-P_C(\mathbf{u})\Vert=(\alpha+1)\Vert\mathbf{u}-P_C(\mathbf{u})\Vert=(\alpha+1)d_C(\mathbf{u})=d_C(\mathbf{u})+\lambda(>\lambda), dC(x)=xPC(x)=xPC(u)=(α+1)uPC(u)=(α+1)dC(u)=dC(u)+λ(>λ),且有 1 α + 1 = d C ( u ) λ + d C ( u ) = d C ( x ) − λ d C ( x ) = 1 − θ . \frac{1}{\alpha+1}=\frac{d_C(\mathbf{u})}{\lambda+d_C(\mathbf{u})}=\frac{d_C(\mathbf{x})-\lambda}{d_C(\mathbf{x})}=1-\theta. α+11=λ+dC(u)dC(u)=dC(x)dC(x)λ=1θ.于是 p r o x λ d C ( x ) = ( 1 − θ ) x + θ P C ( x ) . \mathrm{prox}_{\lambda d_C}(\mathbf{x})=(1-\theta)\mathbf{x}+\theta P_C(\mathbf{x}). proxλdC(x)=(1θ)x+θPC(x).
  • 情形二: u ∈ C \mathbf{u}\in C uC. 下证 u = P C ( x ) \mathbf{u}=P_C(\mathbf{x}) u=PC(x). 为此, 设 v ∈ C \mathbf{v}\in C vC. 由于 u = p r o x λ d C ( x ) \mathbf{u}=\mathrm{prox}_{\lambda d_C}(\mathbf{x}) u=proxλdC(x), 所以 λ d C ( u ) + 1 2 ∥ u − x ∥ 2 ≤ λ d C ( v ) + 1 2 ∥ v − x ∥ 2 , \lambda d_C(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\le\lambda d_C(\mathbf{v})+\frac{1}{2}\Vert\mathbf{v-x}\Vert^2, λdC(u)+21ux2λdC(v)+21vx2,因为 d C ( u ) = d C ( v ) = 0 d_C(\mathbf{u})=d_C(\mathbf{v})=0 dC(u)=dC(v)=0, 进一步有 ∥ u − x ∥ ≤ ∥ v − x ∥ . \Vert\mathbf{u-x}\Vert\le\Vert\mathbf{v-x}\Vert. uxvx.因此, u = arg ⁡ min ⁡ v ∈ C ∥ v − x ∥ = P C ( x ) . \mathbf{u}=\arg\min_{\mathbf{v}\in C}\Vert\mathbf{v-x}\Vert=P_C(\mathbf{x}). u=argvCminvx=PC(x).同样根据第三章例16, 此时最优性条件变为 x − P C ( x ) λ ∈ N C ( u ) ∩ B [ 0 , 1 ] , \frac{\mathbf{x}-P_C(\mathbf{x})}{\lambda}\in N_C(\mathbf{u})\cap B[\mathbf{0},1], λxPC(x)NC(u)B[0,1],特别地, ∥ x − P C ( x ) λ ∥ ≤ 1 ⇒ d C ( x ) = ∥ P C ( x ) − x ∥ ≤ λ . \left\Vert\frac{\mathbf{x}-P_C(\mathbf{x})}{\lambda}\right\Vert\le1\Rightarrow d_C(\mathbf{x})=\Vert P_C(\mathbf{x})-\mathbf{x}\Vert\le\lambda. λxPC(x)1dC(x)=PC(x)xλ.
    由于情形一、二分别对应 d C ( x ) > λ , d C ( x ) ≤ λ d_C(\mathbf{x})>\lambda,d_C(\mathbf{x})\le\lambda dC(x)>λ,dC(x)λ, 因此得证.

6. Moreau分解

临近算子的一个重要性质是Moreau分解定理. 该定理将正常闭凸函数的临近算子和它们共轭函数的临近算子联结起来了.

定理17 (Moreau分解) 设 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为正常闭凸函数. 则对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, p r o x f ( x ) + p r o x f ∗ ( x ) = x . \mathrm{prox}_f(\mathbf{x})+\mathrm{prox}_{f^*}(\mathbf{x})=\mathbf{x}. proxf(x)+proxf(x)=x.
证明: 设 x ∈ E \mathbf{x}\in\mathbb{E} xE, 记 u = p r o x f ( x ) \mathbf{u}=\mathrm{prox}_f(\mathbf{x}) u=proxf(x). 由第二临近定理, x − u ∈ ∂ f ( u ) \mathbf{x-u}\in\partial f(\mathbf{u}) xuf(u); 再由共轭次梯度定理, 这等价于 u ∈ ∂ f ∗ ( x − u ) \mathbf{u}\in\partial f^*(\mathbf{x-u}) uf(xu). 再次由第二临近定理, x − u = p r o x f ∗ ( x ) \mathbf{x-u}=\mathrm{prox}_{f^*}(\mathbf{x}) xu=proxf(x). 因此, p r o x f ( x ) + p r o x f ∗ ( x ) = u + ( x − u ) = x . \mathrm{prox}_f(\mathbf{x})+\mathrm{prox}_{f^*}(\mathbf{x})=\mathbf{u}+(\mathbf{x-u})=\mathbf{x}. proxf(x)+proxf(x)=u+(xu)=x.

定理18 (推广的Moreau分解) 设 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为正常闭凸函数, λ > 0 \lambda>0 λ>0. 则对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, p r o x λ f ( x ) + λ p r o x λ − 1 f ∗ ( x / λ ) = x . \mathrm{prox}_{\lambda f}(\mathbf{x})+\lambda\mathrm{prox}_{\lambda^{-1}f^*}(\mathbf{x}/\lambda)=\mathbf{x}. proxλf(x)+λproxλ1f(x/λ)=x.

证明: 由Moreau分解, 对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, p r o x λ f ( x ) = x − p r o x ( λ f ) ∗ ( x ) = 第 四 章 定 理 7 x − p r o x λ f ∗ ( ⋅ / λ ) ( x ) . \mathrm{prox}_{\lambda f}(\mathbf{x})=\mathbf{x}-\mathrm{prox}_{(\lambda f)^*}(\mathbf{x})\overset{第四章定理7}{=}\mathbf{x}-\mathrm{prox}_{\lambda f^*(\cdot/\lambda)}(\mathbf{x}). proxλf(x)=xprox(λf)(x)=7xproxλf(/λ)(x).由定理5, p r o x λ f ∗ ( ⋅ / λ ) ( x ) = λ p r o x λ − 1 f ∗ ( x / λ ) . \mathrm{prox}_{\lambda f^*(\cdot/\lambda)}(\mathbf{x})=\lambda\mathrm{prox}_{\lambda^{-1}f^*}(\mathbf{x}/\lambda). proxλf(/λ)(x)=λproxλ1f(x/λ).结合上式, 即得证.

6.1 支撑函数

利用Moreau分解, 我们可推导出计算给定非空闭凸集的支撑函数的prox公式.

定理19 (支撑函数的prox) 设 C ⊂ E C\subset\mathbb{E} CE为非空闭凸集, λ > 0 \lambda>0 λ>0. 则对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, p r o x λ σ C ( x ) = x − λ P C ( x / λ ) . \mathrm{prox}_{\lambda\sigma_C}(\mathbf{x})=\mathbf{x}-\lambda P_C(\mathbf{x}/\lambda). proxλσC(x)=xλPC(x/λ).

证明: 注意到 ( σ C ) ∗ = δ C (\sigma_C)^*=\delta_C (σC)=δC(第四章例3), 直接应用推广的Moreau分解即可.

例19 (范数的prox) 设 f : E → R f:\mathbb{E}\to\mathbb{R} f:ER定义为 f ( x ) = λ ∥ x ∥ α f(\mathbf{x})=\lambda\Vert\mathbf{x}\Vert_{\alpha} f(x)=λxα, 其中 λ > 0 \lambda>0 λ>0, ∥ ⋅ ∥ α \Vert\cdot\Vert_{\alpha} α E \mathbb{E} E上的任一范数. 根据第二章例12, 我们知道 ∥ x ∥ α = σ C ( x ) , \Vert\mathbf{x}\Vert_{\alpha}=\sigma_C(\mathbf{x}), xα=σC(x),其中 C = B ∥ ⋅ ∥ α , ∗ [ 0 , 1 ] = { x ∈ E : ∥ x ∥ α , ∗ ≤ 1 } , C=B_{\Vert\cdot\Vert_{\alpha,*}}[\mathbf{0},1]=\{\mathbf{x}\in\mathbb{E}:\Vert\mathbf{x}\Vert_{\alpha,*}\le1\}, C=Bα,[0,1]={xE:xα,1}, ∥ ⋅ ∥ α , ∗ \Vert\cdot\Vert_{\alpha,*} α, ∥ ⋅ ∥ α \Vert\cdot\Vert_{\alpha} α的对偶范数. 由定理19, 就有 p r o x λ ∥ ⋅ ∥ α ( x ) = x − λ P B ∥ ⋅ ∥ α , ∗ [ 0 , 1 ] ( x / λ ) . \boxed{\mathrm{prox}_{\lambda\Vert\cdot\Vert_{\alpha}}(\mathbf{x})=\mathbf{x}-\lambda P_{B_{\Vert\cdot\Vert_{\alpha,*}}[\mathbf{0},1]}(\mathbf{x}/\lambda).} proxλα(x)=xλPBα,[0,1](x/λ).

例20 ( ℓ ∞ \ell_{\infty} -范数的prox) 直接由例19, 对 ∀ λ > 0 ,   x ∈ R n \forall\lambda>0,\,\mathbf{x}\in\mathbb{R}^n λ>0,xRn, p r o x λ ∥ ⋅ ∥ ∞ ( x ) = x − λ P B ∥ ⋅ ∥ 1 [ 0 , 1 ] ( x / λ ) . \boxed{\mathrm{prox}_{\lambda\Vert\cdot\Vert_{\infty}}(\mathbf{x})=\mathbf{x}-\lambda P_{B_{\Vert\cdot\Vert_1}[\mathbf{0},1]}(\mathbf{x}/\lambda).} proxλ(x)=xλPB1[0,1](x/λ).由例14, 到 ℓ 1 \ell_1 1球上的正交投影可通过求一单调递减一维函数的根得到.

例21 (极大函数的prox) 考虑极大函数 g : R n → R g:\mathbb{R}^n\to\mathbb{R} g:RnR定义为 g ( x ) = max ⁡ ( x ) ≡ max ⁡ { x 1 , x 2 , … , x n } g(\mathbf{x})=\max(\mathbf{x})\equiv\max\{x_1,x_2,\ldots,x_n\} g(x)=max(x)max{x1,x2,,xn}. 根据第二章例7, max ⁡ ( x ) = σ Δ n ( x ) . \max(\mathbf{x})=\sigma_{\Delta_n}(\mathbf{x}). max(x)=σΔn(x).因此由定理19, 对 ∀ λ > 0 ,   x ∈ R n \forall\lambda>0,\,\mathbf{x}\in\mathbb{R}^n λ>0,xRn, p r o x λ max ⁡ ( ⋅ ) ( x ) = x − λ P Δ n ( x / λ ) . \boxed{\mathrm{prox}_{\lambda\max(\cdot)}(\mathbf{x})=\mathbf{x}-\lambda P_{\Delta_n}(\mathbf{x}/\lambda).} proxλmax()(x)=xλPΔn(x/λ).到单位单纯形上的正交投影计算可见推论1.

例22 ( k k k个最大分量求和函数的prox) 设 f : R n → R f:\mathbb{R}^n\to\mathbb{R} f:RnR定义为 f ( x ) = x [ 1 ] + x [ 2 ] + ⋯ + x [ k ] , f(\mathbf{x})=x_{[1]}+x_{[2]}+\cdots+x_{[k]}, f(x)=x[1]+x[2]++x[k],其中 k ∈ { 1 , 2 , … , n } k\in\{1,2,\ldots,n\} k{1,2,,n}; 对 ∀ i \forall i i, x [ i ] x_{[i]} x[i]表示 x \mathbf{x} x分量中第 i i i大的分量. 不难证明, f = σ C f=\sigma_C f=σC, 其中 C = { y ∈ R n : e T y = k ,   0 ≤ y ≤ e } . C=\{\mathbf{y}\in\mathbb{R}^n:\mathbf{e}^T\mathbf{y}=k,\,\mathbf{0}\le\mathbf{y}\le\mathbf{e}\}. C={yRn:eTy=k,0ye}.事实上, 对 ∀ x ∈ R n ,   y ∈ C \forall\mathbf{x}\in\mathbb{R}^n,\,\mathbf{y}\in C xRn,yC, ∑ i = 1 n x i y i = ∑ i = 1 k x [ i ] y [ i ] + ∑ i = k + 1 n x [ i ] y [ i ] = ∑ i = 1 k x [ i ] y [ i ] + ∑ i = k + 1 n y [ i ] [ ∑ i = k + 1 n x [ i ] ( y [ i ] ∑ j = k + 1 n y [ j ] ) ] ≤ ∑ i = 1 k x [ i ] y [ i ] + ∑ i = k + 1 n y [ i ] x [ k + 1 ] = k [ ∑ i = 1 k x [ i ] ( 1 k y [ i ] ) + t k x [ k + 1 ] ] ( 记 ∑ i = k + 1 n y [ i ] ≜ t , 则 ∑ i = 1 k y [ i ] = k − t ) . \begin{aligned}\sum_{i=1}^nx_iy_i&=\sum_{i=1}^kx_{[i]}y_{[i]}+\sum_{i=k+1}^nx_{[i]}y_{[i]}\\&=\sum_{i=1}^kx_{[i]}y_{[i]}+\sum_{i=k+1}^ny_{[i]}\left[\sum_{i=k+1}^nx_{[i]}\left(\frac{y_{[i]}}{\sum_{j=k+1}^ny_{[j]}}\right)\right]\\&\le\sum_{i=1}^kx_{[i]}y_{[i]}+\sum_{i=k+1}^ny_{[i]}x_{[k+1]}\\&=k\left[\sum_{i=1}^kx_{[i]}\left(\frac{1}{k}y_{[i]}\right)+\frac{t}{k}x_{[k+1]}\right]\left(记\sum_{i=k+1}^ny_{[i]}\triangleq t, 则\sum_{i=1}^ky_{[i]}=k-t\right).\end{aligned} i=1nxiyi=i=1kx[i]y[i]+i=k+1nx[i]y[i]=i=1kx[i]y[i]+i=k+1ny[i][i=k+1nx[i](j=k+1ny[j]y[i])]i=1kx[i]y[i]+i=k+1ny[i]x[k+1]=k[i=1kx[i](k1y[i])+ktx[k+1]](i=k+1ny[i]t,i=1ky[i]=kt).下证对 ∀ t ∈ [ 0 , 1 ] \forall t\in[0,1] t[0,1], ∑ i = 1 k x [ i ] ( 1 k y [ i ] ) + t k x [ k + 1 ] ≤ 1 k ∑ i = 1 k x [ i ] , \sum_{i=1}^kx_{[i]}\left(\frac{1}{k}y_{[i]}\right)+\frac{t}{k}x_{[k+1]}\le\frac{1}{k}\sum_{i=1}^kx_{[i]}, i=1kx[i](k1y[i])+ktx[k+1]k1i=1kx[i],从而完成了证明. 而 ∑ i = 1 k x [ i ] 1 − y [ i ] k ≥ y [ i ] ≤ 1 x [ k ] ( 1 − 1 + t k ) ≥ t k x [ k + 1 ] . \sum_{i=1}^kx_{[i]}\frac{1-y_{[i]}}{k}\overset{y_{[i]}\le 1}{\ge}x_{[k]}\left(1-1+\frac{t}{k}\right)\ge\frac{t}{k}x_{[k+1]}. i=1kx[i]k1y[i]y[i]1x[k](11+kt)ktx[k+1].移项后即得证. 所以 σ C ( x ) = max ⁡ y ∈ C ⟨ y , x ⟩ ≤ 1 k ∑ i = 1 k x [ i ] . \sigma_C(\mathbf{x})=\max_{\mathbf{y}\in C}\langle\mathbf{y,x}\rangle\le\frac{1}{k}\sum_{i=1}^kx_{[i]}. σC(x)=yCmaxy,xk1i=1kx[i].而右端上界显然可以取到. 所以不等号变等号, σ C = f \sigma_C=f σC=f. 因此由定理19, 对 ∀ x ∈ R n \forall\mathbf{x}\in\mathbb{R}^n xRn, p r o x λ f ( x ) = x − λ P { y : e T y = k ,   0 ≤ y ≤ e } ( x / λ ) . \boxed{\mathrm{prox}_{\lambda f}(\mathbf{x})=\mathbf{x}-\lambda P_{\{\mathbf{y}:\mathbf{e}^T\mathbf{y}=k,\,\mathbf{0\le y\le e}\}}(\mathbf{x}/\lambda).} proxλf(x)=xλP{y:eTy=k,0ye}(x/λ).其中正交投影的计算可见定理11.

例23 ( k k k个模最大分量求和函数的prox) 设 f : R n → R f:\mathbb{R}^n\to\mathbb{R} f:RnR定义为 f ( x ) = ∑ i = 1 k ∣ x ⟨ i ⟩ ∣ , f(\mathbf{x})=\sum_{i=1}^k\left|x_{\langle i\rangle}\right|, f(x)=i=1kxi,其中 k ∈ { 1 , 2 , … , n } k\in\{1,2,\ldots,n\} k{1,2,,n}, x ⟨ i ⟩ x_{\langle i\rangle} xi x \mathbf{x} x分量中模第 i i i大的分量. 类似于例22, 可以证明 f ( x ) = max ⁡ { ∑ i = 1 n z i x i : ∥ z ∥ 1 ≤ k ,   − e ≤ z ≤ e } . f(\mathbf{x})=\max\left\{\sum_{i=1}^nz_ix_i:\Vert\mathbf{z}\Vert_1\le k,\,\mathbf{-e\le z\le e}\right\}. f(x)=max{i=1nzixi:z1k,eze}.因此 f = σ C f=\sigma_C f=σC, 其中 C = { z ∈ R n : ∥ z ∥ 1 ≤ k ,   − e ≤ z ≤ e } . C=\{\mathbf{z}\in\mathbb{R}^n:\Vert\mathbf{z}\Vert_1\le k,\,\mathbf{-e\le z\le e}\}. C={zRn:z1k,eze}.因此由定理19, 对 ∀ x ∈ R n \forall\mathbf{x}\in\mathbb{R}^n xRn, p r o x λ f ( x ) = x − λ P { y : ∥ y ∥ 1 ≤ k ,   − e ≤ y ≤ e } ( x / λ ) . \boxed{\mathrm{prox}_{\lambda f}(\mathbf{x})=\mathbf{x}-\lambda P_{\{\mathbf{y}:\Vert\mathbf{y}\Vert_1\le k,\,\mathbf{-e\le y\le e}\}}(\mathbf{x}/\lambda).} proxλf(x)=xλP{y:y1k,eye}(x/λ).其中正交投影的计算可见例15.

7. Moreau包络

7.1 定义与基本性质

定义2 (Moreau包络) 给定正常闭凸函数 f : E → ( − ∞ , ∞ ] ,   μ > 0 f:\mathbb{E}\to(-\infty,\infty],\,\mu>0 f:E(,],μ>0, f f f的Moreau包络是函数 M f μ ( x ) = min ⁡ u ∈ E { f ( u ) + 1 2 μ ∥ x − u ∥ 2 } . M_f^{\mu}(\mathbf{x})=\min_{\mathbf{u}\in\mathbb{E}}\left\{f(\mathbf{u})+\frac{1}{2\mu}\Vert\mathbf{x-u}\Vert^2\right\}. Mfμ(x)=uEmin{f(u)+2μ1xu2}.这里 μ \mu μ称作光滑参数(smoothing parameter)9. 在下一小节我们会给出关于这一术语的解释. 由第一临近定理, Moreau包络定义中的极小化问题有唯一解, 即 p r o x μ f ( x ) \mathrm{prox}_{\mu f}(\mathbf{x}) proxμf(x). 因此 M f μ ( x ) M_f^{\mu}(\mathbf{x}) Mfμ(x)总是一个实数: M f μ ( x ) = f ( p r o x μ f ( x ) ) + 1 2 μ ∥ x − p r o x μ f ( x ) ∥ 2 . M_f^{\mu}(\mathbf{x})=f(\mathrm{prox}_{\mu f}(\mathbf{x}))+\frac{1}{2\mu}\Vert\mathbf{x}-\mathrm{prox}_{\mu f}(\mathbf{x})\Vert^2. Mfμ(x)=f(proxμf(x))+2μ1xproxμf(x)2.

例24 (指示函数的Moreau包络) 设 f = δ C f=\delta_C f=δC, 其中 C ⊂ E C\subset\mathbb{E} CE为一非空闭凸集. 于是 p r o x μ f ( x ) = P C ( x ) \mathrm{prox}_{\mu f}(\mathbf{x})=P_C(\mathbf{x}) proxμf(x)=PC(x). 因此对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, M δ C μ ( x ) = δ C ( P C ( x ) ) + 1 2 μ ∥ x − P C ( x ) ) ∥ 2 = 1 2 μ d C 2 ( x ) . \boxed{M_{\delta_C}^{\mu}(\mathbf{x})=\delta_C(P_C(\mathbf{x}))+\frac{1}{2\mu}\Vert\mathbf{x}-P_C(\mathbf{x}))\Vert^2=\frac{1}{2\mu}d_C^2(\mathbf{x}).} MδCμ(x)=δC(PC(x))+2μ1xPC(x))2=2μ1dC2(x).

下例中我们将说明欧式范数的Moreau包络是Huber函数, 其定义为 H μ ( x ) = { 1 2 μ ∥ x ∥ 2 , ∥ x ∥ ≤ μ , ∥ x ∥ − μ 2 , ∥ x ∥ > μ . H_{\mu}(\mathbf{x})=\left\{\begin{array}{ll}\frac{1}{2\mu}\Vert\mathbf{x}\Vert^2, & \Vert\mathbf{x}\Vert\le\mu,\\\Vert\mathbf{x}\Vert-\frac{\mu}{2}, & \Vert\mathbf{x}\Vert>\mu.\end{array}\right. Hμ(x)={2μ1x2,x2μ,xμ,x>μ.一维Huber函数的图像见下图. 从图中可见随着 μ \mu μ变得越大, 函数也变得越光滑.
在这里插入图片描述

例25 (欧式范数的Moreau包络——Huber函数) 设 f : E → R f:\mathbb{E}\to\mathbb{R} f:ER f ( x ) = ∥ x ∥ f(\mathbf{x})=\Vert\mathbf{x}\Vert f(x)=x. 由例8, 对 ∀ x ∈ E ,   μ > 0 \forall\mathbf{x}\in\mathbb{E},\,\mu>0 xE,μ>0, p r o x μ f ( x ) = ( 1 − μ max ⁡ { ∥ x ∥ , μ } ) x . \mathrm{prox}_{\mu f}(\mathbf{x})=\left(1-\frac{\mu}{\max\{\Vert\mathbf{x}\Vert,\mu\}}\right)\mathbf{x}. proxμf(x)=(1max{x,μ}μ)x.因此, M ∥ ⋅ ∥ μ ( x ) = ∥ p r o x μ f ( x ) ∥ + 1 2 μ ∥ x − p r o x μ f ( x ) ∥ 2 = { 1 2 μ ∥ x ∥ 2 , ∥ x ∥ ≤ μ , ∥ x ∥ − μ 2 , ∥ x ∥ > μ = H μ ( x ) . \boxed{M_{\Vert\cdot\Vert}^{\mu}(\mathbf{x})=\Vert\mathrm{prox}_{\mu f}(\mathbf{x})\Vert+\frac{1}{2\mu}\Vert\mathbf{x}-\mathrm{prox}_{\mu f}(\mathbf{x})\Vert^2=\left\{\begin{array}{ll}\frac{1}{2\mu}\Vert\mathbf{x}\Vert^2, & \Vert\mathbf{x}\Vert\le\mu,\\\Vert\mathbf{x}\Vert-\frac{\mu}{2}, & \Vert\mathbf{x}\Vert>\mu\end{array}\right.=H_{\mu}(\mathbf{x}).} Mμ(x)=proxμf(x)+2μ1xproxμf(x)2={2μ1x2,x2μ,xμ,x>μ=Hμ(x).

注意到Moreau包络实际上就是 f f f与函数 ω μ ( x ) = 1 2 μ ∥ x ∥ 2 \omega_{\mu}(\mathbf{x})=\frac{1}{2\mu}\Vert\mathbf{x}\Vert^2 ωμ(x)=2μ1x2的极小卷积, 即 M f μ = f □ ω μ . M_f^{\mu}=f\square\omega_{\mu}. Mfμ=fωμ.因此根据第二章的定理8, 若 f f f正常闭凸(实际上闭性是不需要的), 则 M f μ M_f^{\mu} Mfμ是凸函数.

定理20 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常闭凸函数, ω μ \omega_{\mu} ωμ定义如上, μ > 0 \mu>0 μ>0. 则
(i) M f μ = f □ ω μ M_f^{\mu}=f\square\omega_{\mu} Mfμ=fωμ;
(ii) M f μ : E → R M_f^{\mu}:\mathbb{E}\to\mathbb{R} Mfμ:ER是实值凸函数.

再根据极小卷积与共轭运算的关系, 由定理20和第四章定理9, 我们就可以推出Moreau包络共轭函数的表达式.

推论3 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常闭凸函数, ω μ \omega_{\mu} ωμ定义如上, μ > 0 \mu>0 μ>0. 则 ( M f μ ) ∗ = f ∗ + ω 1 μ , (M_f^{\mu})^*=f^*+\omega_{\frac{1}{\mu}}, (Mfμ)=f+ωμ1,其中 ω 1 μ \omega_{\frac{1}{\mu}} ωμ1 ∥ ⋅ ∥ \Vert\cdot\Vert 的对偶范数定义.

下面给出几个Moreau包络的运算规则.

引理4 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常闭凸函数, λ ,   μ > 0 \lambda,\,\mu>0 λ,μ>0. 则对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, λ M f μ ( x ) = M λ f μ / λ ( x ) . \lambda M_f^{\mu}(\mathbf{x})=M_{\lambda f}^{\mu/\lambda}(\mathbf{x}). λMfμ(x)=Mλfμ/λ(x).

证明: 对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, λ M f μ ( x ) = λ min ⁡ u { f ( u ) + 1 2 μ ∥ u − x ∥ 2 } = min ⁡ u { λ f ( u ) + 1 2 μ / λ ∥ u − x ∥ 2 } = M λ f μ / λ ( x ) . \begin{aligned}\lambda M_f^{\mu}(\mathbf{x})&=\lambda\min_{\mathbf{u}}\left\{f(\mathbf{u})+\frac{1}{2\mu}\Vert\mathbf{u-x}\Vert^2\right\}\\&=\min_{\mathbf{u}}\left\{\lambda f(\mathbf{u})+\frac{1}{2\mu/\lambda}\Vert\mathbf{u-x}\Vert^2\right\}\\&=M_{\lambda f}^{\mu/\lambda}(\mathbf{x}).\end{aligned} λMfμ(x)=λumin{f(u)+2μ1ux2}=umin{λf(u)+2μ/λ1ux2}=Mλfμ/λ(x).

定理21 (可分函数的Moreau包络) 设 E = E 1 × E 2 × ⋯ × E m \mathbb{E}=\mathbb{E}_1\times\mathbb{E}_2\times\cdots\times\mathbb{E}_m E=E1×E2××Em, f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]定义为 f ( x 1 , x 2 , … , x m ) = ∑ i = 1 m f i ( x i ) , x 1 ∈ E 1 ,   x 2 ∈ E 2 ,   … , x m ∈ E m , f(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)=\sum_{i=1}^mf_i(\mathbf{x}_i),\quad\mathbf{x}_1\in\mathbb{E}_1,\,\mathbf{x}_2\in\mathbb{E}_2,\,\ldots,\mathbf{x}_m\in\mathbb{E}_m, f(x1,x2,,xm)=i=1mfi(xi),x1E1,x2E2,,xmEm,这里对 ∀ i \forall i i, f i : E i → ( − ∞ , ∞ ] f_i:\mathbb{E}_i\to(-\infty,\infty] fi:Ei(,]是正常闭凸函数. 则给定 μ > 0 \mu>0 μ>0, 对 ∀ x 1 ∈ E 1 ,   x 2 ∈ E 2 ,   … , x m ∈ E m \forall\mathbf{x}_1\in\mathbb{E}_1,\,\mathbf{x}_2\in\mathbb{E}_2,\,\ldots,\mathbf{x}_m\in\mathbb{E}_m x1E1,x2E2,,xmEm, M f μ ( x 1 , x 2 , … , x m ) = ∑ i = 1 m M f i μ ( x i ) . M_f^{\mu}(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m)=\sum_{i=1}^mM_{f_i}^{\mu}(\mathbf{x}_i). Mfμ(x1,x2,,xm)=i=1mMfiμ(xi).

证明: 对 ∀ x 1 ∈ E 1 ,   x 2 ∈ E 2 , … , x m ∈ E m \forall\mathbf{x}_1\in\mathbb{E}_1,\,\mathbf{x}_2\in\mathbb{E}_2,\ldots,\mathbf{x}_m\in\mathbb{E}_m x1E1,x2E2,,xmEm, 记 x = ( x 1 , x 2 , … , x m ) \mathbf{x}=(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_m) x=(x1,x2,,xm), M f μ ( x ) = min ⁡ u i ∈ E i ,   i = 1 , 2 , … , m { f ( u 1 , u 2 , … , u m ) + 1 2 μ ∥ ( u 1 , u 2 , … , u m ) − x ∥ 2 } = min ⁡ u i ∈ E i ,   i = 1 , 2 , … , m { ∑ i = 1 m f i ( u i ) + 1 2 μ ∑ i = 1 m ∥ u i − x i ∥ 2 } = ∑ i = 1 m min ⁡ u i ∈ E i { f i ( u i ) + 1 2 μ ∥ u i − x i ∥ 2 } = ∑ i = 1 m M f i μ ( x i ) . \begin{aligned}M_f^{\mu}(\mathbf{x})&=\min_{\mathbf{u}_i\in\mathbb{E}_i,\,i=1,2,\ldots,m}\left\{f(\mathbf{u}_1,\mathbf{u}_2,\ldots,\mathbf{u}_m)+\frac{1}{2\mu}\Vert(\mathbf{u}_1,\mathbf{u}_2,\ldots,\mathbf{u}_m)-\mathbf{x}\Vert^2\right\}\\&=\min_{\mathbf{u}_i\in\mathbb{E}_i,\,i=1,2,\ldots,m}\left\{\sum_{i=1}^mf_i(\mathbf{u}_i)+\frac{1}{2\mu}\sum_{i=1}^m\Vert\mathbf{u}_i-\mathbf{x}_i\Vert^2\right\}\\&=\sum_{i=1}^m\min_{\mathbf{u}_i\in\mathbb{E}_i}\left\{f_i(\mathbf{u}_i)+\frac{1}{2\mu}\Vert\mathbf{u}_i-\mathbf{x}_i\Vert^2\right\}\\&=\sum_{i=1}^mM_{f_i}^{\mu}(\mathbf{x}_i).\end{aligned} Mfμ(x)=uiEi,i=1,2,,mmin{f(u1,u2,,um)+2μ1(u1,u2,,um)x2}=uiEi,i=1,2,,mmin{i=1mfi(ui)+2μ1i=1muixi2}=i=1muiEimin{fi(ui)+2μ1uixi2}=i=1mMfiμ(xi).

例26 ( ℓ 1 \ell_1 1-范数的Moreau包络) 考虑函数 f : R n → R f:\mathbb{R}^n\to\mathbb{R} f:RnR定义为 f ( x ) = ∥ x ∥ 1 f(\mathbf{x})=\Vert\mathbf{x}\Vert_1 f(x)=x1. 注意到 f ( x ) = ∥ x ∥ 1 = ∑ i = 1 n g ( x i ) , f(\mathbf{x})=\Vert\mathbf{x}\Vert_1=\sum_{i=1}^ng(x_i), f(x)=x1=i=1ng(xi),其中 g ( t ) = ∣ t ∣ g(t)=|t| g(t)=t. 由例25, M g μ = H μ M_g^{\mu}=H_{\mu} Mgμ=Hμ. 再由定理21, 我们就有对 ∀ x ∈ R n \forall\mathbf{x}\in\mathbb{R}^n xRn, M f μ ( x ) = ∑ i = 1 n M g μ ( x i ) = ∑ i = 1 n H μ ( x i ) . M_f^{\mu}(\mathbf{x})=\sum_{i=1}^nM_g^{\mu}(x_i)=\sum_{i=1}^nH_{\mu}(x_i). Mfμ(x)=i=1nMgμ(xi)=i=1nHμ(xi).

7.2 Moreau包络的可微性

定理22 (Moreau包络的光滑性) 设 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常闭凸函数, μ > 0 \mu>0 μ>0. 则 M f μ M_f^{\mu} Mfμ E \mathbb{E} E上的 1 μ \frac{1}{\mu} μ1-光滑函数10, 且对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, ∇ M f μ ( x ) = 1 μ ( x − p r o x μ f ( x ) ) . \nabla M_f^{\mu}(\mathbf{x})=\frac{1}{\mu}(\mathbf{x}-\mathrm{prox}_{\mu f}(\mathbf{x})). Mfμ(x)=μ1(xproxμf(x)).

证明: 由定理20(i), M f μ = f □ ω μ M_f^{\mu}=f\square\omega_{\mu} Mfμ=fωμ, 其中 ω μ = 1 2 μ ∥ ⋅ ∥ 2 \omega_{\mu}=\frac{1}{2\mu}\Vert\cdot\Vert^2 ωμ=2μ12. 根据第五章定理9(令 ω = ω μ ,   L = 1 μ \omega=\omega_{\mu},\,L=\frac{1}{\mu} ω=ωμ,L=μ1), 就有 M f μ M_f^{\mu} Mfμ 1 μ \frac{1}{\mu} μ1-光滑的. 由于 p r o x μ f ( x ) = arg ⁡ min ⁡ u ∈ E { f ( u ) + 1 2 μ ∥ u − x ∥ 2 } , \mathrm{prox}_{\mu f}(\mathbf{x})=\arg\min_{\mathbf{u}\in\mathbb{E}}\left\{f(\mathbf{u})+\frac{1}{2\mu}\Vert\mathbf{u-x}\Vert^2\right\}, proxμf(x)=arguEmin{f(u)+2μ1ux2},因此第五章定理9中的 u ( x ) \mathbf{u(x)} u(x)就是 p r o x μ f ( x ) \mathrm{prox}_{\mu f}(\mathbf{x}) proxμf(x) ∇ M f μ ( x ) = ∇ ω μ ( x − u ( x ) ) = 1 μ ( x − p r o x μ f ( x ) ) . \nabla M_f^{\mu}(\mathbf{x})=\nabla \omega_{\mu}(\mathbf{x-u(x)})=\frac{1}{\mu}(\mathbf{x}-\mathrm{prox}_{\mu f}(\mathbf{x})). Mfμ(x)=ωμ(xu(x))=μ1(xproxμf(x)).

例27 ( 1 2 d C 2 \frac{1}{2}d_C^2 21dC2 1 1 1-光滑性) 此前, 我们在第三章例9推导了 1 2 d C 2 \frac{1}{2}d_C^2 21dC2的梯度表达式, 其 1 1 1-光滑性也在第五章的例3和例13中被两次讨论. 这里我们再从Moreau包络的角度出发. 设 C ⊂ E C\subset\mathbb{E} CE为一非空闭凸集. 由例24, 1 2 d C 2 = M δ C 1 \frac{1}{2}d_C^2=M_{\delta C}^1 21dC2=MδC1. 于是由定理22, 1 2 d C 2 \frac{1}{2}d_C^2 21dC2 1 1 1-光滑的且 ∇ ( 1 2 d C 2 ) ( x ) = x − p r o x δ C ( x ) = x − P C ( x ) . \nabla\left(\frac{1}{2}d_C^2\right)(\mathbf{x})=\mathbf{x}-\mathrm{prox}_{\delta_C}(\mathbf{x})=\mathbf{x}-P_C(\mathbf{x}). (21dC2)(x)=xproxδC(x)=xPC(x).

例28 (Huber函数的光滑性) Huber函数的定义为 H μ ( x ) = { 1 2 μ ∥ x ∥ 2 , ∥ x ∥ ≤ μ ∥ x ∥ − μ 2 ∥ x ∥ > μ . H_{\mu}(\mathbf{x})=\left\{\begin{array}{ll}\frac{1}{2\mu}\Vert\mathbf{x}\Vert^2, & \Vert\mathbf{x}\Vert\le\mu\\\Vert\mathbf{x}\Vert-\frac{\mu}{2} & \Vert\mathbf{x}\Vert>\mu.\end{array}\right. Hμ(x)={2μ1x2,x2μxμx>μ.由例25, H μ = M f μ H_{\mu}=M_f^{\mu} Hμ=Mfμ, 其中 f f f是欧式范数 f ( x ) = ∥ x ∥ f(\mathbf{x})=\Vert\mathbf{x}\Vert f(x)=x. 于是由定理22, H μ H_{\mu} Hμ 1 μ \frac{1}{\mu} μ1-光滑函数且 ∇ H μ ( x ) = 1 μ ( x − p r o x μ f ( x ) ) = 例 8 1 μ ( x − ( 1 − μ max ⁡ { ∥ x ∥ , μ } ) x ) = { 1 μ x , ∥ x ∥ ≤ μ , x ∥ x ∥ , ∥ x ∥ > μ , \begin{aligned}\nabla H_{\mu}(\mathbf{x})&=\frac{1}{\mu}\left(\mathbf{x}-\mathrm{prox}_{\mu f}(\mathbf{x})\right)\\&\overset{例8}{=}\frac{1}{\mu}\left(\mathbf{x}-\left(1-\frac{\mu}{\max\{\Vert\mathbf{x}\Vert,\mu\}}\right)\mathbf{x}\right)\\&=\left\{\begin{array}{ll}\frac{1}{\mu}\mathbf{x}, & \Vert\mathbf{x}\Vert\le\mu,\\\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert}, & \Vert\mathbf{x}\Vert>\mu,\end{array}\right.\end{aligned} Hμ(x)=μ1(xproxμf(x))=8μ1(x(1max{x,μ}μ)x)={μ1x,xx,xμ,x>μ,这也说明Huber函数在 ∥ x ∥ = μ \Vert\mathbf{x}\Vert=\mu x=μ的位置是光滑连接的.

7.3 Moreau包络的prox

下面的定理23表明, 对一个正常闭凸函数 f f f, 已知其prox, 则可以进一步计算出其Moreau包络的prox.

定理23 (Moreau包络的prox) 设 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常闭凸函数, μ > 0 \mu>0 μ>0. 则对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, p r o x M f μ ( x ) = x + 1 μ + 1 ( p r o x ( μ + 1 ) f ( x ) − x ) . \mathrm{prox}_{M_f^{\mu}}(\mathbf{x})=\mathbf{x}+\frac{1}{\mu+1}\left(\mathrm{prox}_{(\mu+1)f}(\mathbf{x})-\mathbf{x}\right). proxMfμ(x)=x+μ+11(prox(μ+1)f(x)x).

证明: 首先注意到 min ⁡ u { M f μ ( u ) + 1 2 ∥ u − x ∥ 2 } = min ⁡ u min ⁡ y { f ( y ) + 1 2 μ ∥ u − y ∥ 2 + 1 2 ∥ u − x ∥ 2 } . \min_{\mathbf{u}}\left\{M_f^{\mu}(\mathbf{u})+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}=\min_{\mathbf{u}}\min_{\mathbf{y}}\left\{f(\mathbf{y})+\frac{1}{2\mu}\Vert\mathbf{u-y}\Vert^2+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}. umin{Mfμ(u)+21ux2}=uminymin{f(y)+2μ1uy2+21ux2}.交换极小化次序可得 min ⁡ y min ⁡ u { f ( y ) + 1 2 μ ∥ u − y ∥ 2 + 1 2 ∥ u − x ∥ 2 } . \min_{\mathbf{y}}\min_{\mathbf{u}}\left\{f(\mathbf{y})+\frac{1}{2\mu}\Vert\mathbf{u-y}\Vert^2+\frac{1}{2}\Vert\mathbf{u-x}\Vert^2\right\}. yminumin{f(y)+2μ1uy2+21ux2}.内部子问题的最优解于梯度为 0 \mathbf{0} 0处取到, 即 1 μ ( u − y ) + ( u − x ) = 0 ⇒ u = u μ = μ x + y μ + 1 . \frac{1}{\mu}(\mathbf{u-y})+(\mathbf{u-x})=\mathbf{0}\Rightarrow\mathbf{u}=\mathbf{u}_{\mu}=\frac{\mu\mathbf{x+y}}{\mu+1}. μ1(uy)+(ux)=0u=uμ=μ+1μx+y.因此, 内部子问题的最优值为 f ( y ) + 1 2 μ ∥ u μ − y ∥ 2 + 1 2 ∥ u μ − x ∥ 2 = f ( y ) + 1 2 μ ∥ μ x − μ y μ + 1 ∥ 2 + 1 2 ∥ y − x μ + 1 ∥ 2 = f ( y ) + 1 2 ( μ + 1 ) ∥ x − y ∥ 2 . \begin{aligned}f(\mathbf{y})+\frac{1}{2\mu}\Vert\mathbf{u}_{\mu}-\mathbf{y}\Vert^2+\frac{1}{2}\Vert\mathbf{u}_{\mu}-\mathbf{x}\Vert^2&=f(\mathbf{y})+\frac{1}{2\mu}\left\Vert\frac{\mu\mathbf{x}-\mu\mathbf{y}}{\mu+1}\right\Vert^2+\frac{1}{2}\left\Vert\frac{\mathbf{y-x}}{\mu+1}\right\Vert^2\\&=f(\mathbf{y})+\frac{1}{2(\mu+1)}\Vert\mathbf{x-y}\Vert^2.\end{aligned} f(y)+2μ1uμy2+21uμx2=f(y)+2μ1μ+1μxμy2+21μ+1yx2=f(y)+2(μ+1)1xy2.所以原问题最优解 u \mathbf{u} u的表达式中 y \mathbf{y} y min ⁡ y { f ( y ) + 1 2 ( μ + 1 ) ∥ x − y ∥ 2 } \min_{\mathbf{y}}\left\{f(\mathbf{y)}+\frac{1}{2(\mu+1)}\Vert\mathbf{x-y}\Vert^2\right\} ymin{f(y)+2(μ+1)1xy2}的解, 也即 y = p r o x ( μ + 1 ) f ( x ) \mathbf{y}=\mathrm{prox}_{(\mu+1)f}(\mathbf{x}) y=prox(μ+1)f(x). 总之, p r o x M f μ = 1 μ + 1 ( μ x + p r o x ( μ + 1 ) f ( x ) ) . \mathrm{prox}_{M_f^{\mu}}=\frac{1}{\mu+1}\left(\mu\mathbf{x}+\mathrm{prox}_{(\mu+1)f}(\mathbf{x})\right). proxMfμ=μ+11(μx+prox(μ+1)f(x)).

结合定理23与引理4, 就得到下面的推论4.

推论4 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常闭凸函数, λ , μ > 0 \lambda,\mu>0 λ,μ>0. 则对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, p r o x λ M f μ ( x ) = x + λ μ + λ ( p r o x ( μ + λ ) f ( x ) − x ) . \mathrm{prox}_{\lambda M_f^{\mu}}(\mathbf{x})=\mathbf{x}+\frac{\lambda}{\mu+\lambda}\left(\mathrm{prox}_{(\mu+\lambda)f}(\mathbf{x})-\mathbf{x}\right). proxλMfμ(x)=x+μ+λλ(prox(μ+λ)f(x)x).

证明: p r o x λ M f μ ( x ) = p r o x M λ f μ / λ ( x ) = x + λ μ + λ ( p r o x ( μ + λ ) f ( x ) − x ) . \mathrm{prox}_{\lambda M_f^{\mu}}(\mathbf{x})=\mathrm{prox}_{M_{\lambda f}^{\mu/\lambda}}(\mathbf{x})=\mathbf{x}+\frac{\lambda}{\mu+\lambda}\left(\mathrm{prox}_{(\mu+\lambda)f}(\mathbf{x})-\mathbf{x}\right). proxλMfμ(x)=proxMλfμ/λ(x)=x+μ+λλ(prox(μ+λ)f(x)x).

例29 ( λ 2 d C 2 \frac{\lambda}{2}d_C^2 2λdC2的prox) 设 C ⊂ E C\subset\mathbb{E} CE为一非空闭凸集, λ > 0 \lambda>0 λ>0. 考虑函数 f = 1 2 d C 2 f=\frac{1}{2}d_C^2 f=21dC2. 由例27, f = M g 1 f=M_g^1 f=Mg1, 其中 g = δ C g=\delta_C g=δC. 由于 p r o x g = P C \mathrm{prox}_g=P_C proxg=PC, 因此由推论4, 对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, p r o x λ f ( x ) = p r o x λ M g 1 ( x ) = x + λ λ + 1 ( p r o x ( λ + 1 ) g ( x ) − x ) = x + λ λ + 1 ( P C ( x ) − x ) . \mathrm{prox}_{\lambda f}(\mathbf{x})=\mathrm{prox}_{\lambda M_g^1}(\mathbf{x})=\mathbf{x}+\frac{\lambda}{\lambda+1}\left(\mathrm{prox}_{(\lambda+1)g}(\mathbf{x})-\mathbf{x}\right)=\mathbf{x}+\frac{\lambda}{\lambda+1}(P_C(\mathbf{x})-\mathbf{x}). proxλf(x)=proxλMg1(x)=x+λ+1λ(prox(λ+1)g(x)x)=x+λ+1λ(PC(x)x). p r o x λ 2 d C 2 ( x ) = λ λ + 1 P C ( x ) + 1 λ + 1 x . \boxed{\mathrm{prox}_{\frac{\lambda}{2}d_C^2}(\mathbf{x})=\frac{\lambda}{\lambda+1}P_C(\mathbf{x})+\frac{1}{\lambda+1}\mathbf{x}.} prox2λdC2(x)=λ+1λPC(x)+λ+11x.

例30 (Huber函数的prox) 考虑函数 f ( x ) = λ H μ ( x ) f(\mathbf{x})=\lambda H_{\mu}(\mathbf{x}) f(x)=λHμ(x). 由例25, H μ = M g μ H_{\mu}=M_g^{\mu} Hμ=Mgμ, 其中 g ( x ) = ∥ x ∥ g(\mathbf{x})=\Vert\mathbf{x}\Vert g(x)=x. 因此由推论4和例8, 对 ∀ λ > 0 ,   x ∈ E \forall\lambda>0,\,\mathbf{x}\in\mathbb{E} λ>0,xE, p r o x λ H μ ( x ) = p r o x λ M g μ ( x ) = x + λ μ + λ ( p r o x ( μ + λ ) g ( x ) − x ) = x + λ μ + λ ( ( 1 − μ + λ max ⁡ { ∥ x ∥ , μ + λ } ) x − x ) . \begin{aligned}\mathrm{prox}_{\lambda H_{\mu}}(\mathbf{x})&=\mathrm{prox}_{\lambda M_g^{\mu}}(\mathbf{x})=\mathbf{x}+\frac{\lambda}{\mu+\lambda}\left(\mathrm{prox}_{(\mu+\lambda)g}(\mathbf{x})-\mathbf{x}\right)\\&=\mathbf{x}+\frac{\lambda}{\mu+\lambda}\left(\left(1-\frac{\mu+\lambda}{\max\{\Vert\mathbf{x}\Vert,\mu+\lambda\}}\right)\mathbf{x}-\mathbf{x}\right).\end{aligned} proxλHμ(x)=proxλMgμ(x)=x+μ+λλ(prox(μ+λ)g(x)x)=x+μ+λλ((1max{x,μ+λ}μ+λ)xx).简化后可得 p r o x λ H μ ( x ) = ( 1 − λ max ⁡ { ∥ x ∥ , μ + λ } ) x . \boxed{\mathrm{prox}_{\lambda H_{\mu}}(\mathbf{x})=\left(1-\frac{\lambda}{\max\{\Vert\mathbf{x}\Vert,\mu+\lambda\}}\right)\mathbf{x}.} proxλHμ(x)=(1max{x,μ+λ}λ)x.

类似于临近算子的Moreau分解公式, 我们也可以推导出对Moreau包络函数的分解公式.

定理24 (Moreau包络分解) 设 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常闭凸函数, μ > 0 \mu>0 μ>0. 则对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, M f μ ( x ) + M f ∗ 1 / μ ( x / μ ) = 1 2 μ ∥ x ∥ 2 . M_f^{\mu}(\mathbf{x})+M_{f^*}^{1/\mu}(\mathbf{x}/\mu)=\frac{1}{2\mu}\Vert\mathbf{x}\Vert^2. Mfμ(x)+Mf1/μ(x/μ)=2μ1x2.

证明: 对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, M f μ ( x ) = min ⁡ u ∈ E { f ( u ) + ψ ( u ) } , M_f^{\mu}(\mathbf{x})=\min_{\mathbf{u}\in\mathbb{E}}\{f(\mathbf{u})+\psi(\mathbf{u})\}, Mfμ(x)=uEmin{f(u)+ψ(u)},其中 ψ ( u ) = 1 2 μ ∥ u − x ∥ 2 \psi(\mathbf{u})=\frac{1}{2\mu}\Vert\mathbf{u-x}\Vert^2 ψ(u)=2μ1ux2. 由Fenchel对偶定理, M f μ ( x ) = max ⁡ v ∈ E { − f ∗ ( v ) − ψ ∗ ( − v ) } = − min ⁡ v ∈ E { f ∗ ( v ) + ψ ∗ ( − v ) } . M_f^{\mu}(\mathbf{x})=\max_{\mathbf{v}\in\mathbb{E}}\{-f^*(\mathbf{v})-\psi^*(-\mathbf{v})\}=-\min_{\mathbf{v}\in\mathbb{E}}\{f^*(\mathbf{v})+\psi^*(\mathbf{-v})\}. Mfμ(x)=vEmax{f(v)ψ(v)}=vEmin{f(v)+ψ(v)}. ϕ ( ⋅ ) = 1 2 ∥ ⋅ − x ∥ 2 \phi(\cdot)=\frac{1}{2}\Vert\cdot-\mathbf{x}\Vert^2 ϕ()=21x2. 于是 ϕ ∗ ( v ) = 1 2 ∥ v ∥ 2 + ⟨ x , v ⟩ . \phi^*(\mathbf{v})=\frac{1}{2}\Vert\mathbf{v}\Vert^2+\langle\mathbf{x,v}\rangle. ϕ(v)=21v2+x,v.因为 ψ = 1 μ ϕ \psi=\frac{1}{\mu}\phi ψ=μ1ϕ, 于是由第四章定理7(i), ψ ∗ ( v ) = 1 μ ϕ ∗ ( μ v ) = μ 2 ∥ v ∥ 2 + ⟨ x , v ⟩ . \psi^*(\mathbf{v})=\frac{1}{\mu}\phi^*(\mu\mathbf{v})=\frac{\mu}{2}\Vert\mathbf{v}\Vert^2+\langle\mathbf{x,v}\rangle. ψ(v)=μ1ϕ(μv)=2μv2+x,v.因此 M f μ ( x ) = − min ⁡ v ∈ E { f ∗ ( v ) + μ 2 ∥ v ∥ 2 − ⟨ x , v ⟩ } = − min ⁡ v ∈ E { f ∗ ( v ) + μ 2 ∥ v − x / μ ∥ 2 − 1 2 μ ∥ x ∥ 2 } = 1 2 μ ∥ x ∥ 2 − M f ∗ 1 / μ ( x / μ ) . \begin{aligned}M_f^{\mu}(\mathbf{x})&=-\min_{\mathbf{v}\in\mathbb{E}}\left\{f^*(\mathbf{v})+\frac{\mu}{2}\Vert\mathbf{v}\Vert^2-\langle\mathbf{x,v}\rangle\right\}\\&=-\min_{\mathbf{v}\in\mathbb{E}}\left\{f^*(\mathbf{v})+\frac{\mu}{2}\Vert\mathbf{v}-\mathbf{x}/\mu\Vert^2-\frac{1}{2\mu}\Vert\mathbf{x}\Vert^2\right\}\\&=\frac{1}{2\mu}\Vert\mathbf{x}\Vert^2-M_{f^*}^{1/\mu}(\mathbf{x}/\mu).\end{aligned} Mfμ(x)=vEmin{f(v)+2μv2x,v}=vEmin{f(v)+2μvx/μ22μ1x2}=2μ1x2Mf1/μ(x/μ).

8. 关于prox计算的其它结论

本节我们给出一些prox计算的特殊例子. 它们的证明不依赖于本章中的任何结论.

8.1 R n \mathbb{R}^n Rn上线性变换的范数

引理5 f : R n → R f:\mathbb{R}^n\to\mathbb{R} f:RnR定义为 f ( x ) = ∥ A x ∥ 2 f(\mathbf{x})=\Vert\mathbf{Ax}\Vert_2 f(x)=Ax2, 其中 A ∈ R m × n \mathbf{A}\in\mathbb{R}^{m\times n} ARm×n行满秩, λ > 0 \lambda>0 λ>0. 则 p r o x λ f ( x ) = { x − A T ( A A T ) − 1 A x , ∥ ( A A T ) − 1 A x ∥ 2 ≤ λ , x − A T ( A A T + α ∗ I ) − 1 A x , ∥ ( A A T ) − 1 A x ∥ 2 > λ , \mathrm{prox}_{\lambda f}(\mathbf{x})=\left\{\begin{array}{ll}\mathbf{x}-\mathbf{A}^T\left(\mathbf{AA}^T\right)^{-1}\mathbf{Ax}, & \left\Vert\left(\mathbf{AA}^T\right)^{-1}\mathbf{Ax}\right\Vert_2\le\lambda,\\\mathbf{x}-\mathbf{A}^T\left(\mathbf{AA}^T+\alpha^*\mathbf{I}\right)^{-1}\mathbf{Ax}, & \left\Vert\left(\mathbf{AA}^T\right)^{-1}\mathbf{Ax}\right\Vert_2>\lambda,\end{array}\right. proxλf(x)=xAT(AAT)1Ax,xAT(AAT+αI)1Ax,(AAT)1Ax2λ,(AAT)1Ax2>λ,其中 α ∗ \alpha^* α为严格单调递减函数 g ( α ) = ∥ ( A A T + α I ) − 1 A x ∥ 2 2 − λ 2 g(\alpha)=\left\Vert\left(\mathbf{AA}^T+\alpha I\right)^{-1}\mathbf{Ax}\right\Vert_2^2-\lambda^2 g(α)=(AAT+αI)1Ax22λ2的唯一正根.

证明: p r o x λ f ( x ) \mathrm{prox}_{\lambda f}(\mathbf{x}) proxλf(x) min ⁡ u ∈ R n { λ ∥ A u ∥ 2 + 1 2 ∥ u − x ∥ 2 2 } \min_{\mathbf{u}\in\mathbb{R}^n}\left\{\lambda\Vert\mathbf{Au}\Vert_2+\frac{1}{2}\Vert\mathbf{u-x}\Vert_2^2\right\} uRnmin{λAu2+21ux22}的唯一最优解. 这等价于 min ⁡ u ∈ R n ,   z ∈ R m { 1 2 ∥ u − x ∥ 2 2 + λ ∥ z ∥ 2 : z = A u } . \min_{\mathbf{u}\in\mathbb{R}^n,\,\mathbf{z}\in\mathbb{R}^m}\left\{\frac{1}{2}\Vert\mathbf{u-x}\Vert_2^2+\lambda\Vert\mathbf{z}\Vert_2:\mathbf{z=Au}\right\}. uRn,zRmmin{21ux22+λz2:z=Au}.其Lagrange函数为 L ( u , z ; y ) = 1 2 ∥ u − x ∥ 2 2 + λ ∥ z ∥ 2 + y T ( z − A u ) = [ 1 2 ∥ u − x ∥ 2 2 − ( A T y ) T u ] + [ λ ∥ z ∥ 2 + y T z ] . \begin{aligned}L(\mathbf{u,z;y})&=\frac{1}{2}\Vert\mathbf{u-x}\Vert_2^2+\lambda\Vert\mathbf{z}\Vert_2+\mathbf{y}^T(\mathbf{z-Au})\\&=\left[\frac{1}{2}\Vert\mathbf{u-x}\Vert_2^2-\left(\mathbf{A}^T\mathbf{y}\right)^T\mathbf{u}\right]+\left[\lambda\Vert\mathbf{z}\Vert_2+\mathbf{y}^T\mathbf{z}\right].\end{aligned} L(u,z;y)=21ux22+λz2+yT(zAu)=[21ux22(ATy)Tu]+[λz2+yTz].由于Lagrange函数对变量 u , z \mathbf{u,z} u,z是可分的, 因此对偶问题的目标函数可以写作 min ⁡ u , z L ( u , z ; y ) = min ⁡ u [ 1 2 ∥ u − x ∥ 2 2 − ( A T y ) T u ] + min ⁡ z [ λ ∥ z ∥ 2 + y T z ] . \min_{\mathbf{u,z}}L(\mathbf{u,z;y})=\min_{\mathbf{u}}\left[\frac{1}{2}\Vert\mathbf{u-x}\Vert_2^2-\left(\mathbf{A}^T\mathbf{y}\right)^T\mathbf{u}\right]+\min_{\mathbf{z}}\left[\lambda\Vert\mathbf{z}\Vert_2+\mathbf{y}^T\mathbf{z}\right]. u,zminL(u,z;y)=umin[21ux22(ATy)Tu]+zmin[λz2+yTz].关于 u \mathbf{u} u的极小化问题的最优解为 u ~ = x + A T y \tilde\mathbf{u}=\mathbf{x}+\mathbf{A}^T\mathbf{y} u~=x+ATy, 对应的最优值为 − 1 2 y T A A T y − ( A x ) T y . -\frac{1}{2}\mathbf{y}^T\mathbf{AA}^T\mathbf{y}-\left(\mathbf{Ax}\right)^T\mathbf{y}. 21yTAATy(Ax)Ty.而关于 z \mathbf{z} z的极小化问题可以写作 min ⁡ z [ λ ∥ z ∥ 2 + y T z ] = − max ⁡ z [ ( − y ) T z − λ ∥ z ∥ 2 ] = − g ∗ ( − y ) , \min_{\mathbf{z}}\left[\lambda\Vert\mathbf{z}\Vert_2+\mathbf{y}^T\mathbf{z}\right]=-\max_{\mathbf{z}}\left[(\mathbf{-y})^T\mathbf{z}-\lambda\Vert\mathbf{z}\Vert_2\right]=-g^*(\mathbf{-y}), zmin[λz2+yTz]=zmax[(y)Tzλz2]=g(y),其中 g ( ⋅ ) = λ ∥ ⋅ ∥ 2 g(\cdot)=\lambda\Vert\cdot\Vert_2 g()=λ2. 由于 g ∗ ( w ) = λ δ B ∥ ⋅ ∥ 2 [ 0 , 1 ] ( w / λ ) = δ B ∥ ⋅ ∥ 2 [ 0 , λ ] g^*(\mathbf{w})=\lambda\delta_{B_{\Vert\cdot\Vert_2}[\mathbf{0},1]}(\mathbf{w}/\lambda)=\delta_{B_{\Vert\cdot\Vert_2}[\mathbf{0},\lambda]} g(w)=λδB2[0,1](w/λ)=δB2[0,λ](根据第四章定理7(i)与4.12节), 因此 min ⁡ z [ λ ∥ z ∥ 2 + y T z ] = { 0 , ∥ y ∥ 2 ≤ λ , − ∞ , ∥ y ∥ 2 > λ . \min_{\mathbf{z}}\left[\lambda\Vert\mathbf{z}\Vert_2+\mathbf{y}^T\mathbf{z}\right]=\left\{\begin{array}{ll}0, & \Vert\mathbf{y}\Vert_2\le\lambda,\\-\infty, & \Vert\mathbf{y}\Vert_2>\lambda.\end{array}\right. zmin[λz2+yTz]={0,,y2λ,y2>λ.于此, 我们就有对偶问题: max ⁡ y ∈ R m { − 1 2 y T A A T y − ( A x ) T y : ∥ y ∥ 2 ≤ λ } . \max_{\mathbf{y}\in\mathbb{R}^m}\left\{-\frac{1}{2}\mathbf{y}^T\mathbf{AA}^T\mathbf{y}-\left(\mathbf{Ax}\right)^T\mathbf{y}:\Vert\mathbf{y}\Vert_2\le\lambda\right\}. yRmmax{21yTAATy(Ax)Ty:y2λ}.注意到强对偶性成立. 我们首先将对偶问题写成等价的 min ⁡ y ∈ R m { 1 2 y T A A T y + ( A x ) T y : ∥ y ∥ 2 2 ≤ λ 2 } . \min_{\mathbf{y}\in\mathbb{R}^m}\left\{\frac{1}{2}\mathbf{y}^T\mathbf{AA}^T\mathbf{y}+\left(\mathbf{Ax}\right)^T\mathbf{y}:\Vert\mathbf{y}\Vert_2^2\le\lambda^2\right\}. yRmmin{21yTAATy+(Ax)Ty:y22λ2}.我们已经知道 p r o x λ f ( x ) = x + A T y , \mathrm{prox}_{\lambda f}(\mathbf{x})=\mathbf{x}+\mathbf{A}^T\mathbf{y}, proxλf(x)=x+ATy,其中 y \mathbf{y} y是对偶问题最优解. 而对偶问题是凸问题且满足Slater条件(见第三章定理28), 因此 y \mathbf{y} y为其最优解当且仅当存在 α ∗ \alpha^* α, 使得 ( A A T + α ∗ I ) y + A x = 0 , α ∗ ( ∥ y ∥ 2 2 − λ 2 ) = 0 , ∥ y ∥ 2 2 ≤ λ 2 , α ∗ ≥ 0. \begin{aligned}\left(\mathbf{AA}^T+\alpha^*\mathbf{I}\right)\mathbf{y}+\mathbf{Ax}&=\mathbf{0},\\\alpha^*\left(\Vert\mathbf{y}\Vert_2^2-\lambda^2\right)&=0,\\\Vert\mathbf{y}\Vert_2^2&\le\lambda^2,\\\alpha^*&\ge0.\end{aligned} (AAT+αI)y+Axα(y22λ2)y22α=0,=0,λ2,0.

  • 情形一: α ∗ = 0 \alpha^*=0 α=0. 于是 y = − ( A A T ) − 1 A x . \mathbf{y}=-\left(\mathbf{AA}^T\right)^{-1}\mathbf{Ax}. y=(AAT)1Ax.这时 y \mathbf{y} y是最优解当且仅当 ∥ ( A A T ) − 1 A x ∥ 2 ≤ λ \left\Vert\left(\mathbf{AA}^T\right)^{-1}\mathbf{Ax}\right\Vert_2\le\lambda (AAT)1Ax2λ. 此时, p r o x λ f ( x ) = x − A T ( A A T ) − 1 A x . \mathrm{prox}_{\lambda f}(\mathbf{x})=\mathbf{x}-\mathbf{A}^T\left(\mathbf{AA}^T\right)^{-1}\mathbf{Ax}. proxλf(x)=xAT(AAT)1Ax.
  • 情形二: 若 ∥ ( A A T ) − 1 A x ∥ 2 > λ \left\Vert\left(\mathbf{AA}^T\right)^{-1}\mathbf{Ax}\right\Vert_2>\lambda (AAT)1Ax2>λ, 则 α ∗ > 0 \alpha^*>0 α>0. 此时 y = − ( A A T + α ∗ I ) − 1 A x , \mathbf{y}=-\left(\mathbf{AA}^T+\alpha^*\mathbf{I}\right)^{-1}\mathbf{Ax}, y=(AAT+αI)1Ax,且由互补松弛条件, ∥ y ∥ 2 2 = λ 2 . \Vert\mathbf{y}\Vert_2^2=\lambda^2. y22=λ2.二者结合即得, α ∗ \alpha^* α为函数 g ( α ) = ∥ ( A A T + α I ) − 1 A x ∥ 2 2 − λ 2 g(\alpha)=\left\Vert\left(\mathbf{AA}^T+\alpha\mathbf{I}\right)^{-1}\mathbf{Ax}\right\Vert_2^2-\lambda^2 g(α)=(AAT+αI)1Ax22λ2的正根. 可以验证, g ( α ) g(\alpha) g(α) α ≥ 0 \alpha\ge0 α0时是严格单调递减函数. 因此 α ∗ \alpha^* α唯一确定.

8.2 ℓ 1 \ell_1 1-范数平方

ℓ 1 \ell_1 1-范数的prox就是软阈值函数(见例2). 但 ℓ 1 \ell_1 1-范数平方的prox就不那么好求了. 在下面的引理6中, 我们先证明 ∥ x ∥ 1 2 \Vert\mathbf{x}\Vert_1^2 x12是一个优化问题的最优值. 其中要用到函数 φ ( s , t ) = { s 2 t , t > 0 , 0 , s = 0 ,   t = 0 , ∞ , 其 它 . \varphi(s,t)=\left\{\begin{array}{ll}\frac{s^2}{t}, & t>0,\\0, & s=0,\,t=0,\\\infty, & 其它.\end{array}\right. φ(s,t)=ts2,0,,t>0,s=0,t=0,.根据第二章例13, φ \varphi φ是闭凸函数(尽管它在 ( s , t ) = ( 0 , 0 ) (s,t)=(0,0) (s,t)=(0,0)处并不连续).

引理6 ( ∥ ⋅ ∥ 1 2 \Vert\cdot\Vert_1^2 12的变分表示) 对 ∀ x ∈ R n \forall\mathbf{x}\in\mathbb{R}^n xRn, 有 min ⁡ λ ∈ Δ n ∑ j = 1 n φ ( x j , λ j ) = ∥ x ∥ 1 2 . \min_{\bm{\lambda}\in\Delta_n}\sum_{j=1}^n\varphi(x_j,\lambda_j)=\Vert\mathbf{x}\Vert_1^2. λΔnminj=1nφ(xj,λj)=x12.此问题的一个最优解为 λ ~ j = { ∣ x j ∣ ∥ x ∥ 1 , x ≠ 0 , 1 n , x = 0 , j = 1 , 2 , … , n . \tilde\lambda_j=\left\{\begin{array}{ll}\frac{|x_j|}{\Vert\mathbf{x}\Vert_1}, & \mathbf{x\ne0},\\\frac{1}{n}, & \mathbf{x=0},\end{array}\right.\quad j=1,2,\ldots,n. λ~j={x1xj,n1,x=0,x=0,j=1,2,,n.

证明: 根据闭函数的Weierstrass定理, 此问题必有最优解. 我们记之为 λ ∗ ∈ Δ n \bm{\lambda}^*\in\Delta_n λΔn. 定义 I 0 = { i ∈ { 1 , 2 , … , n } : λ i ∗ = 0 } , I 1 = { i ∈ { 1 , 2 , … , n } : λ i ∗ > 0 } . \begin{aligned}I_0&=\{i\in\{1,2,\ldots,n\}:\lambda_i^*=0\},\\I_1&=\{i\in\{1,2,\ldots,n\}:\lambda_i^*>0\}.\end{aligned} I0I1={i{1,2,,n}:λi=0},={i{1,2,,n}:λi>0}. I 0 , I 1 I_0,I_1 I0,I1的定义, ∑ i ∈ I 1 λ i ∗ = ∑ i = 1 n λ i ∗ = 1. \sum_{i\in I_1}\lambda_i^*=\sum_{i=1}^n\lambda_i^*=1. iI1λi=i=1nλi=1.对于 i ∈ I 0 i\in I_0 iI0, 必有 x i = 0 x_i=0 xi=0. 否则 φ ( x i , λ i ∗ ) = ∞ \varphi(x_i,\lambda_i^*)=\infty φ(xi,λi)=. 由Cauchy-Schwarz不等式, ∑ j = 1 n ∣ x j ∣ = ∑ j ∈ I 1 ∣ x j ∣ = ∑ j ∈ I 1 ∣ x j ∣ λ j ∗ λ j ∗ ≤ ∑ j ∈ I 1 x j 2 λ j ∗ ⋅ ∑ j ∈ I 1 λ j ∗ = ∑ j ∈ I i x j 2 λ j ∗ . \sum_{j=1}^n|x_j|=\sum_{j\in I_1}|x_j|=\sum_{j\in I_1}\frac{|x_j|}{\sqrt{\lambda_j^*}}\sqrt{\lambda_j^*}\le\sqrt{\sum_{j\in I_1}\frac{x_j^2}{\lambda_j^*}}\cdot\sqrt{\sum_{j\in I_1}\lambda_j^*}=\sqrt{\sum_{j\in I_i}\frac{x_j^2}{\lambda_j^*}}. j=1nxj=jI1xj=jI1λj xjλj jI1λjxj2 jI1λj =jIiλjxj2 .于是 ∑ j = 1 n φ ( x j , λ j ∗ ) = ∑ j ∈ I 1 φ ( x j , λ j ∗ ) = ∑ j ∈ I 1 x j 2 λ j ∗ ≥ ∥ x ∥ 1 2 . \sum_{j=1}^n\varphi(x_j,\lambda_j^*)=\sum_{j\in I_1}\varphi(x_j,\lambda_j^*)=\sum_{j\in I_1}\frac{x_j^2}{\lambda_j^*}\ge\Vert\mathbf{x}\Vert_1^2. j=1nφ(xj,λj)=jI1φ(xj,λj)=jI1λjxj2x12.另一方面, 由于 λ ∗ \bm{\lambda}^* λ是问题的最优解, 所以 ∑ j = 1 n φ ( x j , λ j ∗ ) ≤ ∑ j = 1 n φ ( x j , λ ~ j ) = ∥ x ∥ 1 2 . \sum_{j=1}^n\varphi(x_j,\lambda_j^*)\le\sum_{j=1}^n\varphi(x_j,\tilde\lambda_j)=\Vert\mathbf{x}\Vert_1^2. j=1nφ(xj,λj)j=1nφ(xj,λ~j)=x12.因此, 问题的最优值就是 ∥ x ∥ 1 2 \Vert\mathbf{x}\Vert_1^2 x12, 且 λ ~ \tilde\bm{\lambda} λ~是一个最优解.

引理7 ( ∥ ⋅ ∥ 1 2 \Vert\cdot\Vert_1^2 12的prox) 设 f : R n → R f:\mathbb{R}^n\to\mathbb{R} f:RnR定义为 f ( x ) = ∥ x ∥ 1 2 f(\mathbf{x})=\Vert\mathbf{x}\Vert_1^2 f(x)=x12, ρ > 0 \rho>0 ρ>0. 则 p r o x ρ f ( x ) = { ( λ i x i λ i + 2 ρ ) i = 1 n , x ≠ 0 , 0 , x = 0 , \mathrm{prox}_{\rho f}(\mathbf{x})=\left\{\begin{array}{ll}\left(\frac{\lambda_ix_i}{\lambda_i+2\rho}\right)_{i=1}^n, & \mathbf{x\ne0},\\\mathbf{0}, & \mathbf{x=0},\end{array}\right. proxρf(x)={(λi+2ρλixi)i=1n,0,x=0,x=0,其中 λ i = [ ρ ∣ x i ∣ μ ∗ − 2 ρ ] + \lambda_i=\left[\frac{\sqrt{\rho}|x_i|}{\sqrt{\mu^*}}-2\rho\right]_+ λi=[μ ρ xi2ρ]+, μ ∗ \mu^* μ为单调递减函数 ψ ( μ ) = ∑ i = 1 n [ ρ ∣ x i ∣ μ − 2 ρ ] + − 1 \psi(\mu)=\sum_{i=1}^n\left[\frac{\sqrt{\rho}|x_i|}{\sqrt{\mu}}-2\rho\right]_+-1 ψ(μ)=i=1n[μ ρ xi2ρ]+1的任一正根.

证明: 若 x = 0 \mathbf{x=0} x=0, 则显然有 p r o x ρ f ( x ) = arg ⁡ min ⁡ u { 1 2 ∥ u ∥ 2 2 + ρ ∥ u ∥ 1 2 } = 0 \mathrm{prox}_{\rho f}(\mathbf{x})=\arg\min_{\mathbf{u}}\{\frac{1}{2}\Vert\mathbf{u}\Vert_2^2+\rho\Vert\mathbf{u}\Vert_1^2\}=\mathbf{0} proxρf(x)=argminu{21u22+ρu12}=0. 现假设 x ≠ 0 \mathbf{x\ne0} x=0. 由引理6, u = p r o x ρ f ( x ) \mathbf{u}=\mathrm{prox}_{\rho f}(\mathbf{x}) u=proxρf(x)当且仅当它就是 min ⁡ u ∈ R n ,   λ ∈ Δ n { 1 2 ∥ u − x ∥ 2 2 + ρ ∑ i = 1 n φ ( u i , λ i ) } \min_{\mathbf{u}\in\mathbb{R}^n,\,\bm{\lambda}\in\Delta_n}\left\{\frac{1}{2}\Vert\mathbf{u-x}\Vert_2^2+\rho\sum_{i=1}^n\varphi(u_i,\lambda_i)\right\} uRn,λΔnmin{21ux22+ρi=1nφ(ui,λi)}最优解中的 u \mathbf{u} u. 首先对 u \mathbf{u} u极小化, 就有 u i = λ i x i λ i + 2 ρ u_i=\frac{\lambda_ix_i}{\lambda_i+2\rho} ui=λi+2ρλixi11, 问题变成 min ⁡ λ ∑ i = 1 n ρ x i 2 λ + 2 ρ s . t . e T λ = 1 , λ ≥ 0. \begin{array}{ll}\min_{\bm{\lambda}} & \sum\limits_{i=1}^n\dfrac{\rho x_i^2}{\lambda+2\rho}\\\mathrm{s.t.} & \mathbf{e}^T\bm{\lambda}=1,\\&\bm{\lambda}\ge\mathbf{0}.\end{array} minλs.t.i=1nλ+2ρρxi2eTλ=1,λ0.注意到此问题满足强对偶性. Lagrange函数为 L ( λ ; μ ) = ∑ i = 1 n ( ρ x i 2 λ + 2 ρ + λ i μ ) − μ . L(\bm{\lambda};\mu)=\sum_{i=1}^n\left(\frac{\rho x_i^2}{\lambda+2\rho}+\lambda_i\mu\right)-\mu. L(λ;μ)=i=1n(λ+2ρρxi2+λiμ)μ. λ ∗ \bm{\lambda}^* λ为最优解当且仅当存在 μ ∗ \mu^* μ使得 λ ∗ ∈ arg ⁡ min ⁡ λ ≥ 0 L ( λ ; μ ∗ ) , e T λ ∗ = 1. \begin{aligned}\bm{\lambda}^*&\in\arg\min_{\bm{\lambda}\ge\mathbf{0}}L(\bm{\lambda};\mu^*),\\\mathbf{e}^T\bm{\lambda}^*&=1.\end{aligned} λeTλargλ0minL(λ;μ),=1.由于最小值有限且可取到, 又 x ≠ 0 \mathbf{x\ne0} x=0, 因此必有 μ ∗ > 0 \mu^*>0 μ>0(若 μ ∗ = 0 \mu^*=0 μ=0, 则最小值无法取到; 若 μ ∗ < 0 \mu^*<0 μ<0, 则最小值为 − ∞ -\infty ). 求导置零可得 λ i ∗ = [ ρ ∣ x i ∣ μ − 2 ρ ] + . \lambda_i^*=\left[\frac{\sqrt{\rho}|x_i|}{\sqrt{\mu}}-2\rho\right]_+. λi=[μ ρ xi2ρ]+.因此 μ ∗ \mu^* μ就必须满足 ∑ i = 1 n [ ρ ∣ x i ∣ μ − 2 ρ ] + = 1. \sum_{i=1}^n\left[\frac{\sqrt{\rho}|x_i|}{\sqrt{\mu}}-2\rho\right]_+=1. i=1n[μ ρ xi2ρ]+=1.

8.3 到 s s s-稀疏向量集上的正交投影

s ∈ { 1 , 2 , … , n } s\in\{1,2,\ldots,n\} s{1,2,,n}, 考虑集合 C s = { x ∈ R n : ∥ x ∥ 0 ≤ s } . C_s=\{\mathbf{x}\in\mathbb{R}^n:\Vert\mathbf{x}\Vert_0\le s\}. Cs={xRn:x0s}.集合 C s C_s Cs包含了所有 s s s-稀疏向量, 即那些至多有 s s s个非零元的向量.

  • C s C_s Cs不是凸集. 例如 n = 2 n=2 n=2, ( 0 , 1 ) T , ( 1 , 0 ) T ∈ C 1 (0,1)^T,(1,0)^T\in C_1 (0,1)T,(1,0)TC1, 但 ( 0.5 , 0.5 ) T = 0.5 ( 0 , 1 ) T + 0.5 ( 1 , 0 ) T ∉ C 1 (0.5,0.5)^T=0.5(0,1)^T+0.5(1,0)^T\notin C_1 (0.5,0.5)T=0.5(0,1)T+0.5(1,0)T/C1.
  • C s C_s Cs是闭集. 它是闭函数 ∥ ⋅ ∥ 0 \Vert\cdot\Vert_0 0的水平集(见第二章例3).

由定理2, P C s = p r o x δ C s P_{C_s}=\mathrm{prox}_{\delta_{C_s}} PCs=proxδCs非空, 但未必是单点集.

为进一步给出 P C s P_{C_s} PCs的表示, 下面引入一些记号: 对 x ∈ R n \mathbf{x}\in\mathbb{R}^n xRn与某个指标集 S ⊂ { 1 , 2 , … , n } S\subset\{1,2,\ldots,n\} S{1,2,,n},

  • x S \mathbf{x}_S xS x \mathbf{x} x中那些指标在 S S S中的分量组成的向量;
  • 矩阵 U S \mathbf{U}_S US是单位阵中那些指标在 S S S中的列向量组成的子阵;
  • 集合 S c S^c Sc S S S { 1 , 2 , … , n } \{1,2,\ldots,n\} {1,2,,n}的补集: S c = { 1 , 2 , … , n } ∖ S S^c=\{1,2,\ldots,n\}\setminus S Sc={1,2,,n}S;
  • x ⟨ i ⟩ x_{\langle i\rangle} xi x \mathbf{x} x按模第 i i i大的分量.

下面的引理8表明, P C s ( x ) P_{C_s}(\mathbf{x}) PCs(x)由具有 x \mathbf{x} x按模前 s s s大的分量组成的向量构成. 正是因为 x \mathbf{x} x中可能有相同的分量, P C s ( x ) P_{C_s}(\mathbf{x}) PCs(x)才有可能不是单点集.

引理8 ( C s C_s Cs上的正交投影) 设 s ∈ { 1 , 2 , … , n } ,   x ∈ R n s\in\{1,2,\ldots,n\},\,\mathbf{x}\in\mathbb{R}^n s{1,2,,n},xRn. 则 P C s ( x ) = { U S x S : ∣ S ∣ = s ,   S ⊂ { 1 , 2 , … , n } ,   ∑ i ∈ S ∣ x i ∣ = ∑ i = 1 s ∣ x ⟨ i ⟩ ∣ } . P_{C_s}(\mathbf{x})=\left\{\mathbf{U}_S\mathbf{x}_S:|S|=s,\,S\subset\{1,2,\ldots,n\},\,\sum_{i\in S}|x_i|=\sum_{i=1}^s\left|x_{\langle i\rangle}\right|\right\}. PCs(x)={USxS:S=s,S{1,2,,n},iSxi=i=1sxi}.

证明: 按 C s C_s Cs的定义, 它可以写成 C s = ⋃ S ⊂ { 1 , 2 , … , n } ,   ∣ S ∣ = s A S , C_s=\bigcup_{S\subset\{1,2,\ldots,n\},\,|S|=s}A_S, Cs=S{1,2,,n},S=sAS,其中 A S = { x ∈ R n : x S c = 0 } A_S=\{\mathbf{x}\in\mathbb{R}^n:\mathbf{x}_{S^c}=\mathbf{0}\} AS={xRn:xSc=0}. 注意这里 A S A_S AS是闭凸集, 因此我们可以把 P A S ( x ) P_{A_S}(\mathbf{x}) PAS(x)看成是向量. 而对于有限个闭凸集 A S A_S AS, 我们有 P C s ( x ) = P ⋃ S ⊂ { 1 , 2 , … , n } , ∣ S ∣ = s A S ( x ) ⊂ ⋃ S ⊂ { 1 , 2 , … , n } ,   ∣ S ∣ = s { P A S ( x ) } . P_{C_s}(\mathbf{x})=P_{\bigcup_{S\subset\{1,2,\ldots,n\},|S|=s}A_S}(\mathbf{x})\subset\bigcup_{S\subset\{1,2,\ldots,n\},\,|S|=s}\{P_{A_S}(\mathbf{x})\}. PCs(x)=PS{1,2,,n},S=sAS(x)S{1,2,,n},S=s{PAS(x)}.事实上, 对 ∀ y ∈ P C s ( x ) \forall\mathbf{y}\in P_{C_s}(\mathbf{x}) yPCs(x), y ∈ C s = ⋃ S ⊂ { 1 , 2 , … , n } ,   ∣ S ∣ = s A S \mathbf{y}\in C_s=\bigcup_{S\subset\{1,2,\ldots,n\},\,|S|=s}A_S yCs=S{1,2,,n},S=sAS. 因此必存在某个 S ⊂ { 1 , 2 , … , n } : ∣ S ∣ = s S\subset\{1,2,\ldots,n\}:|S|=s S{1,2,,n}:S=s, 使得 y ∈ A S \mathbf{y}\in A_S yAS. 一方面, y ∈ P C s ( x ) ⇒ ∥ y − x ∥ = min ⁡ u ∈ C s ∥ u − x ∥ ≤ min ⁡ u ∈ A S ∥ u − x ∥ , \mathbf{y}\in P_{C_s}(\mathbf{x})\Rightarrow\Vert\mathbf{y}-\mathbf{x}\Vert=\min_{\mathbf{u}\in C_s}\Vert\mathbf{u-x}\Vert\le\min_{\mathbf{u}\in A_S}\Vert\mathbf{u-x}\Vert, yPCs(x)yx=uCsminuxuASminux,另一方面, y ∈ A S ⇒ ∥ y − x ∥ ≥ min ⁡ u ∈ A S ∥ u − x ∥ . \mathbf{y}\in A_S\Rightarrow\Vert\mathbf{y-x}\Vert\ge\min_{\mathbf{u}\in A_S}\Vert\mathbf{u-x}\Vert. yASyxuASminux.联立二者可得 ∥ y − x ∥ = min ⁡ u ∈ A S ∥ u − x ∥ ⇒ y = P A S ( x ) ∈ ⋃ S ⊂ { 1 , 2 , … , n } ,   ∣ S ∣ = s { P A S ( x ) } . \Vert\mathbf{y-x}\Vert=\min_{\mathbf{u}\in A_S}\Vert\mathbf{u-x}\Vert\Rightarrow\mathbf{y}=P_{A_S}(\mathbf{x})\in\bigcup_{S\subset\{1,2,\ldots,n\},\,|S|=s}\{P_{A_S}(\mathbf{x})\}. yx=uASminuxy=PAS(x)S{1,2,,n},S=s{PAS(x)}.由以上分析还可知, P C s ( x ) = { P A S ( x ) : ∥ P A S ( x ) − x ∥ = min ⁡ S ′ ⊂ { 1 , 2 , … , n } ,   ∣ S ′ ∣ = s ∥ P A S ′ ( x ) − x ∥ } . P_{C_s}(\mathbf{x})=\left\{P_{A_S}(\mathbf{x}):\Vert P_{A_S}(\mathbf{x})-\mathbf{x}\Vert=\min_{S'\subset\{1,2,\ldots,n\},\,|S'|=s}\left\Vert P_{A_{S'}}(\mathbf{x})-\mathbf{x}\right\Vert\right\}. PCs(x)={PAS(x):PAS(x)x=S{1,2,,n},S=sminPAS(x)x}. P A S ( x ) P_{A_S}(\mathbf{x}) PAS(x)则是问题 min ⁡ y ∈ R n { ∥ y − x ∥ 2 2 : y S c = 0 } = min ⁡ y ∈ R n { ∥ y S − x S ∥ 2 2 + ∥ x S c ∥ 2 2 : y S c = 0 } \min_{\mathbf{y}\in\mathbb{R}^n}\left\{\Vert\mathbf{y-x}\Vert_2^2:\mathbf{y}_{S^c}=\mathbf{0}\right\}=\min_{\mathbf{y}\in\mathbb{R}^n}\left\{\Vert\mathbf{y}_S-\mathbf{x}_S\Vert_2^2+\Vert\mathbf{x}_{S^c}\Vert_2^2:\mathbf{y}_{S^c}=\mathbf{0}\right\} yRnmin{yx22:ySc=0}=yRnmin{ySxS22+xSc22:ySc=0}的最优解, 显然是 y S = x S ,   y S c = 0 \mathbf{y}_S=\mathbf{x}_S,\,\mathbf{y}_{S^c}=\mathbf{0} yS=xS,ySc=0, 即 y = U S x S \mathbf{y}=\mathbf{U}_S\mathbf{x}_S y=USxS, 从而最优值为 ∥ x S c ∥ 2 2 \Vert\mathbf{x}_{S^c}\Vert_2^2 xSc22. 因此 P C s ( x ) P_{C_s}(\mathbf{x}) PCs(x)中的向量会形如 U S x S \mathbf{U}_S\mathbf{x}_S USxS, 其中 S S S基数为 s s s, 且应当具有最小的 ∥ x S c ∥ 2 2 \Vert\mathbf{x}_{S^c}\Vert_2^2 xSc22. 这就等价于 S : ∣ S ∣ = s ,   S ⊂ { 1 , 2 , … , n } ,   ∑ i ∈ S ∣ x i ∣ = ∑ i = 1 s ∣ x ⟨ i ⟩ ∣ . S:|S|=s,\,S\subset\{1,2,\ldots,n\},\,\sum_{i\in S}|x_i|=\sum_{i=1}^s\left|x_{\langle i\rangle}\right|. S:S=s,S{1,2,,n},iSxi=i=1sxi.

例31 假定 n = 4 n=4 n=4. 则 P C 2 [ ( 2 , 3 , − 2 , 1 ) T ] = { ( 2 , 3 , 0 , 0 ) T , ( 0 , 3 , − 2 , 0 ) T } . P_{C_2}\left[(2,3,-2,1)^T\right]=\left\{(2,3,0,0)^T,(0,3,-2,0)^T\right\}. PC2[(2,3,2,1)T]={(2,3,0,0)T,(0,3,2,0)T}.

9. 特殊函数的临近计算小结

f ( x ) f(\mathbf{x}) f(x) d o m ( f ) \mathrm{dom}(f) dom(f) p r o x f ( x ) \mathrm{prox}_f(\mathbf{x}) proxf(x)假设条件参考
1 2 x T A x + b T x + c \frac{1}{2}\mathbf{x}^T\mathbf{Ax}+\mathbf{b}^T\mathbf{x}+c 21xTAx+bTx+c R n \mathbb{R}^n Rn ( A + I ) − 1 ( x − b ) (\mathbf{A+I})^{-1}\mathbf{(x-b)} (A+I)1(xb) A ∈ S + n ,   b ∈ R n ,   c ∈ R \mathbf{A}\in\mathbb{S}_+^n,\,\mathbf{b}\in\mathbb{R}^n,\,c\in\mathbb{R} AS+n,bRn,cR2.3节
λ x 3 \lambda x^3 λx3 R + \mathbb{R}_+ R+ − 1 + 1 + 12 λ [ x ] + 6 λ \frac{-1+\sqrt{1+12\lambda[x]_+}}{6\lambda} 6λ1+1+12λ[x]+ λ > 0 \lambda>0 λ>0引理1
μ x \mu x μx [ 0 , α ] ∩ R [0,\alpha]\cap\mathbb{R} [0,α]R min ⁡ { max ⁡ { x − μ , 0 } , α } \min\{\max\{x-\mu,0\},\alpha\} min{max{xμ,0},α} μ ∈ R ,   α ∈ [ 0 , ∞ ] \mu\in\mathbb{R},\,\alpha\in[0,\infty] μR,α[0,]例5
λ ∥ x ∥ \lambda\Vert\mathbf{x}\Vert λx E \mathbb{E} E ( 1 − λ max ⁡ { ∥ x ∥ , λ } ) x \left(1-\frac{\lambda}{\max\{\Vert\mathbf{x}\Vert,\lambda\}}\right)\mathbf{x} (1max{x,λ}λ)x ∥ ⋅ ∥ \Vert\cdot\Vert 为欧式范数, λ > 0 \lambda>0 λ>0例8
− λ ∥ x ∥ -\lambda\Vert\mathbf{x}\Vert λx E \mathbb{E} E ( 1 + λ ∥ x ∥ ) x , x ≠ 0 , { u : ∥ u ∥ = λ } , x = 0 . \begin{array}{ll}\left(1+\frac{\lambda}{\Vert\mathbf{x}\Vert}\right)\mathbf{x}, & \mathbf{x\ne0},\\\{\mathbf{u}:\Vert\mathbf{u}\Vert=\lambda\}, & \mathbf{x=0}.\end{array} (1+xλ)x,{u:u=λ},x=0,x=0. ∥ ⋅ ∥ \Vert\cdot\Vert 为欧式范数, λ > 0 \lambda>0 λ>0例10
λ ∥ x ∥ 1 \lambda\Vert\mathbf{x}\Vert_1 λx1 R n \mathbb{R}^n Rn T λ ( x ) = [ a b s ( x ) − λ e ] + ⊙ s g n ( x ) \mathcal{T}_{\lambda}(\mathbf{x})=[\mathrm{abs}(\mathbf{x})-\lambda\mathbf{e}]_+\odot\mathrm{sgn}(\mathbf{x}) Tλ(x)=[abs(x)λe]+sgn(x) λ > 0 \lambda>0 λ>0例2
∥ ω ⊙ x ∥ 1 \Vert\bm{\omega}\odot\mathbf{x}\Vert_1 ωx1 Box [ − α , α ] \text{Box}[-\bm{\alpha},\bm{\alpha}] Box[α,α] S ω , α ( x ) \mathcal{S}_{\bm{\omega},\bm{\alpha}}(\mathbf{x}) Sω,α(x) α ∈ [ 0 , ∞ ] n ,   ω ∈ R + n \bm{\alpha}\in[0,\infty]^n,\,\bm{\omega}\in\mathbb{R}_+^n α[0,]n,ωR+n例12
λ ∥ x ∥ ∞ \lambda\Vert\mathbf{x}\Vert_{\infty} λx R n \mathbb{R}^n Rn x − λ P B ∥ ⋅ ∥ 1 [ 0 , 1 ] ( x / λ ) \mathbf{x}-\lambda P_{B_{\Vert\cdot\Vert_1}[\mathbf{0},1]}(\mathbf{x}/\lambda) xλPB1[0,1](x/λ) λ > 0 \lambda>0 λ>0例20
λ ∥ x ∥ a \lambda\Vert\mathbf{x}\Vert_a λxa E \mathbb{E} E x − λ P B ∥ ⋅ ∥ a , ∗ [ 0 , 1 ] ( x / λ ) \mathbf{x}-\lambda P_{B_{\Vert\cdot\Vert_{a,*}}[\mathbf{0},1]}(\mathbf{x}/\lambda) xλPBa,[0,1](x/λ) ∥ ⋅ ∥ a \Vert\cdot\Vert_a a为任一范数, λ > 0 \lambda>0 λ>0例19
λ ∥ x ∥ 0 \lambda\Vert\mathbf{x}\Vert_0 λx0 R n \mathbb{R}^n Rn H 2 λ ( x 1 ) × ⋯ × H 2 λ ( x n ) \mathcal{H}_{\sqrt{2\lambda}}(x_1)\times\cdots\times\mathcal{H}_{\sqrt{2\lambda}}(x_n) H2λ (x1)××H2λ (xn) λ > 0 \lambda>0 λ>0例4
λ ∥ x ∥ 3 \lambda\Vert\mathbf{x}\Vert^3 λx3 E \mathbb{E} E 2 1 + 1 + 12 λ ∥ x ∥ x \frac{2}{1+\sqrt{1+12\lambda\Vert\mathbf{x}\Vert}}\mathbf{x} 1+1+12λx 2x ∥ ⋅ ∥ \Vert\cdot\Vert 为欧式范数, λ > 0 \lambda>0 λ>0例9
− λ ∑ j = 1 n log ⁡ x j -\lambda\sum_{j=1}^n\log x_j λj=1nlogxj R + + n \mathbb{R}_{++}^n R++n ( x j + x j 2 + 4 λ 2 ) j = 1 n \left(\frac{x_j+\sqrt{x_j^2+4\lambda}}{2}\right)_{j=1}^n (2xj+xj2+4λ )j=1n λ > 0 \lambda>0 λ>0例3
δ C ( x ) \delta_C(\mathbf{x}) δC(x) E \mathbb{E} E P C ( x ) P_C(\mathbf{x}) PC(x) ∅ ≠ C ⊂ E \emptyset\ne C\subset\mathbb{E} =CE定理9
λ σ C ( x ) \lambda\sigma_C(\mathbf{x}) λσC(x) E \mathbb{E} E x − λ P C ( x / λ ) \mathbf{x}-\lambda P_C(\mathbf{x}/\lambda) xλPC(x/λ) λ > 0 ,   C ≠ ∅ \lambda>0,\,C\ne\emptyset λ>0,C=闭凸定理19
λ max ⁡ { x i } \lambda\max\{x_i\} λmax{xi} R n \mathbb{R}^n Rn x − λ P Δ n ( x / λ ) \mathbf{x}-\lambda P_{\Delta_n}(\mathbf{x}/\lambda) xλPΔn(x/λ) λ > 0 \lambda>0 λ>0例21
λ ∑ i = 1 k x [ i ] \lambda\sum_{i=1}^kx_{[i]} λi=1kx[i] R n \mathbb{R}^n Rn x − λ P C ( x / λ ) ,   C = H e , k ∩ Box [ 0 , e ] \mathbf{x}-\lambda P_C(\mathbf{x}/\lambda),\,C=H_{\mathbf{e},k}\cap\text{Box}[\mathbf{0,e}] xλPC(x/λ),C=He,kBox[0,e] λ > 0 \lambda>0 λ>0例22
λ ∑ i = 1 k a b s ( x ⟨ i ⟩ ) \lambda\sum_{i=1}^k\mathrm{abs}\left(x_{\langle i\rangle}\right) λi=1kabs(xi) R n \mathbb{R}^n Rn x − λ P C ( x / λ ) ,   C = B ∥ ⋅ ∥ 1 [ 0 , k ] ∩ Box [ − e , e ] \mathbf{x}-\lambda P_C(\mathbf{x}/\lambda),\,C=B_{\Vert\cdot\Vert_1}[\mathbf{0},k]\cap\text{Box}[\mathbf{-e,e}] xλPC(x/λ),C=B1[0,k]Box[e,e] λ > 0 \lambda>0 λ>0例23
λ M f μ ( x ) \lambda M_f^{\mu}(\mathbf{x}) λMfμ(x) E \mathbb{E} E x + λ μ + λ ( p r o x ( μ + λ ) f ( x ) − x ) \mathbf{x}+\frac{\lambda}{\mu+\lambda}\left(\mathrm{prox}_{(\mu+\lambda)f}(\mathbf{x})-\mathbf{x}\right) x+μ+λλ(prox(μ+λ)f(x)x) λ ,   u > 0 ,   f \lambda,\,u>0,\,f λ,u>0,f正常闭凸推论4
λ d C ( x ) \lambda d_C(\mathbf{x}) λdC(x) E \mathbb{E} E x + min ⁡ { λ d C ( x ) , 1 } ( P C ( x ) − x ) \mathbf{x}+\min\left\{\frac{\lambda}{d_C(\mathbf{x})},1\right\}(P_C(\mathbf{x})-\mathbf{x}) x+min{dC(x)λ,1}(PC(x)x) ∅ ≠ C \emptyset\ne C =C闭凸, λ > 0 \lambda>0 λ>0引理3
λ 2 d C 2 ( x ) \frac{\lambda}{2}d_C^2(\mathbf{x}) 2λdC2(x) E \mathbb{E} E λ λ + 1 P C ( x ) + 1 λ + 1 x \frac{\lambda}{\lambda+1}P_C(\mathbf{x})+\frac{1}{\lambda+1}\mathbf{x} λ+1λPC(x)+λ+11x ∅ ≠ C \emptyset\ne C =C闭凸, λ > 0 \lambda>0 λ>0例29
λ H μ ( x ) \lambda H_{\mu}(\mathbf{x}) λHμ(x) E \mathbb{E} E ( 1 − λ max ⁡ { ∥ x ∥ , μ + λ } ) x \left(1-\frac{\lambda}{\max\{\Vert\mathbf{x}\Vert,\mu+\lambda\}}\right)\mathbf{x} (1max{x,μ+λ}λ)x λ ,   μ > 0 \lambda,\,\mu>0 λ,μ>0例30
ρ ∥ x ∥ 1 2 \rho\Vert\mathbf{x}\Vert_1^2 ρx12 R n \mathbb{R}^n Rn ( v i x i v i + 2 ρ ) i = 1 n ,   v = [ ρ μ a b s ( x ) − 2 ρ ] + ,   e T v = 1 \left(\frac{v_ix_i}{v_i+2\rho}\right)_{i=1}^n,\,\mathbf{v}=\left[\sqrt{\frac{\rho}{\mu}}\mathrm{abs}(\mathbf{x})-2\rho\right]_+,\,\mathbf{e}^T\mathbf{v}=1 (vi+2ρvixi)i=1n,v=[μρ abs(x)2ρ]+,eTv=1 ρ > 0 \rho>0 ρ>0引理7
λ ∥ A x ∥ 2 \lambda\Vert\mathbf{Ax}\Vert_2 λAx2 R n \mathbb{R}^n Rn x − A T ( A A T + α ∗ I ) − 1 A x ,   α ∗ = 0 ,   若 ∥ v 0 ∥ ≤ λ ; 否 则 ,   ∥ v α ∗ ∥ 2 = λ ;   v α ≡ ( A A T + α I ) − 1 A x \mathbf{x}-\mathbf{A}^T\left(\mathbf{AA}^T+\alpha^*\mathbf{I}\right)^{-1}\mathbf{Ax},\,\alpha^*=0,\,若\Vert\mathbf{v}_0\Vert\le\lambda; 否则,\,\Vert\mathbf{v}_{\alpha^*}\Vert_2=\lambda;\,\mathbf{v}_{\alpha}\equiv\left(\mathbf{AA}^T+\alpha\mathbf{I}\right)^{-1}\mathbf{Ax} xAT(AAT+αI)1Ax,α=0,v0λ;,vα2=λ;vα(AAT+αI)1Ax A ∈ R m × n \mathbf{A}\in\mathbb{R}^{m\times n} ARm×n行满秩, λ > 0 \lambda>0 λ>0引理5

  1. 英文中, 我们常将“proximal”简写成“prox”. ↩︎

  2. f : R n → R f:\mathbb{R}^n\to\mathbb{R} f:RnR是正常闭凸可分函数, f ( x ) = ∑ i = 1 n f i ( x i ) , f(\mathbf{x})=\sum_{i=1}^nf_i(x_i), f(x)=i=1nfi(xi),其中的 f i f_i fi是正常闭凸的一元函数, 则根据第一临近定理, 定理3的结论就可写成 p r o x f ( x ) = ( p r o x f i ( x i ) ) i = 1 n . \mathrm{prox}_f(\mathbf{x})=\left(\mathrm{prox}_{f_i}(x_i)\right)_{i=1}^n. proxf(x)=(proxfi(xi))i=1n. ↩︎

  3. z ~ , u ~ \tilde\mathbf{z},\tilde\mathbf{u} z~,u~的存在唯一性可由 g g g是正常闭凸函数得到. ↩︎

  4. 等价性来源于 d o m ( g ) ⊂ [ 0 , ∞ ) \mathrm{dom}(g)\subset[0,\infty) dom(g)[0,). ↩︎

  5. 因为 g g g正常闭凸, 所以这是唯一的. ↩︎

  6. 投影到盒型区域 Box [ ℓ , u ] \text{Box}[\mathbf{\ell,u}] Box[,u]可以用引理2的结论逐元素进行; 求方程 a T P Box ( x − μ a ) = b \mathbf{a}^TP_{\text{Box}}(\mathbf{x-\mu a})=b aTPBox(xμa)=b的解则可以用二分法等简单的求根法. 这是因为 φ ( μ ) = a T P Box ( x − μ a ) − b \varphi(\mu)=\mathbf{a}^TP_{\text{Box}}(\mathbf{x-\mu a})-b φ(μ)=aTPBox(xμa)b是单调函数. 事实上, φ ( μ ) = ∑ i = 1 n a i min ⁡ { max ⁡ { x i − μ a i , ℓ i } , u i } − b \varphi(\mu)=\sum_{i=1}^na_i\min\{\max\{x_i-\mu a_i,\ell_i\},u_i\}-b φ(μ)=i=1naimin{max{xiμai,i},ui}b, 且对 ∀ i \forall i i, μ ↦ a i min ⁡ { max ⁡ { x i − μ a i , ℓ i } , u i } \mu\mapsto a_i\min\{\max\{x_i-\mu a_i,\ell_i\},u_i\} μaimin{max{xiμai,i},ui}都是非增函数. ↩︎

  7. 定理假设 f f f是闭函数, 但这并不一定能推出 d o m ( f ) \mathrm{dom}(f) dom(f)是闭集. 此时 P d o m ( f ) ( x ) P_{\mathrm{dom}(f)}(\mathbf{x}) Pdom(f)(x)就不一定存在. 反例可见例16. ↩︎

  8. 这里 θ \theta θ只在 d C ( x ) > λ > 0 d_C(\mathbf{x})>\lambda>0 dC(x)>λ>0的时候用到, 所以 θ \theta θ是良定的. ↩︎

  9. 注意不要与第五章的光滑参数 L L L混淆. ↩︎

  10. Moreau包络的定义实际上是在原函数 f f f的基础上加了一个强凸项 1 2 μ ∥ x − u ∥ 2 \frac{1}{2\mu}\Vert\mathbf{x-u}\Vert^2 2μ1xu2. 由定理22我们进一步知道, μ \mu μ越大, M f μ M_f^{\mu} Mfμ的光滑参数就越小. 这时强凸项在优化问题中的作用就越小. 这就是 μ \mu μ被称作光滑参数的原因. ↩︎

  11. 注意若 λ i = 0 \lambda_i=0 λi=0, 则此式给出 u i = 0 u_i=0 ui=0. 因此这个式子包含了不连续点的情形. ↩︎

  • 10
    点赞
  • 32
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值