近端算法学习

Nightmare004

已于 2024-01-06 16:15:50 修改

阅读量918

点赞数

分类专栏：数学文章标签：算法近端算法

于 2022-05-14 17:27:17 首次发布

本文链接：https://blog.csdn.net/qq_39942341/article/details/124753874

版权

数学专栏收录该内容

144 篇文章

订阅专栏

闭函数

设函数 $f:\mathbb{E}\to \left[-\infty,\infty\right]$
如果 $f$ 的上镜图是闭的，则 $f$ 是闭函数(closed function)

适当函数

设函数 $f:\mathbb{E}\to \left[-\infty,\infty\right]$
如果 $\forall \mathbf{x}\in \mathbb{E},f\left(\mathbf{x}\right)>-\infty$
且 $\exists \mathbf{x}\in\mathbb{E},f\left(\mathbb{x}\right)<+\infty$
则 $f$ 是适当函数(proper function)

共轭函数

设 $f:\mathbb{E}\to \left[-\infty,\infty\right]$ ,则 $f$ 的共轭函数 $f^*:\mathbb{E}^*\to\left[-\infty,\infty\right]$ 定义为
$f^*\left(\mathbf{y}\right)=\max_{\mathbf{x}\in\mathbb{E}}\left\{\left\langle\mathbf{y},\mathbf{x}\right\rangle -f\left(\mathbf{x}\right)\right\}\quad \mathbf{y}\in\mathbb{E}^*$
如果 $f:\mathbb{E}\to \left(-\infty,\infty\right]$ ,则 $f$ 的共轭函数 $f^*$ 是闭的且凸的

设 $g\left(\mathbf{x}\right)=\alpha f\left(\mathbf{x}\right)$
则 $g^*\left(\mathbf{y}\right)=\alpha f^*\left(\frac{\mathbf{y}}{\alpha}\right)$

证明：
$\begin{aligned} g^*\left(\mathbf{y}\right)&=\max_{\mathbf{x}}\left\{\left\langle \mathbf{y},\mathbf{x} \right\rangle -g\left(\mathbf{x}\right)\right\}\\ &=\max_{\mathbf{x}}\left\{\left\langle \mathbf{y},\mathbf{x} \right\rangle -\alpha f\left(\mathbf{x}\right)\right\}\\ &=\alpha\max_{\mathbf{x}}\left\{\left\langle \frac{\mathbf{y}}{\alpha},\mathbf{x} \right\rangle - f\left(\mathbf{x}\right)\right\}\\ &=\alpha f^*\left(\frac{\mathbf{y}}{\alpha}\right) \end{aligned}$

Fenchel不等式

$f$ 是一个适当函数，则 $\forall \mathbf{x}\in\mathbb{E},\mathbf{y}\in\mathbb{E}^*$
$f\left(\mathbf{x}\right)+f^*\left(\mathbf{x}\right)\ge \left\langle\mathbf{y},\mathbf{x}\right\rangle$

二次共轭函数

设函数 $f:\mathbb{E}\to \left[-\infty,\infty\right]$ ，则 $f$ 的二次共轭函数 $f^{**}$
$f^{**}=\max_{y\in\mathbb{E}^*}\left\{\left\langle\mathbf{y},\mathbf{x}\right\rangle-f^*\left(\mathbf{y}\right)\right\},\quad\mathbf{x}\in\mathbb{E}$
因为
$f\left(\mathbf{x}\right)\ge \left\langle\mathbf{y},\mathbf{x}\right\rangle-f^*\left(\mathbf{y}\right)\Rightarrow f\left(\mathbf{x}\right)\ge \max_{y\in\mathbb{E}^*}\left\{\left\langle\mathbf{y},\mathbf{x}\right\rangle-f^*\left(\mathbf{y}\right)\right\}=f^{**}$
所以 $f^{**}\le f$

定理1

设函数 $f:\mathbb{E}\to \left(-\infty,\infty\right]$ 是一个适当凸函数,则 $f^{**}=f$
证明：
再说

共轭次梯度定理

设函数 $f:\mathbb{E}\to \left(-\infty,\infty\right]$ 是适当凸函数，则对于任意 $\mathbf{x}\in \mathbb{E},\mathbf{y}\in\mathbb{E}^*$ 下面两个式子等价

(i) $\left\langle\mathbf{x},\mathbf{y} \right\rangle = f\left(\mathbf{x}\right)+f^*\left(\mathbf{y}\right)$
(ii) $\mathbf{y}\in \partial f\left(\mathbf{x}\right)$
如果 $f$ 是闭的，则(i)和(ii)等价于
(iii) $\mathbf{x}\in \partial f^*\left(\mathbf{y}\right)$

证明：
由次微分的定义
$f\left(\mathbf{z}\right)\ge f\left(\mathbf{x}\right)+\left\langle \mathbf{y},\mathbf{z}-\mathbf{x}\right\rangle\quad \forall \mathbf{z}\in\mathbb{E}$
即
$\left\langle \mathbf{y},\mathbf{x}\right\rangle - f\left(\mathbf{x}\right)\ge \left\langle \mathbf{y},\mathbf{z}\right\rangle-f\left(\mathbf{z}\right) \quad \forall \mathbf{z}\in\mathbb{E}$
对右边关于 $\mathbf{z}$ 取最大值
$\left\langle\mathbf{x},\mathbf{y} \right\rangle - f\left(\mathbf{x}\right)\ge f^*\left(\mathbf{y}\right)$
再根据Fenchel不等式，有
$\left\langle\mathbf{x},\mathbf{y} \right\rangle = f\left(\mathbf{x}\right)+f^*\left(\mathbf{y}\right)$
如果 $f$ 是闭函数，则根据定理1， $f^{**}=f$
设 $g=f^*$
$\left\langle\mathbf{x},\mathbf{y} \right\rangle = g\left(\mathbf{y}\right)+g^*\left(\mathbf{x}\right)$
于是
$\mathbf{x}\in \partial g\left(\mathbf{y}\right)=\partial f^* \left(\mathbf{y}\right)$

近端算子

proximal operator
假设：
1. $f:\mathbb{R}^n\to\mathbb{R}\cup\left\{\infty\right\}$ 是一个闭，适当，凸函数
2. $\operatorname{dom} f$ 是使得 $f\left(\mathbf{x}\right)$ 的地方

定义：
$\operatorname{prox}_f:\mathbb{R}^n\to\mathbb{R}^n$ 定义为
$\operatorname{prox}_f\left(\mathbf{v}\right)=\arg\min_{\mathbf{x}\in\mathbb{R}^n}\left(f\left(\mathbf{x}\right)+\frac{1}{2}\|\mathbf{x}-\mathbf{v}\|^2\right)$

$\operatorname{prox}_{\lambda f}\left(\mathbf{v}\right)=\arg\min_{\mathbf{x}\in\mathbb{R}^n}\left(f\left(\mathbf{x}\right)+\frac{1}{2\lambda}\|\mathbf{x}-\mathbf{v}\|^2\right)$

第二近端定理

设 $f:\mathbb{E}\to \left(-\infty,\infty\right]$ 是一个闭凸函数，则下列3个式子等价
(i) $\mathbf{u}=\operatorname{Prox}_f\left(\mathbf{x}\right)$
(ii) $\mathbf{x}-\mathbf{u}\in\partial f\left(\mathbf{u}\right)$
(iii) $\left\langle \mathbf{x}-\mathbf{u},\mathbf{y}-\mathbf{u}\right\rangle \le f\left(\mathbf{u}\right)-f\left(\mathbf{u}\right)\quad \forall \mathbf{y}\in\mathbb{E}$

证明：
$\operatorname{prox}_f\left(\mathbf{x}\right)=\arg\min_{\mathbf{v}\in\mathbb{R}^n}\left(f\left(\mathbf{v}\right)+\frac{1}{2}\|\mathbf{v}-\mathbf{x}\|^2\right)$
所以 $\mathbf{0}\in\partial f\left(\mathbf{u}\right)+\mathbf{u}-\mathbf{x}$
所以 $\mathbf{x}-\mathbf{u}\in\partial f\left(\mathbf{u}\right)$
再根据次梯度定义，(iii)成立

引理

设 $g:\mathbb{E}\to\left(-\infty,\infty\right]$ 是适当函数， $\lambda\neq 0$
$f\left(\mathbf{x}\right)=\lambda g\left(\frac{\mathbf{x}}{\lambda}\right)$
则
$\operatorname{prox}_f\left(\mathbf{x}\right)=\lambda\operatorname{prox}_{\frac{g}{\lambda}}\left(\frac{\mathbf{x}}{\lambda}\right)$
证明：
$\begin{aligned} \operatorname{prox}_f\left(\mathbf{x}\right)&=\arg\min_{\mathbf{u}}\left\{f\left(\mathbf{u}\right)+\frac{1}{2}\|\mathbf{u}-\mathbf{x}\|^2\right\}\\ &=\arg\min_{\mathbf{u}}\left\{\lambda g\left(\frac{\mathbf{x}}{\lambda}\right)+\frac{1}{2}\|\mathbf{u}-\mathbf{x}\|^2\right\}\\ &=\lambda \arg\min_{\mathbf{z}}\left\{\lambda g\left(\mathbf{z}\right)+\frac{1}{2}\|\lambda\mathbf{z}-\mathbf{x}\|^2\right\}\\ &=\lambda \arg\min_{\mathbf{z}}\left\{\frac{g\left(\mathbf{z}\right)}{\lambda}+\frac{1}{2}\|\mathbf{z}-\frac{\mathbf{x}}{\lambda}\|^2\right\}\\ &=\lambda\operatorname{prox}_{\frac{g}{\lambda}}\left(\frac{\mathbf{x}}{\lambda}\right) \end{aligned}$

Moreau Decomposition

设 $f:\mathbb{E}\to \left(-\infty,\infty\right]$ 是一个适当闭的凸函数
$\mathbf{x}=\operatorname{Prox}_{f}(\mathbf{x})+\operatorname{Prox}_{f^{*}}(\mathbf{x})$
其中 $f^*$ 是共轭函数
证明：
设 $\mathbf{u}=\operatorname{Prox}_{f}(\mathbf{x})$
$\begin{aligned} &\Longleftrightarrow \mathbf{x}-\mathbf{u} \in \partial f(\mathbf{u}) \\ &\Longleftrightarrow \mathbf{u} \in \partial f^{*}(\mathbf{x}-\mathbf{u}) \\ &\Longleftrightarrow \mathbf{x}-(\mathbf{x}-\mathbf{u}) \in \partial f^{*}(\mathbf{x}-\mathbf{u}) \\ &\Longleftrightarrow \mathbf{x}-\mathbf{u}=\operatorname{Prox}_{f^{*}}(\mathbf{x}) \\ &\Longleftrightarrow \mathbf{x}=\mathbf{u}+\operatorname{Prox}_{f^{*}}(\mathbf{x})=\operatorname{Prox}_{f}(\mathbf{x})+\operatorname{Prox}_{f^{*}}(\mathbf{x}) \end{aligned}$

扩展版本
$\mathbf{x}=\operatorname{Prox}_{\lambda f}(\mathbf{x})+\lambda \operatorname{Prox}_{\frac{f^{*}}{\lambda}}(\mathbf{x} / \lambda)$

证明：
$\begin{aligned} \mathbf{x}&=\operatorname{Prox}_{\lambda f}(\mathbf{x})+\operatorname{Prox}_{\left(\lambda f\right)^{*}}(\mathbf{x})\\ &=\operatorname{Prox}_{\lambda f}(\mathbf{x})+\operatorname{Prox}_{\lambda f^*\left(\frac{\cdot}{\lambda}\right)}(\mathbf{x})\\ &==\operatorname{Prox}_{\lambda f}(\mathbf{x})+\lambda \operatorname{Prox}_{\frac{f^{*}}{\lambda}}(\mathbf{x} / \lambda)\\ \end{aligned}$

例子

投影

考虑闭凸集 $\mathbf{C}\neq \empty$
$l_C$ 是 $C$ 的指示函数：如果 $\mathbf{x}\in C$ ，则 $l_C\left(\mathbf{x}\right)=0$ ,否则为 $\infty$
$\operatorname{prox}_{l_C}\left(\mathbf{x}\right)=\arg\min_{\mathbf{y}}\left(l_C\left(\mathbf{y}\right)+\frac{1}{2}\|\mathbf{y}-\mathbf{x}\|^2\right)=\arg\min_{\mathbf{y}}\frac{1}{2}\|\mathbf{y}-\mathbf{x}\|^2=P_C\left(\mathbf{x}\right)$
也就是投影

1范数

$\begin{aligned} &\arg\min_{x\in\mathbb{R}^n}\left(\|\mathbf{x}\|_1+\frac{1}{2\lambda}\|\mathbf{x}-\mathbf{v}\|^2\right)\\ =&\arg\min_{x\in\mathbb{R}^n}\left(\lambda\|\mathbf{x}\|_1+\frac{1}{2}\|\mathbf{x}-\mathbf{v}\|^2\right)\\ =&\arg\min_{x\in\mathbb{R}^n}\sum_{i=1}^{n}\left(\lambda \left|x_i\right|+\frac{1}{2}\left(\mathbf{x}-\mathbf{v}\right)^2\right)\\ =&\sum_{i=1}^{n}\arg\min_{x\in\mathbb{R}^n}\left(\lambda \left|x_i\right|+\frac{1}{2}\left(\mathbf{x}-\mathbf{v}\right)^2\right)\\ \end{aligned}$

令 $g\left(x\right)=\lambda\left|x\right|+\frac{1}{2}\left(x-v\right)^2\left(\lambda>0,x\in\mathbb{R}\right)$
$g\left(x\right)=\begin{cases} \lambda x+\frac{1}{2}\left(x-v\right)^2,&x>0\\ -\lambda x+\frac{1}{2}\left(x-v\right)^2,&x\le 0\\ \end{cases}$
$g'\left(x\right)=\begin{cases} \lambda+x-v,&x>0\\ -\lambda+x-v,&x<0\\ \end{cases}$

$\lambda+x-v=0\Rightarrow x=v-\lambda$
当 $v>\lambda$ 时
$\ { 0 } , g ′ ( x ) < 0 \forall x\in \left(-\infty,v-\lambda\right)\backslash\left\{0\right\},g'\left(x\right)<0$
$\forall x\in\left(v-\lambda,+\infty\right),g'\left(x\right)>0$
所以 $\operatorname{prox}_{\lambda g}\left(v\right)=v-\lambda$

同理，当 $v<-\lambda$ 时
$\operatorname{prox}_{\lambda g}\left(v\right)=v+\lambda$

当 $\left|v\right|<\lambda$ 时
$\forall x\in\left(0,+\infty\right),g'\left(x\right)>0$
$\forall x\in\left(-\infty,0\right),g'\left(x\right)<0$
所以 $\operatorname{prox}_{\lambda g}\left(v\right)=0$

$\operatorname{prox}_{\lambda g}\left(v\right)=\begin{cases} v-\lambda,&v>\lambda\\ 0,&\left|v\right|<\lambda\\ v+\lambda,&v<-\lambda\\ \end{cases}=\operatorname{sign}\left(x\right)\left[\left|v\right|-\lambda\right]_+=v-P_{\left[-\lambda,\lambda\right]}v$
$P_{\left[-\lambda,\lambda\right]}$ 表示投影到 $\left[-\lambda,\lambda\right]$

所以
$\operatorname{prox}_{\lambda f}\left(\mathbf{v}\right)=\operatorname{sign}\left(\mathbf{v}\right)\left[\left|\mathbf{v}\right|-\lambda\right]_+=\mathbf{v}-P_{\left[-\lambda,\lambda\right]^n}\mathbf{v}$

例3

$f\left(x\right)=\begin{cases} \mu x,&x\ge 0\\ \infty,&x<0 \end{cases}$
$\operatorname{prox}_{\lambda f}\left(v\right)=\left[v-\lambda \mu\right]_+$

p范数

设 $\|\cdot\|_p$ 的对偶范数为 $\|\cdot\|_*$
( $p$ 范数的对偶范数为 $q$ 范数，其中 $\frac{1}{p}+\frac{1}{q}=1$ ,若 $p = 1$ ，则 $q=\infty$ )
范数的共轭函数为其单位对偶范数球的示性函数，即
$f^*\left(\mathbf{y}\right)=\begin{cases} 0,&\|\mathbf{y}\|_*\le 1\\ +\infty,&\|\mathbf{y}\|_*\ge1 \end{cases}$
根据Moreau Decomposition
$\mathbf{x}=\operatorname{Prox}_{\lambda f}(\mathbf{x})+\lambda \operatorname{Prox}_{\frac{f^{*}}{\lambda}}(\mathbf{x} / \lambda)$
因为 $\lambda>0$
所以 $\frac{1}{\lambda}f^*\left(\mathbf{y}\right)=f^*\left(\mathbf{y}\right)$
所以 $\operatorname{Prox}_{\frac{f^{*}}{\lambda}}(\mathbf{x} / \lambda)$ 相当于投影到单位范数球里
即
$\operatorname{Prox}_{\frac{f^{*}}{\lambda}}(\mathbf{x} / \lambda)=P_{\|\mathbf{x}\|_*\le 1}\frac{\mathbf{x}}{\lambda}=P_{\|\mathbf{x}\|_*\le \lambda}\mathbf{x}$
所以
$\operatorname{Prox}_{\lambda f}(\mathbf{x})=\mathbf{x}-\lambda P_{\|\mathbf{x}\|_*\le \lambda}\mathbf{x}$

性质

1.最小值点=不动点
设 $\lambda>0$ ,
$\mathbf{x}^*$ 是 $f$ 最小值点当且仅当 $\operatorname{prox}_{\lambda f}\left(\mathbf{x}^*\right)=\mathbf{x}^*$

PPA

Proximal-point algorithm
$\mathbf{x}^{k+1}=\operatorname{prox}_{\lambda f}\left(\mathbf{x}^k\right)$
收敛性：
对于强凸函数， $\operatorname{prox}_{\lambda f}$ 是压缩的
即
$\exists C_0<1$
$\|\operatorname{prox}_{\lambda f}\left(\mathbf{x}\right)-\operatorname{prox}_{\lambda f}\left(\mathbf{y}\right)\|\le C_0\|\mathbf{x}-\mathbf{y}\|$
于是
$\|\mathbf{x}^{k+1}-\mathbf{x}^*\|=\|\operatorname{prox}_{\lambda f}\left(\mathbf{x}^{k}\right)-\operatorname{prox}_{\lambda f}\left(\mathbf{x}^*\right)\|\le C_0\|\mathbf{x}^k-\mathbf{x}^*\|\le C_0^{k+1}\|\mathbf{x}^0-\mathbf{x}^*\|$

对于一般的凸函数， $\operatorname{prox}_{\lambda f}$ 是非扩张的
$\|\operatorname{prox}_{\lambda f}\left(\mathbf{x}\right)-\operatorname{prox}_{\lambda f}\left(\mathbf{y}\right)\|^2\le\|\mathbf{x}-\mathbf{y}\|^2-\|\left(\mathbf{x}-\operatorname{prox}_{\lambda f}\left(\mathbf{x}\right)\right)-\left(\mathbf{y}-\operatorname{prox}_{\lambda f}\left(\mathbf{y}\right)\right)\|$

近端梯度法

设
$f\left(\mathbf{x}\right)=g\left(\mathbf{x}\right)+h\left(\mathbf{x}\right)$
$g$ 是凸的，可微的， $\operatorname{dom}\left(g\right)=\mathbb{R}^n$
$h$ 是凸的，但是不一定可微

如果 $f$ 可微，可以用梯度下降法

梯度下降法是用泰勒展开到二阶，忽略高阶项，用 $\frac{1}{t}\mathbf{I}$ 来代替海森矩阵
即
$\mathbf{x}^+=\arg\min_{\mathbf{z}} \left(f\left(\mathbf{x}\right)+\nabla f\left(\mathbf{x}\right)^T\left(\mathbf{z}-\mathbf{x}\right)+\frac{1}{2t}\|\mathbf{z}-\mathbf{x}\|^2\right)$

对于原问题，如果我们对 $g$ 也泰勒展开近似， $h$ 不变

$\begin{aligned} \mathbf{x}^+&=\arg\min_{\mathbf{z}} \left(g\left(\mathbf{x}\right)+\nabla g\left(\mathbf{x}\right)^T\left(\mathbf{z}-\mathbf{x}\right)+\frac{1}{2t}\|\mathbf{z}-\mathbf{x}\|^2+h\left(\mathbf{z}\right)\right)\\ &=\arg\min_{\mathbf{z}} \left(\nabla g\left(\mathbf{x}\right)^T\left(\mathbf{z}-\mathbf{x}\right)+\frac{1}{2t}\|\mathbf{z}-\mathbf{x}\|^2+h\left(\mathbf{z}\right)\right)\\ &=\arg\min_{\mathbf{z}} \left(\frac{t}{2}\|\nabla g\left(\mathbf{x}\right)\|^2+\nabla g\left(\mathbf{x}\right)^T\left(\mathbf{z}-\mathbf{x}\right)+\frac{1}{2t}\|\mathbf{z}-\mathbf{x}\|^2+h\left(\mathbf{z}\right)\right)\\ &=\arg\min_{\mathbf{z}} \left(\frac{1}{2t}\|\mathbf{z}-\left(\mathbf{x}-t\nabla g\left(\mathbf{x}\right)\right)\|^2+h\left(\mathbf{z}\right)\right)\\ &=\operatorname{prox}_{t\ h}\left(\mathbf{x}-t\nabla g\left(\mathbf{x}\right)\right) \end{aligned}$

我们相当于对 $g$ 梯度下降一次，对 $h$ 做一次近端算子

步骤：
选择初始 $\mathbf{x}^{(0)}$ ,
重复
$\mathbf{x}^{(k+1)}=\operatorname{prox}_{t\ h}\left(\mathbf{x}^{(k)}-t\nabla g\left(\mathbf{x}^{(k)}\right)\right)$

特殊情况：
$h = 0$ ,相当于梯度下降
$h=I_C$ ,相当于投影
$g = 0$ ,相当于PPA

收敛性

假设 $g$ 是凸的，梯度Lipschitz连续( $L > 0$ )
$h$ 是凸的

固定步长 $t$
$f\left(\mathbf{x}^{(k)}\right)-f^*\le\frac{\|\mathbf{x}^{(0)}-\mathbf{x}^*\|^2}{2tk}$
回溯法
$f\left(\mathbf{x}^{(k)}\right)-f^*\le\frac{\|\mathbf{x}^{(0)}-\mathbf{x}^*\|^2}{2t_{min}k}$
$t_{min}=\min\left\{1,\frac{\beta}{L}\right\}$

有 $O\left(\frac{1}{k}\right)$ 的收敛速度

加速

$\mathbf{v}=\mathbf{x}^{(k-1)}+\frac{k-2}{k+1}\left(\mathbf{x}^{(k-1)}-\mathbf{x}^{(k-2)}\right)$
$\mathbf{x}^{(k)}=\operatorname{prox}_{t\ h}\left(\mathbf{v}-t\nabla g\left(\mathbf{v}\right)\right)$
有点像动量法