凸优化学习2

最新推荐文章于 2022-03-10 18:00:32 发布

Nightmare004

最新推荐文章于 2022-03-10 18:00:32 发布

阅读量360

点赞数

分类专栏：数学

本文链接：https://blog.csdn.net/qq_39942341/article/details/121882316

版权

数学专栏收录该内容

143 篇文章 18 订阅

订阅专栏

本文详细探讨了梯度投影法在解决约束优化问题中的应用，特别是在处理闭凸集上的连续可微函数时。首先，定义了稳定点的概念，证明了局部最小值是稳定点。然后，通过定理展示了连续可微凸函数的稳定点是全局最优解。接着，介绍了正交投影算子和充分下降引理，证明了回溯法的下降性质。最后，讨论了稀疏约束问题和IHT方法，证明了IHT产生的序列的收敛性及其稳定点特性。

摘要由CSDN通过智能技术生成

这回的问题是
$\begin{array}{lll} \text { (P) } & \min & f(\mathbf{x}) \\ & \text { s.t. } & x \in C \end{array}$
其中 $f$ 是一个定义在闭凸集 $C$ 上的连续可微函数。

稳定

稳定点定义

设 $f$ 是一个定义在闭凸集 $C$ 上的连续可微函数
设 $\mathbf{x}^*\in C$ ,如果 $\forall \mathbf{x}\in C,\nabla f(\mathbf{x}^*)^T(\mathbf{x}-\mathbf{x}^*)\ge 0$
则称 $\mathbf{x}^*\in C$ 是一个稳定点(stationary point)

定理1

设 $f$ 是一个定义在闭凸集 $C$ 上的连续可微函数
设 $\mathbf{x}^*$ 是（P）一个局部最小值，则 $\mathbf{x}^*$ 是（P）的一个稳定点

证明：
假设 $\mathbf{x}^*$ 不是一个稳定点，则
$\exists x \in C,\nabla f(\mathbf{x}^*)^T(\mathbf{x}-\mathbf{x}^*)<0$
令 $\mathbf{d}=\mathbf{x}-\mathbf{x}^*$ ,
则 $f'(\mathbf{x}^*,\mathbf{d})<0$
所以 $\exists \epsilon>0,\forall t\in (0,\epsilon),f(\mathbf{x}^*+t\mathbf{d})<f(\mathbf{x}^*)$
因为 $C$ 是一个凸集，
所以 $\mathbf{x}^*+t\mathbf{d}=(1-t)\mathbf{x}^*+t\mathbf{x}\in C$
所以 $\mathbf{x}^*$ 不是局部最小点，矛盾

定理2

设 $f$ 是一个定义在闭凸集 $C$ 上的连续可微凸函数
则 $\mathbf{x}^*$ 是一个稳定点当且仅当 $\mathbf{x}^*$ 是（P）的最优解

证明：
必要性和定理1一样
充分性：设 $\mathbf{x}^*$ 是（P）的一个稳定点，设 $\mathbf{x}\in C$ ,则
$f(\mathbf{x})\ge f(\mathbf{x}^*)+\nabla f(\mathbf{x}^*)^T(\mathbf{x}-\mathbf{x}^*)\ge f(\mathbf{x}^*)$
所以 $\mathbf{x}^*$ 是最优解

正交投影算子

投影第二定理

设 $C$ 是一个闭凸集， $\mathbf{x}\in \mathbb{R}^n$ ,
则 $\mathbf{z}=P_C(\mathbf{x})$ 当且仅当
$(\mathbf{x}-\mathbf{z})^T(\mathbf{y}-\mathbf{z})\le 0,\ \forall \mathbf{y}\in C$

证明：
$\mathbf{z}=P_C(\mathbf{x})$ 当且仅当他是
$\begin{array}{ll} \min & g(y) \equiv\|y-x\|^{2} \\ \text { s.t. } & y \in C . \end{array}$
的最优解
根据定理2，他是一个稳定点
所以
$\nabla g(\mathbf{z})^T(\mathbf{y}-\mathbf{z})\ge 0,\ \forall \mathbf{y}\in C$
于是
$(\mathbf{x}-\mathbf{z})^T(\mathbf{y}-\mathbf{z})\le 0,\ \forall \mathbf{y}\in C$

定理3

设 $C$ 是一个闭凸集，则
1. $\forall \mathbf{v},\mathbf{w}\in\mathbb{R}^n$
$\left(P_{C}(\mathbf{v})-P_{C}(\mathbf{w})\right)^{T}(\mathbf{v}-\mathbf{w}) \geq\left\|P_{C}(\mathbf{v})-P_{C}(\mathbf{w})\right\|^{2}$
2. $\forall \mathbf{v},\mathbf{w}\in\mathbb{R}^n$
$\left\|P_{C}(\mathbf{v})-P_{C}(\mathbf{w})\right\| \leq\|\mathbf{v}-\mathbf{w}\|$
证明：
1.
根据投影第二定理
$\left(\mathbf{v}-P_{C}(\mathbf{v})\right)^{T}\left(P_{C}(\mathbf{w})-P_{C}(\mathbf{v})\right) \leq 0\\ \left(\mathbf{w}-P_{C}(\mathbf{w})\right)^{T}\left(P_{C}(\mathbf{v})-P_{C}(\mathbf{w})\right) \leq 0$
加起来得
$\left(P_{C}(\mathbf{w})-P_{C}(\mathbf{v})\right)^{T}\left(\mathbf{v}-\mathbf{w}+P_{C}(\mathbf{w})-P_{C}(\mathbf{v})\right) \leq 0$
进而
$\left(P_{C}(\mathbf{v})-P_{C}(\mathbf{w})\right)^{T}(\mathbf{v}-\mathbf{w}) \geq\left\|P_{C}(\mathbf{v})-P_{C}(\mathbf{w})\right\|^{2}$
2.
如果 $P_{C}(\mathbf{v})=P_{C}(\mathbf{w})$ ，显然成立，
如果 $P_{C}(\mathbf{v})\neq P_{C}(\mathbf{w})$
根据柯西不等式
$\left(P_{C}(\mathbf{v})-P_{C}(\mathbf{w})\right)^{T}(\mathbf{v}-\mathbf{w}) \leq\left\|P_{C}(\mathbf{v})-P_{C}(\mathbf{w})\right\| \cdot\|\mathbf{v}-\mathbf{w}\|,$
于是
$\left\|P_{C}(\mathbf{v})-P_{C}(\mathbf{w})\right\|^{2} \leq\left\|P_{C}(\mathbf{v})-P_{C}(\mathbf{w})\right\| \cdot\|\mathbf{v}-\mathbf{w}\| \ \\ \left\|P_{C}(\mathbf{v})-P_{C}(\mathbf{w})\right\| \leq\|\mathbf{v}-\mathbf{w}\|$

定理4

设 $f$ 是一个定义在闭凸集 $C$ 的连续可微的函数
设 $s > 0$ ，则 $\mathbf{x}^*$ 是问题（P）的稳定点，
当且仅当
$\mathbf{x}^{*}=P_{C}\left(\mathbf{x}^{*}-s \nabla f\left(\mathbf{x}^{*}\right)\right)$
证明：
根据投影第二定理 $\mathbf{x}^{*}=P_{C}\left(\mathbf{x}^{*}-s \nabla f\left(\mathbf{x}^{*}\right)\right)$
当且仅当
$\left(\mathbf{x}^{*}-s \nabla f\left(\mathbf{x}^{*}\right)-\mathbf{x}^{*}\right)^{T}\left(\mathbf{x}-\mathbf{x}^{*}\right) \leq 0\ \forall \mathbf{x}\in C$
进而
$\nabla f\left(\mathbf{x}^{*}\right)^{T}\left(\mathbf{x}-\mathbf{x}^{*}\right) \geq 0, \ \forall \mathbf{x}\in C$
所以是一个稳定点

梯度投影法

随机选一个初始点 $\mathbf{x}_0\in C$
a)用线搜索选择一个步长 $t_k$
b) $\mathbf{x}_{k+1}=P_{C}\left(\mathbf{x}_{k}-t_{k} \nabla f\left(\mathbf{x}_{k}\right)\right)$
c)如果 $\|\mathbf{x}_k-\mathbf{x}_{k+1}\|\le \epsilon$ ，则输出 $\mathbf{x}_{k+1}$

其实这和我们的梯度方法差不都，只不过普通的梯度方法可能会超出这个 $C$ ，所以梯度投影法把他再投影回来

约束问题的充分下降引理

设 $f\in C_{L}^{1,1}(C)$
其中 $C$ 是一个闭凸集
$\forall \mathbf{x}\in C,t\in(0,\frac{2}{L})$ ,
$f(\mathbf{x})-f\left(P_{C}(\mathbf{x}-t \nabla f(\mathbf{x}))\right) \geq t\left(1-\frac{L t}{2}\right)\left\|\frac{1}{t}\left(\mathbf{x}-P_{C}(\mathbf{x}-t \nabla f(\mathbf{x}))\right)\right\|^{2}$
证明：
令 $\mathbf{x}^{+}=P_{C}(\mathbf{x}-t \nabla f(\mathbf{x}))$
根据下降引理
$f\left(\mathbf{x}^{+}\right) \leq f(\mathbf{x})+\left\langle\nabla f(\mathbf{x}), \mathbf{x}^{+}-\mathbf{x}\right\rangle+\frac{L}{2}\left\|\mathbf{x}-\mathbf{x}^{+}\right\|^{2}$
根据投影第二定理
$\begin{aligned} \left\langle\mathbf{x}-t \nabla f(\mathbf{x})-\mathbf{x}^{+}, \mathbf{x}-\mathbf{x}^{+}\right\rangle &\leq 0\\ \left\langle\nabla f(\mathbf{x}), \mathbf{x}^{+}-\mathbf{x}\right\rangle &\leq-\frac{1}{t}\left\|\mathbf{x}^{+}-\mathbf{x}\right\|^{2} \end{aligned}$
结合一下
$f(\mathbf{x})-f\left(P_{C}(\mathbf{x}-t \nabla f(\mathbf{x}))\right) \geq t\left(1-\frac{L t}{2}\right)\left\|\frac{1}{t}\left(\mathbf{x}-P_{C}(\mathbf{x}-t \nabla f(\mathbf{x}))\right)\right\|^{2}$

梯度映射定义

$G_{M}(\mathbf{x})=M\left[\mathbf{x}-P_{C}\left(\mathbf{x}-\frac{1}{M} \nabla f(\mathbf{x})\right)\right]$
其中 $M > 0$

所以下降引理也可以写成
$f(\mathbf{x})-f\left(P_{C}(\mathbf{x}-t \nabla f(\mathbf{x}))\right) \geq t\left(1-\frac{L t}{2}\right)\left\|G_{\frac{1}{t}}(\mathbf{x})\right\|^{2}$

定理6

设 $f$ 是定义在闭凸集 $C$ 上的连续可微函数
设 $L_1\ge L_2$ ,则 $\forall \mathbf{x}\in \mathbb{R}^n$
$\left\|G_{L_{1}}(\mathbf{x})\right\| \geq\left\|G_{L_{2}}(\mathbf{x})\right\|$
或者
$\frac{\left\|G_{L_{1}}(\mathbf{x})\right\|}{L_{1}} \leq \frac{\left\|G_{L_{2}}(\mathbf{x})\right\|}{L_{2}}$

证明：利用投影第二定理
$\forall \mathbf{v}\in \mathbb{R}^n,\mathbf{w}\in C$
$\left\langle\mathbf{v}-P_{C}(\mathbf{v}), P_{C}(\mathbf{v})-\mathbf{w}\right\rangle \geq 0$
令 $\mathbf{v}=\mathbf{x}-\frac{1}{L_{1}} \nabla f(\mathbf{x}),\mathbf{w}=P_{C}\left(\mathbf{x}-\frac{1}{L_{2}} \nabla f(\mathbf{x})\right)$
于是
$\left\langle\mathrm{x}-\frac{1}{L_{1}} \nabla f(\mathrm{x})-P_{C}\left(\mathrm{x}-\frac{1}{L_{1}} \nabla f(\mathrm{x})\right), P_{C}\left(\mathrm{x}-\frac{1}{L_{1}} \nabla f(\mathrm{x})\right)-P_{C}\left(\mathrm{x}-\frac{1}{L_{2}} \nabla f(\mathrm{x})\right)\right\rangle \geq 0$
或者
$\left\langle\frac{1}{L_{1}} G_{L_{1}}(\mathbf{x})-\frac{1}{L_{1}} \nabla f(\mathbf{x}), \frac{1}{L_{2}} G_{L_{2}}(\mathbf{x})-\frac{1}{L_{1}} G_{L_{1}}(\mathbf{x})\right\rangle \geq 0$
轮换一下 $L_1,L_2$
$\left\langle\frac{1}{L_{2}} G_{L_{2}}(\mathbf{x})-\frac{1}{L_{2}} \nabla f(\mathbf{x}), \frac{1}{L_{1}} G_{L_{1}}(\mathbf{x})-\frac{1}{L_{2}} G_{L_{2}}(\mathbf{x})\right\rangle \geq 0$
$1)*L_1+(2)*L_2$ 得
$\left\langle G_{L_{1}}(\mathbf{x})-G_{L_{2}}(\mathbf{x}), \frac{1}{L_{2}} G_{L_{2}}(\mathbf{x})-\frac{1}{L_{1}} G_{L_{1}}(\mathbf{x})\right\rangle \geq 0,$
整理得
$\frac{1}{L_{1}}\left\|G_{L_{1}}(\mathbf{x})\right\|^{2}+\frac{1}{L_{2}}\left\|G_{L_{2}}(\mathbf{x})\right\|^{2} \leq\left(\frac{1}{L_{1}}+\frac{1}{L_{2}}\right) G_{L_{1}}(\mathbf{x})^{T} G_{L_{2}}(\mathbf{x})$
利用柯西不等式
$\frac{1}{L_{1}}\left\|G_{L_{1}}(\mathbf{x})\right\|^{2}+\frac{1}{L_{2}}\left\|G_{L_{2}}(\mathbf{x})\right\|^{2} \leq\left(\frac{1}{L_{1}}+\frac{1}{L_{2}}\right)\left\|G_{L_{1}}(\mathbf{x})\right\| \cdot\left\|G_{L_{2}}(\mathbf{x})\right\|$
如果 $G_{L_2}(\mathbf{x})=0$ ,则结论成立
如果 $G_{L_2}(\mathbf{x})\neq 0$
令 $t=\frac{\left\|G_{L_{1}}(\mathrm{x})\right\|}{\left\|G_{L_{2}}(\mathrm{x})\right\|}$
则
$\begin{aligned} \frac{1}{L_1}t^2-(\frac{1}{L_1}+\frac{1}{L_2})t+\frac{1}{L_2}&\le 0\\ (t-1)(\frac{1}{L_1}t-\frac{1}{L_2})&\le 0\\ 1\le t\le\frac{L_1}{L_2} \end{aligned}$
于是
$\left\|G_{L_{2}}(\mathbf{x})\right\| \leq\left\|G_{L_{1}}(\mathbf{x})\right\| \leq \frac{L_{1}}{L_{2}}\left\|G_{L_{2}}(\mathbf{x})\right\| .$

回溯法

设参数 $(s,\alpha,\beta)$ ，其中 $s>0,\alpha\in(0,1),\beta \in (0,1)$
一开始 $t_k=s$
当
$f\left(\mathbf{x}_{k}\right)-f\left(P_{C}\left(\mathbf{x}_{k}-t_{k} \nabla f\left(\mathbf{x}_{k}\right)\right)\right)<\alpha t_{k}\left\|G_{\frac{1}{t}}\left(\mathbf{x}_{k}\right)\right\|^{2}$
时，
令 $t_k=\beta t_k$
也就是说 $t_k=s\beta^{i_k}$ ,其中 $i_k$ 是满足
$f\left(\mathbf{x}_{k}\right)-f\left(P_{C}\left(\mathbf{x}_{k}-s \beta^{i_{k}} \nabla f\left(\mathbf{x}_{k}\right)\right)\right) \geq \alpha s \beta^{i_{k}}\left\|G_{\frac{1}{s \beta^{i} k}}\left(\mathbf{x}_{k}\right)\right\|^{2}$
的最小非负整数

如果 $t_k=s$ 满足条件
显然 $t_k\ge s$
如果不满足，则 $t_k=s\beta^{i_k}$ 时才满足条件
也就是说 $\frac{t_k}{\beta}$ 是不满足条件的
即
$f\left(\mathbf{x}_{k}\right)-f\left(P_{C}\left(\mathbf{x}_{k}-\frac{t_k}{\beta} \nabla f\left(\mathbf{x}_{k}\right)\right)\right)<\alpha t_{k}\left\|G_{\frac{\beta}{t_k}}\left(\mathbf{x}_{k}\right)\right\|^{2}$

$x=x_k,t=\frac{t_k}{\beta}$ 代入充分下降引理得
$f\left(\mathbf{x}_{k}\right)-f (P_{C}\left(\mathbf{x}_{k}-\frac{t_k}{\beta} \nabla f\left(\mathbf{x}_{k}\right)\right) \geq \frac{t_k}{\beta}\left(1-\frac{L t_k}{2\beta}\right)\left\|G_{\frac{\beta}{t_k}}\left(\mathbf{x}_{k}\right)\right\|^{2}$
取这两个式子的右边，得
$\frac{t_k}{\beta}\left(1-\frac{L t_k}{2\beta}\right)\left\|G_{\frac{\beta}{t_k}}\left(\mathbf{x}_{k}\right)\right\|^{2} <\alpha t_{k}\left\|G_{\frac{\beta}{t_k}}\left(\mathbf{x}_{k}\right)\right\|^{2} \Rightarrow t_k\ge \frac{2(1-\alpha)\beta}{L}$
综上所述
$t_{k} \geq \min \left\{s, \frac{2(1-\alpha) \beta}{L}\right\}$

引理1

考虑问题（P）
设 $C$ 是一个闭凸集
$f\in C_L^{1,1}(C)$ 有下界
设 $\{\mathbf{x}_k\}_{k\ge 0}$ 是利用梯度投影法产生的序列，
其中步长用的要么是固定步长 $t_k=\bar{t}\in(0,\frac{2}{L})$ ，要么是回溯法（其中参数 $(s,\alpha,\beta)$ 满足 $s>0,\alpha\in(0,1),\beta\in(0,1)$ ）
那么 $\forall k\ge 0$
$f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right) \geq M\left\|G_{d}\left(\mathbf{x}_{k}\right)\right\|^{2}$
其中
$\begin{cases}\bar{t}\left(1-\frac{\bar{t} L}{2}\right) & \text { 固定步长 } \\ \alpha \min \left\{s, \frac{2(1-\alpha) \beta}{L}\right\} & \text { 回溯法 }\end{cases}$
且
$\begin{cases}1 / \bar{t} & \text { 固定步长 } \\ 1 / s & \text { 回溯法 }\end{cases}$

证明：
固定步长：直接代入，成立

回溯法：
因为回溯法， $t_k$ 其实是越来越小的，所以， $t_k\le s$
根据定理6
$\left\|G_{1 / t_{k}}\left(\mathbf{x}_{k}\right)\right\| \geq\left\|G_{1 / s}\left(\mathbf{x}_{k}\right)\right\|$
所以
$f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right) \geq \alpha t_k\left\|G_{\frac{1}{t_k}}\left(\mathbf{x}_{k}\right)\right\|^{2}\ge \alpha \min \left\{s, \frac{2(1-\alpha) \beta}{L}\right\}\left\|G_{1 / s}\left(\mathbf{x}_{k}\right)\right\|^2$
成立

收敛性

考虑问题（P）
设 $C$ 是一个闭凸集
$f\in C_L^{1,1}(C)$ 有下界
设 $\{\mathbf{x}_k\}_{k\ge 0}$ 是利用梯度投影法产生的序列，
其中步长用的要么是固定步长 $t_k=\bar{t}\in(0,\frac{2}{L})$ ，要么是回溯法（其中参数 $(s,\alpha,\beta)$ 满足 $s>0,\alpha\in(0,1),\beta\in(0,1)$ ），那么
(a)序列 $\{f(\mathbf{x}_k)\}$ 单调不增，并且 $f(\mathbf{x}_{k+1})<f(\mathbf{x}_k)$ ,除非 $\mathbf{x}_k$ 是一个稳定点
(b)当 $k\to \infty$ 时， $G_d(\mathbf{x}_k)\to 0$
其中 $\begin{cases}1 / \bar{t} & \text { 固定步长 } \\ 1 / s & \text { 回溯法 }\end{cases}$

证明：
(a)
由引理
$f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right) \geq M\left\|G_{d}\left(\mathbf{x}_{k}\right)\right\|^{2}\ge 0$
所以单调不增
当且仅当 $G_{d}\left(\mathbf{x}_{k}\right)=0$ 时取等
也就是说 $\mathbf{x}_k$ 是一个稳定点

(b)
因为 $\{f(\mathbf{x}_k)\}$ 单调递减有下界，收敛
所以 $k\to \infty,f(\mathbf{x}_k)-f(\mathbf{x}_{k+1})\to 0$
于是当 $k\to \infty$ 时， $G_d(\mathbf{x}_k)\to 0$

收敛速率

在上面的条件下
设 $f^*$ 是序列 $\{f(\mathbf{x}_k)\}$ 的极限，那么 $\forall n=0,1,2,\cdots$
$\min _{k=0,1, \ldots, n}\left\|G_{d}\left(\mathbf{x}_{k}\right)\right\| \leq \sqrt{\frac{f\left(\mathbf{x}_{0}\right)-f^{*}}{M(n+1)}}$
其中
$\begin{cases}\bar{t}\left(1-\frac{\bar{t} L}{2}\right) & \text { 固定步长 } \\ \alpha \min \left\{s, \frac{2(1-\alpha) \beta}{L}\right\} & \text { 回溯法 }\end{cases}$
且
$\begin{cases}1 / \bar{t} & \text { 固定步长 } \\ 1 / s & \text { 回溯法 }\end{cases}$

证明：
$\begin{aligned} \sum_{k=0}^{n}\left(f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right)\right) &\geq \sum_{k=0}^{n}M\left\|G_{d}\left(\mathbf{x}_{k}\right)\right\|^{2}\\ f\left(\mathbf{x}_{0}\right)-f\left(\mathbf{x}_{n+1}\right) &\geq M \sum_{k=0}^{n}\left\|G_{d}\left(\mathbf{x}_{k}\right)\right\|^{2} \end{aligned}$
又因为 $f(\mathbf{x}_{n+1})\ge f^*$
所以
$f\left(\mathbf{x}_{0}\right)-f^*\ge f\left(\mathbf{x}_{0}\right)-f\left(\mathbf{x}_{n+1}\right) \geq M \sum_{k=0}^{n}\left\|G_{d}\left(\mathbf{x}_{k}\right)\right\|^{2}$
又因为
$\sum_{k=0}^{n}\left\|G_{d}\left(\mathbf{x}_{k}\right)\right\|^{2}\ge (n+1) \min _{k=0,1, \ldots, n}\left\|G_{d}\left(\mathbf{x}_{k}\right)\right\|^{2}$
所以
$f\left(\mathbf{x}_{0}\right)-f^*\ge M(n+1) \min _{k=0,1, \ldots, n}\left\|G_{d}\left(\mathbf{x}_{k}\right)\right\|^{2}$
进而
$\min _{k=0,1, \ldots, n}\left\|G_{d}\left(\mathbf{x}_{k}\right)\right\| \leq \sqrt{\frac{f\left(\mathbf{x}_{0}\right)-f^{*}}{M(n+1)}}$

凸函数

$\begin{array}{lll} \text { (P) } & \min & f(\mathbf{x}) \\ & \text { s.t. } & x \in C \end{array}$
其中 $f$ 是一个定义在闭凸集 $C$ 上的连续可微函数。
设 $\{\mathbf{x}_k\}_{k\ge 0}$ 是梯度投影法用固定步长 $\bar{t}\in (0,\frac{1}{L}]$ 产生的序列。
假设 $X^*$ 是最优解集，并且非空， $f^*$ 是最优解，则
(a) $\forall k\ge 0,\mathbf{x}^*\in X^*$
$\bar{t}\left(f\left(\mathbf{x}_{k+1}\right)-f\left(\mathbf{x}^{*}\right)\right) \leq\left\|\mathbf{x}_{k}-\mathbf{x}^{*}\right\|^{2}-\left\|\mathbf{x}_{k+1}-\mathbf{x}^{*}\right\|^{2}$
(b) $\forall n\ge 0$
$f\left(\mathbf{x}_{n}\right)-f^{*} \leq \frac{\left\|\mathbf{x}_{0}-\mathbf{x}^{*}\right\|^{2}}{2 \bar{t} n}$

证明：
(a)
由下降引理
$f\left(\mathbf{x}_{k+1}\right) \leq f\left(\mathbf{x}_{k}\right)+\left\langle\nabla f\left(\mathbf{x}_{k}\right), \mathbf{x}_{k+1}-\mathbf{x}_{k}\right\rangle+\frac{L}{2}\left\|\mathbf{x}_{k}-\mathbf{x}_{k+1}\right\|^{2}$
根据凸函数的一节条件
$f(\mathbf{x}_k)\le f(\mathbf{x}^*)+\left\langle \nabla f(\mathbf{x}^*),\mathbf{x}_k-\mathbf{x}^* \right\rangle$
有
$f\left(\mathbf{x}_{k+1}\right) \leq f\left(\mathbf{x}^{*}\right)+\left\langle\nabla f\left(\mathbf{x}_{k}\right), \mathbf{x}_{k}-\mathbf{x}^{*}\right\rangle+\left\langle\nabla f\left(\mathbf{x}_{k}\right), \mathbf{x}_{k+1}-\mathbf{x}_{k}\right\rangle+\frac{L}{2}\left\|\mathbf{x}_{k}-\mathbf{x}_{k+1}\right\|^{2} .$
根据第二投影定理
$\left\langle\mathbf{x}_{k}-\bar{t} \nabla f\left(\mathbf{x}_{k}\right)-\mathbf{x}_{k+1}, \mathbf{x}^{*}-\mathbf{x}_{k+1}\right\rangle \leq 0$
于是
$\left\langle\nabla f\left(\mathbf{x}_{k}\right), \mathbf{x}_{k+1}-\mathbf{x}^{*}\right\rangle \leq \frac{1}{\bar{t}}\left\langle\mathbf{x}_{k}-\mathbf{x}_{k+1}, \mathbf{x}_{k+1}-\mathbf{x}^{*}\right\rangle$
因为 $\bar{t}\le \frac{1}{L}$
$\begin{aligned} f\left(\mathbf{x}_{k+1}\right) & \leq f\left(\mathbf{x}^{*}\right)+\left\langle\nabla f\left(\mathbf{x}_{k}\right), \mathbf{x}_{k}-\mathbf{x}^{*}\right\rangle+\left\langle\nabla f\left(\mathbf{x}_{k}\right), \mathbf{x}_{k+1}-\mathbf{x}_{k}\right\rangle+\frac{L}{2}\left\|\mathbf{x}_{k}-\mathbf{x}_{k+1}\right\|^{2} \\ &=f\left(\mathbf{x}^{*}\right)+\left\langle\nabla f\left(\mathbf{x}_{k}\right), \mathbf{x}_{k+1}-\mathbf{x}^{*}\right\rangle+\frac{L}{2}\left\|\mathbf{x}_{k}-\mathbf{x}_{k+1}\right\|^{2} \\ & \leq f\left(\mathbf{x}^{*}\right)+\frac{1}{\bar{t}}\left\langle\mathbf{x}_{k}-\mathbf{x}_{k+1}, \mathbf{x}_{k+1}-\mathbf{x}^{*}\right\rangle+\frac{L}{2}\left\|\mathbf{x}_{k}-\mathbf{x}_{k+1}\right\|^{2} \\ & \leq f\left(\mathbf{x}^{*}\right)+\frac{1}{\bar{t}}\left\langle\mathbf{x}_{k}-\mathbf{x}_{k+1}, \mathbf{x}_{k+1}-\mathbf{x}^{*}\right\rangle+\frac{1}{2 \bar{t}}\left\|\mathbf{x}_{k}-\mathbf{x}_{k+1}\right\|^{2} \\ &=f\left(\mathbf{x}^{*}\right)+\frac{1}{2 \bar{t}}\left(\left\|\mathbf{x}_{k}-\mathbf{x}^{*}\right\|^{2}-\left\|\mathbf{x}_{k+1}-\mathbf{x}^{*}\right\|^{2}\right) \end{aligned}$
(b)
对(a)的结论进行累加
$\begin{aligned} \sum_{k=0}^{n-1}\left( \left\|\mathbf{x}_{k+1}-\mathbf{x}^{*}\right\|^{2}-\left\|\mathbf{x}_{k}-\mathbf{x}^{*}\right\|^{2}\right) &\leq \sum_{k=0}^{n-1}2 \bar{t}\left(f\left(\mathbf{x}^{*}\right)-f\left(\mathbf{x}_{k+1}\right)\right)\\ \left\|\mathbf{x}_{n}-\mathbf{x}^{*}\right\|^{2}-\left\|\mathbf{x}_{0}-\mathbf{x}^{*}\right\|^{2} &\leq 2n \bar{t}\left(f\left(\mathbf{x}^{*}\right)-f\left(\mathbf{x}_{n}\right)\right) \end{aligned}$
因此
$f\left(\mathbf{x}_{n}\right)-f^{*} \leq \frac{\left\|\mathbf{x}_{0}-\mathbf{x}^{*}\right\|^{2}-\left\|\mathbf{x}_{n}-\mathbf{x}^{*}\right\|^{2}}{2 \bar{t} n} \leq \frac{\left\|\mathbf{x}_{0}-\mathbf{x}^{*}\right\|^{2}}{2 \bar{t} n}$

引理2

在刚才的条件下
产生的序列 $\{\mathbf{x}_k\}_{k\ge 0}$ ，对于 $\forall \mathbf{x}^*\in X^*,k\ge 0$ ,有
$\left\|\mathrm{x}_{k+1}-\mathrm{x}^{*}\right\| \leq\left\|\mathrm{x}_{k}-\mathrm{x}^{*}\right\|$
证明：
利用(a)的结论，显然成立

定理7

在刚才的条件下
设 $\{\mathbf{x}_k\}_{k\ge 0}$ 是梯度投影法用固定步长 $\bar{t}\in (0,\frac{1}{L}]$ 产生的序列，
则这个序列收敛到最优解

证明：
摸了

稀疏约束问题

$\begin{array}{lll} & \min & f(\mathbf{x}) \\ \text { (S) } & \text { s.t. } & \|\mathbf{x}\|_{0} \leq s \end{array}$
其中 $f:\mathbb{R}^n\to \mathbb{R}$ 是一个连续可微的有下界的函数，并且梯度Lipschitz常数为 $L_f$ ， $s > 0$
$\|\mathbf{x}\|_{0}=\left|\left\{i: x_{i} \neq 0\right\}\right| .$

令
$I_{1}(\mathbf{x}) \equiv\left\{i: x_{i} \neq 0\right\}$
$I_{0}(\mathbf{x}) \equiv\left\{i: x_{i} = 0\right\}$
$C_{s}=\left\{\mathbf{x}:\|\mathbf{x}\|_{0} \leq s\right\}$
于是问题(S)可以写成
$\min\left\{f(\mathbf{x}):\mathbf{x}\in C_s\right\}$
接着
$M_i(\mathbf{x})$ 表示 $\mathbf{x}$ 分量中绝对值最大的
现在我们来解这个问题

L-稳定

前面正交投影算子要求是一个凸集，但是这个 $C_s$ 并不是一个凸集，所以，最优解不唯一。
$P_{C_{s}}(\mathbf{x})=\operatorname{argmin}_{\mathbf{y}}\left\{\|\mathbf{y}-\mathbf{x}\|: \mathbf{y} \in C_{s}\right\}$
因为目标函数依然是强制函数， $C_s$ 是闭集，所以一定是有解的

L-稳定点

$\mathbf{x}^*\in C_s$ 如果满足
$\left[\mathrm{NC}_{L}\right] \quad \mathrm{x}^{*} \in P_{\mathrm{C}_{s}}\left(\mathrm{x}^{*}-\frac{1}{L} \nabla f\left(\mathrm{x}^{*}\right)\right)$
则称为问题(S)的L-稳定点(L-stationarity point)

引理3

$\forall L>0,\mathbf{x}^* \in C_s$ ,满足 $\left[NC_L\right]$ 当且仅当
$\left|\frac{\partial f}{\partial x_{i}}\left(\mathbf{x}^{*}\right)\right| \begin{cases}\leq L M_{s}\left(\mathbf{x}^{*}\right) & \text { if } i \in I_{0}\left(\mathbf{x}^{*}\right), \\ =0 & \text { if } i \in I_{1}\left(\mathbf{x}^{*}\right) .\end{cases}$

证明：
必要性： $\mathbf{x}^*$ 满足 $\left[NC_L\right]$
注意到 $P_{\mathrm{C}_{s}}\left(\mathrm{x}^{*}-\frac{1}{L} \nabla f\left(\mathrm{x}^{*}\right)\right)$ 中的分量要么为0，要么为 $\mathbf{x}_j-\frac{1}{L}\frac{\partial f}{\partial x_j}(\mathbf{x}^*)$
因为 $\mathrm{x}^{*} \in P_{\mathrm{C}_{s}}\left(\mathrm{x}^{*}-\frac{1}{L} \nabla f\left(\mathrm{x}^{*}\right)\right)$
所以如果 $\in I_{1}\left(\mathbf{x}^{*}\right)$ ,则
$x_{i}^{*}=x_{i}^{*}-\frac{1}{L} \frac{\partial f}{\partial x_{i}}\left(\mathbf{x}^{*}\right)$
如果 $\in I_{0}\left(\mathbf{x}^{*}\right)$ ,则 $\left|x_{i}^{*}-\frac{1}{L} \frac{\partial f}{\partial x_{i}}\left(\mathbf{x}^{*}\right)\right| \leq M_{s}\left(\mathbf{x}^{*}\right)$
又因为 $x_i^*=0$ ，所以 $\left| \frac{\partial f}{\partial x_{i}}\left(\mathbf{x}^{*}\right)\right| \leq LM_{s}\left(\mathbf{x}^{*}\right)$

充分性：
如果 $\|\mathbf{x}^*\|_0<s$ ,则 $M_s(\mathbf{x}^*)=0$ ,进而 $\nabla f(\mathbf{x}^*)=0$
这种情况 $P_{\mathrm{C}_{s}}\left(\mathrm{x}^{*}-\frac{1}{L} \nabla f\left(\mathrm{x}^{*}\right)\right)=P_{C_s}(\mathbf{x}^*)=\left\{\mathbf{x}^*\right\}$

如果 $\|\mathbf{x}^*\|=s$ ,则
$\left|x_{i}^{*}-\frac{1}{L} \frac{\partial f}{\partial x_{i}}\left(\mathbf{x}^{*}\right)\right| \begin{cases}=\left|x_{i}^{*}\right|, & i \in I_{1}\left(\mathbf{x}^{*}\right), \\ \leq M_{s}\left(\mathbf{x}^{*}\right), & i \in I_{0}\left(\mathbf{x}^{*}\right) .\end{cases}$

引理4

假设 $f\in C_{L_f}^{1,1}(\mathbb{R}^n),L>L_f$ ,则对于 $\forall \mathbf{x}\in C_s,y\in\mathbb{R}^n$ ,且满足
$\mathrm{y} \in P_{C_{s}}\left(\mathrm{x}-\frac{1}{L} \nabla f(\mathrm{x})\right)$
有
$f(\mathbf{x})-f(\mathbf{y}) \geq \frac{L-L_{f}}{2}\|\mathbf{x}-\mathbf{y}\|^{2}$

证明：
因为
$\mathrm{y} \in \operatorname{argmin}_{\mathrm{z} \in \mathrm{C}_{s}}\left\|\mathrm{z}-\left(\mathrm{x}-\frac{1}{L} \nabla f(\mathrm{x})\right)\right\|^{2}$
$\begin{aligned} h_{L}(\mathbf{z}, \mathbf{x}) &=f(\mathbf{x})+\langle\nabla f(\mathbf{x}), \mathbf{z}-\mathbf{x}\rangle+\frac{L}{2}\|\mathbf{z}-\mathbf{x}\|^{2} \\ &=\frac{L}{2}\left\|\mathbf{z}-\left(\mathbf{x}-\frac{1}{L} \nabla f(\mathbf{x})\right)\right\|^{2}+\underbrace{f(\mathbf{x})-\frac{1}{2 L}\|\nabla f(\mathbf{x})\|^{2}}_{\text {constant w.r.t. } z}, \end{aligned}$
所以
$\mathrm{y} \in \operatorname{argmin}_{\mathrm{z} \in \mathrm{C}_{s}} h_{L}(\mathrm{z}, \mathrm{x})$
所以
$h_{L}(\mathbf{y}, \mathbf{x}) \leq h_{L}(\mathbf{x}, \mathbf{x})=f(\mathbf{x})$
根据下降引理
$f(\mathbf{x})-f(\mathbf{y}) \geq f(\mathbf{x})-h_{L_{f}}(\mathbf{y}, \mathbf{x})$
根据
$h_{L_{f}}(\mathbf{x}, \mathbf{y})=h_{L}(\mathbf{x}, \mathbf{y})-\frac{L-L_{f}}{2}\|\mathbf{x}-\mathbf{y}\|^{2}$
有
$f(\mathbf{x})-f(\mathbf{y}) \geq \frac{L-L_{f}}{2}\|\mathbf{x}-\mathbf{y}\|^{2}$

定理8

设 $f\in C_{L_f}^{1,1}(\mathbb{R}^n),L>L_f$
设 $\mathbf{x}^*$ 是问题(S)的最优解，则
(i) $\mathbf{x}^*$ 是一个L-稳定点
(ii) $P_{\mathrm{C}_{s}}\left(\mathrm{x}^{*}-\frac{1}{L} \nabla f\left(\mathrm{x}^{*}\right)\right)$ 只有一个元素
证明：
假设 $\mathbf{y}\in P_{\mathrm{C}_{s}}\left(\mathrm{x}^{*}-\frac{1}{L} \nabla f\left(\mathrm{x}^{*}\right)\right)$
且 $\mathbf{x}^*\neq \mathbf{y}$
根据引理3
$f\left(\mathbf{x}^{*}\right)-f(\mathbf{y}) \geq \frac{L-L_{f}}{2}\left\|\mathbf{x}^{*}-\mathbf{y}\right\|^{2}$
与 $\mathbf{x}^*$ 是最优解矛盾

IHT方法

输入 $L>L_f$
选择起点 $\mathbf{x}_0\in C_s$
迭代 $\mathbf{x}^{k+1}\in P_{C_s}(\mathbf{x}^k-\frac{1}{L}\nabla f(\mathbf{x}^k)),\quad k=0,1,\cdots$

引理5

设 $f\in C_{L_f}^{1,1}(\mathbb{R}^n)$ ,且有下界
设 $\left\{\mathbf{x}^k\right\}_{k\ge 0}$ 是IHT方法用固定步长 $\frac{1}{L}$ 产生的序列，则
(a) $f\left(\mathrm{x}^{k}\right)-f\left(\mathrm{x}^{k+1}\right) \geq \frac{L-L_{f}}{2}\left\|\mathrm{x}^{k}-\mathrm{x}^{k+1}\right\|^{2}$
(b) $\left\{f\left(\mathbf{x}^{k}\right)\right\}_{k \geq 0}$ 单调不增
$(c)$ $\left\|\mathbf{x}^{k}-\mathbf{x}^{k+1}\right\| \rightarrow 0$
(d)对于 $\forall k=0,1,2,\cdots$ ,如果 $\mathbf{x}^k\neq \mathbf{x}^{k+1}$ ,则 $f\left(\mathbf{x}^{k+1}\right)<f\left(\mathbf{x}^{k}\right)$

证明：
(a)由引理4显然成立
(b)由(a)，显然
$(c)$ $\left\{f\left(\mathbf{x}^{k}\right)\right\}_{k \geq 0}$ 单调递减有下界，收敛，
所以由(a)成立
(d)显然

定理8

设 $\left\{\mathbf{x}^k\right\}_{k\ge 0}$ 是IHT方法用固定步长 $\frac{1}{L}$ 产生的序列
其中 $L>L_f$
则 $\left\{\mathbf{x}^k\right\}_{k\ge 0}$ 的任何聚点都是L-稳定点

证明：摸了

Nightmare004

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
凸优化学习2

这回的问题是 (P) min⁡f(x) s.t. x∈C\begin{array}{lll}\text { (P) } & \min & f(\mathbf{x}) \\& \text { s.t. } & x \in C\end{array} (P) min s.t. f(x)x∈C其中fff是一个定义在闭凸集CCC上的连续可微函数。稳定稳定点定义设fff是一个定义在闭
复制链接

扫一扫