凸优化简介12

梯度下降加速理论分析

针对凸优化简介11 中最后的算法过程,下面分析 λ k \lambda_k λk趋向于0的速度。

定理:如果 γ 0 ≥ μ \gamma_0\geq \mu γ0μ, 那么 λ k ≤ min ⁡ { ( 1 − μ L ) 2 , 4 L ( 2 L + k γ 0 ) 2 } \lambda_k\leq \min\{(1-\sqrt{\frac{\mu}{L}})^2, \frac{4L}{(2\sqrt{L}+k\sqrt\gamma_0)^2}\} λkmin{(1Lμ )2,(2L +kγ 0)24L}

证明:如果 γ k ≥ μ \gamma_k\geq \mu γkμ,那么 γ k + 1 = L a k 2 = ( 1 − a k ) γ k + a k μ ≥ μ \gamma_{k+1}=La^2_k=(1-a_k)\gamma_k+a_k\mu \geq \mu γk+1=Lak2=(1ak)γk+akμμ,因为定理中有条件 γ 0 ≥ μ \gamma_0\geq \mu γ0μ,所以得到 a k ≥ μ L a_k\geq \sqrt{\frac{\mu}{L}} akLμ 。由凸优化简介11引理2得到 λ k = ∏ i = 0 k − 1 ( 1 − a i ) \lambda_k=\prod\limits_{i=0}^{k-1}(1-a_i) λk=i=0k1(1ai),因此带入 a k ≥ μ L a_k\geq \sqrt{\frac{\mu}{L}} akLμ 得到 λ k ≤ ( 1 − μ L ) k \lambda_k\leq (1-\sqrt{\frac{\mu}{L}})^k λk(1Lμ )k.
b k = 1 λ k b_k=\frac{1}{\sqrt{\lambda_k}} bk=λk 1,因为 { λ k } \{\lambda_k\} {λk}是一个下降序列,因此可以得到:
b k + 1 − b k = λ k − λ k + 1 λ k λ k + 1 = λ k − λ k + 1 λ k λ k + 1 ( λ k + λ k + 1 ) ≥ λ k − λ k + 1 2 λ k λ k + 1 = λ k − ( 1 − a k ) λ k 2 λ k λ k + 1 = a k 2 λ k + 1 ≥ 1 2 γ 0 L \begin{aligned} &b_{k+1}-b_k=\frac{\sqrt{\lambda_k}-\sqrt{\lambda_{k+1}}}{\sqrt{\lambda_k\lambda_{k+1}}}\\ &=\frac{\lambda_k-\lambda_{k+1}}{\sqrt{\lambda_k\lambda_{k+1}}(\sqrt{\lambda_k}+\sqrt{\lambda_{k+1}})}\\ &\geq \frac{\lambda_k-\lambda_{k+1}}{2\lambda_k\sqrt{\lambda_{k+1}}}\\ &=\frac{\lambda_k-(1-a_k)\lambda_k}{2\lambda_k\sqrt{\lambda_{k+1}}}\\ &=\frac{a_k}{2\sqrt{\lambda_{k+1}}}\\ &\geq \frac{1}{2}\sqrt{\frac{\gamma_0}{L}} \end{aligned} bk+1bk=λkλk+1 λk λk+1 =λkλk+1 (λk +λk+1 )λkλk+12λkλk+1 λkλk+1=2λkλk+1 λk(1ak)λk=2λk+1 ak21Lγ0
因此,可以得到 b k ≥ 1 + k 2 γ 0 L b_k\geq 1+\frac{k}{2}\sqrt{\frac{\gamma_0}{L}} bk1+2kLγ0 .

定理:若取 γ 0 = L \gamma_0=L γ0=L,那么这个过程产生的序列 { x k } k = 0 ∞ \{x_k\}^{\infty}_{k=0} {xk}k=0,满足 f ( x k ) − f ∗ ≤ L min ⁡ { ( 1 − μ L ) k , 4 ( k + 1 ) 2 } ∥ x 0 − x ∗ ∥ 2 f(x_k)-f^*\leq L \min \{(1-\sqrt{\frac{\mu}{L}})^k,\frac{4}{(k+1)^2}\}\|x_0-x^*\|^2 f(xk)fLmin{(1Lμ )k,(k+1)24}x0x2. 这说明,对于来自 F μ , L 1 , 1 ( R n ) \mathfrak{F}_{\mu,L}^{1,1}(\mathbb{R}^n) Fμ,L1,1(Rn)的函数, μ ≥ 0 \mu \geq 0 μ0,其无约束最小化是最优的

证明:我们使用 f ( x 0 ) − f ∗ ≤ L 2 ∥ x 0 − x ∗ ∥ 2 f(x_0)-f^*\leq \frac{L}{2}\|x_0-x^*\|^2 f(x0)f2Lx0x2以及前面的定理得到上面的不等式。
下面是凸优化简介11 最后的算法过程的变种,不同之处在于 步长的选择。

  1. 选择 x 0 ∈ R n x_0\in \mathbb{R}^n x0Rn,并且 γ 0 > 0 \gamma_0 > 0 γ0>0,设 v 0 = x 0 v_0=x_0 v0=x0;
  2. 迭代 k k k 次:
    2.1 从等式 L a k 2 = ( 1 − a k ) γ k + a k μ La_k^2=(1-a_k)\gamma_k+a_k\mu Lak2=(1ak)γk+akμ计算得到 a k ∈ ( 0 , 1 ) a_k\in(0,1) ak(0,1),设 γ k + 1 = ( 1 − a k ) γ k + a k μ \gamma_{k+1}=(1-a_k)\gamma_k+a_k\mu γk+1=(1ak)γk+akμ
    2.2 选择 y k = a k γ k v k + γ k + 1 x k γ k + a K μ y_k=\frac{a_k\gamma_kv_k+\gamma_{k+1}x_k}{\gamma_k+a_K\mu} yk=γk+aKμakγkvk+γk+1xk,并计算 f ( y k ) f(y_k) f(yk) ∇ f ( y k ) \nabla f(y_k) f(yk)
    2.3 找到 x k + 1 = y k − 1 L ∇ f ( y k ) x_{k+1}=y_k-\frac{1}{L}\nabla f(y_k) xk+1=ykL1f(yk)
    2.4 设 v k + 1 = ( 1 − a k ) γ k v k + a k μ y k − a k ∇ f ( y k ) γ k + 1 v_{k+1}=\frac{(1-a_k)\gamma_kv_k+a_k\mu y_k-a_k\nabla f(y_k)}{\gamma_{k+1}} vk+1=γk+1(1ak)γkvk+akμykakf(yk)

根据上面算法中的等式,可以尝试消去一些变量。首先消去 v k v_k vk
v k + 1 = 1 γ k + 1 { 1 − a k a k [ ( γ k + a k μ ) y k − γ k + 1 x k ] + a k μ y k − a k ∇ f ( y k ) } = 1 γ k + 1 { ( 1 − a k ) γ k a k y k + μ y k } − 1 − a k a k x k − a k γ k + 1 ∇ f ( y k ) = x k + 1 a k ( y k − x k ) − 1 a k L ∇ f ( y k ) = x k + 1 a k [ ( y k − x k ) − 1 L ∇ f ( y k ) ] = x k + 1 a k ( x k + 1 − x k ) \begin{aligned} &v_{k+1}=\frac{1}{\gamma_{k+1}}\{\frac{1-a_k}{a_k}[(\gamma_k+a_k\mu)y_k-\gamma_{k+1}x_k]+a_k\mu y_k-a_k\nabla f(y_k)\}\\ &=\frac{1}{\gamma_{k+1}}\{\frac{(1-a_k)\gamma_k}{a_k}y_k+\mu y_k\}-\frac{1-a_k}{a_k}x_k-\frac{a_k}{\gamma_{k+1}}\nabla f(y_k)\\ &=x_k+\frac{1}{a_k}(y_k-x_k)-\frac{1}{a_kL}\nabla f(y_k)\\ &=x_k+\frac{1}{a_k}[(y_k-x_k)-\frac{1}{L}\nabla f(y_k)]\\ &=x_k+\frac{1}{a_k}(x_{k+1}-x_k) \end{aligned} vk+1=γk+11{ak1ak[(γk+akμ)ykγk+1xk]+akμykakf(yk)}=γk+11{ak(1ak)γkyk+μyk}ak1akxkγk+1akf(yk)=xk+ak1(ykxk)akL1f(yk)=xk+ak1[(ykxk)L1f(yk)]=xk+ak1(xk+1xk)
因此, y k + 1 = 1 γ k + 1 + a k + 1 μ ( a k + 1 γ k + 1 v k + 1 + γ k + 2 x k + 1 ) = x k + 1 + a k + 1 γ k + 1 ( v k + 1 − x k + 1 ) γ k + 1 + a k + 1 μ = x k + 1 + β k ( x k + 1 − x k ) y_{k+1}=\frac{1}{\gamma_{k+1}+a_{k+1}\mu}(a_{k+1}\gamma_{k+1}v_{k+1}+\gamma_{k+2}x_{k+1})\\ =x_{k+1}+\frac{a_{k+1}\gamma_{k+1}(v_{k+1}-x_{k+1})}{\gamma_{k+1}+a_{k+1}\mu}\\ =x_{k+1}+\beta_{k}(x_{k+1}-x_{k}) yk+1=γk+1+ak+1μ1(ak+1γk+1vk+1+γk+2xk+1)=xk+1+γk+1+ak+1μak+1γk+1(vk+1xk+1)=xk+1+βk(xk+1xk),其中 β k = a k + 1 γ k + 1 ( 1 − a k ) a k ( γ k + 1 + a k + 1 μ ) \beta_{k}=\frac{a_{k+1}\gamma_{k+1}(1-a_k)}{a_k(\gamma_{k+1}+a_{k+1}\mu)} βk=ak(γk+1+ak+1μ)ak+1γk+1(1ak).
接着消去 { v k } \{v_k\} {vk},使用等式 a k 2 L = ( 1 − a k ) γ k + μ a k ≡ γ k + 1 a^2_{k}L=(1-a_k)\gamma_k+\mu a_k\equiv \gamma_{k+1} ak2L=(1ak)γk+μakγk+1。因此, β k = a k + 1 γ k + 1 ( 1 − a k ) a k ( γ k + 1 + a k + 1 μ ) = a k + 1 γ k + 1 ( 1 − a k ) a k ( γ k + 1 + a k + 1 2 L − ( 1 − a k + 1 ) γ k + 1 ) = γ k + 1 ( 1 − a k ) a k ( γ k + 1 + a k + 1 L ) = a k ( 1 − a k ) a k 2 + a k + 1 \beta_k=\frac{a_{k+1}\gamma_{k+1}(1-a_k)}{a_k(\gamma_{k+1}+a_{k+1}\mu)}=\frac{a_{k+1}\gamma_{k+1}(1-a_k)}{a_k(\gamma_{k+1}+a^2_{k+1}L-(1-a_{k+1})\gamma_{k+1})}=\frac{\gamma_{k+1}(1-a_k)}{a_k(\gamma_{k+1}+a_{k+1}L)}=\frac{a_k(1-a_k)}{a^2_{k}+a_{k+1}} βk=ak(γk+1+ak+1μ)ak+1γk+1(1ak)=ak(γk+1+ak+12L(1ak+1)γk+1)ak+1γk+1(1ak)=ak(γk+1+ak+1L)γk+1(1ak)=ak2+ak+1ak(1ak).
因此,上面的算法过程可以写成下面的形式:

  1. 选择 x 0 ∈ R n x_0\in \mathbb{R}^n x0Rn a 0 ∈ ( 0 , 1 ) a_0\in (0,1) a0(0,1)。设置 y 0 = x 0 y_0=x_0 y0=x0, q = μ L q=\frac{\mu}{L} q=Lμ
  2. 迭代 k k k
    2.1 计算 f ( y k ) f(y_k) f(yk) ∇ f ( y k ) \nabla f(y_k) f(yk),设 x k + 1 = y k − 1 L ∇ f ( y k ) x_{k+1}=y_k-\frac{1}{L}\nabla f(y_k) xk+1=ykL1f(yk)
    2.2 从等式 a k + 1 2 = ( 1 − a k + 1 ) a k 2 + q a k + 1 a^2_{k+1}=(1-a_{k+1})a^2_{k}+qa_{k+1} ak+12=(1ak+1)ak2+qak+1计算 a k + 1 ∈ ( 0 , 1 ) a_{k+1}\in (0,1) ak+1(0,1),且设置 β k = a k ( 1 − a k ) a k 2 + a k + 1 \beta_k=\frac{a_k(1-a_k)}{a^2_{k}+a_{k+1}} βk=ak2+ak+1ak(1ak) y k + 1 = x k + 1 + β k ( x k + 1 − x k ) y_{k+1}=x_{k+1}+\beta_k(x_{k+1}-x_k) yk+1=xk+1+βk(xk+1xk).

定理:如果在上面的过程中, a 0 ≥ μ L a_0\geq \sqrt{\frac{\mu}{L}} a0Lμ ,那么 f ( x k ) − f ∗ ≤ min ⁡ { ( 1 − μ L ) k , 4 L ( 2 L + k γ 0 ) 2 } ⋅ [ f ( x 0 − f ∗ + γ 0 2 ∥ x 0 − x ∗ ∥ 2 ) ] f(x_k)-f^*\leq \min\{(1-\sqrt{\frac{\mu}{L}})^k, \frac{4L}{(2\sqrt{L}+k\sqrt{\gamma_0})^2}\}\cdot[f(x_0-f^*+\frac{\gamma_0}{2}\|x_0-x^*\|^2)] f(xk)fmin{(1Lμ )k,(2L +kγ0 )24L}[f(x0f+2γ0x0x2)],其中 γ 0 = a 0 ( a 0 L − μ ) 1 − a 0 \gamma_0=\frac{a_0(a_0L-\mu)}{1-a_0} γ0=1a0a0(a0Lμ)

如果选择 a 0 = μ L a_0=\sqrt{\frac{\mu}{L}} a0=Lμ 对应于选择 γ 0 = μ \gamma_0=\mu γ0=μ,那么算法里面 a k = μ L , β k = L − μ L − μ a_k=\sqrt{\frac{\mu}{L}}, \beta_k=\frac{\sqrt{L}-\sqrt{\mu}}{\sqrt{L}-\sqrt{\mu}} ak=Lμ ,βk=L μ L μ 。那么算法里面,迭代过程中, x k + 1 = y k − 1 L ∇ f ( y k ) x_{k+1}=y_k-\frac{1}{L}\nabla f(y_k) xk+1=ykL1f(yk) y k + 1 = x k + 1 + L − μ L + μ ( x k + 1 − x k ) y_{k+1}=x_{k+1}+\frac{\sqrt{L}-\sqrt{\mu}}{\sqrt{L}+\sqrt{\mu}}(x_{k+1}-x_{k}) yk+1=xk+1+L +μ L μ (xk+1xk). 这种方案就是 Nesterov在1983年提出的。论文链接
此外还有 Polyak 在 1964年提出的heavy-ball方案论文链接
x t + 1 = x t − a ∇ f ( x t ) + β ( x t − x t − 1 ) x_{t+1}=x_t-a\nabla f(x_t)+\beta (x_t-x_{t-1}) xt+1=xtaf(xt)+β(xtxt1),取 a = 4 L + μ , β = ( L − μ L + μ ) 2 a=\frac{4}{\sqrt{L}+\sqrt{\mu}},\beta=(\frac{\sqrt{L}-\sqrt{\mu}}{\sqrt{L}+\sqrt{\mu}})^2 a=L +μ 4,β=(L +μ L μ )2。迭代中的更新规则为 x t + 1 = y t − 4 ( L + μ ) 2 f ( y t ) , y t + 1 = x t + 1 + ( L − μ L + μ ) 2 ( x t + 1 − x t ) x_{t+1}=y_t-\frac{4}{(\sqrt{L}+\sqrt{\mu})^2}f(y_t),y_{t+1}=x_{t+1}+(\frac{\sqrt{L}-\sqrt{\mu}}{\sqrt{L}+\sqrt{\mu}})^2(x_{t+1}-x_t) xt+1=yt(L +μ )24f(yt),yt+1=xt+1+(L +μ L μ )2(xt+1xt).
Beck 和 Teboulle 在 2009年提出的 FISTA方案论文链接
x t + 1 = y t − 1 L ∇ f ( y t ) , y t + 1 = x t + 1 + λ t − 1 λ t ( x t + 1 − x t ) x_{t+1}=y_t-\frac{1}{L}\nabla f(y_t), y_{t+1}=x_{t+1}+\frac{\lambda_t-1}{\lambda_t}(x_{t+1}-x_t) xt+1=ytL1f(yt),yt+1=xt+1+λtλt1(xt+1xt). 并且对于所有的 t ≥ 0 t \geq 0 t0 λ 0 = 0 , λ t + 1 = 1 + 1 + 4 λ t 2 2 \lambda_0=0, \lambda_{t+1}=\frac{1+\sqrt{1+4\lambda_t^2}}{2} λ0=0,λt+1=21+1+4λt2

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值