关于近端梯度下降的理解

近端梯度下降

目标函数

min ⁡ x Q ( x ) = f ( x ) + g ( x ) (1) {\min}_{x}Q(x) = f(x)+g(x) \tag {1} minxQ(x)=f(x)+g(x)(1)
f ( x ) f(x) f(x)为可微凸函数, g ( x ) g(x) g(x)非光滑, x ∈ H x \in {H} xH

考虑 f ( x ) f(x) f(x)

梯度下降
x k + 1 = x k − α ∇ f ( x ) (2) x_{k+1} = x_{k} - \alpha \nabla{f(x)} \tag {2} xk+1=xkαf(x)(2)

f ( x ) f(x) f(x) x k x_{k} xk处泰勒展开
f ( x k + 1 ) ≈ f ( x k ) + ∇ f ( x k ) ( x k + 1 − x k ) (3) f(x_{k+1}) \approx f(x_{k}) + \nabla{f(x_{k})} (x_{k+1}-x_{k}) \tag {3} f(xk+1)f(xk)+f(xk)(xk+1xk)(3)

式(2) + (3)
f ( x k + 1 ) = f ( x k ) − α ( ∇ f ( x k ) ) 2 f ( x k + 1 ) < f ( x k ) (4) \begin{aligned} f(x_{k+1}) = f(x_{k}) - \alpha (\nabla{f(x_{k})})^{2} \\ f(x_{k+1}) < f(x_{k}) \tag {4} \end{aligned} f(xk+1)=f(xk)α(f(xk))2f(xk+1)<f(xk)(4)

f ( x ) f(x) f(x)二阶泰勒展开
f ^ ( x ; x k ) ≈ f ( x k ) + ∇ f ( x k ) ( x − x k ) + ∇ 2 f ( x k ) 2 ( x − x k ) 2 (5) \widehat{f}(x;x_{k}) \approx f(x_{k}) + \nabla{f(x_{k})} (x-x_{k}) + \frac{\nabla^2 {f(x_{k})}}{2} (x-x_{k})^2 \tag {5} f (x;xk)f(xk)+f(xk)(xxk)+22f(xk)(xxk)2(5)

∇ f ( x k ) \nabla f(x_{k}) f(xk)满足L-Lipschitz条件
∣ ∇ f ( x ) − ∇ f ( x k ) ∣ ≤ L ∣ x − x k ∣ ∇ 2 f ( x k ) = ∣ ∇ f ( x ) − ∇ f ( x k ) ∣ ∣ x − x k ∣ ≤ L (6) \begin{aligned} \left\vert \nabla f(x) - \nabla f(x_{k}) \right\vert \leq L\left\vert x - x_{k} \right\vert \\ \nabla^2 {f(x_{k})} = \frac{\left\vert \nabla f(x) - \nabla f(x_{k}) \right\vert}{\left\vert x - x_{k} \right\vert} \leq L \tag {6} \end{aligned} f(x)f(xk)Lxxk2f(xk)=xxkf(x)f(xk)L(6)

式(5)+(6)
f ^ ( x ; x k ) ≈ f ( x k ) + ∇ f ( x k ) ( x − x k ) + L 2 ( x − x k ) 2 (7) \widehat{f}(x;x_{k}) \approx f(x_{k})+\nabla{f(x_{k})} (x-x_{k})+ \frac{L}{2} (x-x_{k})^2 \tag {7} f (x;xk)f(xk)+f(xk)(xxk)+2L(xxk)2(7)

式(7)配方
f ^ ( x ; x k ) ≈ L 2 ( ( x − x k ) 2 + 2 L ∇ f ( x k ) ( x − x k ) + ( 1 L ∇ f ( x k ) ) 2 ) − 1 2 L ( ∇ f ( x k ) ) 2 + f ( x k ) f ^ ( x ; x k ) ≈ L 2 ( x − x k + 1 L ∇ f ( x k ) ) 2 − 1 2 L ( ∇ f ( x k ) ) 2 + f ( x k ) (8) \begin{aligned} \widehat{f}(x;x_{k}) \approx \frac{L}{2} ((x-x_{k})^2 + \frac{2}{L} \nabla{f(x_{k})} (x-x_{k}) + (\frac{1}{L} \nabla{f(x_{k})})^2) - \frac{1}{2L} (\nabla{f(x_{k})} )^2 +f(x_{k}) \\ \widehat{f}(x;x_{k}) \approx \frac{L}{2} (x-x_{k}+\frac{1}{L} \nabla{f(x_{k})}) ^2- \frac{1}{2L} (\nabla{f(x_{k})} )^2 +f(x_{k}) \tag {8} \end{aligned} f (x;xk)2L((xxk)2+L2f(xk)(xxk)+(L1f(xk))2)2L1(f(xk))2+f(xk)f (x;xk)2L(xxk+L1f(xk))22L1(f(xk))2+f(xk)(8)

x k + 1 = arg ⁡ min ⁡ x f ^ ( x ; x k ) x_{k+1} = \arg \min_{x} \widehat{f}(x;x_{k}) xk+1=argminxf (x;xk)
x k + 1 = x k + 1 L ∇ f ( x k ) (9) x_{k+1} = x_{k} +\frac{1}{L} \nabla{f(x_{k})} \tag {9} xk+1=xk+L1f(xk)(9)
考虑 g ( x ) g(x) g(x)非光滑
式(1)+(8)
Q ( x ) = L 2 ( x − x k + 1 L ∇ f ( x k ) ) 2 + h ( x k ) + g ( x ) h ( x k ) = f ( x k ) − 1 2 L ( ∇ f ( x k ) ) 2 x k + 1 = arg ⁡ min ⁡ x L 2 ( x − x k + 1 L ∇ f ( x k ) ) 2 + g ( x ) (10) \begin{aligned} Q(x)=\frac{L}{2} (x-x_{k}+\frac{1}{L} \nabla{f(x_{k})}) ^2 + h(x_{k}) + g(x) \\ h(x_{k}) = f(x_{k})-\frac{1}{2L} (\nabla{f(x_{k})} )^2 \\ x_{k+1} = \arg \min_{x} \frac{L}{2} (x-x_{k}+\frac{1}{L} \nabla{f(x_{k})}) ^2+g(x) \tag {10} \end{aligned} Q(x)=2L(xxk+L1f(xk))2+h(xk)+g(x)h(xk)=f(xk)2L1(f(xk))2xk+1=argxmin2L(xxk+L1f(xk))2+g(x)(10)

近端投影
p r o x g ( x ) ( z ) = arg ⁡ min ⁡ x 1 2 ( x − z ) 2 + g ( x ) (11) prox_{g(x)}(z) = \arg \min_{x} \frac{1}{2} (x-z) ^2+g(x) \tag {11} proxg(x)(z)=argxmin21(xz)2+g(x)(11)

注:上述内容仅为个人学习过程中的笔记,如有不当的地方还望指正

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值