LASSO近端梯度下降法Proximal Gradient Descent公式推导及代码

LASSO by Proximal Gradient Descent

Prepare:
准备:

from itertools import cycle
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import lasso_path, enet_path
from sklearn import datasets
from copy import deepcopy

X = np.random.randn(100,10)
y = np.dot(X,[1,2,3,4,5,6,7,8,9,10])

Proximal Gradient Descent Framework近端梯度下降算法框架

  1. randomly set β ( 0 ) \beta^{(0)} β(0) for iteration 0
  2. For k k kth iteration:
    ----Compute gradient ∇ f ( β ( k − 1 ) ) \nabla f(\beta^{(k-1)}) f(β(k1))
    ----Set z = β ( k − 1 ) − 1 L ∇ f ( β ( k − 1 ) ) z = \beta^{(k-1)} - \frac{1}{L} \nabla f(\beta^{(k-1)}) z=β(k1)L1f(β(k1))
    ----Update β ( k ) = sgn ( z ) ⋅ max [ ∣ z ∣ − λ L ,    0 ] \beta^{(k)} = \text{sgn}(z)\cdot \text{max}[|z|-\frac{\lambda}{L}, \; 0] β(k)=sgn(z)max[zLλ,0]
    ----Check convergence: if yes, end algorithm; else continue update
    Endfor

Here f ( β ) = 1 2 N ( Y − X β ) T ( Y − X β ) f(\beta) = \frac{1}{2N}(Y-X\beta)^T (Y-X\beta) f(β)=2N1(YXβ)T(YXβ), and ∇ f ( β ) = − 1 N X T ( Y − X β ) \nabla f(\beta) = -\frac{1}{N}X^T(Y-X\beta) f(β)=N1XT(YXβ),
where the size of X X X, Y Y Y, β \beta β is N × p N\times p N×p, N × 1 N\times 1 N×1, p × 1 p\times 1 p×1, which means N N N samples样本 and p p p features特征. Parameter L ≥ 1 L \ge 1 L1 can be chosen, and 1 L \frac{1}{L} L1 can be considered as step size步长.

Proximal Gradient Descent Details近端梯度下降细节推导

Consider optimization problem:
现考虑优化问题:
min x f ( x ) + λ ⋅ g ( x ) , \text{min}_x {f(x) + \lambda \cdot g(x)}, minxf(x)+λg(x),
where x ∈ R p × 1 x\in \mathbb{R}^{p\times 1} xRp×1, f ( x ) ∈ R f(x) \in \mathbb{R} f(x)R. And f ( x ) f(x) f(x) is differentiable convex function, and g ( x ) g(x) g(x) is convex but may not differentiable.
f ( x ) f(x) f(x)是可微凸函数, g ( x ) g(x) g(x)是凸函数但不一定可微。

For f ( x ) f(x) f(x), according to Lipschitz continuity, for ∀ x , y \forall x, y x,y, there always exists a constant L L L s.t.
对于 f ( x ) f(x) f(x),根据利普希茨连续性,对于任意 x , y x,y x,y,一定存在常数 L L L使得满足
∣ f ′ ( y ) − f ′ ( x ) ∣ ≤ L ∣ y − x ∣ . |f'(y) - f'(x)| \le L|y-x|. f(y)f(x)Lyx.
Then this problem can be solved using Proximal Gradient Descent.
可以用近似梯度下降来解决这种问题。

Denote x ( k ) x^{(k)}

  • 11
    点赞
  • 76
    收藏
    觉得还不错? 一键收藏
  • 4
    评论
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值