02 L1正则化

文章详细介绍了L1正则化的基本形式,包括L1正则化的定义和它如何通过引入参数的绝对值之和来约束模型复杂度。接着,讨论了正则化目标函数的具体形式以及其梯度,指出在最优解的邻域内,目标函数可以进行二次近似。最后,分析了正则化后的目标函数,并展示了当参数为零时,参数值的边界情况,解释了L1正则化如何导致稀疏解。
摘要由CSDN通过智能技术生成

基本形式

模型参数 w w w L 1 L^{1} L1正则化的一般形式:
Ω ( θ ) = ∣ ∣ w ∣ ∣ 1 = ∑ i ∣ w i ∣ \varOmega(\theta)=||w||_1=\sum_i|w_i| Ω(θ)=∣∣w1=iwi 即各个参数的绝对值之和,在这里 θ \theta θ也就是 w w w。如果将参数正则化到其他非零值 w ( o ) w^{(o)} w(o)。在这种情况下, L 1 L^1 L1正则化将会引入不同的项 Ω ( θ ) = ∣ ∣ w − w ( o ) ∣ ∣ 1 = ∑ i ∣ w i − w i ( o ) ∣ \varOmega(\theta)=||w-w^{(o)}||_1=\sum{_i}|w_i-w_i^{(o)}| Ω(θ)=∣∣ww(o)1=iwiwi(o)

正则化目标函数

具体形似如下: J ~ ( w ; X , y ) = α ∣ ∣ w ∣ ∣ 1 + J ( w ; X , y ) \tilde J(w;X,y) = \alpha||w||_1+J(w;X,y) J~(w;X,y)=α∣∣w1+J(w;X,y)
对应的梯度: ∇ w J ~ ( w ; X , y ) = α s i g n ( w )   +   ∇ w J ( w ; X , y ) \nabla_w \tilde J(w;X,y)=\alpha sign(w)\: + \:\nabla_w J(w;X,y) wJ~(w;X,y)=αsign(w)+wJ(w;X,y)
其中 s i g n ( w ) sign(w) sign(w)只是简单地取 w w w各个元素的正负号。

近似处理

w ∗ w^* w为未正则化的目标函数取得最小训练误差时的权重向量,即 w ∗ = a r g   m i n w   J ( w ) w^*=arg\:min_w\:J(w) w=argminwJ(w),并在 w ∗ w* w的邻域对目标函数做二次近似。若果目标函数确实是二次的,则该近似是完美的。近似的 J ( w ) J(w) J(w)形式大致如下:
J ( w ) ≈ J ^ ( w ( ∗ ) ) + ( w − w ( ∗ ) ) T j ′ ( w ∗ ) + 1 2 ( w − w ( ∗ ) ) T ( w − w ( ∗ ) ) j ′ ′ ( w ∗ ) J(w)\approx \hat{J}(w^{(*)})+(w-w^{(*)})^{T} j^{'} (w^{*})+ \frac{1}{2}(w-w^{(*)})^{T}(w-w^{(*)})j^{''}(w^{*}) J(w)J^(w())+(ww())Tj(w)+21(ww())T(ww())j′′(w)
其中 j ( w ) ′ j(w)^{'} j(w) w = w ∗ w=w^{*} w=w,即最优解的一阶导, j ( w ) ′ ′ j(w)^{''} j(w)′′表示最优解的二阶导,因为 w ∗ w^{*} w
所以化简后
J ( w ) = J ( w ∗ ) + 1 2 ( w − w ∗ ) T ( w − w ∗ ) j ′ ′ ( w ∗ ) J(w)=J(w^*)+\frac{1}{2}(w-w^*)^{T}(w-w^*)j^{''}(w^{*}) J(w)=J(w)+21(ww)T(ww)j′′(w)
在这里我们用Hessian矩阵表示二阶导,表示如下
j ′ ′ ( w ∗ ) = H = [ ∂ 2 f ∂ w 1 2 ∂ 2 f ∂ w 1   ∂ w 2 ⋯ ∂ 2 f ∂ w 1   ∂ w n ∂ 2 f ∂ w 2   ∂ w 1 ∂ 2 f ∂ w 2 2 ⋯ ∂ 2 f ∂ w 2   ∂ w n ⋮ ⋮ ⋱ ⋮ ∂ 2 f ∂ w n   ∂ w 1 ∂ 2 f ∂ w n   ∂ w 2 ⋯ ∂ 2 f ∂ w n 2 ] j^{''}\left( w^*\right) =H=\left[ \begin{matrix} \frac{\partial ^2f}{\partial w_{1}^{2}}& \frac{\partial ^2f}{\partial w_1\,\partial w_2}& \cdots& \frac{\partial ^2f}{\partial w_1\,\partial w_n}\\ & & & \\ \frac{\partial ^2f}{\partial w_2\,\partial w_1}& \frac{\partial ^2f}{\partial w_{2}^{2}}& \cdots& \frac{\partial ^2f}{\partial w_2\,\partial w_n}\\ & & & \\ \vdots& \vdots& \ddots& \vdots\\ & & & \\ \frac{\partial ^2f}{\partial w_n\,\partial w_1}& \frac{\partial ^2f}{\partial w_n\,\partial w_2}& \cdots& \frac{\partial ^2f}{\partial w_{n}^{2}}\\ \end{matrix} \right] j′′(w)=H= w122fw2w12fwnw12fw1w22fw222fwnw22fw1wn2fw2wn2fwn22f
最终化简后得到:
j ( w ) = j ( w ∗ ) + 1 2 ( w − w ∗ ) T ( w − w ∗ ) H j(w) = j(w^{*})+\frac{1}{2}(w-w^{*})^{T}(w-w^{*})H j(w)=j(w)+21(ww)T(ww)H

正则化后的目标函数

j ^ ( w ) = j ( w ) + α ∣ ∣ w ∣ ∣ 1 = j ( w ∗ ) + 1 2 ( w − w ∗ ) T ( w − w ∗ ) H + α ∣ ∣ w ∣ ∣ 1 \hat{j}(w) = j(w)+\alpha||w||_1 = j(w^{*})+\frac{1}{2}(w-w^{*})^{T}(w-w^{*})H +\alpha||w||_1 j^(w)=j(w)+α∣∣w1=j(w)+21(ww)T(ww)H+α∣∣w1

w ∗ w^{*} w的分析

j ^ ( w ) \hat{j}(w) j^(w)求导,并致其为零(这里假设Hessian矩阵是对角矩阵):
∇ w J ( w ; X , y ) = 0 + 2 ⋅ 1 2 H ( w − w ∗ ) ( w − w ∗ ) ′ + α ⋅ s i g n ( w ) = H ( w − w ∗ ) + α ⋅ s i g n ( w ) = 0 \nabla _wJ\left( w;X,y \right) =0+2\cdot \frac{1}{2}H\left( w-w^* \right) \left( w-w^* \right) ^{'}+\alpha \cdot sign\left( w \right) \\ = H\left( w-w^* \right) +\alpha \cdot sign\left( w \right) =0 wJ(w;X,y)=0+221H(ww)(ww)+αsign(w)=H(ww)+αsign(w)=0
针对每个 i i i,则可表示为:
H i i ( w i − w i ∗ ) + α ⋅ s i g n ( w i ) = 0 H_{ii}(w_i-w_{i}^{*})+\alpha \cdot sign(w_i)=0 Hii(wiwi)+αsign(wi)=0
考虑 w i = 0 w_i=0 wi=0,则 j ^ ( w ) = j ( w ∗ ) + 1 2 H ( w ∗ ) 2 \hat j(w)=j(w^{*})+\frac{1}{2}H(w^{*})^{2} j^(w)=j(w)+21H(w)2,由于 w ∗ w^{*} w为已知量,则 j ( w ∗ ) + 1 2 H ( w ∗ ) 2 j(w^{*})+\frac{1}{2}H(w^{*})^{2} j(w)+21H(w)2就是最小值,这里我们用下图表示:
在这里插入图片描述
根据极值点的性质可知,

  1. w i → − 0 {w_i\to -0} wi0,此时 H i i ( w i − w i ∗ ) − s i g n ( w i ) α = − H i i w i ∗ − α ≤ 0 H_{ii}(w_i-w_{i}^{*})-sign(w_i)\alpha=-H_{ii}w_{i}^{*}-\alpha \le 0 Hii(wiwi)sign(wi)α=Hiiwiα0,则 w i ∗ ≥ − α H i i w_{i}^{*} \ge -\frac{\alpha}{H_{ii}} wiHiiα
  2. 当当 w i → + 0 {w_i\to +0} wi+0,此时 H i i ( w i − w i ∗ ) − s i g n ( w i ) α = − H i i w i ∗ + α ≥ 0 H_{ii}(w_i-w_{i}^{*})-sign(w_i)\alpha=-H_{ii}w_{i}^{*}+\alpha \ge 0 Hii(wiwi)sign(wi)α=Hiiwi+α0,则 w i ∗ ≤ α H i i w_{i}^{*} \le \frac{\alpha}{H_{ii}} wiHiiα
    综上,当 w = 0 w=0 w=0时, − α H i i ≤ w i ∗ ≤ α H i i -\frac{\alpha}{H_{ii}} \le w_{i}^{*} \le \frac{\alpha}{H_{ii}} HiiαwiHiiα
    考虑 w > 0 w>0 w>0,则 w i = w i ∗ − α H i i w_{i} = w_{i}^{*}-\frac{\alpha}{H_{ii}} wi=wiHiiα,即 w i ∗ = w i + α H i i > α H i i w_{i}^{*}=w_{i} + \frac{\alpha}{H_{ii}} > \frac{\alpha}{H_{ii}} wi=wi+Hiiα>Hiiα。所以当 w i ∗ > α H i i w_{i}^{*}>\frac{\alpha}{H_{ii}} wi>Hiiα时, w i = w i ∗ − s i g n ( w i ) α H i i = s i g n ( w ∗ ) ( ∣ w ∗ ∣ − α H i i ) w_{i}=w_{i}^{*}-sign(w_{i})\frac{\alpha}{H_{ii}}=sign(w^{*})(|w^{*}|-\frac{\alpha}{H_{ii}}) wi=wisign(wi)Hiiα=sign(w)(wHiiα)
    考虑 w < 0 w<0 w<0,则 w ∗ < − α H i i w^{*}<- \frac{\alpha}{H_{ii}} w<Hiiα;所以当 w i ∗ < − α H i i w_{i}^{*}<- \frac{\alpha}{H_{ii}} wi<Hiiα时, w i = w i ∗ − s i g n ( w i ) α H i i = s i g n ( w ∗ ) ( ∣ w ∗ ∣ − α H i i ) w_{i}=w_{i}^{*}-sign(w_{i})\frac{\alpha}{H_{ii}}=sign(w^{*})(|w^{*}|-\frac{\alpha}{H_{ii}}) wi=wisign(wi)Hiiα=sign(w)(wHiiα)
    综上
    a. 当 ∣ w ∗ ∣ ≤ α H i i |w^{*}| \le \frac{\alpha}{H_{ii}} wHiiα w i = 0 w_{i}=0 wi=0
    b. 当 w ∗ > α H i i w^{*}>\frac{\alpha}{H_{ii}} w>Hiiα时; w m i n = s i g n ( w ∗ ) ( w ∗ − α H ) \underset{min}{w}=sign(w^{*})(w^{*}-\frac{\alpha}{H}) minw=sign(w)(wHα)
    c. 当 w ∗ < α H i i w^{*}<\frac{\alpha}{H_{ii}} w<Hiiα时, w m i n = s i g n ( w ∗ ) ( w ∗ − α H ) \underset{min}{w}=sign(w^{*})(w^{*}-\frac{\alpha}{H}) minw=sign(w)(wHα)
    w = s i g n ( w ∗ ) m a x ( ∣ w ∗ ∣ − α H , 0 ) w=sign(w^{*})max(|w^{*}|-\frac{\alpha}{H},0) w=sign(w)max(wHα,0)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值