题目(164):L1正则化使得模型参数具有稀疏性的原理是什么?
回答角度:
- 几何角度,即解空间形状
- 微积分角度,对带L1限制的目标函数求导
- 贝叶斯先验
解空间形状
Step 1. 正则条件和限制条件的等价性
Step 2. L1范数与L2范数的几何形状
Step 3. 如果原问题目标函数的最优解不在解空间内,那么约束条件下的最优解一定是在解空间的边界上。
[复习KKT, complementary slackness] \textcolor{red}{\text{[复习KKT, complementary slackness]}} [复习KKT, complementary slackness]
微积分、函数叠加
损失函数加入L1正则后,目标函数变为 J ( θ ) = L ( θ ) + c ∥ θ ∥ 1 J(\bm \theta) = L(\bm \theta) + c \|\bm \theta\|_1 J(θ)=L(θ)+c∥θ∥1。When θ > 0 \bm \theta>0 θ>0, the gradient of c ∥ θ ∥ 1 c \|\bm \theta\|_1 c∥θ∥1 equals c c c; when θ < 0 \bm \theta<0 θ<0, the gradient of c ∥ θ ∥ 1 c \|\bm \theta\|_1 c∥θ∥1 equals − c -c −c. Therefore, if the gradient of L ( θ ) L(\bm \theta) L(θ) lies within ( − c , c ) (-c,c) (−c,c), the gradient of J ( θ ) J(\bm \theta) J(θ) is always negative for θ < 0 \bm \theta<0 θ<0, indicating that J ( θ ) J(\bm \theta) J(θ) is monotonically decreasing on the left of the origin; its gradient is always positive for θ > 0 \bm \theta>0 θ>