----转载自Paul T Mielke
L2正则化:
λ
2
m
\frac{\lambda}{2m}
2mλ
∣
∣
W
∣
∣
2
\vert\vert{W}\vert\vert^2
∣∣W∣∣2
L1正则化:据说有减小模型的作用,不过似乎并不明显
It’s a good question, given that the number of features is completely independent of the number of training samples. One way to reason about it would be to say that the purpose of regularization is to reduce overfitting. But one of the other strategies for combatting overfitting is to get more training data, right? So multiplying the L2 regularization term by
1
m
\frac{1}{m}
m1
effectively just reduces the
λ
\lambda
λ value the more training data you have. In the limit it would go to zero, since the larger your training set the smaller the chance that you have an overfitting problem.
I grant you that is a qualitative argument at best ;^) …