Differences between the L1-norm and the L2-norm
As an error function
train for parameters
L1−norm:S=∑i=1n|yi−f(xi)|
L2−norm:S=∑i=1n(yi−f(xi))2
As an Regularization
prevent overfitting
L1−regularization:w∗=argminw∑j(t(xj)−∑iwihi(xj))2+λ∑i=1k|wi|
L2−regularization:w∗=argminw∑j(t(xj)−∑iwihi(xj))2+λ∑i=1kw2i
L2 loss function | L1 loss function |
---|---|
Not very robust | Robust |
Stable solution | Unstable solution |
Always one solution | Possibly multiple slutions |
L2 regularization | L1 regularization |
---|---|
Computational efficient due to having analytical solutions | Computational inefficient on non-saprse cases |
Non-saprse outputs | Sparse outputs |
No feature selection | Built-in feature selection |