岭回归|Ridge Regression
-
Main idea: slightly worse fit for training data, better long term predictions. ➡️ A little more bias, significznt drop in variance
-
Definition: Add λ × s l o p o e 2 \lambda\times slopoe^2 λ×slopoe2 in the objective function(Least Squares + Ridge Regression Penalty). λ \lambda λ determines how severe the penalty is.
-
Choose λ \lambda λ with cross validation, to determine which one results in lowest variance
-
Compare the Least Squares line and Ridge regression line
- Variance: Without the small amount of bias that penalty creates, the least squares has a large amount of variance.
- Sensitivity to x x x: When the slope of the line is steep, then the prediction for y y y is very sensitive to relatively small changes in x x x. The ridge regression line is less sensitive to changes in x x x.
-
When using a discrete variable to predict a continuous variable, the intercept is the average target value when x = 0 x=0 x=0, the slope is the difference between the averae target in two cases, then intercept + slope is the average target value when x = 1 x=1 x=1.(Predicted value)
-
需要防止过拟合的场景:特征数多于样本数,用岭回归
- Improve predictions made for new data(i.e. reduce variance) by making the predictions less sensitive to the training data.
Lasso回归|Lasso Regression
- The sum of squared residuals(Least Squares) + λ ∣ t h e s l o p e ∣ \lambda|the\ slope| λ∣the slope∣(Absolute value). y-intercept is not included in the slope.
岭回归 vs Lasso回归
-
共同点
- Both make the prediction of target variable less sensitive to independent variables.
- When they shrink parameters, they don’t have to shrink them equally.
- Set λ > 0 \lambda>0 λ>0 results in a smaller optimal slope in both methods.
-
不同点
-
Ridge regression can only shrink the slope asymptotically(渐近地) close to 0 while Lasso regression can shrink the slope all the way to 0.
产生原因(从slope values-objective function value的图角度来看)
- In Ridge regression, when λ \lambda λ increase, the optimal slopes shifts towards 0, but a nice parabola (抛物线) slope is retained. No matter how large λ \lambda λ is, optimal slope will not be 0.
- In Lasso regression, when λ \lambda λ increase, the optimal slopes shifts towards 0, but since we have a kink(扭折) at 0, 0 ends up being the optimal slope.
这导致了二者适用范围的区别
- Lasso regression can exclude useless variables from equations. Better than ridge regression at reducing the variance in models that contains a lot of useless variables.
- Ridge regression is better when most variables are useful.
-