【机器学习基础】用正则化防止过拟合｜Regularization

最新推荐文章于 2024-03-16 20:30:52 发布

Kendyu

最新推荐文章于 2024-03-16 20:30:52 发布

阅读量104

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/weixin_43905298/article/details/113526258

版权

3 篇文章 0 订阅

订阅专栏

Main idea: slightly worse fit for training data, better long term predictions. ➡️ A little more bias, significznt drop in variance
Definition: Add $\lambda\times slopoe^2$ in the objective function(Least Squares + Ridge Regression Penalty). $\lambda$ determines how severe the penalty is.
Choose $\lambda$ with cross validation, to determine which one results in lowest variance
Compare the Least Squares line and Ridge regression line
- Variance: Without the small amount of bias that penalty creates, the least squares has a large amount of variance.
- Sensitivity to $x$ : When the slope of the line is steep, then the prediction for $y$ is very sensitive to relatively small changes in $x$ . The ridge regression line is less sensitive to changes in $x$ .
When using a discrete variable to predict a continuous variable, the intercept is the average target value when $x = 0$ , the slope is the difference between the averae target in two cases, then intercept + slope is the average target value when $x = 1$ .(Predicted value)
需要防止过拟合的场景：特征数多于样本数，用岭回归
- Improve predictions made for new data(i.e. reduce variance) by making the predictions less sensitive to the training data.

The sum of squared residuals(Least Squares) + $\lambda|the\ slope|$ (Absolute value). y-intercept is not included in the slope.

共同点
- Both make the prediction of target variable less sensitive to independent variables.
- When they shrink parameters, they don’t have to shrink them equally.
- Set $\lambda>0$ results in a smaller optimal slope in both methods.
不同点
- Ridge regression can only shrink the slope asymptotically(渐近地) close to 0 while Lasso regression can shrink the slope all the way to 0.
  
  产生原因（从slope values-objective function value的图角度来看）
  - In Ridge regression, when $\lambda$ increase, the optimal slopes shifts towards 0, but a nice parabola (抛物线) slope is retained. No matter how large $\lambda$ is, optimal slope will not be 0.
  - In Lasso regression, when $\lambda$ increase, the optimal slopes shifts towards 0, but since we have a kink(扭折) at 0, 0 ends up being the optimal slope.
  这导致了二者适用范围的区别
  - Lasso regression can exclude useless variables from equations. Better than ridge regression at reducing the variance in models that contains a lot of useless variables.
  - Ridge regression is better when most variables are useful.

关注