
岭回归|Ridge Regression

  • Main idea: slightly worse fit for training data, better long term predictions. ➡️ A little more bias, significznt drop in variance

  • Definition: Add λ × s l o p o e 2 \lambda\times slopoe^2 λ×slopoe2 in the objective function(Least Squares + Ridge Regression Penalty). λ \lambda λ determines how severe the penalty is.

  • Choose λ \lambda λ with cross validation, to determine which one results in lowest variance

  • Compare the Least Squares line and Ridge regression line

    • Variance: Without the small amount of bias that penalty creates, the least squares has a large amount of variance.
    • Sensitivity to x x x: When the slope of the line is steep, then the prediction for y y y is very sensitive to relatively small changes in x x x. The ridge regression line is less sensitive to changes in x x x.
  • When using a discrete variable to predict a continuous variable, the intercept is the average target value when x = 0 x=0 x=0, the slope is the difference between the averae target in two cases, then intercept + slope is the average target value when x = 1 x=1 x=1.(Predicted value)

  • 需要防止过拟合的场景:特征数多于样本数,用岭回归

    • Improve predictions made for new data(i.e. reduce variance) by making the predictions less sensitive to the training data.

Lasso回归|Lasso Regression

  • The sum of squared residuals(Least Squares) + λ ∣ t h e   s l o p e ∣ \lambda|the\ slope| λthe slope(Absolute value). y-intercept is not included in the slope.

岭回归 vs Lasso回归

  • 共同点

    • Both make the prediction of target variable less sensitive to independent variables.
    • When they shrink parameters, they don’t have to shrink them equally.
    • Set λ > 0 \lambda>0 λ>0 results in a smaller optimal slope in both methods.
  • 不同点

    • Ridge regression can only shrink the slope asymptotically(渐近地) close to 0 while Lasso regression can shrink the slope all the way to 0.

      产生原因(从slope values-objective function value的图角度来看)

      • In Ridge regression, when λ \lambda λ increase, the optimal slopes shifts towards 0, but a nice parabola (抛物线) slope is retained. No matter how large λ \lambda λ is, optimal slope will not be 0.
      • In Lasso regression, when λ \lambda λ increase, the optimal slopes shifts towards 0, but since we have a kink(扭折) at 0, 0 ends up being the optimal slope.


      • Lasso regression can exclude useless variables from equations. Better than ridge regression at reducing the variance in models that contains a lot of useless variables.
      • Ridge regression is better when most variables are useful.
