套索回归岭回归_岭和套索回归简介

最新推荐文章于 2024-03-29 16:19:48 发布

weixin_26739165

最新推荐文章于 2024-03-29 16:19:48 发布

阅读量1.3k

点赞数 1

文章标签： python 逻辑回归

原文链接：https://medium.com/@chrisfiore13/intro-to-ridge-and-lasso-regression-e8b3271b5fd0

版权

本文介绍了套索回归和岭回归的概念，这两种都是回归分析的拓展方法，用于解决过拟合问题。套索回归通过引入L1正则化项实现特征选择，而岭回归采用L2正则化项来平衡模型复杂度与预测准确性。

摘要由CSDN通过智能技术生成

套索回归岭回归

Recently my class has been covering topics of regression and classification. We are now able to use the data that we have to make predictions, analyze the data better, and draw significant conclusions. We are able to do this by building models to predict data and see the most important features influencing a target variable. There are many ways to build these models depending on the data that is given and what you are trying to draw from the data. For example if you are trying to predict a continuous variable, you are going to want to use a linear regression model. But if you are trying to predict a classified variable, you are going to want to use something like a decision tree or a logistic regression model. There are also many other models to use that I wont get into in this post. In this post, I will be going over 2 key tools that will help your model predict data it hasn’t seen before. These are called Ridge and Lasso.

最近，我的课程涵盖了回归和分类的主题。现在，我们可以使用进行预测所需的数据，更好地分析数据并得出重要结论。通过构建模型来预测数据并查看影响目标变量的最重要功能，我们能够做到这一点。有多种方法可以根据给定的数据以及您要从数据中提取的内容来构建这些模型。例如，如果您尝试预测连续变量，则将要使用线性回归模型。但是，如果您要预测分类变量，则将要使用决策树或逻辑回归模型之类的东西。在本文中，我也不会涉及其他许多模型。在本文中，我将介绍2个关键工具，这些工具将帮助您的模型预测以前从未见过的数据。这些分别称为Ridge和Lasso。

正则化 (Regularization)

Regularization is a technique that will help a model perform on data that it has not seen before. This is very important to do if the model is overfit to the data the model is fit too. Regularization will also help the bias and variance tradeoff which is very important to a having a good model. This can also be helpful if you do not have as many data points as you would like, regularizing the model will help generalize the model rather than over-fitting to the data you have. The two most common ways to fix these problems and regularize your data are Ridge and Lasso.

正则化是一种技术，可以帮助模型对之前从未见过的数据执行操作。如果模型也过度适合模型也适合的数据，这是非常重要的。正则化还将有助于偏差和方差折衷，这对于拥有一个好的模型非常重要。如果您没有想要的数据点，这也将很有帮助，对模型进行正则化将有助于对模型进行泛化，而不是过度拟合所拥有的数据。解决这些问题和规范化数据的两种最常用的方法是Ridge和Lasso。

Image for post — An example of getting the best Bias-Variance tradeoff

岭回归(Ridge Regression)

Now we will get into Ridge regression. In Ridge regression, we are going to fit a new line to our data to help regularize our overfit model. This may cause the training error of your model to increase, but it will help the model perform on unseen data. By doing this, we are introducing bias in our model for a better tradeoff. This small increase in bias will have a big drop in variance in an over fit model and will help us in the long term. Now for the math that behind changing the model. In linear regression, Ridge regression penalizes the sum of the squared residuals + lambda * the slope². Lambda is what you determine how large you want your penalty to be. So the higher the lambda the more regularized the model will be. We can use cross validation to find the best lambda.

现在我们将进入Ridge回归。在Ridge回归中，我们将为数据拟合一条新线，以帮助规范我们的过拟合模型。这可能会导致模型的训练误差增加，但将有助于模型在看不见的数据上执行。通过这样做，我们在模型中引入了偏差，以实现更好的权衡。偏倚的这种小幅增加将使过度拟合模型的方差大幅度下降，并从长远来看对我们有帮助。现在来看改变模型的数学原理。在线性回归中，Ridge回归惩罚残差平方和+λ*斜率²的总和。 Lambda是您确定要加多大的罚款。因此，lambda越高，模型将越正规化。我们可以使用交叉验证来找到最佳的lambda。

Fitting a ridge regression in the simplest for is shown below where alpha is the lambda we can change.

下面显示了最简单的拟合岭回归的方法，其中alpha是我们可以更改的lambda。

ridge = Ridge(alpha=1)
ridge.fit(X_train, y_train)

Ridge regression will help you choose the best features of the model because it will minimize the features that do not have a large effect on the target variable, therefore using them less in the final model. As you can see Ridge regression can be very helpful to over fit data and help regularize a model.

Ridge回归将帮助您选择模型的最佳特征，因为它将使对目标变量没有太大影响的特征最小化，因此在最终模型中使用较少。如您所见，Ridge回归对于过度拟合数据和正则化模型非常有帮助。

套索回归 (Lasso Regression)

Similar to Ridge regression, Lasso regression also helps regularize a model and can be very helpful to a model predicting on unseen data. However, Lasso regression is slightly different than ridge regression. Lets start with the formula. Instead of using a penalty that squared the slope, Lasso will take a similar equation but take the absolute value of the slope rather than square it. It will take the sum of the squared residuals + lambda * the absolute value of the slope. In Lasso, instead of just penalizing the less important features when using Ridge, Lasso may reduce the features to have zero impact on the final model. This will leave a much simpler model than when we started. This can be very helpful in interpreting the model.

与Ridge回归相似，Lasso回归还有助于规范化模型，并且对基于看不见数据的模型预测非常有帮助。但是，套索回归与岭回归略有不同。让我们从公式开始。拉索将使用类似的方程式，但采用斜率的绝对值而不是对平方求平方，而不是使用对斜率求平方的罚分。它将取残差平方的总和+λ*斜率的绝对值。在Lasso中，Lasso可能会减少特征以对最终模型产生零影响，而不仅仅是在使用Ridge时惩罚那些次要的特征。与我们开始时相比，这将留下一个简单得多的模型。这对解释模型非常有帮助。

结论 (Conclusion)

Ridge and Lasso regression are very helpful when trying to regularize a model. The difference in them are important to note. Ridge regression will be better to use when there are a lot of features that are important in the model as it will penalize them, but not drop the less important ones. Lasso will be better used when there you are only trying to use the most important features in your model. Both of these models will greatly help your understanding of the model and will help the model perform better on unseen data. Thanks for reading!

在尝试对模型进行正则化时，Ridge和Lasso回归非常有用。必须注意它们之间的差异。当模型中有很多重要特征时，最好使用Ridge回归，因为它会惩罚它们，但不会删除次要特征。当您仅尝试使用模型中最重要的功能时，将更好地使用套索。这两种模型都将极大地帮助您理解模型，并有助于模型在看不见的数据上表现更好。谢谢阅读！