线性回归模型的优化目标函数_如何使用线性回归模型来预测二次函数，根函数和多项式函数...

最新推荐文章于 2023-07-23 15:29:09 发布

cumian9828

最新推荐文章于 2023-07-23 15:29:09 发布

阅读量1.7k

点赞数

文章标签： python 机器学习人工智能大数据深度学习

原文链接：https://www.freecodecamp.org/news/learn-how-to-improve-your-linear-models-8294bfa8a731/

版权

本文介绍了如何超越线性回归模型的限制，利用二次、根和指数函数改善预测效果。作者指出，线性模型的线性指的是函数形式，而非变量间的关系。文章通过实例展示了如何在模型中加入平方项、平方根，以及如何处理计数和概率问题，以提升模型的预测能力。

摘要由CSDN通过智能技术生成

线性回归模型的优化目标函数

by Björn Hartmann

比约恩·哈特曼(BjörnHartmann)

When reading articles about machine learning, I often suspect that authors misunderstand the term “linear model.” Many authors suggest that linear models can only be applied if data can be described with a line. But this is way too restrictive.

在阅读有关机器学习的文章时，我经常怀疑作者误解了“线性模型”一词。许多作者建议，仅当可以用直线描述数据时，才可以应用线性模型。但这太过严格了。

Linear models assume the functional form is linear — not the relationship between your variables.

线性模型假定函数形式是线性的，而不是变量之间的关系 。

I’ll show you how you can improve your linear regressions with quadratic, root, and exponential functions.

我将向您展示如何使用二次函数，根函数和指数函数来改善线性回归。

那么功能形式是什么？ (So what’s the functional form?)

The functional form is the equation you want to estimate.

函数形式是您要估计的方程。

Let us start with an example and think about how we could describe salaries of data scientists. Suppose an average data scientist (i) receives an entry-level salary (entry_level_salary) plus a bonus for each year of his experience (experience_i).

让我们从一个例子开始，思考如何描述数据科学家的薪水。假设一个普通的数据科学家( i )收到入门级薪水( entry_level_salary )加上他每年的经验奖励( experience_i )。

Thus, his salary (salary_i) is given by the following functional form:

因此，他的薪水( salary_i )由以下函数形式给出：

salary_i = entry_level_salary + beta_1 * experience_i

Now, we can interpret the coefficient beta_1 as the bonus for each year of experience. And with this coefficient we can start making predictions by just knowing the level of experience.

现在，我们可以将系数beta_1解释为每年经验的加成。有了这个系数，我们就可以通过了解经验水平来开始进行预测。

As your machine learning model takes care of the coefficient beta_1 , all you need to enter in R or any other software is:

由于您的机器学习模型将处理系数beta_1 ，因此您需要在R或任何其他软件中输入的就是：

model_1 <- lm(salary ~ entry_level_salary + experience)

Linearity in the functional form requires that we sum up each determinant on the right-hand side of the equation.

函数形式的线性要求我们对等式右侧的每个行列式求和。

Imagine we are right with our assumptions. Each point indicates one data scientist with his level of experience and salary. Finally, the red line is our predictions.

想象一下，我们的假设是正确的。每一点都表明一位数据科学家的经验和薪水水平。最后，红线是我们的预测。

Many aspiring data scientists already run similar predictions. But often that is all they do with linear models…

许多有抱负的数据科学家已经做出了类似的预测。但是通常这就是线性模型所做的一切……

如何估算二次模型？ (How to estimate quadratic models?)

When we want to estimate a quadratic model, we cannot type in something like this:

当我们想估计一个二次模型时，我们不能输入如下内容：

model_2 <- lm(salary ~ entry_level_salary + experience^2)

>> This will reject an error message

Most of these functions do not expect that they have to transform your input variables. As a result, they reject an error message if you try. Furthermore, you do not have a sum at the right-hand side of the equation anymore.

这些函数大多数都不希望它们必须转换您的输入变量。结果，如果您尝试，他们将拒绝一条错误消息。此外，等式的右边不再有和。

Note: You need to compute experience^² before adding it into your model. Thus, you will run:

注意：您需要先计算experience^²然后再将其添加到模型中。因此，您将运行：

# First, compute the square values of experienceexperience_2 <- experience^2

# Then add them into your regressionmodel_2 <- lm(salary ~ entry_level_salary + experience_2)

In return, you get a nice quadratic function:

作为回报，您会得到一个不错的二次函数：

用线性模型估计根函数 (Estimate root functions with linear models)

Often we observe values that rise fast in the beginning and align to certain level afterwards. Let us modify our example and estimate a typical learning curve.

通常，我们观察到的值在开始时快速上升，然后在之后达到一定水平。让我们修改示例并估计典型的学习曲线。

In the beginning a learning curve tends to be very steep and slows down after some years.

在开始时，学习曲线往往会非常陡峭，并在几年后变慢。

There is one function that features such a trend, the root function. So we use the square root of experience to capture this relationship:

有一项功能具有这种趋势，即root函数。因此，我们使用experience square root来捕捉这种关系：

# First, compute the square root values of experiencesqrt_experience <- sqrt(experience)

# Then add them into your regressionmodel_3 <- lm(knowledge ~ sqrt_experience)

Again, make sure you compute the square root before you add it to your model:

同样，在将平方根添加到模型之前，请确保已计算平方根：

Or you might want to use the logarithmic function as it describes a similar trend. But its’ values are negative between zero and one. So make sure this is not a problem for you and your data.

或者您可能想使用对数函数，因为它描述了类似的趋势。但是它的值在零到一之间为负。因此，请确保这对您和您的数据都没有问题。

掌握线性模型 (Mastering linear models)

Finally, you can even estimate polynomial functions with higher orders or exponential functions. All you need to do is to compute all variables before you add them into your linear model:

最后，您甚至可以估算具有更高阶数或指数函数的多项式函数。您需要做的就是计算所有变量，然后将它们添加到线性模型中：

# First, compute polynomialsexperience_2 <- experience^2experience_3 <- experience^3

# Then add them into your regressionmodel_4 <- lm(salary ~ experience + experience_2 + experience_3)

您应该使用其他模型的两种情况 (Two cases where you should use other models)

Although linear models can be applied to many cases, there are limitations. The most popular can be divided into two categories:

尽管线性模型可以应用于许多情况，但仍有局限性。最受欢迎的可以分为两类：

1.概率： (1. Probabilities:)

If you want to estimate the probability of an event, you better use Probit, Logit or Tobit models. When estimating probabilities you use distributions that linear functions cannot capture. Depending on the distribution you assume, you should choose between the Probit, Logit or Tobit model.

如果要估计事件的可能性，则最好使用Probit，Logit或Tobit模型。估计概率时，请使用线性函数无法捕获的分布。根据您假设的分布，您应该在Probit，Logit或Tobit模型之间进行选择。

2.计算变量 (2. Count variables)

Finally, when estimating a count variable you want to use a Poisson model. Count variables are variable that can only be integers such as 1, 2, 3, 4.

最后，在估计计数变量时，您想使用泊松模型。计数变量是只能为整数的变量，例如1, 2, 3, 4 。

For example count the number of children, the number of purchases a customer makes or the number of accidents in a region.

例如，计算孩子的数量，客户购买的数量或某个地区的事故数量。

从本文中获得什么 (What to take away from this article)

There are two things I want you to remember:

我想让您记住两件事：

Improve your linear models and try quadratic, root or polynomial functions.
改善线性模型并尝试二次，根或多项式函数。
Always transform your data before you add them to your regression.
在将数据添加到回归之前，请始终对其进行转换。

I uploaded the R code for all examples on GitHub. Feel free to download them, play with them, or share them with your friends and colleagues.

我将所有示例的R代码上传到GitHub上。随时下载它们，与他们一起玩或与您的朋友和同事分享。

If you have any questions, write a comment below or contact me. I appreciate your feedback.

如有任何疑问，请在下面写评论或与我联系。感谢您的反馈。