普通最小二乘(OLS)回归

普通最小二乘(OLS)回归是线性回归的基础模型,它通过最小化预测值与实际值之间的差平方和来确定模型参数。OLS不仅提供模型准确性,还能分析每个变量的重要性、数据中是否存在自相关等。本文介绍了如何使用Python的statsmodels库进行线性回归,并解释了模型摘要的各个部分,包括R-squared、调整后的R-squared、F统计量和P值等,用于评估模型的性能和变量的重要性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

口译: (Interpreting:)

OLS (Ordinary Least Squared) Regression is the most simple linear regression model also known as the base model for Linear Regression. While it is a simple model, in Machine learning it is not given much weightage. OLS is one such model which tells you much more than only the accuracy of the overall model. It also tells you how each variables have fared, if we have unwanted variables, if there is autocorrelation in the data and so on.

OLS(普通最小二乘)回归是最简单的线性回归模型,也称为线性回归的基础模型。 尽管它是一个简单的模型,但是在机器学习中却没有太多的权重。 OLS就是这样一种模型,它告诉您的不仅是整个模型的准确性。 它还告诉您每个变量的运行情况,是否有不需要的变量,数据中是否存在自相关等。

It is also one of the easier and more intuitive techniques to understand, and it provides a good basis for learning more advanced concepts and techniques. This post explains how to perform linear regression using the statsmodels Python package.

它也是一种更容易理解,更直观的技术,并且为学习更高级的概念和技术提供了良好的基础。 这篇文章说明了如何使用statsmodels Python软件包执行线性回归。

Note: There is also a Logit Regression which is similar to Sklearn’s Logistic Regression and works for classification problems.

注意:还有一个Logit回归,类似于Sklearn的Logistic回归,适用于分类问题。

OLS reflects the relationship between X and y variables following the simple formula:

OLS按照以下简单公式反映X和y变量之间的关系:

Y = b1X +b0 #Simple Linear

Y = b1X + b0#简单线性

𝑦 = b0 + b1X1 + b2X2…. + 𝜀 #Multi Linear

𝑦= b0 + b1X1 + b2X2…。 + 𝜀#多线性

Where

哪里

· b0 — y — intercept

·b0 — y —截距

· b1,b2 — slope

·b1,b2 —斜率

· X, X1, X2 — predictor

·X,X1,X2-预测变量

· y — Target variable

·y-目标变量

OLS is an estimator in which the values of b1 and b0 (from the above equation) are chosen in such a way as to minimize the sum of the squares of the differences between the observed dependent variable and predicted dependent variable. That’s why it’s named ordinary least squares.

OLS是一种估计器,其中b1和b0的值(根据上述方程式)的选择方式应使所观察到的因变量与预测因变量之间的差平方和最小。 这就是为什么它被称为普通最小二乘法。

Also when the model is trying to reduce the error rate between predicted and actual, it means its trying to cut down on losses and predict better. You are trying to predict the impact of your predictors on the results.

同样,当模型试图降低预测和实际之间的错误率时,这意味着它试图减少损失并更好地进行预测。 您正在尝试预测预测变量对结果的影响。

Note: Ideally before computing the model building using OLS, the linear assumptions need to be met. The aim of this article is to interpret all the elements in an OLS model.

注意:理想情况下,在使用OLS计算模型构建之前,需要满足线性假设。 本文的目的是解释OLS模型中的所有元素。

Lets understand this better looking at this example, I have taken a simple dataset — Advertising data:

让我们通过下面的示例更好地了解这一点,我采用了一个简单的数据集-广告数据:

Image for post
Data in consideration. Data shape is 200x4
考虑数据。 数据形状为200x4

In linear models, the coefficient of 1 variable is dependent on other independent variables. Hence if there is a reduction or addition in the data, it will affect the whole model. For example, suppose in the future, we also have another advertising medium say Social Media, we will have to re-fit and re-calculate the coefficients and the constants as they are dependent on dimensions of the dataset.

在线性模型中,1变量的系数取决于其他自变量。 因此,如果数据减少或增加,则会影响整个模型。 例如,假设在将来,我们还有另一种广告媒体,例如“社交媒体”,我们将不得不重新拟合和重新计算系数和常数,因为它们取决于数据集的维数。

In case you want to check out the formula for multi linear regression:

如果您想查看用于多元线性回归的公式:

So practically, it’s not feasible to keep adding variables and checking their linear relationship. The idea is to pick the best of variables using the following 2 steps:

因此,实际上,不断添加变量并检查它们的线性关系是不可行的。 这个想法是通过以下两个步骤来选择最佳变量:

1. Domain Knowledge

1.领域知识

2. Statistical tests — Not only the parametric and non-parametric tests but also check if there is multicollinearity between independent variables and correlation with target variables.

2.统计检验-不仅要进行参数检验和非参数检验,还要检查自变量与目标变量之间的相关性是否存在多重共线性。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值