statsmodels库——线性回归模型

目录

简单线性回归模型

多元线性回归模型


对statsmodels库的线性回归模型整理可以与sklearn库的线性回归模型的整理互相参考借鉴​​​​​​​https://blog.csdn.net/qq_57099024/article/details/122324764icon-default.png?t=LBL2https://blog.csdn.net/qq_57099024/article/details/122324764

简单线性回归模型

import statsmodels.formula.api as smf
import seaborn as sns
import pandas as pd
tips=sns.load_dataset('tips')#下载seaborn自带的数据集tips
print(tips.head())#查看获取的数据集tips的前五行
   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4
#指定模型,波浪线左边是响应变量,右边是自变量
model=smf.ols(formula='tip~total_bill',data=tips)
#使用fit方法拟合模型
results=model.fit()
#使用summary方法查看拟合出来的模型的结果
print(results.summary())
OLS Regression Results                            
==============================================================================
Dep. Variable:                    tip   R-squared:                       0.457
Model:                            OLS   Adj. R-squared:                  0.454
Method:                 Least Squares   F-statistic:                     203.4
Date:                Wed, 05 Jan 2022   Prob (F-statistic):           6.69e-34
Time:                        14:34:42   Log-Likelihood:                -350.54
No. Observations:                 244   AIC:                             705.1
Df Residuals:                     242   BIC:                             712.1
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.9203      0.160      5.761      0.000       0.606       1.235
total_bill     0.1050      0.007     14.260      0.000       0.091       0.120
==============================================================================
Omnibus:                       20.185   Durbin-Watson:                   2.151
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               37.750
Skew:                           0.443   Prob(JB):                     6.35e-09
Kurtosis:                       4.711   Cond. No.                         53.0
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

结果中包含模型的Intercept(截距)和total_bill。有了这些参数就能得到直线方程y=0.105x+0.920。可以将这些数字解释为:total_bill每增加一个单位(即每次消费额增加1美元),消费就增加0.105个单位如果只需要系数,可以结束results的params属性来获得。

print(results.params)

Intercept     0.920270
total_bill    0.105025
dtype: float64

多元线性回归模型

statsmodels会自动创建虚拟变量,并且删除参考变量来避免多重共线性,比如性别分为男女两类,那么系统会选定第一个男为参考变量,删除之后就不会将男这一列转换成虚拟变量,也就不会成为影响响应变量的因子

import statsmodels.formula.api as smf
import seaborn as sns
import pandas as pd
tips=sns.load_dataset('tips')
print(tips.head())
print('----'*10)#输出横线以区分输出
#使用加号将多个自变量传入
model=smf.ols(formula='tip~total_bill+size+sex+smoker+day+time',data=tips)
results=model.fit()
print(results.summary())
print('----'*10)
print(results.params)
   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4
----------------------------------------
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    tip   R-squared:                       0.470
Model:                            OLS   Adj. R-squared:                  0.452
Method:                 Least Squares   F-statistic:                     26.06
Date:                Wed, 05 Jan 2022   Prob (F-statistic):           1.20e-28
Time:                        16:27:12   Log-Likelihood:                -347.48
No. Observations:                 244   AIC:                             713.0
Df Residuals:                     235   BIC:                             744.4
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
Intercept          0.5908      0.256      2.310      0.022       0.087       1.095
sex[T.Female]      0.0324      0.142      0.229      0.819      -0.247       0.311
smoker[T.No]       0.0864      0.147      0.589      0.556      -0.202       0.375
day[T.Fri]         0.1623      0.393      0.412      0.680      -0.613       0.937
day[T.Sat]         0.0408      0.471      0.087      0.931      -0.886       0.968
day[T.Sun]         0.1368      0.472      0.290      0.772      -0.793       1.066
time[T.Dinner]    -0.0681      0.445     -0.153      0.878      -0.944       0.808
total_bill         0.0945      0.010      9.841      0.000       0.076       0.113
size               0.1760      0.090      1.966      0.051      -0.000       0.352
==============================================================================
Omnibus:                       27.860   Durbin-Watson:                   2.096
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               52.555
Skew:                           0.607   Prob(JB):                     3.87e-12
Kurtosis:                       4.923   Cond. No.                         281.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
----------------------------------------
Intercept         0.590837
sex[T.Female]     0.032441
smoker[T.No]      0.086408
day[T.Fri]        0.162259
day[T.Sat]        0.040801
day[T.Sun]        0.136779
time[T.Dinner]   -0.068129
total_bill        0.094487
size              0.175992
dtype: float64
  • 2
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

爱打羽毛球的小怪兽

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值