目录
对statsmodels库的线性回归模型整理可以与sklearn库的线性回归模型的整理互相参考借鉴https://blog.csdn.net/qq_57099024/article/details/122324764https://blog.csdn.net/qq_57099024/article/details/122324764
简单线性回归模型
import statsmodels.formula.api as smf
import seaborn as sns
import pandas as pd
tips=sns.load_dataset('tips')#下载seaborn自带的数据集tips
print(tips.head())#查看获取的数据集tips的前五行
total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4
#指定模型,波浪线左边是响应变量,右边是自变量
model=smf.ols(formula='tip~total_bill',data=tips)
#使用fit方法拟合模型
results=model.fit()
#使用summary方法查看拟合出来的模型的结果
print(results.summary())
OLS Regression Results ============================================================================== Dep. Variable: tip R-squared: 0.457 Model: OLS Adj. R-squared: 0.454 Method: Least Squares F-statistic: 203.4 Date: Wed, 05 Jan 2022 Prob (F-statistic): 6.69e-34 Time: 14:34:42 Log-Likelihood: -350.54 No. Observations: 244 AIC: 705.1 Df Residuals: 242 BIC: 712.1 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept 0.9203 0.160 5.761 0.000 0.606 1.235 total_bill 0.1050 0.007 14.260 0.000 0.091 0.120 ============================================================================== Omnibus: 20.185 Durbin-Watson: 2.151 Prob(Omnibus): 0.000 Jarque-Bera (JB): 37.750 Skew: 0.443 Prob(JB): 6.35e-09 Kurtosis: 4.711 Cond. No. 53.0 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
结果中包含模型的Intercept(截距)和total_bill。有了这些参数就能得到直线方程y=0.105x+0.920。可以将这些数字解释为:total_bill每增加一个单位(即每次消费额增加1美元),消费就增加0.105个单位如果只需要系数,可以结束results的params属性来获得。
print(results.params)
Intercept 0.920270
total_bill 0.105025
dtype: float64
多元线性回归模型
statsmodels会自动创建虚拟变量,并且删除参考变量来避免多重共线性,比如性别分为男女两类,那么系统会选定第一个男为参考变量,删除之后就不会将男这一列转换成虚拟变量,也就不会成为影响响应变量的因子
import statsmodels.formula.api as smf
import seaborn as sns
import pandas as pd
tips=sns.load_dataset('tips')
print(tips.head())
print('----'*10)#输出横线以区分输出
#使用加号将多个自变量传入
model=smf.ols(formula='tip~total_bill+size+sex+smoker+day+time',data=tips)
results=model.fit()
print(results.summary())
print('----'*10)
print(results.params)
total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 ---------------------------------------- OLS Regression Results ============================================================================== Dep. Variable: tip R-squared: 0.470 Model: OLS Adj. R-squared: 0.452 Method: Least Squares F-statistic: 26.06 Date: Wed, 05 Jan 2022 Prob (F-statistic): 1.20e-28 Time: 16:27:12 Log-Likelihood: -347.48 No. Observations: 244 AIC: 713.0 Df Residuals: 235 BIC: 744.4 Df Model: 8 Covariance Type: nonrobust ================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------- Intercept 0.5908 0.256 2.310 0.022 0.087 1.095 sex[T.Female] 0.0324 0.142 0.229 0.819 -0.247 0.311 smoker[T.No] 0.0864 0.147 0.589 0.556 -0.202 0.375 day[T.Fri] 0.1623 0.393 0.412 0.680 -0.613 0.937 day[T.Sat] 0.0408 0.471 0.087 0.931 -0.886 0.968 day[T.Sun] 0.1368 0.472 0.290 0.772 -0.793 1.066 time[T.Dinner] -0.0681 0.445 -0.153 0.878 -0.944 0.808 total_bill 0.0945 0.010 9.841 0.000 0.076 0.113 size 0.1760 0.090 1.966 0.051 -0.000 0.352 ============================================================================== Omnibus: 27.860 Durbin-Watson: 2.096 Prob(Omnibus): 0.000 Jarque-Bera (JB): 52.555 Skew: 0.607 Prob(JB): 3.87e-12 Kurtosis: 4.923 Cond. No. 281. ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. ---------------------------------------- Intercept 0.590837 sex[T.Female] 0.032441 smoker[T.No] 0.086408 day[T.Fri] 0.162259 day[T.Sat] 0.040801 day[T.Sun] 0.136779 time[T.Dinner] -0.068129 total_bill 0.094487 size 0.175992 dtype: float64