线性回归

导入所需的包以及创建数据:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import statsmodels.api as sm
import statsmodels.graphics.api as smg
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.pipeline import Pipeline
import seaborn as sns


x = [[0.18], [0.24], [0.3], [0.36], [0.41], [0.48], [0.49], [0.6], [0.62], [0.64], [0.71], [0.72], [0.82], [0.83], [0.96], [0.98],
     [1.0], [1.19], [1.2], [1.2], [1.24], [1.24], [1.25], [1.28], [1.29], [1.42], [1.49], [1.64], [1.96], [1.99], [2.0], [2.09],
     [2.38], [2.39], [2.4], [2.48], [2.5], [2.51], [2.56], [2.84], [3.92], [4.0], [4.76], [5.0]]
y = [[0.12], [0.16], [0.20], [0.24], [0.3], [0.32], [0.39], [0.4], [0.41], [0.45], [0.48], [0.5], [0.59], [0.6], [0.64], [0.67],
     [0.69], [0.7], [0.78], [0.8], [0.81], [0.82], [0.89], [0.9], [0.99], [1.1], [1.11], [1.15], [1.19], [1.2], [1.29], [1.34],
     [1.35], [1.38], [1.39], [1.56], [1.6], [1.64], [1.8], [2.36], [2.39], [2.4], [2.68], [2.76]]

线性回归拟合:

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(x, y)
print("斜率:", model.coef_[0][0])
print("截距:", model.intercept_[0])
print(f"拟合函数为:y={model.intercept_[0]}x+{model.coef_[0][0]}")
plt.scatter(x, y)
X = np.linspace(0, 5.5, 100)
plt.plot(X, model.coef_[0][0]*X+model.intercept_[0], 'g')
plt.plot(x, model.predict(x), 'r')
plt.show()
print("拟合效果:", r2_score(model.predict(x), y))
斜率: 0.5799536736145978
截距: 0.1049651997671357
拟合函数为:y=0.1049651997671357x+0.5799536736145978

在这里插入图片描述

拟合效果: 0.964665988316247

我们也可以利用 import statsmodels.api as sm 进行拟合:

X=pd.DataFrame({"c":np.ones(len(x)),"x":pd.Series([_[0] for _ in x])})
model=sm.OLS(pd.Series([_[0] for _ in y]),X)
result=model.fit()
print(result.params)
print("截距:",result.params['c'])
print("斜率:",result.params['x'])
c    0.104965
x    0.579954
dtype: float64
截距: 0.10496519976713606
斜率: 0.5799536736145978

感觉一次线性拟合不是很好,我们可以采用更高次幂的线性函数进行拟合:

from sklearn.preprocessing import PolynomialFeatures

二次:

ploy=PolynomialFeatures(2,include_bias=True)
x_ploy=ploy.fit_transform(x)
model=LinearRegression()
model.fit(x_ploy,y)
print("一次项系数:",model.coef_[0][1])
print("二次项系数:",model.coef_[0][2])
print("截距:",model.intercept_[0])
print(f"拟合函数为:y={model.coef_[0][2]}x^2+{model.coef_[0][1]}x+{model.intercept_[0]}")
plt.scatter(x, y)
X = np.linspace(0, 5.5, 100)
plt.plot(X,model.coef_[0][1]*X+model.coef_[0][2]*X**2+model.intercept_[0],'g')
plt.plot(x, model.predict(x_ploy),'r')
plt.show()
print("拟合效果:",r2_score(model.predict(x_ploy),y))
一次项系数: 0.7469011589908677
二次项系数: -0.036181545253088934
截距: -0.021172564638651803
拟合函数为:y=-0.036181545253088934x^2+0.7469011589908677x+-0.021172564638651803

在这里插入图片描述

拟合效果: 0.973319174087569

这种拟合情况相当于利用管道进行拟合:

model=Pipeline([('poly',PolynomialFeatures(2)),('liner',LinearRegression())])
model=model.fit(x,y)
plt.scatter(x,y)
plt.plot(x,model.predict(x),'g')
plt.show()
print("拟合效果:",r2_score(model.predict(x),y))

在这里插入图片描述

拟合效果: 0.973319174087569
model=make_pipeline(PolynomialFeatures(5),LinearRegression())
model=model.fit(x,y)
plt.scatter(x,y)
plt.plot(x,model.predict(x),'g')
plt.show()
print("拟合效果:",r2_score(model.predict(x),y))

在这里插入图片描述

拟合效果: 0.9758071614095045

不管是Pipeline还是make_pipeline,他们的目标都是将若干个转换器评估器连接起来。上面的PolynomialFeatures就是转换器LinearRegression就是评估器

import statsmodels.api as sm 也可以得到相同的结果:

X=pd.DataFrame({"c":np.ones(len(x)),"x":pd.Series([_[0] for _ in x]),"x^2":pd.Series([_[0] for _ in x])**2})
model=sm.OLS(pd.Series([_[0] for _ in y]),X)
result=model.fit()
print(result.params)
print("截距:",result.params['c'])
print("一次项系数:",result.params['x'])
print("二次项系数:",result.params['x^2'])
c     -0.021173
x      0.746901
x^2   -0.036182
dtype: float64
截距: -0.02117256463865172
一次项系数: 0.7469011589908683
二次项系数: -0.036181545253089045

三次:

ploy=PolynomialFeatures(3,include_bias=True)
x_ploy=ploy.fit_transform(x)
model=LinearRegression()
model.fit(x_ploy,y)
print("一次项系数:",model.coef_[0][1])
print("二次项系数:",model.coef_[0][2])
print("三次项系数:",model.coef_[0][3])
print("截距:",model.intercept_[0])
print(f"拟合函数为:y={model.coef_[0][3]}x^3+{model.coef_[0][2]}x^2+{model.coef_[0][1]}x+{model.intercept_[0]}")
plt.scatter(x, y)
X = np.linspace(0, 5.5, 100)
plt.plot(X,model.coef_[0][3]*X**3+model.coef_[0][1]*X+model.coef_[0][2]*X**2+model.intercept_[0],'g')
plt.plot(x, model.predict(x_ploy),'r')
plt.show()
print("拟合效果:",r2_score(model.predict(x_ploy),y))
一次项系数: 0.6121695193895076
二次项系数: 0.035088817026739584
三次项系数: -0.009720786309123671
截距: 0.03716717408435666
拟合函数为:y=-0.009720786309123671x^3+0.035088817026739584x^2+0.6121695193895076x+0.03716717408435666

在这里插入图片描述

拟合效果: 0.9741082686199946

四次:

ploy=PolynomialFeatures(4,include_bias=True)
x_ploy=ploy.fit_transform(x)
model=LinearRegression()
model.fit(x_ploy,y)
print("一次项系数:",model.coef_[0][1])
print("二次项系数:",model.coef_[0][2])
print("三次项系数:",model.coef_[0][3])
print("四次项系数:",model.coef_[0][4])
print("截距:",model.intercept_[0])
print(f"拟合函数为:y={model.coef_[0][4]}x^4+{model.coef_[0][3]}x^3+{model.coef_[0][2]}x^2+{model.coef_[0][1]}x+{model.intercept_[0]}")
plt.scatter(x, y)
X = np.linspace(0, 5.5, 100)
plt.plot(X,model.coef_[0][4]*X**4+model.coef_[0][3]*X**3+model.coef_[0][1]*X+model.coef_[0][2]*X**2+model.intercept_[0],'g')
plt.plot(x, model.predict(x_ploy),'r')
plt.show()
print("拟合效果:",r2_score(model.predict(x_ploy),y))
一次项系数: 0.8237407976439868
二次项系数: -0.15713093509088588
三次项系数: 0.05250537654853375
四次项系数: -0.00641270191634358
截距: -0.025611612440094467
拟合函数为:y=-0.00641270191634358x^4+0.05250537654853375x^3+-0.15713093509088588x^2+0.8237407976439868x+-0.025611612440094467

在这里插入图片描述

拟合效果: 0.9747156306781533

五次:

ploy=PolynomialFeatures(5,include_bias=True)
x_ploy=ploy.fit_transform(x)
model=LinearRegression()
model.fit(x_ploy,y)
print("一次项系数:",model.coef_[0][1])
print("二次项系数:",model.coef_[0][2])
print("三次项系数:",model.coef_[0][3])
print("四次项系数:",model.coef_[0][4])
print("五次项系数:",model.coef_[0][5])
print("截距:",model.intercept_[0])
print(f"拟合函数为:y={model.coef_[0][5]}^5+{model.coef_[0][4]}x^4+{model.coef_[0][3]}x^3+{model.coef_[0][2]}x^2+{model.coef_[0][1]}x+{model.intercept_[0]}")
plt.scatter(x, y)
X = np.linspace(0, 5.5, 100)
plt.plot(X,model.coef_[0][5]*X**5+model.coef_[0][4]*X**4+model.coef_[0][3]*X**3+model.coef_[0][1]*X+model.coef_[0][2]*X**2+model.intercept_[0],'g')
plt.plot(x, model.predict(x_ploy),'r')
plt.show()
print("拟合效果:",r2_score(model.predict(x_ploy),y))
一次项系数: 1.502446352683112
二次项系数: -1.0642795452335645
三次项系数: 0.5350710566181669
四次项系数: -0.11523898348823729
五次项系数: 0.008694307907109533
截距: -0.17266066636826705
拟合函数为:y=0.008694307907109533^5+-0.11523898348823729x^4+0.5350710566181669x^3+-1.0642795452335645x^2+1.502446352683112x+-0.17266066636826705

在这里插入图片描述

拟合效果: 0.9758071614095045

可见,貌似拟合次幂越高,拟合效果越好

我们来对比一下五次幂拟合和一次线性拟合的情况:

X = pd.DataFrame({"c": np.ones(len(x)), "x": pd.Series([_[0] for _ in x]), "x^2": pd.Series([_[0] for _ in x])**2,
                  "x^3": pd.Series([_[0] for _ in x])**3, "x^4": pd.Series([_[0] for _ in x])**4, "x^5": pd.Series([_[0] for _ in x])**5})
model = sm.OLS(pd.Series([_[0] for _ in y]), X)
result5 = model.fit()
print(result5.params)
print("截距:", result5.params['c'])
print("一次项系数:", result5.params['x'])
print("二次项系数:", result5.params['x^2'])
print("三次项系数:", result5.params['x^3'])
print("四次项系数:", result5.params['x^4'])
print("五次项系数:", result5.params['x^5'])
c     -0.172661
x      1.502446
x^2   -1.064280
x^3    0.535071
x^4   -0.115239
x^5    0.008694
dtype: float64
截距: -0.17266066636842506
一次项系数: 1.502446352683135
二次项系数: -1.0642795452334017
三次项系数: 0.5350710566180569
四次项系数: -0.11523898348821326
五次项系数: 0.00869430790710762
result.summary()
OLS Regression Results
Dep. Variable:y R-squared: 0.974
Model:OLS Adj. R-squared: 0.973
Method:Least Squares F-statistic: 768.3
Date:Thu, 24 Sep 2020 Prob (F-statistic):3.18e-33
Time:10:21:01 Log-Likelihood: 34.613
No. Observations: 44 AIC: -63.23
Df Residuals: 41 BIC: -57.87
Df Model: 2
Covariance Type:nonrobust
coefstd errtP>|t|[0.0250.975]
c -0.0212 0.046 -0.462 0.647 -0.114 0.071
x 0.7469 0.049 15.275 0.000 0.648 0.846
x^2 -0.0362 0.010 -3.584 0.001 -0.057 -0.016
Omnibus:49.431 Durbin-Watson: 1.073
Prob(Omnibus): 0.000 Jarque-Bera (JB): 316.704
Skew: 2.578 Prob(JB): 1.69e-69
Kurtosis:15.090 Cond. No. 27.2

result5.summary()
OLS Regression Results
Dep. Variable:y R-squared: 0.976
Model:OLS Adj. R-squared: 0.973
Method:Least Squares F-statistic: 314.1
Date:Thu, 24 Sep 2020 Prob (F-statistic):8.20e-30
Time:10:21:14 Log-Likelihood: 36.714
No. Observations: 44 AIC: -61.43
Df Residuals: 38 BIC: -50.72
Df Model: 5
Covariance Type:nonrobust
coefstd errtP>|t|[0.0250.975]
c -0.1727 0.149 -1.162 0.252 -0.473 0.128
x 1.5024 0.584 2.573 0.014 0.320 2.685
x^2 -1.0643 0.732 -1.453 0.154 -2.547 0.418
x^3 0.5351 0.379 1.412 0.166 -0.232 1.302
x^4 -0.1152 0.084 -1.365 0.180 -0.286 0.056
x^5 0.0087 0.007 1.293 0.204 -0.005 0.022
Omnibus:38.240 Durbin-Watson: 1.163
Prob(Omnibus): 0.000 Jarque-Bera (JB): 157.238
Skew: 2.023 Prob(JB): 7.18e-35
Kurtosis:11.331 Cond. No. 3.88e+04

通过看R-squared可以看出result5比result大,但是还是不能有力说明五次幂的拟合比线性拟合好,因为还没有进行残差检验。

fig,ax=plt.subplots()
x=np.linspace(-2.,2.1,100)
plt.plot(x,x,'g')
smg.qqplot(result.resid,ax=ax,color='b')
smg.qqplot(result5.resid,ax=ax,color='r')
fig.tight_layout()

在这里插入图片描述

通过这张图也可以略微看出,红色线是比蓝色线要略微靠近绿色线的,所以基本可以认为五次幂的拟合比线性拟合好一些。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值