线性回归

最新推荐文章于 2022-04-09 12:03:02 发布

philpy_used

最新推荐文章于 2022-04-09 12:03:02 发布

阅读量440

点赞数

分类专栏：机器学习 Python 文章标签： python 可视化机器学习

本文链接：https://blog.csdn.net/qq_43613793/article/details/108769607

版权

Python 同时被 2 个专栏收录

37 篇文章 6 订阅

订阅专栏

机器学习

2 篇文章 0 订阅

订阅专栏

导入所需的包以及创建数据:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import statsmodels.api as sm
import statsmodels.graphics.api as smg
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.pipeline import Pipeline
import seaborn as sns


x = [[0.18], [0.24], [0.3], [0.36], [0.41], [0.48], [0.49], [0.6], [0.62], [0.64], [0.71], [0.72], [0.82], [0.83], [0.96], [0.98],
     [1.0], [1.19], [1.2], [1.2], [1.24], [1.24], [1.25], [1.28], [1.29], [1.42], [1.49], [1.64], [1.96], [1.99], [2.0], [2.09],
     [2.38], [2.39], [2.4], [2.48], [2.5], [2.51], [2.56], [2.84], [3.92], [4.0], [4.76], [5.0]]
y = [[0.12], [0.16], [0.20], [0.24], [0.3], [0.32], [0.39], [0.4], [0.41], [0.45], [0.48], [0.5], [0.59], [0.6], [0.64], [0.67],
     [0.69], [0.7], [0.78], [0.8], [0.81], [0.82], [0.89], [0.9], [0.99], [1.1], [1.11], [1.15], [1.19], [1.2], [1.29], [1.34],
     [1.35], [1.38], [1.39], [1.56], [1.6], [1.64], [1.8], [2.36], [2.39], [2.4], [2.68], [2.76]]

线性回归拟合:

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(x, y)
print("斜率:", model.coef_[0][0])
print("截距:", model.intercept_[0])
print(f"拟合函数为:y={model.intercept_[0]}x+{model.coef_[0][0]}")
plt.scatter(x, y)
X = np.linspace(0, 5.5, 100)
plt.plot(X, model.coef_[0][0]*X+model.intercept_[0], 'g')
plt.plot(x, model.predict(x), 'r')
plt.show()
print("拟合效果:", r2_score(model.predict(x), y))

斜率: 0.5799536736145978
截距: 0.1049651997671357
拟合函数为:y=0.1049651997671357x+0.5799536736145978

在这里插入图片描述

拟合效果: 0.964665988316247

我们也可以利用 `import statsmodels.api as sm` 进行拟合:

X=pd.DataFrame({"c":np.ones(len(x)),"x":pd.Series([_[0] for _ in x])})
model=sm.OLS(pd.Series([_[0] for _ in y]),X)
result=model.fit()
print(result.params)
print("截距:",result.params['c'])
print("斜率:",result.params['x'])

c    0.104965
x    0.579954
dtype: float64
截距: 0.10496519976713606
斜率: 0.5799536736145978

感觉一次线性拟合不是很好，我们可以采用更高次幂的线性函数进行拟合:

from sklearn.preprocessing import PolynomialFeatures

二次:

ploy=PolynomialFeatures(2,include_bias=True)
x_ploy=ploy.fit_transform(x)
model=LinearRegression()
model.fit(x_ploy,y)
print("一次项系数:",model.coef_[0][1])
print("二次项系数:",model.coef_[0][2])
print("截距:",model.intercept_[0])
print(f"拟合函数为:y={model.coef_[0][2]}x^2+{model.coef_[0][1]}x+{model.intercept_[0]}")
plt.scatter(x, y)
X = np.linspace(0, 5.5, 100)
plt.plot(X,model.coef_[0][1]*X+model.coef_[0][2]*X**2+model.intercept_[0],'g')
plt.plot(x, model.predict(x_ploy),'r')
plt.show()
print("拟合效果:",r2_score(model.predict(x_ploy),y))

一次项系数: 0.7469011589908677
二次项系数: -0.036181545253088934
截距: -0.021172564638651803
拟合函数为:y=-0.036181545253088934x^2+0.7469011589908677x+-0.021172564638651803

在这里插入图片描述

拟合效果: 0.973319174087569

这种拟合情况相当于利用管道进行拟合:

model=Pipeline([('poly',PolynomialFeatures(2)),('liner',LinearRegression())])
model=model.fit(x,y)
plt.scatter(x,y)
plt.plot(x,model.predict(x),'g')
plt.show()
print("拟合效果:",r2_score(model.predict(x),y))

在这里插入图片描述

拟合效果: 0.973319174087569

model=make_pipeline(PolynomialFeatures(5),LinearRegression())
model=model.fit(x,y)
plt.scatter(x,y)
plt.plot(x,model.predict(x),'g')
plt.show()
print("拟合效果:",r2_score(model.predict(x),y))

在这里插入图片描述

拟合效果: 0.9758071614095045

不管是Pipeline还是make_pipeline,他们的目标都是将若干个转换器与评估器连接起来。上面的PolynomialFeatures就是转换器，LinearRegression就是评估器。

import statsmodels.api as sm 也可以得到相同的结果:

X=pd.DataFrame({"c":np.ones(len(x)),"x":pd.Series([_[0] for _ in x]),"x^2":pd.Series([_[0] for _ in x])**2})
model=sm.OLS(pd.Series([_[0] for _ in y]),X)
result=model.fit()
print(result.params)
print("截距:",result.params['c'])
print("一次项系数:",result.params['x'])
print("二次项系数:",result.params['x^2'])

c     -0.021173
x      0.746901
x^2   -0.036182
dtype: float64
截距: -0.02117256463865172
一次项系数: 0.7469011589908683
二次项系数: -0.036181545253089045

三次:

ploy=PolynomialFeatures(3,include_bias=True)
x_ploy=ploy.fit_transform(x)
model=LinearRegression()
model.fit(x_ploy,y)
print("一次项系数:",model.coef_[0][1])
print("二次项系数:",model.coef_[0][2])
print("三次项系数:",model.coef_[0][3])
print("截距:",model.intercept_[0])
print(f"拟合函数为:y={model.coef_[0][3]}x^3+{model.coef_[0][2]}x^2+{model.coef_[0][1]}x+{model.intercept_[0]}")
plt.scatter(x, y)
X = np.linspace(0, 5.5, 100)
plt.plot(X,model.coef_[0][3]*X**3+model.coef_[0][1]*X+model.coef_[0][2]*X**2+model.intercept_[0],'g')
plt.plot(x, model.predict(x_ploy),'r')
plt.show()
print("拟合效果:",r2_score(model.predict(x_ploy),y))

一次项系数: 0.6121695193895076
二次项系数: 0.035088817026739584
三次项系数: -0.009720786309123671
截距: 0.03716717408435666
拟合函数为:y=-0.009720786309123671x^3+0.035088817026739584x^2+0.6121695193895076x+0.03716717408435666

在这里插入图片描述

拟合效果: 0.9741082686199946

四次:

ploy=PolynomialFeatures(4,include_bias=True)
x_ploy=ploy.fit_transform(x)
model=LinearRegression()
model.fit(x_ploy,y)
print("一次项系数:",model.coef_[0][1])
print("二次项系数:",model.coef_[0][2])
print("三次项系数:",model.coef_[0][3])
print("四次项系数:",model.coef_[0][4])
print("截距:",model.intercept_[0])
print(f"拟合函数为:y={model.coef_[0][4]}x^4+{model.coef_[0][3]}x^3+{model.coef_[0][2]}x^2+{model.coef_[0][1]}x+{model.intercept_[0]}")
plt.scatter(x, y)
X = np.linspace(0, 5.5, 100)
plt.plot(X,model.coef_[0][4]*X**4+model.coef_[0][3]*X**3+model.coef_[0][1]*X+model.coef_[0][2]*X**2+model.intercept_[0],'g')
plt.plot(x, model.predict(x_ploy),'r')
plt.show()
print("拟合效果:",r2_score(model.predict(x_ploy),y))

一次项系数: 0.8237407976439868
二次项系数: -0.15713093509088588
三次项系数: 0.05250537654853375
四次项系数: -0.00641270191634358
截距: -0.025611612440094467
拟合函数为:y=-0.00641270191634358x^4+0.05250537654853375x^3+-0.15713093509088588x^2+0.8237407976439868x+-0.025611612440094467

在这里插入图片描述

拟合效果: 0.9747156306781533

五次:

ploy=PolynomialFeatures(5,include_bias=True)
x_ploy=ploy.fit_transform(x)
model=LinearRegression()
model.fit(x_ploy,y)
print("一次项系数:",model.coef_[0][1])
print("二次项系数:",model.coef_[0][2])
print("三次项系数:",model.coef_[0][3])
print("四次项系数:",model.coef_[0][4])
print("五次项系数:",model.coef_[0][5])
print("截距:",model.intercept_[0])
print(f"拟合函数为:y={model.coef_[0][5]}^5+{model.coef_[0][4]}x^4+{model.coef_[0][3]}x^3+{model.coef_[0][2]}x^2+{model.coef_[0][1]}x+{model.intercept_[0]}")
plt.scatter(x, y)
X = np.linspace(0, 5.5, 100)
plt.plot(X,model.coef_[0][5]*X**5+model.coef_[0][4]*X**4+model.coef_[0][3]*X**3+model.coef_[0][1]*X+model.coef_[0][2]*X**2+model.intercept_[0],'g')
plt.plot(x, model.predict(x_ploy),'r')
plt.show()
print("拟合效果:",r2_score(model.predict(x_ploy),y))

一次项系数: 1.502446352683112
二次项系数: -1.0642795452335645
三次项系数: 0.5350710566181669
四次项系数: -0.11523898348823729
五次项系数: 0.008694307907109533
截距: -0.17266066636826705
拟合函数为:y=0.008694307907109533^5+-0.11523898348823729x^4+0.5350710566181669x^3+-1.0642795452335645x^2+1.502446352683112x+-0.17266066636826705

在这里插入图片描述

拟合效果: 0.9758071614095045

可见，貌似拟合次幂越高，拟合效果越好

我们来对比一下五次幂拟合和一次线性拟合的情况:

X = pd.DataFrame({"c": np.ones(len(x)), "x": pd.Series([_[0] for _ in x]), "x^2": pd.Series([_[0] for _ in x])**2,
                  "x^3": pd.Series([_[0] for _ in x])**3, "x^4": pd.Series([_[0] for _ in x])**4, "x^5": pd.Series([_[0] for _ in x])**5})
model = sm.OLS(pd.Series([_[0] for _ in y]), X)
result5 = model.fit()
print(result5.params)
print("截距:", result5.params['c'])
print("一次项系数:", result5.params['x'])
print("二次项系数:", result5.params['x^2'])
print("三次项系数:", result5.params['x^3'])
print("四次项系数:", result5.params['x^4'])
print("五次项系数:", result5.params['x^5'])

c     -0.172661
x      1.502446
x^2   -1.064280
x^3    0.535071
x^4   -0.115239
x^5    0.008694
dtype: float64
截距: -0.17266066636842506
一次项系数: 1.502446352683135
二次项系数: -1.0642795452334017
三次项系数: 0.5350710566180569
四次项系数: -0.11523898348821326
五次项系数: 0.00869430790710762

result.summary()

OLS Regression Results
Dep. Variable:	y	R-squared:	0.974
Model:	OLS	Adj. R-squared:	0.973
Method:	Least Squares	F-statistic:	768.3
Date:	Thu, 24 Sep 2020	Prob (F-statistic):	3.18e-33
Time:	10:21:01	Log-Likelihood:	34.613
No. Observations:	44	AIC:	-63.23
Df Residuals:	41	BIC:	-57.87
Df Model:	2
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
c	-0.0212	0.046	-0.462	0.647	-0.114	0.071
x	0.7469	0.049	15.275	0.000	0.648	0.846
x^2	-0.0362	0.010	-3.584	0.001	-0.057	-0.016

Omnibus:	49.431	Durbin-Watson:	1.073
Prob(Omnibus):	0.000	Jarque-Bera (JB):	316.704
Skew:	2.578	Prob(JB):	1.69e-69
Kurtosis:	15.090	Cond. No.	27.2

result5.summary()

OLS Regression Results
Dep. Variable:	y	R-squared:	0.976
Model:	OLS	Adj. R-squared:	0.973
Method:	Least Squares	F-statistic:	314.1
Date:	Thu, 24 Sep 2020	Prob (F-statistic):	8.20e-30
Time:	10:21:14	Log-Likelihood:	36.714
No. Observations:	44	AIC:	-61.43
Df Residuals:	38	BIC:	-50.72
Df Model:	5
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
c	-0.1727	0.149	-1.162	0.252	-0.473	0.128
x	1.5024	0.584	2.573	0.014	0.320	2.685
x^2	-1.0643	0.732	-1.453	0.154	-2.547	0.418
x^3	0.5351	0.379	1.412	0.166	-0.232	1.302
x^4	-0.1152	0.084	-1.365	0.180	-0.286	0.056
x^5	0.0087	0.007	1.293	0.204	-0.005	0.022

Omnibus:	38.240	Durbin-Watson:	1.163
Prob(Omnibus):	0.000	Jarque-Bera (JB):	157.238
Skew:	2.023	Prob(JB):	7.18e-35
Kurtosis:	11.331	Cond. No.	3.88e+04

通过看R-squared可以看出result5比result大，但是还是不能有力说明五次幂的拟合比线性拟合好，因为还没有进行残差检验。

fig,ax=plt.subplots()
x=np.linspace(-2.,2.1,100)
plt.plot(x,x,'g')
smg.qqplot(result.resid,ax=ax,color='b')
smg.qqplot(result5.resid,ax=ax,color='r')
fig.tight_layout()

在这里插入图片描述

通过这张图也可以略微看出，红色线是比蓝色线要略微靠近绿色线的，所以基本可以认为五次幂的拟合比线性拟合好一些。

philpy_used

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
线性回归

导入所需的包以及创建数据:import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import r2_scoreimport statsmodels.api as smimport statsmodels.graphics.api as smgfrom sklearn.pre
复制链接

扫一扫