线性回归
1.导入线性回归模型
from sklearn import linear_model
linerg = linear_model.LinearRegression()
2.导入Diabetes数据集并分成训练集和测试集两部分
from sklearn import datasets
diabetes = datasets.load_diabetes()
x_train = diabetes.data[:-20]
y_train = diabetes.target[:-20]
x_test = diabetes.data[-20:]
y_test = diabetes.target[-20:]
3.调用分类器构造函数,用fit()函数进行训练
linreg.fit(x_train,y_train)
4.调用预测模型的coef_属性,得到每种数据的回归系数b
linreg.coef_
print(linreg.coef_)
输出b值:
[ 3.03499549e-01 -2.37639315e+02 5.10530605e+02 3.27736980e+02
-8.14131709e+02 4.92814588e+02 1.02848452e+02 1.84606489e+02
7.43519617e+02 7.60951722e+01]
5.在预测模型linreg上调用predict()函数进行预测并与实际值对比
linreg.predict(x_test)
print (linreg.predict(x_test))
print(y_test)
输出预测值:
[197.61846908 155.43979328 172.88665147 111.53537279 164.80054784
131.06954875 259.12237761 100.47935157 117.0601052 124.30503555
218.36632793 61.19831284 132.25046751 120.3332925 52.54458691
194.03798088 102.57139702 123.56604987 211.0346317 52.60335674]
输出实际值:
[233. 91. 111. 152. 120. 67. 310. 94. 183. 66. 173. 72. 49. 64.
48. 178. 104. 132. 220. 57.]
6.评价预测结果(越接近于1结果越准确)
linreg.score(x_test,y_test)
print(linreg.score(x_test,y_test))
0.5850753022690574
分析单个因素与目标值之间的回归关系并绘制图表
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn import datasets
diabetes = datasets.load_diabetes()
x_train = diabetes.data[:-20]
y_train = diabetes.target[:-20]
x_test = diabetes.data[-20:]
y_test = diabetes.target[-20:]
x0_test = x_test[:,0]
x0_train = x_train[:,0]
x0_test = x0_test[:,np.newaxis]
x0_train = x0_train[:,np.newaxis]
linreg = linear_model.LinearRegression()
linreg.fit(x0_train,y_train)
y = linreg.predict(x0_test)
plt.scatter(x0_test,y_test,color='blue')
plt.plot(x0_test,y,color='r',linewidth=3)
plt.show()
对10个因素进行回归分析并绘制图表
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn import datasets
diabetes = datasets.load_diabetes()
linreg = linear_model.LinearRegression()
x_train = diabetes.data[:-20]
y_train = diabetes.target[:-20]
x_test = diabetes.data[-20:]
y_test = diabetes.target[-20:]
plt.figure(figsize=(8,12))
for f in range(0,10):
xi_test = x_test[:,f]
xi_train = x_train[:,f]
xi_test = xi_test[:,np.newaxis] #增加维度
xi_train = xi_train[:,np.newaxis] #增加维度
linreg.fit(xi_train,y_train)
y = linreg.predict(xi_test)
plt.subplot(5,2,f+1)
plt.scatter(xi_test,y_test,color='b')
plt.plot(xi_test,y,color='r',linewidth=3)
plt.show()
参考:
法比奥·内利. Python数据分析实战:第2版.北京:人民邮电出版社, 2019.11.