头文件
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn import datasets
注:
线性回归 LinearRegression 即LR
from sklearn import datasets
sklearn自带的数据集datasets,方便操作
获取房价数据
boston = datasets.load_boston()
data = boston['data']
target = boston['target']
其中:
data 影响房价的数据、特征、属性
target 目标值,即房价
data.shape
结果是:
(506, 13)
说明 :506个样本,13个属性
有多少数据,样本对应多少目标值
一个数据,对应一个目标值
target.shape
结果是:
(506,)
将数据拆分 一份训练,一份测试
index = np.arange(0,506)
np.random.shuffle(index)
train = index[:406]
test = index[406:]
X_train = data[train]
Y_train = target[train]
X_test = data[test]
Y_true = target[test]
train = index(:406) # 索引,根据索引取出数据,训练数据
test = index(406:) #索引,测试数据
X_train = data[train]
Y_train = terget[train]
获取的x_train和x_test 一一对应
X_test = data[test]
Y_true = terget[test]
保留的标记的房价 预测的房价 真实房价
声明模型 进行训练,预测
model = LinearRegression()
model.fit(X_train,Y_train)
Y_pred = model.predict(X_test)
Y_pred
结果是:
array([20.50075736, 20.72774437, 20.87802768, 26.07593572, 22.71525028,
16.92515468, 31.26798373, 27.68266563, 24.19091715, 3.21074807,
24.64495328, 22.4238403 , 33.06983933, 23.58841414, 23.0482368 ,
32.62319049, -6.32154137, 25.21415141, 20.63868884, 41.24112535,
22.3843775 , 16.879722 , 10.55782625, 37.89637563, 20.14648133,
24.36927798, 29.67241285, 25.8406856 , 13.76076023, 20.63249151,
23.83063483, 12.51935251, 20.06784866, 23.56816758, 3.51552896,
21.48507692, 20.72109649, 12.5002905 , 15.72982035, 3.62237213,
30.92930806, 16.68222375, 29.97118796, 15.31861122, 15.32386318,
15.66182274, 22.59024753, 21.78211966, 19.22998993, 23.88447986,
23.56576574, 26.84539069, 18.37990381, 21.22944698, 21.04193568,
27.97529157, 35.92040393, 13.79878956, 0.11307138, 34.14848414,
27.21442574, 13.14905896, 34.54987287, 13.16736632, 17.47280292,
21.58633347, 28.68348662, 22.82792296, 29.12485817, 32.35153257,
18.12161229, 18.97635659, 35.70416709, 26.64392257, 23.42360448,
34.70736954, 30.79135264, 30.90931709, 18.56556942, 19.53298063,
33.14057998, 31.98601864, 32.86551255, 22.85274017, 14.17267339,
22.77013423, 13.41294251, 19.09189614, 35.80394115, 27.83649619,
26.59389651, 17.78284028, 31.30005385, 20.39975022, 20.21175591,
11.98494675, 27.50218546, 23.17096047, 11.55379301, 16.07399542])
model.fit(X_train,Y_train) 训练数据
Y_pred = model.predict(X_test) 预测
Y_true
结果是:
array([19.5, 18.8, 20.9, 22.2, 17.4, 23.1, 32.5, 25.2, 21.4, 8.1, 21.4,
17. , 28.2, 21.4, 20. , 31.6, 7. , 23.8, 18.7, 48.8, 22.7, 19.1,
6.3, 37.6, 20.4, 23.4, 22.9, 24.1, 14.5, 21.4, 23. , 10.9, 17.1,
19.4, 8.4, 19.7, 16.7, 10.5, 15.6, 8.8, 29.4, 10.2, 23. , 15.7,
10.2, 15.6, 21.1, 18.9, 18.3, 20.1, 20.8, 23.3, 10.9, 22. , 21.7,
36.2, 33.4, 13.6, 17.9, 37.9, 23.9, 12.8, 34.9, 13.9, 15.1, 19. ,
23.7, 11.9, 26.4, 31.1, 14.5, 18.9, 35.1, 22.6, 21.7, 35.4, 29.1,
34.7, 19.8, 22.2, 33.1, 29. , 50. , 21.7, 13.1, 23.2, 17.2, 27.5,
38.7, 24.5, 22. , 19.6, 30.7, 24.1, 24.3, 13.4, 23.7, 25. , 13.8,
20.2])
准确率
均方差
from sklearn.metrics import mean_squared_error
mean_squared_error(Y_true,Y_pred)
结果是:
19.924808455888186
平方绝对误差
from sklearn.metrics import mean_absolute_error
mean_absolute_error(Y_true,Y_pred)
结果是:
3.1620001162484743
r2
from sklearn.metrics import r2_score
r2_score(Y_true,Y_pred)
结果是
0.7086439388540885
`