Boston房价预测 bug记录

任务详情
由于网上有很多关于这个的介绍了,就简单记一下数据处理的过程。
任务上写着只把CRIM,RM,LSTAT作为特征,MEDV作为结果来进行预测,那么首先载入数据集

#Load the Boston Housing Data Set from sklearn.datasets and print it
from sklearn.datasets import load_boston
boston = load_boston()
print(boston)

然后把三列特征摘出来

df = pd.DataFrame(boston.data, columns = boston.feature_names)
df_x = df[['CRIM', 'RM','LSTAT']]
df_y = pd.DataFrame(boston.target)
print(df_x)

可以看到,特征数据是这样分布的:


        CRIM     RM  LSTAT
0    0.00632  6.575   4.98
1    0.02731  6.421   9.14
2    0.02729  7.185   4.03
3    0.03237  6.998   2.94
4    0.06905  7.147   5.33
..       ...    ...    ...
501  0.06263  6.593   9.67
502  0.04527  6.120   9.08
503  0.06076  6.976   5.64
504  0.10959  6.794   6.48
505  0.04741  6.030   7.88

然后我们按照要求把训练及测试集合73分

#Initialize the linear regression model
reg = linear_model.LinearRegression()
#Split the data into 70% training and 30% testing data
#NOTE: We have to split the dependent variables (x) and the target or independent variable (y)
x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=0.3, random_state=42)
#Train our model with the training data
reg.fit(x_train, y_train)

最后在预测指标的时候遇到了一些问题,就是pred和test数据怎么都报错
先看一下怎么算的:

error = []
#print(y_pred.shape)
pre = y_pred[:,0]
test = y_test.values
test = test[:,0]

p = pre.tolist()
t = test.tolist()

for i in range(len(y_pred)):
    error.append(t[i] - p[i])

看一下pred的数据分布:

(152, 1)
[[25.95010416]
 [31.04855815]
 [18.2497264 ]
 [26.3019832 ]
 [19.72042823]
......
 [16.17338714]
 [40.34645114]
 [21.0067563 ]
 [19.39310502]]

可以看到这是152行1列的数据,所以我们把第一列的数据全部拿出来计算就行,因此pre = y_pred[:,0]。至于test,如果按照同样的处理方式会发现报错,
在这里插入图片描述
因此先转换成ndarray,然后再取第一列。

之后进行for循环的时候,发现还是报错,在用这样的MAE计算方式时,需要将ndarray转换成list,再加一个数据转换tolist(),然后就可以了。

下面就是输出MAE和MSE

squaredError = []
absError = []
for val in error:
    squaredError.append(val * val)#target-prediction之差平方 
    absError.append(abs(val))#误差绝对值
 
 
#print("Square Error: ", squaredError)
#print("Absolute Value of Error: ", absError)
 
print("MAE = ", sum(absError) / len(absError))#平均绝对误差MAE
print("MSE = ", sum(squaredError) / len(squaredError))#均方误差MSE

全部代码贴到下面:

import pandas as pd
import numpy as np
from sklearn import linear_model
from sklearn.model_selection import train_test_split

#Load the Boston Housing Data Set from sklearn.datasets and print it
from sklearn.datasets import load_boston
boston = load_boston()
print(boston)

#Transform the data set into a data frame 
#NOTE: boston.data = the data we want, 
#      boston.feature_names = the column names of the data
#      boston.target = Our target variable or the price of the houses
df = pd.DataFrame(boston.data, columns = boston.feature_names)
df_x = df[['CRIM', 'RM','LSTAT']]
df_y = pd.DataFrame(boston.target)
print(df_x)

#Initialize the linear regression model
reg = linear_model.LinearRegression()
#Split the data into 70% training and 30% testing data
#NOTE: We have to split the dependent variables (x) and the target or independent variable (y)
x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=0.3, random_state=42)
#Train our model with the training data
reg.fit(x_train, y_train)

#print our price predictions on our test data
y_pred = reg.predict(x_test)

mse = np.sum((y_test - y_pred) ** 2) / len(y_test)
mae = np.sum(np.absolute(y_test - y_pred)) / len(y_test)
print(mae,mae)

error = []
print(y_pred.shape)
print(y_pred)
pre = y_pred[:,0]
test = y_test.values
test = test[:,0]

p = pre.tolist()
t = test.tolist()

for i in range(len(y_pred)):
    error.append(t[i] - p[i])
squaredError = []
absError = []
for val in error:
    squaredError.append(val * val)#target-prediction之差平方 
    absError.append(abs(val))#误差绝对值
 
 
#print("Square Error: ", squaredError)
#print("Absolute Value of Error: ", absError)
 
print("MAE = ", sum(absError) / len(absError))#平均绝对误差MAE
print("MSE = ", sum(squaredError) / len(squaredError))#均方误差MSE

#MAE =  4.111995393754926
#MSE =  29.975964330767486
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值