Boston房价预测 bug记录

最新推荐文章于 2024-06-24 11:09:53 发布

Inko_yy

最新推荐文章于 2024-06-24 11:09:53 发布

阅读量359

点赞数

本文链接：https://blog.csdn.net/NEU_YH/article/details/114479868

版权

任务详情
由于网上有很多关于这个的介绍了，就简单记一下数据处理的过程。
任务上写着只把CRIM，RM，LSTAT作为特征，MEDV作为结果来进行预测，那么首先载入数据集

#Load the Boston Housing Data Set from sklearn.datasets and print it
from sklearn.datasets import load_boston
boston = load_boston()
print(boston)

然后把三列特征摘出来

df = pd.DataFrame(boston.data, columns = boston.feature_names)
df_x = df[['CRIM', 'RM','LSTAT']]
df_y = pd.DataFrame(boston.target)
print(df_x)

可以看到，特征数据是这样分布的：


        CRIM     RM  LSTAT
0    0.00632  6.575   4.98
1    0.02731  6.421   9.14
2    0.02729  7.185   4.03
3    0.03237  6.998   2.94
4    0.06905  7.147   5.33
..       ...    ...    ...
501  0.06263  6.593   9.67
502  0.04527  6.120   9.08
503  0.06076  6.976   5.64
504  0.10959  6.794   6.48
505  0.04741  6.030   7.88

然后我们按照要求把训练及测试集合73分

#Initialize the linear regression model
reg = linear_model.LinearRegression()
#Split the data into 70% training and 30% testing data
#NOTE: We have to split the dependent variables (x) and the target or independent variable (y)
x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=0.3, random_state=42)
#Train our model with the training data
reg.fit(x_train, y_train)

最后在预测指标的时候遇到了一些问题，就是pred和test数据怎么都报错
先看一下怎么算的：

error = []
#print(y_pred.shape)
pre = y_pred[:,0]
test = y_test.values
test = test[:,0]

p = pre.tolist()
t = test.tolist()

for i in range(len(y_pred)):
    error.append(t[i] - p[i])

看一下pred的数据分布：

(152, 1)
[[25.95010416]
 [31.04855815]
 [18.2497264 ]
 [26.3019832 ]
 [19.72042823]
......
 [16.17338714]
 [40.34645114]
 [21.0067563 ]
 [19.39310502]]

可以看到这是152行1列的数据，所以我们把第一列的数据全部拿出来计算就行，因此pre = y_pred[:,0]。至于test，如果按照同样的处理方式会发现报错，
在这里插入图片描述
因此先转换成ndarray，然后再取第一列。

之后进行for循环的时候，发现还是报错，在用这样的MAE计算方式时，需要将ndarray转换成list，再加一个数据转换tolist()，然后就可以了。

下面就是输出MAE和MSE

squaredError = []
absError = []
for val in error:
    squaredError.append(val * val)#target-prediction之差平方 
    absError.append(abs(val))#误差绝对值
 
 
#print("Square Error: ", squaredError)
#print("Absolute Value of Error: ", absError)
 
print("MAE = ", sum(absError) / len(absError))#平均绝对误差MAE
print("MSE = ", sum(squaredError) / len(squaredError))#均方误差MSE

全部代码贴到下面：

import pandas as pd
import numpy as np
from sklearn import linear_model
from sklearn.model_selection import train_test_split

#Load the Boston Housing Data Set from sklearn.datasets and print it
from sklearn.datasets import load_boston
boston = load_boston()
print(boston)

#Transform the data set into a data frame 
#NOTE: boston.data = the data we want, 
#      boston.feature_names = the column names of the data
#      boston.target = Our target variable or the price of the houses
df = pd.DataFrame(boston.data, columns = boston.feature_names)
df_x = df[['CRIM', 'RM','LSTAT']]
df_y = pd.DataFrame(boston.target)
print(df_x)

#Initialize the linear regression model
reg = linear_model.LinearRegression()
#Split the data into 70% training and 30% testing data
#NOTE: We have to split the dependent variables (x) and the target or independent variable (y)
x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=0.3, random_state=42)
#Train our model with the training data
reg.fit(x_train, y_train)

#print our price predictions on our test data
y_pred = reg.predict(x_test)

mse = np.sum((y_test - y_pred) ** 2) / len(y_test)
mae = np.sum(np.absolute(y_test - y_pred)) / len(y_test)
print(mae,mae)

error = []
print(y_pred.shape)
print(y_pred)
pre = y_pred[:,0]
test = y_test.values
test = test[:,0]

p = pre.tolist()
t = test.tolist()

for i in range(len(y_pred)):
    error.append(t[i] - p[i])
squaredError = []
absError = []
for val in error:
    squaredError.append(val * val)#target-prediction之差平方 
    absError.append(abs(val))#误差绝对值
 
 
#print("Square Error: ", squaredError)
#print("Absolute Value of Error: ", absError)
 
print("MAE = ", sum(absError) / len(absError))#平均绝对误差MAE
print("MSE = ", sum(squaredError) / len(squaredError))#均方误差MSE

#MAE =  4.111995393754926
#MSE =  29.975964330767486

Inko_yy

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Boston房价预测 bug记录

由于网上有很多关于这个的介绍了，就简单记一下数据处理的过程。任务上写着只把CRIM，RM，LSTAT作为特征，MEDV作为结果来进行预测，那么首先载入数据集#Load the Boston Housing Data Set from sklearn.datasets and print itfrom sklearn.datasets import load_bostonboston = load_boston()print(boston)然后把三列特征摘出来df = pd.DataFram.
复制链接

扫一扫