预测房价:
数据集介绍
boston_dataset:
Boston house prices dataset
Data Set Characteristics:
:Number of Instances: 506
:Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.
:Attribute Information (in order):
- CRIM per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX nitric oxides concentration (parts per 10 million)
- RM average number of rooms per dwelling
- AGE proportion of owner-occupied units built prior to 1940
- DIS weighted distances to five Boston employment centres
- RAD index of accessibility to radial highways
- TAX full-value property-tax rate per $10,000
- PTRATIO pupil-teacher ratio by town
- B 1000(Bk - 0.63)^2 where Bk is the proportion of black people by town
- LSTAT % lower status of the population
- MEDV Median value of owner-occupied homes in $1000's
:Missing Attribute Values: None
:Creator: Harrison, D. and Rubinfeld, D.L.
This is a copy of UCI ML housing dataset.
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.
The Boston house-price data of Harrison, D. and Rubinfeld, D.L. ‘Hedonic
prices and the demand for clean air’, J. Environ. Economics & Management,
vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, ‘Regression diagnostics
…’, Wiley, 1980. N.B. Various transformations are used in the table on
pages 244-261 of the latter.
The Boston house-price data has been used in many machine learning papers that address regression
problems.
… topic:: References
- Belsley, Kuh & Welsch, ‘Regression diagnostics: Identifying Influential Data and Sources of Collinearity’, Wiley, 1980. 244-261.
- Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
tensorflow 代码
# -*- coding: utf-8 -*-
"""
Created on Sun Nov 21 17:48:53 2021
@MysteriousKnight: 23608
@Email: xingchenziyi@163.com
"""
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import tensorflow as tf
data = load_boston()
x_train,x_test,y_train,y_test = train_test_split(data["data"], data.target, test_size=0.5, random_state = 50)
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, input_dim=x_train.shape[1], activation="relu"),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(256, activation="relu"),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(64, activation="relu"),
tf.keras.layers.Dense(1)
])
model.compile(
optimizer="adam",
loss="mse"
)
model.summary()
history = model.fit(
x_train,
y_train,
epochs=1000,
batch_size=512,
validation_data=(x_test,y_test)
)
test_pre = model.predict(x_test)
print("y1 mse:%.4f" % np.mean(np.square((test_pre - y_test))))
loss = history.history['loss']
val_loss = history.history['val_loss']
# 绘制loss图像
plt.plot(loss, lw=3, label='loss')
plt.plot(val_loss, lw=3, label='val_loss')
plt.legend()
plt.show()
loss曲线
误差
y1 mse:590.4769
对比测试集数据和预测数据:
结论
从图中就能看出,随着迭代次数的增加,模型出现严重过拟合现象,训练集和测试集的数据集我划分为五五开,但因为模型的数据量太少了,总是会出现过拟合现象,想要缓解过拟合,唯有增大数据集,或者尝试调整模型的网络结构或者参数,也许能训练出不错的模型?数据集只有500多行,完全不够看。