机器学习【1.预测房价】

预测房价:

数据集介绍

boston_dataset:

Boston house prices dataset

Data Set Characteristics:

:Number of Instances: 506 

:Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

:Attribute Information (in order):
    - CRIM     per capita crime rate by town
    - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
    - INDUS    proportion of non-retail business acres per town
    - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
    - NOX      nitric oxides concentration (parts per 10 million)
    - RM       average number of rooms per dwelling
    - AGE      proportion of owner-occupied units built prior to 1940
    - DIS      weighted distances to five Boston employment centres
    - RAD      index of accessibility to radial highways
    - TAX      full-value property-tax rate per $10,000
    - PTRATIO  pupil-teacher ratio by town
    - B        1000(Bk - 0.63)^2 where Bk is the proportion of black people by town
    - LSTAT    % lower status of the population
    - MEDV     Median value of owner-occupied homes in $1000's

:Missing Attribute Values: None

:Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset.
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/

This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. ‘Hedonic
prices and the demand for clean air’, J. Environ. Economics & Management,
vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, ‘Regression diagnostics
…’, Wiley, 1980. N.B. Various transformations are used in the table on
pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression
problems.

… topic:: References

  • Belsley, Kuh & Welsch, ‘Regression diagnostics: Identifying Influential Data and Sources of Collinearity’, Wiley, 1980. 244-261.
  • Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.

tensorflow 代码

# -*- coding: utf-8 -*-
"""
Created on Sun Nov 21 17:48:53 2021

@MysteriousKnight: 23608
@Email: xingchenziyi@163.com   
 

"""

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import tensorflow as tf

data = load_boston()

x_train,x_test,y_train,y_test = train_test_split(data["data"], data.target, test_size=0.5, random_state = 50)

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, input_dim=x_train.shape[1], activation="relu"),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dense(256, activation="relu"),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dense(64, activation="relu"),
    tf.keras.layers.Dense(1)
])

model.compile(
    optimizer="adam",
    loss="mse"
    )


model.summary()

history = model.fit(
    x_train,
    y_train,
    epochs=1000,
    batch_size=512,
    validation_data=(x_test,y_test)
    
)

test_pre = model.predict(x_test)

print("y1 mse:%.4f" % np.mean(np.square((test_pre - y_test))))

loss = history.history['loss']
val_loss = history.history['val_loss']

#  绘制loss图像
plt.plot(loss, lw=3, label='loss')
plt.plot(val_loss, lw=3, label='val_loss')
plt.legend()
plt.show()

loss曲线

在这里插入图片描述

误差

y1 mse:590.4769

对比测试集数据和预测数据:

在这里插入图片描述

结论

从图中就能看出,随着迭代次数的增加,模型出现严重过拟合现象,训练集和测试集的数据集我划分为五五开,但因为模型的数据量太少了,总是会出现过拟合现象,想要缓解过拟合,唯有增大数据集,或者尝试调整模型的网络结构或者参数,也许能训练出不错的模型?数据集只有500多行,完全不够看。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值