波士顿房价预测

最新推荐文章于 2023-09-13 22:59:50 发布

嗯呢嗯呢

最新推荐文章于 2023-09-13 22:59:50 发布

阅读量253

点赞数

分类专栏：机器学习文章标签： sklearn 机器学习 python

本文链接：https://blog.csdn.net/ZZhangYajuan/article/details/120282208

版权

机器学习专栏收录该内容

12 篇文章 0 订阅

订阅专栏

数据下载地址：

https://archive.ics.uci.edu/ml/machine-learning-databases/housing/

使用sklearn.datasets.load_boston可加载相关数据

数据描述如下：

CRIM：城镇人均犯罪率。

ZN：住宅用地超过 25000 sq.ft. 的比例。

INDUS：城镇非零售商用土地的比例。

CHAS：查理斯河空变量（如果边界是河流，则为1；否则为0）。

NOX：一氧化氮浓度。

RM：住宅平均房间数。

AGE：1940 年之前建成的自用房屋比例。

DIS：到波士顿五个中心区域的加权距离。

RAD：辐射性公路的接近指数。

TAX：每 10000 美元的全值财产税率。

PTRATIO：城镇师生比例。

B：1000（Bk-0.63）^ 2，其中 Bk 指代城镇中黑人的比例。

LSTAT：人口中地位低下者的比例。

MEDV：自住房的平均房价，以千美元计。

代码实现

from sklearn.datasets import load_boston
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LassoCV #LassoCV用于实现Lasso的交叉验证，通常用于求解最佳参数
import seaborn as sns #可视化工具
house  = load_boston()
print(house.DESCR)  #查看数据描述

.. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's

    :Missing Attribute Values: None

    :Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset.
Index of /ml/machine-learning-databases/housing


This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980.   N.B. Various transformations are used in the table on
pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression
problems.   
     
.. topic:: References

   - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
   - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.

x = house.data #取数据
y = house.target #取标签
df = pd.DataFrame(x,columns=house.feature_names) #构造表格型数据
df['Target'] = pd.DataFrame(y,columns=['Target'])
df.head() #打印前五行

CRIM	ZN	INDUS	CHAS	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT	Target
0	0.00632	18.0	2.31	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	396.90	4.98	24.0
1	0.02731	0.0	7.07	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	396.90	9.14	21.6
2	0.02729	0.0	7.07	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	392.83	4.03	34.7
3	0.03237	0.0	2.18	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	394.63	2.94	33.4
4	0.06905	0.0	2.18	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	396.90	5.33	36.2

#画热力图，数值为两个变量之间的相关系数
plt.figure(figsize=(15,15)) #创建自定义图像
p = sns.heatmap(df.corr(),annot=True,square=True) #画热力图
#data.corr()表示了data中的两个变量之间的相关性，取值范围为[-1,1],取值接近-1，表示反相关，类似反比例函数，取值接近1，表正相关。
#sns.heatmap参数介绍 https://www.cntofu.com/book/172/docs/30.md

#数据标准化 训练样本用fit_transform，而测试样本用transform
# X_train = ss.fit_transform(X_train)
# X_test = ss.transform(X_test)
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
X = ss.fit_transform(x) #归一化数据
print(X[:4])

[[-0.41978194  0.28482986 -1.2879095  -0.27259857 -0.14421743  0.41367189
  -0.12001342  0.1402136  -0.98284286 -0.66660821 -1.45900038  0.44105193
  -1.0755623 ]
 [-0.41733926 -0.48772236 -0.59338101 -0.27259857 -0.74026221  0.19427445
   0.36716642  0.55715988 -0.8678825  -0.98732948 -0.30309415  0.44105193
  -0.49243937]
 [-0.41734159 -0.48772236 -0.59338101 -0.27259857 -0.74026221  1.28271368
  -0.26581176  0.55715988 -0.8678825  -0.98732948 -0.30309415  0.39642699
  -1.2087274 ]
 [-0.41675042 -0.48772236 -1.30687771 -0.27259857 -0.83528384  1.01630251
  -0.80988851  1.07773662 -0.75292215 -1.10611514  0.1130321   0.41616284
  -1.36151682]]

#切分数据集
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3)
#创建模型
model = LassoCV()
model.fit(x_train,y_train)

#lasso系数
print(model.alpha_)

0.75101819879345

#相关系数
print(model.coef_)

[-0.06338152  0.05225382 -0.          0.         -0.          1.81419358
  0.01723948 -0.78293398  0.31906904 -0.01794047 -0.78602545  0.01093303
 -0.68469353]

#决定系数
model.score(x_test,y_test)

0.6687838293074235

嗯呢嗯呢

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
波士顿房价预测

下载地址：http://t.cn/RfHTAgY使用sklearn.datasets.load_boston可加载相关数据数据描述如下：CRIM：城镇人均犯罪率。ZN：住宅用地超过 25000 sq.ft. 的比例。INDUS：城镇非零售商用土地的比例。CHAS：查理斯河空变量（如果边界是河流，则为1；否则为0）。NOX：一氧化氮浓度。RM：住宅平均房间数。AGE：1940 年之前建成的自用房屋比例。DIS：到波士顿五个中心区域的加权距离。RAD：辐射性公路.
复制链接

扫一扫