波士顿房价预测

数据下载地址:

https://archive.ics.uci.edu/ml/machine-learning-databases/housing/

使用sklearn.datasets.load_boston可加载相关数据 

数据描述如下:

CRIM:城镇人均犯罪率。

ZN:住宅用地超过 25000 sq.ft. 的比例。

INDUS:城镇非零售商用土地的比例。

CHAS:查理斯河空变量(如果边界是河流,则为1;否则为0)。

NOX:一氧化氮浓度。

RM:住宅平均房间数。

AGE:1940 年之前建成的自用房屋比例。

DIS:到波士顿五个中心区域的加权距离。

RAD:辐射性公路的接近指数。

TAX:每 10000 美元的全值财产税率。

PTRATIO:城镇师生比例。

B:1000(Bk-0.63)^ 2,其中 Bk 指代城镇中黑人的比例。

LSTAT:人口中地位低下者的比例。

MEDV:自住房的平均房价,以千美元计。

代码实现

from sklearn.datasets import load_boston
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LassoCV #LassoCV用于实现Lasso的交叉验证,通常用于求解最佳参数
import seaborn as sns #可视化工具
house  = load_boston()
print(house.DESCR)  #查看数据描述
.. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's

    :Missing Attribute Values: None

    :Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset.
Index of /ml/machine-learning-databases/housing


This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980.   N.B. Various transformations are used in the table on
pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression
problems.   
     
.. topic:: References

   - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
   - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
x = house.data #取数据
y = house.target #取标签
df = pd.DataFrame(x,columns=house.feature_names) #构造表格型数据
df['Target'] = pd.DataFrame(y,columns=['Target'])
df.head() #打印前五行
CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATTarget
00.0063218.02.310.00.5386.57565.24.09001.0296.015.3396.904.9824.0
10.027310.07.070.00.4696.42178.94.96712.0242.017.8396.909.1421.6
20.027290.07.070.00.4697.18561.14.96712.0242.017.8392.834.0334.7
30.032370.02.180.00.4586.99845.86.06223.0222.018.7394.632.9433.4
40.069050.02.180.00.4587.14754.26.06223.0222.018.7396.905.33

36.2

#画热力图,数值为两个变量之间的相关系数
plt.figure(figsize=(15,15)) #创建自定义图像
p = sns.heatmap(df.corr(),annot=True,square=True) #画热力图
#data.corr()表示了data中的两个变量之间的相关性,取值范围为[-1,1],取值接近-1,表示反相关,类似反比例函数,取值接近1,表正相关。
#sns.heatmap参数介绍 https://www.cntofu.com/book/172/docs/30.md

#数据标准化 训练样本用fit_transform,而测试样本用transform
# X_train = ss.fit_transform(X_train)
# X_test = ss.transform(X_test)
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
X = ss.fit_transform(x) #归一化数据
print(X[:4])
[[-0.41978194  0.28482986 -1.2879095  -0.27259857 -0.14421743  0.41367189
  -0.12001342  0.1402136  -0.98284286 -0.66660821 -1.45900038  0.44105193
  -1.0755623 ]
 [-0.41733926 -0.48772236 -0.59338101 -0.27259857 -0.74026221  0.19427445
   0.36716642  0.55715988 -0.8678825  -0.98732948 -0.30309415  0.44105193
  -0.49243937]
 [-0.41734159 -0.48772236 -0.59338101 -0.27259857 -0.74026221  1.28271368
  -0.26581176  0.55715988 -0.8678825  -0.98732948 -0.30309415  0.39642699
  -1.2087274 ]
 [-0.41675042 -0.48772236 -1.30687771 -0.27259857 -0.83528384  1.01630251
  -0.80988851  1.07773662 -0.75292215 -1.10611514  0.1130321   0.41616284
  -1.36151682]]
#切分数据集
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3)
#创建模型
model = LassoCV()
model.fit(x_train,y_train)

#lasso系数
print(model.alpha_)
0.75101819879345
#相关系数
print(model.coef_)
[-0.06338152  0.05225382 -0.          0.         -0.          1.81419358
  0.01723948 -0.78293398  0.31906904 -0.01794047 -0.78602545  0.01093303
 -0.68469353]
#决定系数
model.score(x_test,y_test)
0.6687838293074235
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

嗯呢嗯呢

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值