波士顿房价问题
波士顿的房价为什么那么贵呢? 可以用什么方法预测房价呢? 这里的波士顿房价显然是连续变量,所以这个问题我们可以用回归来尝试解决。
今天我们就从最简单的线性回归来入门吧
数据加载与模型训练
波士顿房价的数据源Kaggle上也是免费开放的,Python的sklearn 数据集也可以直接load
from sklearn.datasets import load_boston
boston_dataset = load_boston()
我们一起来看一下这个数据集
print(boston_dataset.keys())
dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])
print(boston_dataset.DESCR)
. _boston_dataset:
Boston house prices dataset
---------------------------
**Data Set Characteristics:**
:Number of Instances: 506
:Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.
:Attribute Information (in order):
- CRIM per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX nitric oxides concentration (parts per 10 million)
- RM average number of rooms per dwelling
- AGE proportion of owner-occupied units built prior to 1940
- DIS weighted distances to five Boston employment centres
- RAD index of accessibility to radial highways
- TAX full-value property-tax rate per $10,000
- PTRATIO pupil-teacher ratio by town
- B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town