>数据的选择
从Scikit-learn的数据集里载入波士顿的房价数据:
from sklearn import datasets
boston = datasetd.load_boston()
波士顿数据集是一个具有13个特征的常见线性数据集,也是NG网课里的第一个例子。我们可以打印其描述文档来获取其各项属性:
print boston.DESCR
结果如下:
Data Set Characteristics:
:Number of Instances: 506
:Number of Attributes: 13 numeric/categorical predictive
:Median Value (attribute 14) is usually the target
:Attribute Information (in order):
- CRIM per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX