读书笔记
Chapter2 端到端机器学习实例
Select a performance measure
RMSE
回归任务的一个典型performance measure是Root Mean Square Error(RMSE,均方根误差)
RMSE(X,h)是使用假设h在数据X上测得的均方根误差
MAE
Mean absolute error(MAE,平均绝对误差)
Get the data
使用pandas获取数据
import pandas as pd
#This function returns a pandas DataFrame object containing all the data
def load_housing_data(housing_path=HOUSING_PATH):
csv_path = os.path.join(housing_path, "housing.csv")
return pd.read_csv(csv_path)
快速浏览数据结构
housing = load_housing_data()
housing.head()
info()方法可以快速浏览数据的结构(行数,列数,每个属性的值的类型)
housing.info()
查看单个属性可能的取值范围和实例数value_counts()
housing["ocean_proximity"].value_counts()
查看数据全貌 describe()
housing.describe()
创建一个测试集
scikit-learn提供了一些方法来得到训练集和测试集,
from sklearn.model_selection import train_test_split
train_set, test_set = train_test_split(housing, test_size=0.2, random_state=42)
test_set.head()