Measure Your Model Validation

1---Model Validation-MAE

I will use model validation to measure the quality of my model. Measuring model quality is the key to iteratively improving your models.

In most applications, the relevant measure of model quality is predictive accuracy. But be careful, I can’t make predictions with my training data and compare those predictions to the target values in the training data when I measuring predictive accuracy. Because it will mix of good and bad predictions and look through them would be pointless.

So you'd first need to summarize this into a single metric.

Here I’ll start with one called Mean Absolute Error (also called MAE).(Metric评价指标及损失函数-Error系列之平均绝对误差MAE)

MAE converts each error to a positive number. We then take the average of those absolute errors. This is our measure of model quality.

2---How to calculate MAE

To calculate MAE, we first need a model.

import pandas as pd

melbourne_file_path = '/Users/mac/Desktop/melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path) 
# Filter rows with missing price values
filtered_melbourne_data = melbourne_data.dropna(axis=0)
# Choose target and features
y = filtered_melbourne_data.Price
melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 'BuildingArea', 
                        'YearBuilt', 'Lattitude', 'Longtitude']
X = filtered_melbourne_data[melbourne_features]

from sklearn.tree import DecisionTreeRegressor
# Define model
melbourne_model = DecisionTreeRegressor()
# Fit model
melbourne_model.fit(X, y)
#Calculate MAE
from sklearn.metrics import mean_absolute_error

predicted_home_prices = melbourne_model.predict(X)
mean_absolute_error(y, predicted_home_prices)
runcell(0, '/Users/mac/Desktop/untitled8.py')
Out[1]: DecisionTreeRegressor()

runcell(0, '/Users/mac/Desktop/untitled8.py')
Out[2]: 434.71594577146544

3---The problem with "In-Sample" Scores

The value we just calculated can be called an “in-sample”score.
In the large real estate market, door🚪color is related to home🏠 price. In the sample of data we used to build the model, all homes with green🍏 doors were very expensive. So the model’s job is to find an accurate patterns that predict home prices.

Since this pattern was derived from the training data, the model will appear accurate in the training data. But when the model encounters new data, are the predictions still accurate?🤷‍♀️ If not, then the predictions of the model in practice will not be accurate.

Since model’s practical value come from predicting data that the model has never seen before. So we need to measure the performance of data that has never been modeled before. The most straightforward way to do this is to exclude some common data and keeping only data that has never been seen before. The database that remains is called validation data.

4---"train_test_split"

The scikit-learn library has a function “train_test_split” to break up the data into two pieces:1. Training data-fit the model;2.Validation data-calculate MAE.

We need to give the random.seed()(=the random_state)a value to make sure that we get the same split every time we run the script.

from sklearn.model_selection import train_test_split

#Split data into training and validation data
#The split is based on a random number generator. 
#Supplying a numeric value to the random_state argument 
#guarantees we get the same split every time we run this script.
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state = 0)
#Define model
melbourne_model = DecisionTreeRegressor()
#Fit model
melbourne_model.fit(train_X, train_y)

#get predicted prices on validation data
val_predictions = melbourne_model.predict(val_X)
print(mean_absolute_error(val_y, val_predictions))
runcell(0, '/Users/mac/untitled17.py')
261764.75790832794

5---Analysis

My MAE for the "in-sample" data was about 500 dollars.

Out-of-sample it is more than 250,000 dollars.

As a point of reference, the average home value in the validation data is 1.1 million dollars. So the error in new data is about a quarter of the average home value.

  • 5
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值