2020-12-06

We need to report testset performance

Why validation set is not enough and we need a seperate test set?

We don’t strictly need a test set, it might be okay if some cases to have training and validation sets onluy, the purpose of the test set is to help us get an unbiased estimate of the generazation performance especially when we have a lot of hyper-parameters to tune, there might be a risk of overfitting to the validation set although the model never sees the validation set, we do and we tune the knobs to reconfigure our model accordingly. When we have a lot of knobs to tune , the model might end up being overly tuned to perform well on the validation set, yet do not generaliza well to truely unseen data. That’s why it might be beneficial to have a seperate test set to use once we are done with configuring and training our model.

Partition the data: (training set , validation, test set)

Partitioned in an unbiased way, for an unbiased split, randomly shuffling the data before partitioning is usually good enough . If the distribution of the labels in the dataset is heavily imbalanced, you might want to do stratified sampling, to make sure that you have the representative samples of all samples in all sets. Another caveat is that some datasets might have duplicate samples. So we need to make sure that the partitions are disjoint and we don’t use the same samples for both training and evaluation.

It’s not uncommon to train the model on one source and test the model on another source which usually consists of more challenging samples. It might also be practical in some cases to train a model on loosely labelled large volume data such as images crawled from the wen and test it on a small dataset with better quality labels, such as images manually annotated by humans. One thing to keep in mind is to choose validation and test sets to reflect the type of data you expect your model to receive in the future .

Error Summary

Training lossValidation Loss
HighHighUnderfitting: since the model is not even fittinf the training set well, we want to increase capacity by using larger model or training the model for longer
LowHighOverfitting: Our model migt be memorizing the training samples yet learning nothing meaningful. To curb overfitting, we can try shrinking our model size or use regularization techniques.
LowLowWe can use the test set to get an unbiased estimate of the generazation performance (metrics: accuracy, percision and recall)
HighLowUnlikely: debug (same dataset?)
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值