2020-12-06

jerry173985

于 2020-12-07 06:38:44 发布

阅读量192

点赞数

分类专栏： Pytorch

本文链接：https://blog.csdn.net/jerry173985/article/details/110790474

版权

Pytorch 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

We need to report testset performance

Why validation set is not enough and we need a seperate test set?

We don’t strictly need a test set, it might be okay if some cases to have training and validation sets onluy, the purpose of the test set is to help us get an unbiased estimate of the generazation performance especially when we have a lot of hyper-parameters to tune, there might be a risk of overfitting to the validation set although the model never sees the validation set, we do and we tune the knobs to reconfigure our model accordingly. When we have a lot of knobs to tune , the model might end up being overly tuned to perform well on the validation set, yet do not generaliza well to truely unseen data. That’s why it might be beneficial to have a seperate test set to use once we are done with configuring and training our model.

Partition the data: （training set , validation, test set）

Partitioned in an unbiased way, for an unbiased split, randomly shuffling the data before partitioning is usually good enough . If the distribution of the labels in the dataset is heavily imbalanced, you might want to do stratified sampling, to make sure that you have the representative samples of all samples in all sets. Another caveat is that some datasets might have duplicate samples. So we need to make sure that the partitions are disjoint and we don’t use the same samples for both training and evaluation.

It’s not uncommon to train the model on one source and test the model on another source which usually consists of more challenging samples. It might also be practical in some cases to train a model on loosely labelled large volume data such as images crawled from the wen and test it on a small dataset with better quality labels, such as images manually annotated by humans. One thing to keep in mind is to choose validation and test sets to reflect the type of data you expect your model to receive in the future .

Error Summary

Training loss	Validation Loss
High	High	Underfitting: since the model is not even fittinf the training set well, we want to increase capacity by using larger model or training the model for longer
Low	High	Overfitting: Our model migt be memorizing the training samples yet learning nothing meaningful. To curb overfitting, we can try shrinking our model size or use regularization techniques.
Low	Low	We can use the test set to get an unbiased estimate of the generazation performance (metrics: accuracy, percision and recall)
High	Low	Unlikely: debug (same dataset?)

jerry173985

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
2020-12-06

We need to report testset performanceWhy validation set is not enough and we need a seperate test set?We don’t strictly need a test set, it might be okay if some cases to have training and validation sets onluy, the purpose of the test set is to help us
复制链接

扫一扫