why we need validation set?

最新推荐文章于 2024-08-07 00:12:34 发布

陈蒙_

最新推荐文章于 2024-08-07 00:12:34 发布

阅读量272

点赞数

分类专栏： tensorflow

本文为博主原创文章，欢迎转载，请保留原文出处。

本文链接：https://blog.csdn.net/zhaizu/article/details/103135501

版权

tensorflow 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

典型的数据分类是训练集-测试集。这种模式存在的问题是，在反复的调参中，比如 learning rate 等，我们会反复使用测试集，导致模型过于在意某些特征，从而产生过拟合等问题，丧失一般性。

In the figure, “Tweak model” means adjusting anything about the model you can dream up—from changing the learning rate, to adding or removing features, to designing a completely new model from scratch. At the end of this workflow, you pick the model that does best on the test set.

所以我们需要验证集，将上述模式改为训练集-验证集-测试集，训练和验证都在前两个集合上完成，最后用测试集做性能测试。

Test sets and validation sets “wear out” with repeated use. That is, the more you use the same data to make decisions about hyperparameter settings or other model improvements, the less confidence you’ll have that these results actually generalize to new, unseen data.

If possible, it’s a good idea to collect more data to “refresh” the test set and validation set. Starting anew is a great reset.