Training models: why it is necessary to split dataset to traninig, valiadation and test set?

最新推荐文章于 2024-08-14 17:18:35 发布

baaos058949

最新推荐文章于 2024-08-14 17:18:35 发布

阅读量246

点赞数

文章标签： python

原文链接：http://www.cnblogs.com/SongHaoran/p/8359562.html

版权

while training a model no matter what kind of task it aims to achieve, we always need to split existing dataset to three parts, namely training set, validation set and test set.

Training set is used to train your model, that being said, used to calculate the gradient and update parameters; Validation set is used to evaluate the model performance. Now you might start wondering, these two sets sound perfectly enough to do the job of training the model, you can update the model parameter and you can evaluate the model performance, why bother using the test set?

Well, the aforementioned question makes perfect sense and I will explain the importance of test set in the following part.

To quote from Fchollet's new book Deep Learning with Python and I paraphrase, while developing a model, parameter tuning , such as hyperparameter selection, is always conducted using the feedback signal from the performance of the model on the validation data. To put it in a nutshell, some information is leaked from the valiadation set, and the model could quickly overfit on the validation set. This notion is called "information leak". To avoid information leak, we need a completely different and never-seen dataset to test your model's performance.

转载于:https://www.cnblogs.com/SongHaoran/p/8359562.html

baaos058949

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Training models: why it is necessary to split dataset to traninig, valiadation and test set?

while training a model no matter what kind of task it aims to achieve, we always need to split existing dataset to three parts, namely training set, validation set and test set.Training set is ...
复制链接

扫一扫