why we need validation set?

典型的数据分类是 训练集-测试集。这种模式存在的问题是,在反复的调参中,比如 learning rate 等,我们会反复使用测试集,导致模型过于在意某些特征,从而产生过拟合等问题,丧失一般性。

In the figure, “Tweak model” means adjusting anything about the model you can dream up—from changing the learning rate, to adding or removing features, to designing a completely new model from scratch. At the end of this workflow, you pick the model that does best on the test set.

所以我们需要验证集,将上述模式改为 训练集-验证集-测试集,训练和验证都在前两个集合上完成,最后用测试集做性能测试。

Test sets and validation sets “wear out” with repeated use. That is, the more you use the same data to make decisions about hyperparameter settings or other model improvements, the less confidence you’ll have that these results actually generalize to new, unseen data.

If possible, it’s a good idea to collect more data to “refresh” the test set and validation set. Starting anew is a great reset.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值