吴恩达关于dev / test sets的形象解释

最新推荐文章于 2024-07-30 10:36:36 发布

Riverhope

最新推荐文章于 2024-07-30 10:36:36 发布

阅读量5.1k

点赞数 3

分类专栏：结构化机器学习项目——正则化文章标签：开发集测试集

本文链接：https://blog.csdn.net/Riverhope/article/details/78695295

版权

结构化机器学习项目——正则化专栏收录该内容

3 篇文章 0 订阅

订阅专栏

dev set is also called development set or sometimes called hold out cross validation set. And worflow in machine learning is that you try lots of ideas,traning different models on the traning sets,and then use the dev sets to evaluate the diferent ideas and pick one,and keep iterating to improve dev set performance until finally you have one cost that you've happy with that you then evaluate on your test sets

Now, let's say ,by way of exmaple that you building a cat classifier.here are some regions,and if we set the dev and test as you see in the picture,it's a very bad idea,because your dev and test come from different distributions.

dev set + your single real number evaluation metric like placing a target and telling your team where's you think is the bullseye you want to aim at.because once you established the dev set and metric is that the team can iterate very quickly,try different ideas,run experiments and very quickly ues the dev set and the metric to evaluate the classifiers and pick the best one. so machine learning teams are good at shooting different arrows into target and iterating to get closer and closer to hitting the bullseye. so doing well on your dev set and metric.

And the probelm with how we set up the dev set and test sets,so you'll wasting months of work on optimizing to the dev set.and is not giving good performance on test sets. so having dev set and test sets from different distributions is like setting a target having your team spend months trying closer and closer to the bullseye, only to realize after months of work that you'er going to move the target over right.

so, to avoid this ,you should take all the data,randomly shuffles the data into dev sets and test sets ,and dev sets and test sets reallycome from the same distribution.