people often talk about three datasets: (三种数据集)
The training data:
the training data is used by one or more
learning schemes to
come up with classifiers. (训练集:使用训练集来构造分类器)
The validation data:
the validation data is used to optimize parameters of those
classifier or to select a
particular one
(验证集:优化分类器的参数或者选择特定的分类器)
The test data:Then the test data is used to calculate
the error rate of the final, optimized, method.
(检验集:检查最终所得到的分类器的错误率)
Each of the three sets must be chosen independently:
The validation set must be different from the training set to obtain
good performance in the optimization or selection stage, and the
test set must be different from both to obtain a reliable estimate of
the true error rate.(三个集合往往相互独立)