问:验证集和测试集的区别?
答:
参考stackoverflow的一个回答,写得很透彻。
参考链接: https://stackoverflow.com/questions/2976452/whats-is-the-difference-between-train-validation-and-test-set-in-neural-netwo
The training and validation sets are used during training.
for each epoch
for each training data instance
propagate error through the network
adjust the weights
calculate the accuracy over training data
for each validation data instance
calculate the accuracy over the validation data
if the threshold validation accuracy is met
exit training
else
continue training
Once you’re finished training, then you run against your testing set and verify that the accuracy is sufficient.
Training Set: this data set is used to adjust the weights on the neural network.
Validation Set: this data set is used to minimize overfitting. You’re not adjusting the weights of the network with this data set, you’re just verifying that any increase in accuracy over the training data set actually yields an increase in accuracy over a data set that has not been shown to the network before, or at least the network hasn’t trained on it (i.e. validation data set). If the accuracy over the training data set increases, but the accuracy over the validation data set stays the same or decreases, then you’re overfitting your neural network and you should stop training.
Testing Set: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.
问:数据增强是在数据划分之后还是之前?
答:
- 验证集和测试集不需要扩充,数据扩充指只针对训练集;
- 比例指的是对原始数据划分的比例,不考虑增强后的;
- 如果先做增强再进行数据集的划分,那么会出现信息泄露的问题,导致同一张图片增强后的多张图片分别出现在训练集和测试集(验证集)。
参考链接: https://blog.csdn.net/w18013886857/article/details/130092705