Large-batch-size与模型泛化性的问题

最新推荐文章于 2023-08-28 23:00:00 发布

gyl2016

最新推荐文章于 2023-08-28 23:00:00 发布

阅读量1.1k

点赞数

分类专栏：深度学习机器学习文章标签：深度学习算法机器学习

本文链接：https://blog.csdn.net/NOT_GUY/article/details/122285116

版权

大型批量大小在理论上能降低训练中的数据方差，提高训练速度，但实验证明，这可能导致模型泛化性能下降。研究指出，小型批量方法有更强的探索性，能收敛到具有更好泛化的平坦极小值，而大型批量方法易收敛到尖锐极小值。为缓解这一问题，可以采用预热训练、数据增强、保守训练、对抗训练和适当的学习率策略，如周期性学习率调整和学习率重启等。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

paper—《Bag of Tricks for Image Classification with Convolutional Neural Networks》中提到

“Using large batch size, however, may slow down the training progress. For convex problems, convergence rate decreases as batch size increases. Similar empirical results have been reported for neural networks [25]. In other words, for the same number of epochs, training with a large batch size results in a model with degraded validation accuracy compared to the ones trained with smaller batch sizes”

针对这句话，有个问题：