paper—《Bag of Tricks for Image Classification with Convolutional Neural Networks》中提到
“Using large batch size, however, may slow down the training progress. For convex problems, convergence rate decreases as batch size increases. Similar empirical results have been reported for neural networks [25]. In other words, for the same number of epochs, training with a large batch size results in a model with degraded validation accuracy compared to the ones trained with smaller batch sizes”
针对这句话,有个问题:
从理论上来讲,batch size increases能够使得训练中数据的方差更小,即更加不易受小样本更新时噪声的影响,其训练速度会更快,那为什么最后会导致泛化性能下降?
带着这个问题,找到了这篇paper—《ON LARGE-BATCH TRAINING FOR DEEP LEARNING: GENERALIZATION GAP AND SHARP MINIMA》,其中提到两点来解释这个现象,并给出了实验来支撑:
(1)LB(large-batch) methods lac