阅读了Yang You等的系列论文(现在才发现其实以下四篇实际是两篇。。)
1. Scaling SGD Batch Size to 32K for ImageNet Training. https://arxiv.org/abs/1708.03888v1
2. Large Batch Training of Convolutional Networks. https://arxiv.org/abs/1708.03888v3
3. 100-epoch ImageNet Training with AlexNet in 24 Minutes. https://arxiv.org/abs/1709.05011v1
4. ImageNet training in minutes. https://arxiv.org/abs/1709.05011v10
加速大型卷积网络训练的一种常见方法是增加计算单元, 随着节点数量的增加,batch size增长。 但是,大batch size训练通常会导致模型精度低。 我们认为目前大批量训练(线性学习速率缩放与预热,在Facebook的文章中有描述:Accurate, Large Min