Large-batch-size与模型泛化性的问题

paper—《Bag of Tricks for Image Classification with Convolutional Neural Networks》中提到

“Using large batch size, however, may slow down the training progress. For convex problems, convergence rate decreases as batch size increases. Similar empirical results have been reported for neural networks [25]. In other words, for the same number of epochs, training with a large batch size results in a model with degraded validation accuracy compared to the ones trained with smaller batch sizes

针对这句话,有个问题:

从理论上来讲,batch size increases能够使得训练中数据的方差更小,即更加不易受小样本更新时噪声的影响,其训练速度会更快,那为什么最后会导致泛化性能下降?

带着这个问题,找到了这篇paper—《ON LARGE-BATCH TRAINING FOR DEEP LEARNING: GENERALIZATION GAP AND SHARP MINIMA》,其中提到两点来解释这个现象,并给出了实验来支撑:

(1)LB(large-batch) methods lac

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值