论文阅读：Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

最新推荐文章于 2022-03-23 10:37:03 发布

yaoling-xumi13

最新推荐文章于 2022-03-23 10:37:03 发布

阅读量1.3k

点赞数 1

分类专栏：深度学习论文阅读文章标签：论文

本文链接：https://blog.csdn.net/xumi13/article/details/99711104

版权

深度学习论文阅读专栏收录该内容

6 篇文章 0 订阅

订阅专栏

论文阅读：Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

论文地址： https://arxiv.org/pdf/1706.02677

背景

1）larger networks and larger datasets need longer time for training;

解决方法:分布式同步SGD，将SGD minibatches分配给并行工作站。
（distributed synchronous SGD offers a potential solution to the problem by dividing SGD minibatches over a pool of parallel workers.）

2）large minibatches’s main issue is: optimization difficultly rather than poor generalization.

解决方法:Authors provide some strategies

use large minibatch in place of small minibatches while maintaining training and generalization accuracy.

但是，用large minibatch会使得optimization difficultly，针对这个问题，作者提出一些tips.

Large minibatch SGD

首先，回忆一下SGD的公式：

1）损失函数：
$L(w)=\frac{1}{X}\sum_{x \in X} l(x,w)$

$X$ 为全部的训练集， $w$ 为权值， $l (x, w)$ 为单个sample $x$ 的损失函数
2）权值更新：
$w_{ t+1 }=w_{ t } - \eta \frac{1}{n} \sum_{x \in B } \Delta l(x,w_{t})$
$B$ 为minibatch sample， $n$ 为 $\left | B \right |$ ， $\eta$ 为学习率， $t$ 为迭代index.