2.7 Adam

最新推荐文章于 2021-01-18 18:01:13 发布

布纸所云

最新推荐文章于 2021-01-18 18:01:13 发布

阅读量159

点赞数

分类专栏：深度学习

本文链接：https://blog.csdn.net/XindiOntheWay/article/details/82258797

版权

深度学习专栏收录该内容

22 篇文章 0 订阅

订阅专栏

这里写图片描述

这一篇写得特别详细：
深度学习优化算法解析(Momentum, RMSProp, Adam)
Adam（Adaptive Moment Estimation）
初始化：

v d W = 0, v d b = 0, S d W = 0, S d b = 0

$v_{dW}=0, v_{db}=0, S_{dW}=0, S_{db}=0$

On iteration t:
compute $dW, db$ using mini batch

M o m e n t u m : v d W = β 1 v d W + (1 - β 1) d W v d b = β 1 v d b + (1 - β 1) d b R M S p r o p : S d W = β 2 S d W + (1 - β 2) d W 2 S d b = β 2 S d b + (1 - β 2) d b 2 B i a s c o r r e c t i o n : v c o r r e c t e d d W = v d W 1 - β 2 1 v c o r r e c t e d d b = v d b 1 - β 2 1 S c o r r e c t e d d W = S d W 1 - β 2 2 S c o r r e c t e d d b = S d b 1 - β 2 2 C o m p u t a t i o n : W = W - α v d W S d W - - - - \sqrt b = b - α v d b S d b - - - \sqrt

$\begin{align*} &Momentum: \\ &v_{dW}=\beta_1 v_{dW}+(1-\beta_1) dW \quad v_{db}=\beta_1 v_{db}+(1-\beta_1)db\\ &RMSprop: \\ & S_{dW}=\beta_2S_{dW}+(1-\beta_2)dW^2 \quad S_{db}=\beta_2 S_{db}+(1-\beta_2)db^2\\ &\\ & Bias \quad correction:\\ & v_{dW}^{corrected}=\frac{v_{dW}}{1-\beta_1^2} \quad v_{db}^{corrected}=\frac{v_{db}}{1-\beta_1^2}\\ & S_{dW}^{corrected}=\frac{S_{dW}}{1-\beta_2^2} \quad S_{db}^{corrected}=\frac{S_{db}}{1-\beta_2^2}\\ &\\ & Computation:\\ & W=W-\alpha \frac{v_{dW}}{\sqrt{S_{dW}}}\\ & b=b-\alpha \frac{v_{db}}{\sqrt{S_{db}}} \end{align*}$

这里写图片描述
When implementing Adam, what people usually do is just use the default value. So, $\beta_1$ and $\beta_2$ as well as $\epsilon$ . I don’t think anyone ever really tunes Epsilon. And then, try a range of values of Alpha to see what works best.

So, where does the term ‘Adam’ come from?

Adam stands for Adaptive Moment Estimation. So $\beta_1$ is computing the mean of the derivatives. This is called the first moment. And $\beta_2$ is used to compute exponentially weighted average of the squares and that’s called the second moment. So that gives rise to the name adaptive moment estimation.

布纸所云

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
2.7 Adam

这一篇写得特别详细：深度学习优化算法解析(Momentum, RMSProp, Adam) Adam（Adaptive Moment Estimation）初始化： vdW=0,vdb=0,SdW=0,Sdb=0vdW=0,vdb=0,SdW=0,Sdb=0v_{dW}=0, v_{db}=0, S_{dW}=0, S_{db}=0On iteration t: comp...
复制链接

扫一扫