神经网络5种优化器

待优化参数 w w w,损失函数 l o s s loss loss,学习率 l r lr lr,每次迭代一个batch, t t t表示当前batch迭代的总次数

  1. 计算t时刻损失函数关于当前参数的梯度 g t = ▽ l o s s = ∂ l o s s ∂ w t g_t=▽loss=\frac{\partial loss}{\partial w_t} gt=loss=wtloss
  2. 计算t时刻一阶动量 m t m_t mt和二阶动量 V t V_t Vt
  3. 计算t时刻下降梯度: η t = l r ⋅ m t / V t η_t=lr·m_t/\sqrt V_t ηt=lrmt/V t
  4. 计算t+1时刻参数: w t + 1 = w t − η t = w t − l r ⋅ m t / V t w_{t+1}=w_t-η_t=w_t-lr·m_t/\sqrt V_t wt+1=wtηt=wtlrmt/V t

一阶动量:与梯度相关的函数
二阶动量:与梯度平方相关的函数

5种优化器

1. SGD (无动量):随机梯度下降

m t = g t      V t = 1 m_t=g_t\ \ \ \ V_t=1 mt=gt    Vt=1
η t = l r ⋅ m t / V t = l r ⋅ g t η_t=lr·m_t/\sqrt V_t=lr·g_t ηt=lrmt/V t=lrgt
w t + 1 = w t − η t = w t − l r ⋅ m t / V t = w t − l r ⋅ g t w_{t+1}=w_t-η_t=w_t-lr·m_t/\sqrt V_t=w_t-lr·g_t wt+1=wtηt=wtlrmt/V t=wtlrgt

       = w t − l r ⋅ ∂ l o s s ∂ w t \ \ \ \ \ \ =w_t-lr·\frac{\partial loss}{\partial w_t}       =wtlrwtloss

2. SGDM(含动量的SGD),在SGD基础上增加一阶动量

m t = β ⋅ m t − 1 + ( 1 − β ) ⋅ g t       V t = 1 m_t=\beta·m_{t-1}+(1-\beta)·g_t \ \ \ \ \ V_t=1 mt=βmt1+(1β)gt     Vt=1
η t = l r ⋅ m t / V t = l r ⋅ m t η_t=lr·m_t/\sqrt V_t=lr·m_t ηt=lrmt/V t=lrmt
                = l r ⋅ ( β ⋅ m t − 1 + ( 1 − β ) ⋅ g t ) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =lr·(\beta·m_{t-1}+(1-\beta)·g_t)                =lr(βmt1+(1β)gt)
w t + 1 = w t − η t w_{t+1}=w_t-η_t wt+1=wtηt
       = w t − l r ⋅ ( β ⋅ m t − 1 + ( 1 − β ) ⋅ g t ) \ \ \ \ \ \ =w_t-lr·(\beta·m_{t-1}+(1-\beta)·g_t)       =wtlr(βmt1+(1β)gt)

3. Adagrad, 在SGD基础上增加二阶动量

m t = g t      V t = ∑ τ = 1 t g 2 m_t=g_t \ \ \ \ V_t=\sum_{\tau=1}^tg^2 mt=gt    Vt=τ=1tg2
η t = l r ⋅ m t / ( V t ) η_t=lr·m_t/(\sqrt V_t) ηt=lrmt/(V t)
     = l r ⋅ g t / ( ∑ τ = 1 t g t 2 ) \ \ \ \ =lr·g_t/(\sqrt{\sum_{\tau=1}^tg^2_t)}     =lrgt/(τ=1tgt2)
w t + 1 = w t − η t w_{t+1}=w_t-\eta_t wt+1=wtηt
          = w t − l r ⋅ g t / ( ∑ τ = 1 t g t 2 ) \ \ \ \ \ \ \ \ \ =w_t-lr·g_t/(\sqrt{\sum_{\tau=1}^tg^2_t)}          =wtlrgt/(τ=1tgt2)

4. RMSProp, SGD基础上增加二阶动量

m t = g t      V t = β ⋅ V t − 1 + ( 1 − β ) ⋅ g t 2 m_t=g_t\ \ \ \ V_t=\beta\cdot V_{t-1}+(1-\beta)\cdot g^2_t mt=gt    Vt=βVt1+(1β)gt2
η = l r ⋅ m t / V t \eta=lr\cdot m_t / {\sqrt V_t} η=lrmt/V t
      = l r ⋅ g t / ( β ⋅ V t − 1 + ( 1 − β ) ⋅ g t 2 ) \ \ \ \ \ =lr\cdot g_t / (\sqrt{\beta\cdot V_{t-1}+(1-\beta)\cdot g^2_t})      =lrgt/(βVt1+(1β)gt2 )
w t + 1 = w t − η w_{t+1}=w_t-\eta wt+1=wtη
       = w t − l r ⋅ g t / ( β ⋅ V t − 1 + ( 1 − β ) ⋅ g t 2 ) \ \ \ \ \ \ = w_t-lr\cdot g_t / (\sqrt{\beta\cdot V_{t-1}+(1-\beta)\cdot g^2_t})       =wtlrgt/(βVt1+(1β)gt2 )

5. Adam, 同时结合SGDM一阶动量和RMSProp二阶动量

m t = β ⋅ m t − 1 + ( 1 − β ) ⋅ g t m_t=\beta \cdot m_{t-1}+(1-\beta)\cdot g_t mt=βmt1+(1β)gt
修正一阶动量的偏差: m t ^ = m t 1 − β 1 t \widehat{m_t}=\frac{m_t}{1-\beta^t_1} mt =1β1tmt
V t = β 2 ⋅ V s t e p − 1 + ( 1 − β 2 ) ⋅ g t 2 V_t = \beta_2 \cdot V_{step-1}+(1-\beta_2)\cdot g^2_t Vt=β2Vstep1+(1β2)gt2
修正二阶动量的偏差: V t ^ = V t 1 − β t 2 \widehat{V_t}=\frac{V_t}{1-\beta^2_t} Vt =1βt2Vt
η t = l r ⋅ m t ^ / V t ^ \eta_t=lr\cdot \widehat{m_t} / \sqrt{\widehat{V_t}} ηt=lrmt /Vt
     = l r ⋅ m t 1 − β 1 t / V t 1 − β 2 t \ \ \ \ = lr\cdot \frac{m_t}{1-\beta^t_1} / \sqrt{\frac{V_t}{1-\beta^t_2}}     =lr1β1tmt/1β2tVt

w t + 1 = w t − η t w_{t+1}=w_t-\eta_t wt+1=wtηt
     = w t − l r ⋅ m t 1 − β 1 t / V t 1 − β 2 t \ \ \ \ =w_t-lr\cdot \frac{m_t}{1-\beta^t_1} / \sqrt{\frac{V_t}{1-\beta^t_2}}     =wtlr1β1tmt/1β2tVt

笔记内容来源于视频:人工智能实践:Tensorflow笔记

  • 3
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值