Neural Network 5 优化器

神经网络参数优化器
引导参数优化的方法,

1.SGD

没有momentum
W t + 1 = W t − l r    ∗    ∂ l o s s ∂ W t W_{t+1}=W_t-lr\;\ast\;\frac{\partial loss}{\partial W_t} Wt+1=WtlrWtloss

2.SGDM

含有momentum,在SGD基础上增加一阶动量
β 一般取值0.9
m t = β ⋅ m t − 1 + ( 1 − β ) ⋅ g t      ,        v t = 1 η t    = l r ⋅ m t v t = l r ⋅ ( β ⋅ m t − 1 + ( 1 − β ) ⋅ g t ) W t + 1 = W t − l r    ∗    l r ⋅ ( β ⋅ m t − 1 + ( 1 − β ) ⋅ g t ) m_t=\beta\cdot m_{t-1}+(1-\beta)\cdot g_t\;\;,\;\;\;v_t=1\\\eta_t\;=lr\cdot\frac{m_t}{\sqrt{v_t}}=lr\cdot(\beta\cdot m_{t-1}+(1-\beta)\cdot g_t)\\W_{t+1}=W_t-lr\;\ast\;lr\cdot(\beta\cdot m_{t-1}+(1-\beta)\cdot g_t) mt=βmt1+(1β)gt,vt=1ηt=lrvt mt=lr(βmt1+(1β)gt)Wt+1=Wtlrlr(βmt1+(1β)gt)

#更新梯度
        m_w = beta * m_w + (1 - beta) * m_w
        m_b = beta * m_b + (1 - beta) * m_b
        w1.assign_sub(lr * grads[0])
        b1.assign_sub(lr * grads[1])

在这里插入图片描述

(3).Adagrad

加入二阶动量,gt表示梯度
m t = g t            ,          v t = ∑ g τ 2 τ = 1 t η t    = l r ⋅ m t v t = l r ⋅ g t ∑ g τ 2 τ = 1 t W t + 1 = W t − l r    ∗ l r ⋅ g t ∑ g τ 2 τ = 1 t m_t=g_t\;\;\;\;\;,\;\;\;\;v_t=\overset t{\underset{\tau=1}{\sum g_\tau^2}}\\\eta_t\;=lr\cdot\frac{m_t}{\sqrt{v_t}}=lr\cdot\frac{g_t}{\sqrt{\overset t{\underset{\tau=1}{\sum g_\tau^2}}}}\\W_{t+1}=W_t-lr\;\ast lr\cdot\frac{g_t}{\sqrt{\overset t{\underset{\tau=1}{\sum g_\tau^2}}}} mt=gt,vt=τ=1gτ2tηt=lrvt mt=lrτ=1gτ2t gtWt+1=Wtlrlrτ=1gτ2t gt

        v_w +=tf.square(grads[0])
        v_b +=tf.square(grads[1])
        w1.assign_sub(lr * grads[0] / tf.sqrt(v_w))
        b1.assign_sub(lr * grads[1] / tf.sqrt(v_b))

在这里插入图片描述

(4).RMSProp

增加二阶动力
m t = g t      ,        v t = β ⋅ v t − 1 + ( 1 − β ) ⋅ g t 2    η t    = l r ⋅ m t v t = l r ⋅ g t    / ( β ⋅ v t − 1 + ( 1 − β ) ⋅ g t 2 ) W t + 1 = W t − l r ⋅ g t    / ( β ⋅ v t − 1 + ( 1 − β ) ⋅ g t 2 ) m_t=g_t\;\;,\;\;\;v_t=\beta\cdot v_{t-1}+(1-\beta)\cdot g_t^2\;\\\eta_t\;=lr\cdot\frac{m_t}{\sqrt{v_t}}=lr\cdot g_t\;/(\sqrt{\beta\cdot v_{t-1}+(1-\beta)\cdot g_t^2})\\W_{t+1}=W_t-lr\cdot g_t\;/(\sqrt{\beta\cdot v_{t-1}+(1-\beta)\cdot g_t^2}) mt=gt,vt=βvt1+(1β)gt2ηt=lrvt mt=lrgt/(βvt1+(1β)gt2 )Wt+1=Wtlrgt/(βvt1+(1β)gt2 )

        v_w =beta * v_w + (1 - beta) * tf.square(grads[0])
        v_b =beta * v_b + (1 - beta) * tf.square(grads[1])
        w1.assign_sub(lr * grads[0] / tf.sqrt(v_w))
        b1.assign_sub(lr * grads[1] / tf.sqrt(v_b))

在这里插入图片描述

(5)Adam 优化器

同时结合SGDM一阶动量和RMSProp二阶动量
m t = β 1 ⋅ m t − 1 + ( 1 − β 1 ) ⋅ g t      修 正 一 阶 动 量 的 偏 差 : m t ⏞ = m t 1 − β 1      t    v t = β 2 ⋅ v s t e p − 1 + ( 1 − β 2 ) ⋅ g t 2    修 正 二 阶 动 量 的 偏 差 : v t ⏞ = v t 1 − β 2      t η t    = l r ⋅ m t v t = l r ⋅ m t 1 − β 1      t t    / ( v t 1 − β 2      t ) W t + 1 = W t − l r ⋅ m t 1 − β 1      t t    / ( v t 1 − β 2      t ) m_t=\beta_1\cdot m_{t-1}+(1-\beta_1)\cdot g_t\;\;\\\mathrm{修正一阶动量的偏差}:\overbrace{m_t}=\frac{m_t}{1-\beta_1^{\;\;t}}\;\\\\v_t=\beta_2\cdot v_{step-1}+(1-\beta_2)\cdot g_t^2\;\\\mathrm{修正二阶动量的偏差}:\overbrace{v_t}=\frac{v_t}{1-\beta_2^{\;\;t}}\\\\\eta_t\;=lr\cdot\frac{m_t}{\sqrt{v_t}}=lr\cdot{\frac{m_t}{1-\beta_1^{\;\;t}}}_t\;/(\sqrt{\frac{v_t}{1-\beta_2^{\;\;t}}})\\W_{t+1}=W_t-lr\cdot{\frac{m_t}{1-\beta_1^{\;\;t}}}_t\;/(\sqrt{\frac{v_t}{1-\beta_2^{\;\;t}}}) mt=β1mt1+(1β1)gtmt =1β1tmtvt=β2vstep1+(1β2)gt2vt =1β2tvtηt=lrvt mt=lr1β1tmtt/(1β2tvt )Wt+1=Wtlr1β1tmtt/(1β2tvt )

#学习率和画图用的参数的存储
lr = 0.1
train_loss_results = []
test_acc = []
epoch = 500
loss_all = 0
#加入优化器参数
m_w,m_b=0,0
v_w,v_b=0,0
beta1=0.9
beta2=0.999
delta_w,delta_b = 0,0
global_step=0

#训练 epoch 是整个数据集 而第二个for是一个batch
for epoch in range(epoch):
    for step , (x_train,y_train) in enumerate(train_db):
        #更新
        global_step += 1
        with tf.GradientTape() as tape:
            y = tf.matmul(x_train , w1) + b1
            y = tf.nn.softmax(y)
            y_ = tf.one_hot(y_train,depth=3)
            loss = tf.reduce_mean(tf.square(y_ - y))
            loss_all += loss.numpy()

        grads = tape.gradient(loss,[w1 , b1])

        #更新梯度adma
        m_w = beta1 * m_w + (1 - beta1) * grads[0]
        m_b = beta2 * m_b + (1 - beta1) * grads[1]
        v_w = beta2 * v_w + (1 - beta2) * tf.square(grads[0])
        v_b = beta2 * v_b + (1 - beta2) * tf.square(grads[1])

        m_w_correction = m_w / (1 - tf.pow(beta1,int(global_step)))
        m_b_correction = m_b / (1 - tf.pow(beta1,int(global_step)))
        v_w_correction = v_w / (1 - tf.pow(beta2,int(global_step)))
        v_b_correction = v_b / (1 - tf.pow(beta2,int(global_step)))

        w1.assign_sub(lr * m_w_correction / tf.sqrt(v_w_correction))
        b1.assign_sub(lr * m_b_correction / tf.sqrt(v_b_correction))

        #m每个epoch 打印loss的值
    print("Epoch {},loss {}:".format(epoch,loss_all/4))
    train_loss_results.append(loss_all/4)
    loss_all = 0

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值