Tensorflow实现先累加多个minibatch计算的梯度,再反向传播

1. 问题背景

目前,我们在训练神经网络模型时,一般采用批梯度训练,大量实验表明,超参数batch size会影响模型收敛速度(训练时间)和模型效果。

通常,batch size越小,模型的收敛速度越慢;batch size越大,模型收敛速度越快,性能一般也会好一些。batch size 的影响可以看实验:Tensorflow playground.
但是受限于设备的显存,我们不可能一直增大batch size。于是,如何在设备显存受限的情况下,增大batch size的可选范围(可方便调参),成为这篇文章的主要研究问题。

2. 主要解决思路

主要解决思路正如文章标题所示:在一个batch中先累加多个minibatch计算的梯度,再反向传播。即,
(1)将整个dataset分成多个batch,
(2)分别将每个batch分成多个minibatch,将每个minibatch喂给神经网络,计算loss,计算梯度,并将梯度保存下来,先不进行反向传播。
(3)对一个batch中的所有minibatch得到的梯度进行累加,并进行反向传播。

上述结果完全等同于,将一个batch喂给神经网络,计算loss,计算梯度,再进行反向传播。

好处:你可以根据显存大小,实现任何batch size 大小的批梯度训练。batch size 的范围是 [1, len(dataset)]。如果你有更多的显存,就把minibatch size 设大一点,显存不足则将minibatch size设小一点。

3. Tensorflow实现

在 pytorch 中,梯度只要不清零默认是累加的,于是很容易实现上述问题。但在Tensorflow中,却不那么容易。话不多说,直接上程序。

import tensorflow as tf
import numpy as np
import os

os.environ["CUDA_VISIBLE_DEVICES"] = "1"

x_data = np.array(range(1, 20))
num_dataset = len(x_data)
batch_size = 4
minibatch_size = 2
with tf.Graph().as_default():
    x = tf.placeholder(dtype='float32', shape=None)
    w = tf.Variable(initial_value=4., dtype='float32')
    loss = w * w * x

    # Optimizer definition - nothing different from any classical example
    opt = tf.train.GradientDescentOptimizer(0.1)

    # Retrieve all trainable variables you defined in your graph
    tvs = tf.trainable_variables()

    # Creation of a list of variables with the same shape as the trainable ones
    # initialized with zeros
    accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
    zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]

    # Calls the compute_gradients function of the optimizer to obtain the list of gradients
    gvs = opt.compute_gradients(loss, tvs)

    # Adds to each element from the list you initialized earlier with zeros its gradient
    # (works because accum_vars and gvs are in the same order)
    accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]

    # Define the training step (part with variable value update)
    train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)])

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        for batch_count in range(batch_size):
            # 在run每个batch, 需先将前一个batch所得的累积梯度清零
            sess.run(zero_ops)

            batch_data = x_data[batch_count*batch_size: (batch_count+1)*batch_size]
            # Accumulate the gradients 'minibatch_size' times in accum_vars using accum_ops
            for minibatch_count in range(minibatch_size):
                minibatch_data = batch_data[minibatch_count*minibatch_size: (minibatch_count+1)*minibatch_size]
                accum_array = sess.run(accum_ops, feed_dict={x: minibatch_data})
                print("[%d][%d]" % (batch_count, minibatch_count), accum_array)
                print(sess.run(tvs))
            # Run the train_step ops to update the weights based on your accumulated gradients
            sess.run(train_step)

输出结果:

[0][0] [24.0]
[4.0]
[0][1] [80.0]
[4.0]
[1][0] [-88.0]
[-4.0]
[1][1] [-208.0]
[-4.0]
[2][0] [638.4]
[16.800001]
[2][1] [1411.2001]
[16.800001]
[3][0] [-6713.2803]
[-124.32001]
[3][1] [-14421.121]
[-124.32001]

参考资料:
[1]: https://stackoverflow.com/questions/46772685/how-to-accumulate-gradients-in-tensorflow.
[2]: https://www.zhihu.com/question/320783553.

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值