参考链接:
TF中optimizor源码:
https://blog.csdn.net/Huang_Fj/article/details/102688509
如何累加梯度进行反向:
https://stackoverflow.com/questions/46772685/how-to-accumulate-gradients-in-tensorflow
https://blog.csdn.net/weixin_41560402/article/details/106930463
问题背景及解决:
在显卡资源有限的情况下,想要增加batch_size的大小,于是可以想到将一个大的batch分解成许多的mini_batch前传,并同时将梯度累加,达到batch_size大小之后反向跟新梯度。
操作也很简单,实际上就是将optimizor分解为compute_gradients和apply_gradients两步,并在第一步之后累加梯度即可,具体可以看参考链接里面的optimizor解析,这里也贴一下:
如何进行梯度累加,并更新梯度可以参考上面两个链接,比较详细
def minimize(self, loss, global_step=None, var_list=None,
gate_gradients=GATE_OP, aggregation_method=None,
colocate_gradients_with_ops=False, name=None,
grad_loss=None):
grads_and_vars = self.compute_gradients(
loss, var_list=var_list, gate_gradients=gate_gradients,
aggregation_method=aggregation_method,
colocate_gradients_with_ops=colocate_gradients_with_ops,
grad_loss=grad_loss)
vars_with_grad = [v for g, v in grads_and_vars if g is not None]
if not vars_with_grad:
raise ValueError(
"No gradients provided for any variable, check your graph for ops"
" that do not support gradients, between variables %s and loss %s." %
([str(v) for _, v in grads_and_vars], loss))
return self.apply_gradients(grads_and_vars, global_step=global_step,
name=name)