简介
mxnet 提供了自动求导的方法,相比于使用caffe需要自己写反向传播,这可以更加节约我们的时间。
求导
例如:
y = 4 * x^2
x = [[1,2; 3,4]]
dy/dx = 8x = [[8,16;24,32]]
使用mxnet自动求导:
x = mx.nd.array([[1,2], [3,4]])
x.attach_grad()
with ag.record():
y = 4 * (x ** 2)
y.backward()
print x.grad
结果如下:
[[ 8. 16.]
[ 24. 32.]]
<NDArray 2x2 @cpu(0)>
API接口
x.attach_grad 用来开辟梯度存储空间
通过 grad_req参数定义梯度更新方式,write,add, nulldef attach_grad(self, grad_req='write', stype=None): """Attach a gradient buffer to this NDArray, so that `backward` can compute gradient with respect to it. Parameters ---------- grad_req : {'write', 'add', 'null'} How gradient will be accumulated. - 'write': gradient will be overwritten on every backward. - 'add': gradient will be added to existing value on every backward. - 'null': do not compute gradient for this NDArray. stype : str, optional The storage type of the gradient array. Defaults to the same stype of this NDArray. """
ag.record()
为了减少计算和内存开销,默认条件下 MXNet 不会记录用于求梯度的计算。我们需要调用record函数来要求 MXNet 记录与求梯度有关的计算。
def record(train_mode=True): #pylint: disable=redefined-outer-name
"""Returns an autograd recording scope context to be used in 'with' statement
and captures code that needs gradients to be calculated.
.. note:: When forwarding with train_mode=False, the corresponding backward
should also use train_mode=False, otherwise gradient is undefined.
Example::
with autograd.record():
y = model(x)
backward([y])
metric.update(...)
optim.step(...)
Parameters
----------
train_mode: bool, default True
Whether the forward pass is in training or predicting mode. This controls the behavior
of some layers such as Dropout, BatchNorm.
"""
return _RecordingStateScope(True, train_mode)
3.backward()计算反向传播
backward(self, out_grad=None, retain_graph=False, train_mode=True)
out_grad表示是否有头梯度,如果有需要将上一层的梯度给 out_grad。