momentum是mxnet包中SGD(随机梯度下降)优化器的一个参数。它是用来更新SGD优化器权值的。具体如下:
rescaled_grad = lr * rescale_grad * clip(grad, clip_gradient) + wd * weight
state = momentum * state + rescaled_grad
weight = weight - state
参考:https://mxnet.incubator.apache.org/api/python/optimization/optimization.html#mxnet.optimizer.SGD