李宏毅老师机器学习2020学习笔记与思考(五)New Optimizers for Deep Learning
New Optimizers for Deep LearningSome Notationsθt\theta_{t}θt: 第t步的模型参数∇L(θt)\nabla L(\theta_{t})∇L(θt)或gtg_{t}gt: θt\theta_{t}θt处的梯度,常用来计算θt+1\theta_{t+1}θt+1mt+1m_{t+1}mt+1: 从第0步到第t步的累积momentum,常用来计算θt+1\theta_{t+1}θt+1...
复制链接