1.
An overview of gradient descent optimization algorithms
http://sebastianruder.com/optimizing-gradient-descent/
2.
3.
4.
比Momentum更快:揭开Nesterov Accelerated Gradient的真面目
https://zhuanlan.zhihu.com/p/22810533
5.
用Theano实现Nesterov momentum的正确姿势
https://zhuanlan.zhihu.com/p/20190387