https://blog.csdn.net/yinyu19950811/article/details/90476956 优化方法总结以及Adam存在的问题(SGD, Momentum, AdaDelta, Adam, AdamW,LazyAdam)
https://www.cnblogs.com/maybe2030/p/9220921.html [Deep Learning] 常用的Active functions & Optimizers
SGD, Momentum, AdaDelta, Adam, Adagrad,NAG,Rmsprop效果图
https://raw.githubusercontent.com/cs231n/cs231n.github.io/master/assets/nn3/opt2.gif
https://raw.githubusercontent.com/cs231n/cs231n.github.io/master/assets/nn3/opt1.gif