http://ruder.io/optimizing-gradient-descent/index.html#momentum 转载于:https://www.cnblogs.com/fuhang/p/8927240.html