http://sebastianruder.com/optimizing-gradient-descent/ 转载于:https://www.cnblogs.com/zjpeng1234/p/5954228.html