http://blog.csdn.net/luo123n/article/details/48239963 http://sebastianruder.com/optimizing-gradient-descent/index.html#gradientdescentvariants