参考: http://ruder.io/optimizing-gradient-descent/index.html#adadeltahttps://blog.csdn.net/u010089444/article/details/76725843