一、Gradient: Loss 的等高線的法線方向
二、Adagrad
Divide the learning rate of each parameter by the root mean square of its previous derivatives
三、Comparison between different parameters
四、Formal Derivation
五、Taylor Series
Taylor series: Let h(x) be any function infinitely differentiable around x = x0.