http://sebastianruder.com/optimizing-gradient-descent/
Gradient descent variants
Batch gradient descent
Stochastic gradient descent
Mini-batch gradient descent
Challenges
Gradient descent optimization algorithms
Momentum
Nesterov accelerated gradient
Adagrad
Adadelta
RMSprop
Adam
Visualization of algorithms
Which optimizer to choose?
Parallelizing and distributing SGD
Hogwild!
Downpour SGD
Delay-tolerant Algorithms for SGD
TensorFlow
Elastic Averaging SGD
Additional strategies for optimizing SGD
Shuffling and Curriculum Learning
Batch normalization
Early Stopping
Gradient noise