An overview of gradient descent optimization algorithm Table of contents: Gradient descent variants Batch gradient descent Stochastic gradient descent Mini-batch gradient descent Challenges Gradient descent optimization algorithms Momentum Nesterov accelerated gradient Adagrad Adadelta RMSprop Adam Visualization of algorithms Which optimizer to choose? Parallelizing and distributing SGD Hogwild! Downpour SGD Delay-tolerant Algorithms for SGD TensorFlow Elastic Averaging SGD Additional strategies for optimizing SGD Shuffling and Curriculum Learning Batch normalization Early Stopping Gradient noise Conclusion References