Reading: Clarification about Upcoming Adam Optimization Video
Adam Optimization Algorithm(Adam优化算法)
During the history of deep learning, many researchers including some very well-known researchers, sometimes proposed optimization algorithms and show they work well in a few problems. But those optimization algorithms subsequently were shown not to really generalize that well to the wide range of neural networks you might want to train. Over time, I think the deep learning community actually developed some amount of skepticism about new optimization algorithms. A lot of people felt that gradient descent with momentum really works well, was difficult to propose things that work much better. RMSprop and the Adam optimization algorithm, which we'll talk about in this video, is one of those rare algorithms that has really stood up, and has been shown to work well across a wide range of deep learning architectures. This one of the algorithms that I wouldn't hesitate to recommend you try, because many people have tried it and seeing it work well on many problems. The Adam optimization algorithm is basically taking momentum and RMSprop, and putting them together.
Let's see how that works. To implement Adam, you initialize V_dw equals 0, S_dw equals 0, and similarly V_db, S_db equals 0. Then on iteration t,