Difference Between Rho and Decay Arguments in Keras RMSprop

最新推荐文章于 2024-08-08 10:03:09 发布

kinggang2017

最新推荐文章于 2024-08-08 10:03:09 发布

阅读量182

点赞数

分类专栏：深度学习

深度学习专栏收录该内容

25 篇文章 0 订阅

订阅专栏

https://stats.stackexchange.com/questions/351409/difference-between-rho-and-decay-arguments-in-keras-rmsprop

Short explanation

rho is the "Gradient moving average [also exponentially weighted average] decay factor" and decay is the "Learning rate decay over each update".

Long explanation

RMSProp is defined as follows

source

So RMSProp uses "rho" to calculate an exponentially weighted average over the square of the gradients.

Note that "rho" is a direct parameter of the RMSProp optimizer (it is used in the RMSProp formula).

Decay on the other hand handles learning rate decay. Learning rate decay is a mechanism generally applied independently of the chosen optimizer. Keras simply builds this mechanism into the RMSProp optimizer for convenience (as does it with other optimizers like SGD and Adam which all have the same "decay" parameter). You may think of the "decay" parameter as "lr_decay".

It can be confusing at first that there are two decay parameters, but they are decaying different values.

"rho" is the decay factor or the exponentially weighted average over the square of the gradients.
"decay" decays the learning rate over time, so we can move even closer to the local minimum in the end of training.