Difference Between Rho and Decay Arguments in Keras RMSprop

https://stats.stackexchange.com/questions/351409/difference-between-rho-and-decay-arguments-in-keras-rmsprop

Short explanation

rho is the "Gradient moving average [also exponentially weighted average] decay factor" and decay is the "Learning rate decay over each update".

Long explanation

RMSProp is defined as follows

enter image description here source

So RMSProp uses "rho" to calculate an exponentially weighted average over the square of the gradients.

Note that "rho" is a direct parameter of the RMSProp optimizer (it is used in the RMSProp formula).

Decay on the other hand handles learning rate decay. Learning rate decay is a mechanism generally applied independently of the chosen optimizer. Keras simply builds this mechanism into the RMSProp optimizer for convenience (as does it with other optimizers like SGD and Adam which all have the same "decay" parameter). You may think of the "decay" parameter as "lr_decay".

It can be confusing at first that there are two decay parameters, but they are decaying different values.

  • "rho" is the decay factor or the exponentially weighted average over the square of the gradients.
  • "decay" decays the learning rate over time, so we can move even closer to the local minimum in the end of training.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值