强化学习 折扣率
This post deals with the key parameter I found as a high influence: the discount factor. It discusses the time-based penalization to achieve better performances, where discount factor is modified accordingly.
这篇文章处理了我发现有很大影响力的关键参数:折扣系数。 它讨论了基于时间的惩罚以实现更好的性能,在此基础上对折现因子进行了相应的修改。
I assume that if you land on this post, you are already familiar with the RL terminology. If it is not the case, then I highly recommend these blogs which provide a great background, before you continue: Intro1 and Intro2.
我认为,如果您登陆这篇文章,您已经熟悉RL术语。 如果不是这种情况,那么在继续之前,我强烈建议您提供这些博客,它们提供了很好的背景知识: Intro1和Intro2 。
折扣因子在RL中起什么作用? (What is the role of the discount factor in RL?)
The discount factor, 𝛾, is a real value ∈ [0, 1], cares for the rewards agent achieved in the past, present, and future. In different words, it relates the rewa