- Discounting to compensate for rate of interest.
- Discounting to express uncertainty about the future.
- By discounting future rewards, one makes their infinite sum finite.
Advantage of R-learning
1.Better initial estimates
折损方法的折损参数接近于1时,action-value的累加值会非常大,需要很长时间才能收敛。
2.Faster propagation of rewards
Q-learning中reward的影响传播缓慢,locally。但是采用R-learning方式,一个公共的均值,可以为所有状态所用。
3.Value disambiguation
4.Linearity of undiscounted values