Reinforcement Learning Exercise 5.10

Exercise 5.10 Derive the weighted-average update rule (5.8) from (5.7). Follow the pattern of the derivation of the unweighted rule (2.3)

According to:
V n ≐ ∑ k = 1 n − 1 W k G k ∑ k = 1 n − 1 W k , n ≥ 2 (5.7) V_{n} \doteq \frac{\sum_{k=1}^{n - 1}W_k G_k}{\sum_{k=1}^{n - 1}W_k} \text{,} \qquad n \geq 2 \qquad \text{(5.7)} \\ Vnk=1n1Wkk=1n1WkGk,n2(5.7)
and denote C n C_n Cn as the weights given to the first n returns. So formula (5.7) is transferred to:
V n ≐ ∑ k = 1 n − 1 W k G k C n − 1 , n ≥ 2 V_{n} \doteq \frac{\sum_{k=1}^{n - 1}W_k G_k}{C_{n-1}} \text{,} \qquad n \geq 2 VnCn1k=1n1WkGk,n2
then we have:
V n + 1 ≐ ∑ k = 1 n W k G k C n , n ≥ 1 V_{n+1} \doteq \frac{\sum_{k=1}^{n}W_k G_k}{C_n} \text{,} \qquad n \geq 1 Vn+1Cnk=1nWkGk,n1
∴ V n + 1 = ∑ k = 1 n − 1 W k G k C n + W n G n C n = C n − 1 C n V n + W n G n C n = ( 1 − W n C n ) V n + W n G n C n = V n + W n C n ( G n − V n ) , n ≥ 1 , (5.8) \begin{aligned} \therefore V_{n+1} &= \frac{\sum_{k=1}^{n - 1}W_k G_k}{C_n}+\frac{W_nG_n}{C_n}\\ &=\frac{C_{n-1}}{C_{n}} V_n +\frac{W_nG_n}{C_n} \\ &= (1 - \frac{W_n}{C_n})V_n + \frac{W_nG_n}{C_n} \\ &=V_n + \frac{W_n}{C_n}(G_n - V_n), \qquad n \geq 1, \qquad \text{(5.8)} \end{aligned} Vn+1=Cnk=1n1WkGk+CnWnGn=CnCn1Vn+CnWnGn=(1CnWn)Vn+CnWnGn=Vn+CnWn(GnVn),n1,(5.8)
This derivation is very easy.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值