Reinforcement Learning Exercise 3.29

This exercise focuses on reformulating the Bellman equations for four key value functions in Reinforcement Learning: vπ, v∗, qπ, and q∗ using the state transition probability function p and reward function r. The derivations are provided for each function, illustrating how they can be expressed in terms of these functions." 111434171,10296849,Python爬虫分析2017-2018欧洲五大联赛,"['Python', '数据可视化', '足球分析', '数据爬取', '数据分析']
摘要由CSDN通过智能技术生成

Exercise 3.29 Rewrite the four Bellman equations for the four value functions ( v π v_\pi vπ, v ∗ v* v, q π q_\pi qπ, and q ∗ q_* q) in terms of the three argument function p (3.4) and the two-argument function r(3.5).

For v π v_\pi vπ:
v π ( s ) = ∑ a π ( a ∣ s ) ∑ s ′ , r p ( s ′ , r ∣ s , a ) [ r + γ v π ( s ′ ) ] = ∑ a π ( a ∣ s ) [ ∑ s ′ , r r p ( s ′ , r ∣ s , a ) + ∑ s ′ , r γ v π ( s ′ ) p ( s ′ , r ∣ s , a ) ] = ∑ a π ( a ∣ s ) [ ∑ r r p ( r ∣ s , a ) + ∑ s ′ γ v π ( s ′ ) p ( s ′ ∣ s , a ) ] = ∑ a π ( a ∣ s ) [ r ( s , a ) + ∑ s ′ γ v π ( s ′ ) p ( s ′ ∣ s , a ) ] \begin{aligned} v_\pi(s) &= \sum_a \pi(a|s) \sum_{s', r}p(s', r | s, a) \bigl [ r + \gamma v_\pi(s') \bigr ] \\ &= \sum_a \pi(a|s) \bigl [ \sum_{s', r}rp(s', r | s, a) + \sum_{s',r}\gamma v_\pi(s') p(s', r | s, a) \bigr ] \\ &= \sum_a \pi(a|s) \bigl [ \sum_{r}rp( r | s, a) + \sum_{s'}\gamma v_\pi(s') p(s' | s, a) \bigr ] \\ &= \sum_a \pi(a|s) \bigl [ r(s,a)+ \sum_{s'}\gamma v_\pi(s') p(s' | s, a) \bigr ] \\ \end{aligned} vπ(s)=aπ(as)s,rp(s,rs,a)[r+γvπ(s)]=aπ(

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值