Exercise 3.29 Rewrite the four Bellman equations for the four value functions ( v π v_\pi vπ, v ∗ v* v∗, q π q_\pi qπ, and q ∗ q_* q∗) in terms of the three argument function p (3.4) and the two-argument function r(3.5).
For v π v_\pi vπ:
v π ( s ) = ∑ a π ( a ∣ s ) ∑ s ′ , r p ( s ′ , r ∣ s , a ) [ r + γ v π ( s ′ ) ] = ∑ a π ( a ∣ s ) [ ∑ s ′ , r r p ( s ′ , r ∣ s , a ) + ∑ s ′ , r γ v π ( s ′ ) p ( s ′ , r ∣ s , a ) ] = ∑ a π ( a ∣ s ) [ ∑ r r p ( r ∣ s , a ) + ∑ s ′ γ v π ( s ′ ) p ( s ′ ∣ s , a ) ] = ∑ a π ( a ∣ s ) [ r ( s , a ) + ∑ s ′ γ v π ( s ′ ) p ( s ′ ∣ s , a ) ] \begin{aligned} v_\pi(s) &= \sum_a \pi(a|s) \sum_{s', r}p(s', r | s, a) \bigl [ r + \gamma v_\pi(s') \bigr ] \\ &= \sum_a \pi(a|s) \bigl [ \sum_{s', r}rp(s', r | s, a) + \sum_{s',r}\gamma v_\pi(s') p(s', r | s, a) \bigr ] \\ &= \sum_a \pi(a|s) \bigl [ \sum_{r}rp( r | s, a) + \sum_{s'}\gamma v_\pi(s') p(s' | s, a) \bigr ] \\ &= \sum_a \pi(a|s) \bigl [ r(s,a)+ \sum_{s'}\gamma v_\pi(s') p(s' | s, a) \bigr ] \\ \end{aligned} vπ(s)=a∑π(a∣s)s′,r∑p(s′,r∣s,a)[r+γvπ(s′)]=a∑π(