Previous Blog
在开始新的内容之前,我们先回顾一下前两篇博客[RL] 3 Finite Markov Decision Processes (1) 、[RL] 3 Finite Markov Decision Processes (2)给出的重要概念和式子:
- state:
St=s∈S - action:
At=a∈A(St) - reward:
Rt=r∈R⊂R - policy:
πt(a∣s)=Pr(At=a∣St=s) - return:
Gt≐∑T−t−1k=0γkRt+k+1 - markov property:
p(s′,r∣s,a)≐Pr{ St+1=s′,Rt+1=r∣St=s,At=a}(1.1) - expected reward for state-action:
r(s,a)≐E[Rt+1∣St=s,At=a]=∑r∈Rr∑s′∈Sp(s′,r∣s,a)(1.2) - state-transition probability:
p(s′∣s,a)≐Pr{ St+1=s′∣St=s,At=a}=∑r∈Rp(s′,r∣s,a)(1.3) - expected reward for state-action-nextState:
r(s,a,s′)≐E[Rt+1∣St=s,At=a,St+1=s′]=∑r∈Rrp(s′,r∣s,a)p(s′∣s,a)(1.4)