Lecture1: Basic Concepts
state
action:
state transition
state transition probability
Policy
Reward
Trajectory
Discounted return
Episode / trial
terminal states
terminal states 和 continuing tasks
Markov decision process (MDP)
引用:
https://www.bilibili.com/video/BV1sd4y167NS?p=5&spm_id_from=pageDriver&vd_source=cf05b9584721d9d032da431e95a8dbdb