第五章 蒙特卡洛方法
Chapter 5 Monte Carlo Methods
Monte Carlo methods require only experience——sample sequences of states, actions, and rewards from on-line or simulated interaction with an environment.
- episode-by-episode(每个episode更新一次)
- learn directly from raw experience without a model of the environment’s dynamics
5.1 Monte Carlo Policy Evaluation
- An obvious way to estimate it from experience, then, is simply to average the returns observed after visits to that state. As more returns are observed, the average should converge to the expected value.
- Each occurrence of state s in