无模型预测
Model-Free Prediction
蒙特卡罗强化学习
Monte-Carlo Reinforcement Learning
- 从经历完整的经验序列来估计状态值 MC methods learn directly from episodes of experience
- 无模型,不清楚MDP的状态转移和奖励
MC is model-free: no knowledge of MDP transitions / rewards - 完整的经验序列 MC learns from complete episodes: no bootstrapping
- 价值=收获的平均值 MC uses the simplest possible idea: value = mean return
- Caveat: can only apply MC to episodic MDPs
- 必须终止,才能得到平均值 All episodes must terminate