Reinforcement Learning: An Introduction读书笔记 第六章 时间差分算法
第六章 时间差分算法Chapter 6 Temporal-Difference Learning6.1 TD PredictionChapter 6 Temporal-Difference LearningTD learning is a combination of Monte Carlo ideas and dynamic programming (DP) ideas.6.1 TD Prediction
复制链接