- 博客(4)
- 收藏
- 关注
原创 Reinforcement Learning: An Introduction读书笔记 第六章 时间差分算法
第六章 时间差分算法Chapter 6 Temporal-Difference Learning6.1 TD PredictionChapter 6 Temporal-Difference LearningTD learning is a combination of Monte Carlo ideas and dynamic programming (DP) ideas.6.1 TD Prediction
2020-07-19 20:29:37
190
原创 Reinforcement Learning: An Introduction读书笔记 第四章 动态规划
Reinforcement Learning: An Introduction读书笔记 第四章 动态规划)Reinforcement Learning: An Introduction读书笔记 第四章 动态规划4.1 Policy Evaluation4.2 Policy Improvement4.3 Policy Iteration4.4 Value Iteration4.5 Asynchronous Dynamic Programming4.6 Generalized Policy Iteration(
2020-07-19 19:26:36
239
原创 Reinforcement Learning: An Introduction读书笔记 第五章 蒙特卡洛方法
第五章 蒙特卡洛方法Chapter 5 Monte Carlo Methods5.1 Monte Carlo Policy Evaluation5.2 Monte Carlo Estimation of Action Values5.3 Monte Carlo Control5.4 On-Policy Monte Carlo Control5.6 Off-Policy Monte Carlo ControlChapter 5 Monte Carlo MethodsMonte Carlo methods
2020-07-10 15:52:05
238
原创 Reinforcement Learning: An Introduction 读书笔记——第三章
Reinforcement Learning: An Introduction读书笔记——第三章Chaper3 The Reinforcement Learning Problem3.2 Goals and RewardsRL中agent的目标:Reward:3.3 ReturnsChaper3 The Reinforcement Learning Problem3.2 Goals and RewardsRL中agent的目标:To maximize not immediate reward, b
2020-07-07 20:49:27
183
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人