TZX世界第一可爱-CSDN博客

原创 Reinforcement Learning: An Introduction读书笔记第六章时间差分算法

第六章时间差分算法Chapter 6 Temporal-Difference Learning6.1 TD Prediction Chapter 6 Temporal-Difference Learning TD learning is a combination of Monte Carlo ideas and dynamic programming (DP) ideas. 6.1 TD Prediction

2020-07-19 20:29:37 281

原创 Reinforcement Learning: An Introduction读书笔记第四章动态规划

Reinforcement Learning: An Introduction读书笔记第四章动态规划）Reinforcement Learning: An Introduction读书笔记第四章动态规划4.1 Policy Evaluation4.2 Policy Improvement4.3 Policy Iteration4.4 Value Iteration4.5 Asynchronous Dynamic Programming4.6 Generalized Policy Iteration(

2020-07-19 19:26:36 355

原创 Reinforcement Learning: An Introduction读书笔记第五章蒙特卡洛方法

第五章蒙特卡洛方法Chapter 5 Monte Carlo Methods5.1 Monte Carlo Policy Evaluation5.2 Monte Carlo Estimation of Action Values5.3 Monte Carlo Control5.4 On-Policy Monte Carlo Control5.6 Off-Policy Monte Carlo Control Chapter 5 Monte Carlo Methods Monte Carlo methods

2020-07-10 15:52:05 354

原创 Reinforcement Learning: An Introduction 读书笔记——第三章

Reinforcement Learning: An Introduction读书笔记——第三章Chaper3 The Reinforcement Learning Problem3.2 Goals and RewardsRL中agent的目标：Reward:3.3 Returns Chaper3 The Reinforcement Learning Problem 3.2 Goals and Rewards RL中agent的目标： To maximize not immediate reward, b

2020-07-07 20:49:27 283

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

原创 Reinforcement Learning: An Introduction读书笔记 第六章 时间差分算法

原创 Reinforcement Learning: An Introduction读书笔记 第四章 动态规划

原创 Reinforcement Learning: An Introduction读书笔记 第五章 蒙特卡洛方法

原创 Reinforcement Learning: An Introduction 读书笔记——第三章

空空如也

空空如也

原创 Reinforcement Learning: An Introduction读书笔记第六章时间差分算法

原创 Reinforcement Learning: An Introduction读书笔记第四章动态规划

原创 Reinforcement Learning: An Introduction读书笔记第五章蒙特卡洛方法