自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(9)
  • 资源 (1)
  • 收藏
  • 关注

原创 Chapter 9: On-policy Prediction with Approximation

Chapter 9: On-policy Prediction with Approximation1 Introduction2 Determine the approximate function2.1 Mean Squared Value Error2.2 Stochastic gradient descent and semi-gradient methods to minize VEGradient Monte CarloSemi-gradient (bootstrapping estimate)

2020-08-25 10:38:53 152

原创 Chapter 16 Applications and 17 Frontiers

Notes of chapter 16 Applications and 17 FrontiersQuestionsWhy not combine function approximation and policy approximation with Dyna? The update of policy can be realized with minimizing TD error, and value table can be replaced by ANN or linear approximati

2020-08-25 10:38:41 291

原创 Chapter 13: Policy Gradient Methods

Policy Gradient Methods0 QuestionsQ1: For problems that are not MDP, is it pracitcal to learn a sequential policy model using a temporal convolution network.Q2: Can parameterized policy focus on some interested action space as well as action-value methods?

2020-08-25 10:36:55 223

原创 Chapter 12: Eligibility Traces

Notes of Chapter 12: Eligibility Traces1 Introduction2 λ\lambdaλ-return (offline λ\lambdaλ-return algorithm)N-Step return:***λ\lambdaλ-return of continuing tasks***:***λ\lambdaλ-return of episodic/continuing(T=∞\infin∞) tasks***:Offline λ\lambdaλ-return al

2020-08-25 10:36:33 643

原创 Chapter 10: On-policy Control with Approximation

Notes of Chapter 10: On-policy Control with Approximation1 Introduction2 On-policy control with approximation of episodic tasks2.1 *General gradient-descent update* for action-value prediction is:2.2 *Gradient-descent update* for semi-gradient n-step Sars

2020-08-25 10:36:15 163

原创 Chapter 8: Planning and Learning with Tabular Methods

Chapter 8: Planning and Learning with Tabular MethodsIntroductionWhen the model is dynamicWhen the model is largeExpected & sample UpdateDecision-time planningHeuristic searchRollout AlgorithmMonte carlo tree search (MCTS) Introduction Planning and lea

2020-08-25 10:35:47 397

原创 Chapter 6&7: Temporal-Difference Learning

Chapter 6&7: Temporal-Difference Learning1 Introduction2 n-step TD prediction (estimate V)3 Off-policy n-step sarsa4 Off-policy Learning Without Importance Sampling: The n-step Tree Backup Algorithm5 Question 1 Introduction Temporal-difference (TD) lea

2020-08-25 10:35:26 635

原创 Chapter 5: Monte Carlo Methods

Chapter 5: Monte Carlo Methods1 Introduction2 Policy evaluation (Monte Carlo Prediction; on-policy)3 Policy improvement (on-policy)4 Generalized policy iteration (GPI; on-policy)4.1 Monte Carlo control with Exploring Starts4.2 Monte Carlo control without

2020-08-25 10:34:40 549

原创 Chapter 4: Dynamic Programming

Notes of chapter 4: Dynamic Programming (General dynamic programming DP needs to know the whole model (transition and reward functoins). Bootstrap means updating one estimate from another estimate. It is used to update estimates of the values of states. Th

2020-08-25 10:33:50 598

RL book 2018.pdf

强化学习经典资料

2020-06-12

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除