- 博客(9)
- 资源 (1)
- 收藏
- 关注
原创 Chapter 9: On-policy Prediction with Approximation
Chapter 9: On-policy Prediction with Approximation1 Introduction2 Determine the approximate function2.1 Mean Squared Value Error2.2 Stochastic gradient descent and semi-gradient methods to minize VEGradient Monte CarloSemi-gradient (bootstrapping estimate)
2020-08-25 10:38:53 152
原创 Chapter 16 Applications and 17 Frontiers
Notes of chapter 16 Applications and 17 FrontiersQuestionsWhy not combine function approximation and policy approximation with Dyna? The update of policy can be realized with minimizing TD error, and value table can be replaced by ANN or linear approximati
2020-08-25 10:38:41 291
原创 Chapter 13: Policy Gradient Methods
Policy Gradient Methods0 QuestionsQ1: For problems that are not MDP, is it pracitcal to learn a sequential policy model using a temporal convolution network.Q2: Can parameterized policy focus on some interested action space as well as action-value methods?
2020-08-25 10:36:55 223
原创 Chapter 12: Eligibility Traces
Notes of Chapter 12: Eligibility Traces1 Introduction2 λ\lambdaλ-return (offline λ\lambdaλ-return algorithm)N-Step return:***λ\lambdaλ-return of continuing tasks***:***λ\lambdaλ-return of episodic/continuing(T=∞\infin∞) tasks***:Offline λ\lambdaλ-return al
2020-08-25 10:36:33 643
原创 Chapter 10: On-policy Control with Approximation
Notes of Chapter 10: On-policy Control with Approximation1 Introduction2 On-policy control with approximation of episodic tasks2.1 *General gradient-descent update* for action-value prediction is:2.2 *Gradient-descent update* for semi-gradient n-step Sars
2020-08-25 10:36:15 163
原创 Chapter 8: Planning and Learning with Tabular Methods
Chapter 8: Planning and Learning with Tabular MethodsIntroductionWhen the model is dynamicWhen the model is largeExpected & sample UpdateDecision-time planningHeuristic searchRollout AlgorithmMonte carlo tree search (MCTS) Introduction Planning and lea
2020-08-25 10:35:47 397
原创 Chapter 6&7: Temporal-Difference Learning
Chapter 6&7: Temporal-Difference Learning1 Introduction2 n-step TD prediction (estimate V)3 Off-policy n-step sarsa4 Off-policy Learning Without Importance Sampling: The n-step Tree Backup Algorithm5 Question 1 Introduction Temporal-difference (TD) lea
2020-08-25 10:35:26 635
原创 Chapter 5: Monte Carlo Methods
Chapter 5: Monte Carlo Methods1 Introduction2 Policy evaluation (Monte Carlo Prediction; on-policy)3 Policy improvement (on-policy)4 Generalized policy iteration (GPI; on-policy)4.1 Monte Carlo control with Exploring Starts4.2 Monte Carlo control without
2020-08-25 10:34:40 549
原创 Chapter 4: Dynamic Programming
Notes of chapter 4: Dynamic Programming (General dynamic programming DP needs to know the whole model (transition and reward functoins). Bootstrap means updating one estimate from another estimate. It is used to update estimates of the values of states. Th
2020-08-25 10:33:50 598
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人