Reinforced learning
Reinforced learning An introduction
xiwang_chn
这个作者很懒,什么都没留下…
展开
-
Chapter 9: On-policy Prediction with Approximation
Chapter 9: On-policy Prediction with Approximation1 Introduction2 Determine the approximate function2.1 Mean Squared Value Error2.2 Stochastic gradient descent and semi-gradient methods to minize VEGradient Monte CarloSemi-gradient (bootstrapping estimate)原创 2020-08-25 10:38:53 · 152 阅读 · 0 评论 -
Chapter 16 Applications and 17 Frontiers
Notes of chapter 16 Applications and 17 FrontiersQuestionsWhy not combine function approximation and policy approximation with Dyna? The update of policy can be realized with minimizing TD error, and value table can be replaced by ANN or linear approximati原创 2020-08-25 10:38:41 · 291 阅读 · 0 评论 -
Chapter 13: Policy Gradient Methods
Policy Gradient Methods0 QuestionsQ1: For problems that are not MDP, is it pracitcal to learn a sequential policy model using a temporal convolution network.Q2: Can parameterized policy focus on some interested action space as well as action-value methods?原创 2020-08-25 10:36:55 · 224 阅读 · 0 评论 -
Chapter 12: Eligibility Traces
Notes of Chapter 12: Eligibility Traces1 Introduction2 λ\lambdaλ-return (offline λ\lambdaλ-return algorithm)N-Step return:***λ\lambdaλ-return of continuing tasks***:***λ\lambdaλ-return of episodic/continuing(T=∞\infin∞) tasks***:Offline λ\lambdaλ-return al原创 2020-08-25 10:36:33 · 644 阅读 · 0 评论 -
Chapter 10: On-policy Control with Approximation
Notes of Chapter 10: On-policy Control with Approximation1 Introduction2 On-policy control with approximation of episodic tasks2.1 *General gradient-descent update* for action-value prediction is:2.2 *Gradient-descent update* for semi-gradient n-step Sars原创 2020-08-25 10:36:15 · 163 阅读 · 0 评论 -
Chapter 8: Planning and Learning with Tabular Methods
Chapter 8: Planning and Learning with Tabular MethodsIntroductionWhen the model is dynamicWhen the model is largeExpected & sample UpdateDecision-time planningHeuristic searchRollout AlgorithmMonte carlo tree search (MCTS)IntroductionPlanning and lea原创 2020-08-25 10:35:47 · 397 阅读 · 0 评论 -
Chapter 6&7: Temporal-Difference Learning
Chapter 6&7: Temporal-Difference Learning1 Introduction2 n-step TD prediction (estimate V)3 Off-policy n-step sarsa4 Off-policy Learning Without Importance Sampling: The n-step Tree Backup Algorithm5 Question1 IntroductionTemporal-difference (TD) lea原创 2020-08-25 10:35:26 · 635 阅读 · 0 评论 -
Chapter 5: Monte Carlo Methods
Chapter 5: Monte Carlo Methods1 Introduction2 Policy evaluation (Monte Carlo Prediction; on-policy)3 Policy improvement (on-policy)4 Generalized policy iteration (GPI; on-policy)4.1 Monte Carlo control with Exploring Starts4.2 Monte Carlo control without原创 2020-08-25 10:34:40 · 549 阅读 · 0 评论 -
Chapter 4: Dynamic Programming
Notes of chapter 4: Dynamic Programming (General dynamic programmingDP needs to know the whole model (transition and reward functoins).Bootstrap means updating one estimate from another estimate. It is used to update estimates of the values of states. Th原创 2020-08-25 10:33:50 · 598 阅读 · 0 评论 -
Notes of chapter 3: Finite Markov Decision Process
Chapter 3: Finite Markov Decision Process1 Summary2 Questions3 Exercises3.1 Examples1 Summary2 Questions3 Exercises3.1 ExamplesDriveless cars: States, speed and direction of the car, the sensored information on the road. Actions: regulation of the s原创 2020-06-19 06:21:36 · 516 阅读 · 0 评论 -
Notes of chapter 1: Introduction
Response to chapter 11 Summary:1.1 Definition:Reinforced learning is to train a learning agent, which observes, acts and takes rewards, namely interacts with the environment, to maximize the long-run accumulated rewards which is also known as expectatio原创 2020-06-12 13:40:46 · 350 阅读 · 0 评论 -
Notes of chapter 2: Multi-armed bandits
Notes of chapter 2: Multi-armed bandits1 Summary1.1 The method of updating value tableSample average methodExponential recency-weighted average method (constant step size)1.2 The method of selecting actionsGreedy action selection methodε\varepsilonε-Greedy原创 2020-06-12 13:44:03 · 830 阅读 · 0 评论