Reinforcement Learning Overview
文章平均质量分 91
超级超级小天才
这个作者很懒,什么都没留下…
展开
-
[Chapter 6] Reinforcement Learning (4) Policy Search
In the previous sections, we try to learn the utility function, or more usually, the action-value functions and greedily select the action with the highest Q-value: π(s)=argmaxaQ(s,a){\pi}(s)=arg max_a{Q(s,a)}π(s)=argmaxaQ(s,a) This means that once原创 2021-05-30 11:50:30 · 187 阅读 · 0 评论 -
[Chapter 5] Reinforcement Learning(3)Function Approximation
Function Approximation While we are learning the Q-functions, but how to represent or record the Q-values? For discrete and finite state space and action space, we can use a big table with size of ∣S∣×∣A∣|S| \times |A|∣S∣×∣A∣ to represent the Q-values for原创 2021-05-30 10:04:46 · 158 阅读 · 0 评论 -
[Chapter 4] Reinforcement Learning (2) Model-Free Method
Model-Free RL Method In model-based method, we need firstly model the environment by learning/estimating the transition and reward functions. However, in model-free method, we consider learning the value/utility functions V(s)V(s)V(s) or U(s)U(s)U(s) or ac原创 2021-05-29 14:42:02 · 189 阅读 · 0 评论 -
[Chapter 3] Reinforcement Learning (1) Model-Based Method
Reinforcement Learning Firstly, we assume that all the environments in the following materials are all modeled by Markov decision processes. As we have known, an MDP model can be represented by a tuple (S,A,T,R)(S,A,T,R)(S,A,T,R), the rewards are returned原创 2021-05-29 12:54:12 · 225 阅读 · 0 评论 -
[Chapter 2] Value Iteration and Policy Iteration
We now know the most important thing for computing an optimal policy is to compute the value function. But how? (The following contents are all based on infinite horizon problems.) The solution to this problem can be roughly divided into two categories: Va原创 2021-05-28 23:03:22 · 285 阅读 · 1 评论 -
[Chapter 1] Markov Decision Process and Value Function
Markov Decision Process One of the most important problems in decision making is to make sequential decisions, which is also the agent’s utility depends on. At each time step, the agent selects some actions to interact with the environment and make it tran原创 2021-05-28 14:32:56 · 194 阅读 · 0 评论