RL-赵-(一)：基本概念【state value（v）、action value（q）、policy（π）、reward、return、trajectories、episode】

u013250861

已于 2023-12-12 22:04:12 修改

阅读量476

点赞数 7

分类专栏： RL/强化学习文章标签：强化学习

于 2023-12-03 16:31:56 首次发布

本文链接：https://blog.csdn.net/u013250861/article/details/134766531

版权

RL/强化学习专栏收录该内容

50 篇文章 12 订阅

订阅专栏

在这里插入图片描述

1.1 A grid world example

Consider an example as shown in Figure 1.2, where a robot moves in a grid world. The robot, called agent, can move across adjacent cells in the grid. At each time step, it can only occupy a single cell. The white cells are accessible for entry, and the orange cells are forbidden. There is a target cell that the robot would like to reach. We will use such grid world examples throughout this book since they are intuitive for illustrating new concepts and algorithms.

The ultimate goal of the agent is to ﬁnd a “good” policy that enables it to reach the target cell when starting from any initial cell. How can the “goodness” of a policy be deﬁned? The idea is that the agent should reach the target without entering any forbidden cells, taking unnecessary detours, or colliding with the boundary of the grid.

1.2 State and action

The ﬁrst concept to be introduced is the state, which describes the agent’s status with respect to the environment.

In the grid world example, the state corresponds to the agent’s location. Since there are nine cells, there are nine states as well.

They are indexed as s1, s2, . . . , s9, as shown in Figure 1.3(a). The set of all the states is called the state space, denoted as S = {s1, . . . , s9}.

1.3 State transition

1.4 Policy

1.5 Reward

1.6 Trajectories, returns, and episodes

1.7 Markov decision processes

u013250861

关注

7
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
RL-赵-(一)：基本概念【state value（v）、action value（q）、policy（π）、reward、return、trajectories、episode】

Consider an example as shown in Figure 1.2, where a robot moves in a grid world. The robot, called agent, can move across adjacent cells in the grid. At each time step, it can only occupy a single cell. The white cells are accessible for entry, and the ora
复制链接

扫一扫

专栏目录