Spinning Up USER DOCUMENTATION

最新推荐文章于 2024-03-15 15:44:40 发布

JSerenity

最新推荐文章于 2024-03-15 15:44:40 发布

阅读量224

点赞数

分类专栏：强化学习

本文链接：https://blog.csdn.net/JSerenity/article/details/90302154

版权

强化学习专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Spinning Up USER DOCUMENTATION

(For review)

Environments

Spinning Up requires Python3, OpenAI Gym, and OpenMPI. MuJoCo(Optional but Preferred)

Algorithms

VPG, TRPO, PPO, DDPG, TD3, SAC.

The On-Policy Algorithms

Vanilla Policy Gradient(VPG) is the most basic, entry-level algorithm in the deep RL space. It led to stronger algorithms such as TRPO and PPO.

On-Policy:

don’t use old data.
They are weaker on sample efficiency.
These algorithms directly optimize the policy performance. They trades off sample efficiency in favor of stability.
TRPO and PPO are mean to make up the deficit on sample efficiency.

The Off-Policy Algorithms

Foundational algorithm just like VPG. DDPG is closely connected to Q-learning algorithms, and it concurrently learns a Q-function and a policy which are updated to improve each other.

Off-Policy:

can reuse old data very efficiently.
Bellman’s equations for optimality,
Q-function can be trained to satisfy using any environment interaction data (as long as there’s enough experience from the high-reward areas in the environment).
Unstable: no guarantees of having a great policy performance.
TD3 and SAC are mean to make up the deficit on unstable.

Code Format

All implementations in Spinning Up adhere to a standard template. They are split into two files: an algorithm file, which contains the core logic of the algorithm, and a core file, which contains various utilities needed to run the algorithm.

JSerenity

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spinning Up USER DOCUMENTATION

Spinning Up USER DOCUMENTATION(For review)EnvironmentsSpinning Up requires Python3, OpenAI Gym, and OpenMPI. MuJoCo(Optional but Preferred)AlgorithmsVPG, TRPO, PPO, DDPG, TD3, SAC.The On-Policy ...
复制链接

扫一扫

专栏目录