利用信息约束基元的竞争集合强化学习
- Anirudh Goyal1, Shagun Sodhani1, Jonathan Binas1, Xue Bin Peng2
- Sergey Levine2, Yoshua Bengio1y
- 1Mila, Université de Montréal
- 2University of California, Berkeley yCIFAR Senior Fellow.
Abstract
Reinforcement learning agents that operate in diverse and complex environments can benefit from the structured decomposition of their behavior. Often, this is addressed in the context of hierarchical reinforcement learning, where the aim is to decompose a policy into lower-level primitives or options, and a higher-level meta-policy that triggers the appropriate behaviors for a given situation. However, the meta-policy must still produce appropriate decisions in all states. In this work, we propose a policy design that decomposes into primitives, similarly to hierarchical reinforcement learning, but without a high-level meta-policy. Instead, each primitive can decide for themse