论文记录：Automatic Data Augmentation for Generalization in Reinforcement Learning

最新推荐文章于 2024-10-13 23:49:20 发布

Love Q

最新推荐文章于 2024-10-13 23:49:20 发布

阅读量843

点赞数 2

分类专栏：论文分享文章标签：人工智能机器学习

本文链接：https://blog.csdn.net/LoveQR1/article/details/127281852

版权

摘要：

Deep reinforcement learning (RL) agents often fail to generalize beyond their training environments. To alleviate this problem, recent work has proposed the use of data augmentation. However, different tasks tend to benefit from different types of augmentations and selecting the right one typically requires expert knowledge. In this paper, we introduce three approaches for automatically finding an effective augmentation for any RL task. These are combined with two novel regularization terms for the policy and value function, required to make the use of data augmentation theoretically sound for actor-critic algorithms. Our method achieves a new state-of-the-art on the Procgen benchmark and outperforms popular RL algorithms on DeepMind Control tasks with distractors. In addition, our agent learns policies and representations which are more robust to changes in the environment that are irrelevant for solving the task, such as the background.

深度强化学习(RL)agent通常无法在其训练环境之外进行泛化。为了缓解这一问题，最近的工作提出了使用数据增强。然而，不同的任务往往受益于不同类型的增强，而选择正确的增强通常需要专业知识。在本文中，我们介绍了三种自动寻找任何RL任务的有效扩充的方法。这些与用于策略和价值函数的两个新的正则化术语相结合，所需的是使演员-批评者算法在理论上合理地使用数据增强。我们的方法在Procgen基准测试上达到了最新的水平，并在有干扰的DeepMind Control任务上超过了流行的RL算法。此外，我们的代理学习对环境变化更健壮的策略和表示，这些变化与解决任务无关，例如背景。实现：https://github.com/rraileanu/auto-drac.