Don’t Touch What Matters: Task-Aware Lipschitz Data Augmentationfor Visual Reinforcement Learning

最新推荐文章于 2024-07-19 17:40:35 发布

Love Q

最新推荐文章于 2024-07-19 17:40:35 发布

阅读量480

点赞数

分类专栏：论文分享文章标签：论文阅读

本文链接：https://blog.csdn.net/LoveQR1/article/details/127893696

版权

摘要：

One of the key challenges in visual Reinforcement Learning (RL) is to learn policies that can generalize to unseen environments. Recently, data augmentation techniques aiming at enhancing data diversity have demonstrated proven performance in improving the generalization ability of learned policies. However, due to the sensitivity of RL training, naively applying data augmentation, which transforms each pixel in a task-agnostic manner, may suffer from instability and damage the sample efficiency, thus further exacerbating the generalization performance. At the heart of this phenomenon is the diverged action distribution and high-variance value estimation in the face of augmented images. To alleviate this issue, we propose Task-aware Lipschitz Data Augmentation (TLDA) for visual RL, which explicitly identifies the task-correlated pixels with large Lipschitz constants, and only augments the task-irrelevant pixels. To verify the effectiveness of TLDA, we conduct extensive experiments on DeepMind Control suite, CARLA and DeepMind Manipulation tasks, showing that TLDA improves both sample efficiency in training time and generalization in test time. It outperforms previous state-of-the-art methods across the 3 different visual control benchmarks1.

视觉强化学习(RL)的关键挑战之一是学习能够推广到看不见的环境的策略。近年来，旨在增强数据多样性的数据增强技术在提高学习策略的泛化能力方面表现出了良好的性能。然而，由于RL训练的敏感性，幼稚地应用数据增强以任务无关的方式对每个像素进行变换，可能会出现不稳定，损害样本效率，从而进一步加剧泛化性能。这种现象的核心是面对放大的图像时动作分布的分歧和高方差值估计。为了缓解这一问题，我们提出了一种基于任务的Lipschitz数据增强算法(TLDA)，它显式地识别出具有较大Lipschitz常数的与任务相关的像素，并且只增加与任务无关的像素。为了验证TLDA的有效性，我们在DeepMind Control Suite、Carla和DeepMind操作任务上进行了大量的实验，结果表明TLDA在训练时间和测试时间上都提高了样本效率和泛化能力。在3个不同的视觉控制基准中，它的表现优于以前最先进的方法1。

问题

在视觉RL中，存在一个两难境地：大量的数据增强对于更好的泛化是至关重要的，但它会导致样本效率和训练稳定性的显著降低。
原因
- 数据增强通常执行像素级图像转换，其中每个像素以与任务无关的方式进行转换。
- 然而，在视觉RL中，观察中的每个像素与任务和奖励函数具有不同的相关性。

最低0.47元/天解锁文章

Love Q

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
Don’t Touch What Matters: Task-Aware Lipschitz Data Augmentationfor Visual Reinforcement Learning

Don’t Touch What Matters: Task-Aware Lipschitz Data Augmentationfor Visual Reinforcement Learning 论文分享
复制链接

扫一扫