【论文翻译】Open X-Embodiment: Robotic Learning Datasets and RT-X Models【未完待续】

开头是个人理解,仅供参考:

机器学习领域的agent要呈现出好的性能,有三个方面:数据、模型和知识。

  • 数据是为了让模型在训练阶段尽量能够把将来所有可能遇到的场景都经历过,所以训练数据越多越丰富越好,训练数据决定了agent的能力的上限;
  • 模型的目的是为了让agent的能力容量更大,让agent在灌入这些数据之后学习到的能力能够尽量逼近这个上限;
  • 知识是为了1)从繁杂的感知数据中去除杂乱的信息,充分利用有用的信息;2)提高训练的效率。知识可以是固定的常识,也可以是从感知数据中不断学习到的知识(此时这个过程其实也可以看作是模型的一部分)。知识可以有很多种形式来组织,这就是其中可以出idea的地方之一。其中因果发现和因果推理就是其中的一种(因果表示学习,因果发现,干预因果表示学习,深度学习只是关联,没有因果,所以在他熟悉的场景中,深度学习性能会不错,但是在他陌生的场景中性能就很差了,但是其实可能陌生的场景其中的因果规律是一样的,但是深度学习抓取不到这其中的因果,导致能力不能在陌生的场景中形成很好的泛化)。

具身智能的训练其实数据并没有向视觉那么丰富,今天要翻译的这篇论文主要是在针对具身智能的数据方面。

论文链接如下:

https://robotics-transformer-x.github.io/paper.pdf

0.Abstract

Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models,with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train “generalist” X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms.
在不同数据集上训练的大型高容量模型在有效处理下游应用程序方面取得了显著成功。在从NLP到计算机视觉的领域,这导致了预训练模型的整合,一般的预训练backbones作为许多应用程序的起点。这样的整合会发生在机器人领域吗?传统上,机器人学习方法为每个应用程序、每个机器人甚至每个环境训练一个单独的模型。我们是否可以训练“通才”X机器人策略,使其能够有效地适应新的机器人,任务和环境?在本文中,我们提供了标准化数据格式和模型的数据集,以便在机器人操作的背景下探索这种可能性,以及提供有效X机器人策略示例的实验结果。我们从21个机构合作收集的22个不同机器人中收集了一个数据集,展示了527项技能(160266任务)。我们表明,在这些数据上训练的高容量模型,我们称之为RT-X,表现出正转移,并通过利用其他平台的经验来提高多个机器人的能力。

I. INTRODUCTION

A central lesson from recent advances in machine learning and artificial intelligence is that large-scale learning from broad and diverse datasets can enable capable AI systems by providing for general-purpose pretrained models. In fact, large-scale general-purpose models typically trained on large and diverse datasets can often outperform their narrowly targeted counterparts trained on smaller but more task-specific data. For instance, open-vocabulary image classifiers (e.g., CLIP [1]) trained on large datasets scraped from the web tend to outperform fixed-vocabulary models trained on more limited datasets, and large language models [2, 3] trained on massive text corpora tend to outperform systems that are only trained on narrow task-specific datasets. Increasingly, the most effective way to tackle a given narrow task (e.g., in vision or NLP) is to adapt a general-purpose model. However, these lessons are difficult to apply in robotics: any single robotic domain might be too narrow, and while computer vision and NLP can leverage large datasets sourced from the web, comparably large and broad datasets for robotic interaction are hard to come by. Even the largest data collection efforts still end up with datasets that are a fraction of the size and diversity of benchmark datasets in vision (5-18M) [4, 5] and NLP (1.5B-4.5B) [6, 7]. Perhaps more importantly, such datasets are often still narrow along some axes of variation, either focusing on a single environment, a single set of objects, or a narrow range of tasks. How can we overcome these challenges in robotics and move the field of robotic learning toward the kind of large data regime that has been so successful in other domains? 机器学习和人工智能最新进展的让我们认识到,从广泛而多样的数据集进行大规模学习可以通过提供通用的预训练模型来实现有能力的人工智能系统。事实上,通常在大型和多样化数据集上训练的大规模通用模型往往比在较小但更特定于任务的数据上训练的窄目标模型表现更好。例如,在从网络搜集的大数据集上训练的开放词汇图像分类器(例如,CLIP [1])往往优于在更有限的数据集上训练的固定词汇模型,并且在大规模文本语料库上训练的大语言模型[2,3]往往优于仅在狭窄的特定任务数据集上训练的系统。越来越多的情况下,解决特定狭窄任务(例如,在视觉或NLP中)的最有效方法是采用通用模型。然而,这些经验很难应用于机器人学:任何单一的机器人领域都可能太窄,尽管计算机视觉和NLP可以利用来自网络的大数据集,但用于机器人交互的相对大而广的数据集很难获得。即使是最大规模的数据收集工作,最终得到的数据集仍然只是vision (5-18M) [4,5]和NLP (1.5B-4.5B) [6,7]
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值