由智源社区主办的「智源Live 第25期丨谷歌研究科学家夏斐:基于语言和视觉模型的具身推理」将于9月7日(周三)10:00-11:00举办,谷歌研究科学家夏斐作主旨报告,敬请期待。
时间:2022年9月7日(周三上午)10:00-11:00
形式:线上直播
01
讲师简介
夏斐,目前在谷歌大脑的机器人团队任职研究科学家。他的主要研究方向是将机器人应用到非结构化的复杂环境中。他的代表工作有GibsonEnv, iGibson, SayCan等。他的研究曾被WIRED,华盛顿邮报,纽约时报等媒体报道。夏斐博士毕业于斯坦福大学,师从Silvio Savarese和Leonidas Guibas。他曾在CVPR, CoRL, IROS, ICRA, Neurips, RA-L, Nature Communications等会议和期刊发表过多篇文章。他最近的研究方向是将基础模型(Foundation Models)用于智能体的决策过程中。他的团队近期提出了PaLM-SayCan模型。
02
报告简介
大规模语言模型可以编码关于世界的大量语义知识。原则上,此类知识对于那些根据自然语言表达的高级时间扩展指令而采取行动的机器人非常有用。然而,语言模型的一个显着弱点是缺乏语境基础,这使得在给定的现实世界语境中利用它们进行决策很困难。例如,要求语言模型描述如何清理泼洒的液体可能会产生合理的叙述文本,但这对于需要在特定环境执行任务的特定智能体,如机器人,可能不适用。
对此,研究者提出了提供这种语境基础的方法,即通过预训练的行为,这些行为用于调节模型,使其能够提出可行又适合语境的自然语言行动。机器人可以充当语言模型的“手和眼睛”,而语言模型则提供有关任务的高级语义知识。研究者展示了如何将低级任务与大型语言模型相结合,以便语言模型提供有关执行复杂和时间扩展指令的过程的高级知识,而与这些任务相关的价值函数能够将这些知识所需的基础和特定的物理环境相结合。研究者在许多真实世界的机器人任务上进行了评估,证明其能够在移动机械臂上完成长视域、抽象、自然语言的指令。这一技术推动诞生了PaLM-SayCan模型,可以通过大模型引导机器人理解和完成抽象指令。(见下图)
Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could in principle be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack contextual grounding, which makes it difficult to leverage them for decision-making within a given real-world context. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment.
We propose to provide this grounding by means of pretrained behaviors, which are used to condition the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model’s “hands and eyes,” while the language model supplies high-level semantic knowledge about the task. We show how low-level tasks can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally extended instructions, while value functions associated with these tasks provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show that this approach is capable of completing long-horizon, abstract, natural language instructions on a mobile manipulator.
03
报名方式
点击下方“查看原文”或扫码进入报名页面
扫描下方二维码,根据提示申请加入兴趣交流群,与领域内海内外专家、从业者及各大高校学生交流讨论,建立联系;自由表达观点,探讨前沿课题,共享学科前沿资料信息。