下周二直播｜谷歌DeepMind&UIUC：决策智能，基于强化学习的VLM后训练

智源社区

于 2024-07-28 11:01:01 发布

阅读量63

点赞数

文章标签：算法人工智能

原文链接：https://mp.weixin.qq.com/s?__biz=MzU5ODg0MTAwMw==&mid=2247551030&idx=1&sn=83effaa1c87a83e50f8475fda9fb8cd6&chksm=ffe33ef2d0f09154e80fc632877d410f6d8436132f6ac946a20608dc94b0977a13171cdad8b8&scene=126&sessionid=0

版权

报告主题：决策智能：基于强化学习的VLM后训练

报告日期：7月30日（周二）10:30-11:30

报告要点：

如何让VLM解决in-the-wild decision making任务？本次讲座会详细地讨论这个问题，从环境和算法上给出解决方案，并会讨论如何用这些方案解决现实问题，例如自动完成设备控制任务（如自动在手机上购物）。本次讲座将细致讨论用VLM解决in-the-wild决策任务的根本挑战，例如如何解决in-the-wild任务中出现的随机性造成的observation的变化，以及为什么目前的主流方法在这样的任务上效果有限（Prompting和SFT）。这引出了为什么需要使用强化学习来解决in-the-wild decision making任务，一个理想的强化学习算法应该具有怎样的特征。在方法部分，本次talk将讨论我们使用了怎样的方法来实践并行的环境、可靠的reward和有效的算法（automatic curriculum + doubly robust estimator + hard AWR）。本次讲座最后会介绍我们方法的性能（超越GPT-4V与SFT 40个点），以及通过一些case study说明强大的性能是如何得到的。

How to make VLMs suitable for in-the-wild decision making via RL? This talk comprehensively discusses about this problem and provides solutions from environment level to algorithm level, with a real-life application on digital agents. This talk will cover fundamental challenges for training VLMs on in-the-wild decision-making tasks, like stochasticity, non-stationarity, distracting factors, etc, and why existing methods like prompting and supervised fine-tuning (SFT) fails to solve these problems. It will then cover why autonomous reinforcement learning can solve the challenges, and why both environment and algorithm scalability matters. This talk then presents our parallel environment, and RL algorithm that utilizes automatic curriculum and doubly robust estimator on hard advantage-weighted regression. This talk will also show results on why this approach significantly outperforms both prompting (>40% better than GPT-4V) and SFT, and through qualitative case study show how this is achieved.

报告嘉宾：

白昊是UIUC的计算机科学硕士生，同时是UC Berkeley的访问学者，师从Sergey Levine与马毅。他的主要研究方向为通过建造智能且可靠的机器代理解决现实世界问题，通常包含对于基座模型的表征学习以及开发第一性的强化学习算法。他在浙江大学完成本科教育，且曾在MSRA实习。他曾在JMLR/EMNLP/WSDM等顶级会议发表多篇有影响力的工作。

Jack Bai is a first-year M.S. student in Computer Science at the University of Illinois at Urbana-Champaign, and a visiting scholar at UC Berkeley under Prof. Sergey Levine. His research focuses on building intelligent and reliable machine agents that solve real-world tasks, which includes (1) representation learning for foundation models and (2) developing principled reinforcement learning algorithms. He was previously a visiting scholar with Prof. Yi Ma and a research assistant for Prof. Heng Ji and Prof. Chengxiang Zhai. Jack holds a dual undergraduate degree in Computer Engineering from UIUC and Zhejiang University. During his undergrad, he interned at Microsoft Research Asia (DKI Group), mentored by Dr. Shilin He. He has published several influential papers in top-tier machine learning conferences and journals, such as EMNLP, WSDM, and JMLR.