活动报名：斯坦福&谷歌，视频生成框架WonderJourney，吴佳俊学生俞洪兴分享

智源社区

于 2023-12-22 16:45:24 发布

阅读量144

点赞数

文章标签：人工智能

原文链接：https://mp.weixin.qq.com/s?__biz=MzU5ODg0MTAwMw==&mid=2247542491&idx=2&sn=775119258c3b83fac83372200a0c8f94&chksm=ff15e91eecd4b693b87303e29ac303eecffde2fed7eb2a284e30ea47379cd6299e38b1a8687d&scene=126&sessionid=0

版权

报告主题：WonderJourney，创造属于你的开放式三维世界

报告日期：12月28日（周四）11:00-12:00

主题简介：

你是否也曾好奇《爱丽丝梦游仙境》中的种种奇幻经历，但却难以仅从文字或插图中想象？在这次演讲中，我将介绍我们近期的工作，“WonderJourney”。从一张图片或一段文字出发，WonderJourney能合成一系列多样且自然连接的3D场景，让用户能够看到一个独特的“Wonderland”。WonderJourney是一个用于持续生成3D场景（Perpetual 3D scene generation）的模块化框架。与之前专注于单一场景类型的视角生成工作不同，我们从任何用户提供的位置（通过文本描述或图片）出发，生成一条穿越一系列多样但又连贯相接的3D场景的旅程。我们利用一个大型语言模型（LLM）来生成这次旅程中场景的文本描述，一个基于文本驱动的点云生成流程来创造引人入胜且连贯的3D场景序列，以及一个大型的视觉语言模型（VLM）来验证生成的场景。我们展示了各种场景类型和风格上引人注目、多样化的视觉结果，形成了想象中的“奇幻旅程“（”wonderjourney”）。

结果可以在项目网站上浏览：https://kovenyu.com/wonderjourney/

Have you ever wonder what Alice saw in her adventure in the Wonderland, but struggled to imagine it solely through the text or illustrations? In this talk, I will introduce “WonderJourney: Going from Anywhere to Everywhere”. From a single image or text, WonderJourney synthesizes a long series of diverse yet naturally connected 3D scenes, giving the user a unique experience of seeing a “wonderland”. WonderJourney is a modularized framework for perpetual 3D scene generation. Unlike prior work on view generation that focuses on a single type of scenes, we start at any user-provided location (by a text description or an image), and generate a journey through a long sequence of diverse yet coherently connected 3D scenes. We leverage an LLM to generate textual descriptions of the scenes in this journey, a text-driven point cloud generation pipeline to make a compelling and coherent sequence of 3D scenes, and a large VLM to verify the generated scenes. We show compelling, diverse visual results across various scene types and styles, forming imaginary “wonderjourneys”. See our results at: https://kovenyu.com/wonderjourney/

报告嘉宾：

俞洪兴（Hong-Xing “Koven” Yu），斯坦福大学四年级博士生，导师为吴佳俊教授。他的研究兴趣为机器感知，主要包括物理场景理解（physical scene understanding），动态模型（dynamics models），以及视觉生成模型（visual generative models）。他曾多次获得中国国家奖学金，斯坦福大学 SoE 奖学金，Qualcomm 奖学金，两次获得 Nvidia 奖学金提名，Meta 奖学金提名，以及 SIGGRAPH Asia 最佳论文奖。

关注俞洪兴：https://kovenyu.com/

扫描下方二维码

或点击「阅读原文」报名

智源社区

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
活动报名：斯坦福&谷歌，视频生成框架WonderJourney，吴佳俊学生俞洪兴分享

报告主题：WonderJourney，创造属于你的开放式三维世界报告日期：12月28日（周四）11:00-12:00主题简介：你是否也曾好奇《爱丽丝梦游仙境》中的种种奇幻经历，但却难以仅从文字或插图中想象？在这次演讲中，我将介绍我们近期的工作，“WonderJourney”。从一张图片或一段文字出发，WonderJourney能合成一系列多样且自然连接的3D场景，让用户能够看到一个独特的“Wond...
复制链接

扫一扫