3D-LLM: Injecting the 3D World into Large Language Models

最新推荐文章于 2024-07-08 11:38:25 发布

weixin_42762536

最新推荐文章于 2024-07-08 11:38:25 发布

阅读量142

点赞数 2

文章标签：语言模型人工智能自然语言处理

本文链接：https://blog.csdn.net/weixin_42762536/article/details/139672086

版权

论文的创新点有下面四条：

1. We introduce a new family of 3D-based Large Language models (3D-LLMs) that can take 3D points with features and language prompts as input, and perform a variety of 3D-related tasks .

2. We devise novel data collection pipelines that could generate large-scale 3D-language data . Based on the pipelines, we collect a dataset that has over 300k 3D-language data that cover a diverse set of 3D-related tasks, including but not limited to 3D captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on.

3. We use a 3D feature extractor that extracts meaningful 3D features from rendered multi-view images . We utilize 2D pretrained VLMs as our backbones for efficient training. We introduce a 3D localization mechanism for training the 3D-LLMs to better capture 3D spatial information.

4. We plan to release our 3D-LLMs, the 3D-language dataset, and language-aligned 3D features of the dataset for future research development.

使用输入3d场景的描述，使用chat gpt 生成描述语言，仔细看这个图很有意思，作者哪一个场景举例子，让gpt后续为其他场景生成描述性的语言

下面是模型的结构：其实也比较简单。

weixin_42762536

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
1
评论
3D-LLM: Injecting the 3D World into Large Language Models

使用输入3d场景的描述，使用chat gpt 生成描述语言，仔细看这个图很有意思，作者哪一个场景举例子，让gpt后续为其他场景生成描述性的语言。下面是模型的结构：其实也比较简单。
复制链接

扫一扫

3D-LLM: Injecting the 3D World into Large Language Models

“相关推荐”对你有帮助么？