3D-LLM: Injecting the 3D World into Large Language Models

论文的创新点有下面四条:
1. We introduce a new family of 3D-based Large Language models (3D-LLMs) that can take 3D points with features and language prompts as input, and perform a variety of 3D-related tasks .
2. We devise novel data collection pipelines that could generate large-scale 3D-language data . Based on the pipelines, we collect a dataset that has over 300k 3D-language data that cover a diverse set of 3D-related tasks, including but not limited to 3D captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on.
3. We use a 3D feature extractor that extracts meaningful 3D features from rendered multi-view images . We utilize 2D pretrained VLMs as our backbones for efficient training. We introduce a 3D localization mechanism for training the 3D-LLMs to better capture 3D spatial information.

        4. We plan to release our 3D-LLMs, the 3D-language dataset, and language-aligned 3D features of the dataset for future research development.

使用输入3d场景的描述,使用chat gpt 生成描述语言,仔细看这个图很有意思,作者哪一个场景举例子,让gpt后续为其他场景生成描述性的语言

下面是模型的结构:其实也比较简单。

  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值