SQA3D:SITUATED QUESTION ANSWERING IN 3D SCENES
embodied scene understanding
SQA3D任务:给一个3d场景,可以是3D scan, ego-centric video, or bird-eye view (BEV) picture,智能体需要理解和定位自己的状态,包括位置,方向等。
从空间关系到导航、常识推理、跳读推理
In total, SQA3D comprises 20.4k descriptions of 6.8k unique situations collected from 650 ScanNet scenes and 33.4k questions about these situations
从650张ScanNet场景,6.8K唯一状态、20.4k描述,33.4K问题
不同数据集对比:
数据收集:
数据集描述:
模型:
对比结果: