VIMA中有价值的问题 #持续更新

As the input are images of single objects, so how does the model know the relative position and distance between objects:#24

I have read this paper and it is very interesting, I assume that there are images of full scenes are input to the model. But I didn't find relevant pieces about that. All I see is that objects in the full scenes are extracted as images of single objects. How does this model know the relative position and distance between objects. Thank you very much.

Thank for your interest in our project. For object-centric representation, as mentioned in Sec.4 Tokenization, we also encode bounding box coordinates. These features are then fused with objects' image features to provide object tokens.

#感谢您对我们的项目感兴趣。对于以对象为中心的表示,如第4节“标记化”中所述,我们还对边界框坐标进行编码。然后将这些特征与对象的图像特征融合以提供对象标记。

Some questions about the input observation #38

Hi, I have a question, why do VIMA need both frontal and top-down views for the observation space,Can't just only give the top?

Hi there. For certain tasks only supplying top-down view might be suboptimal, such as Follow Order where one object is stacked on another. Additionally, due to legacy reason, we used to have tasks where frontal view is necessary to provide enough information for reasoning.、

#你好。对于某些任务,仅提供自上而下的视图可能是次优的,例如一个对象堆叠在另一个对象上的“按顺序”。此外,由于遗留的原因,我们过去的任务需要正面视图来提供足够的信息进行推理。

感谢开源关于很优秀的VIMA这项工作,我觉得还是有很多进步的空间,不然真的很难拓展:

① Vima-bench这个环境几乎没有可解释文档,不方便用户的调用。

② 整个工作流程并不闭环,好比模型的Train阶段的代码是没有的,工作并不方便拓展,不能自定义Prompt来执行任务,哪怕是在特定的task中。

参考资料:

As the input are images of single objects, so how does the model know the relative position and distance between objects · Issue #24 · vimalabs/VIMA (github.com)

Some questions about the input observation · Issue #38 · vimalabs/VIMA (github.com)

  • 20
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值