UMPNet: Universal Manipulation Policy Network for Articulated Objects

概述

accepted:RAL/ICRA 2022
项目主页
文章解读参考:https://blog.csdn.net/passer__jw767/article/details/137012261
    

论文解读

工作动机

从self-guided exploration的角度出发,定于manipulation policy π 的goal为generate a sequence of actions to interact with a random articulated object which would result in novel states that haven’t been visited before.从而实现系统的learn through a self-guided exploration process, without explicit human demonstrations [23], scripted policy [27], or pre-defined goal conditions [28].
在这里插入图片描述
单步动作预测–>可变长度轨迹预测(6DoF)
提出了Arrow-of-Time(AoT)的概念,This AoT label indicates whether this action will change the object state back to the past or forward into the future.

方法架构

在这里插入图片描述
为了探索物体的新状态,系统应能够做到以下三点(a) choose the right position on the object to interact with, (b) select a proper action direction, and © consistently select actions in the following steps to explore novel states。对应了系统架构中的三个组件action position selection (a), action distance (b) and Arrow-of-Time inference © for action direction selection

  • 输入:初始和当前状态的RGBD图像, o 0 , o t ∈ R W × H × 4 o_{0},o_{t}\in\mathbb{R}^{W\times H\times4} o0,ot
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值