文章目录
概述
accepted:RAL/ICRA 2022
项目主页
文章解读参考:https://blog.csdn.net/passer__jw767/article/details/137012261
论文解读
工作动机
从self-guided exploration的角度出发,定于manipulation policy π 的goal为generate a sequence of actions to interact with a random articulated object which would result in novel states that haven’t been visited before.从而实现系统的learn through a self-guided exploration process, without explicit human demonstrations [23], scripted policy [27], or pre-defined goal conditions [28].
单步动作预测–>可变长度轨迹预测(6DoF)
提出了Arrow-of-Time(AoT)的概念,This AoT label indicates whether this action will change the object state back to the past or forward into the future.
方法架构
为了探索物体的新状态,系统应能够做到以下三点(a) choose the right position on the object to interact with, (b) select a proper action direction, and © consistently select actions in the following steps to explore novel states。对应了系统架构中的三个组件action position selection (a), action distance (b) and Arrow-of-Time inference © for action direction selection
- 输入:初始和当前状态的RGBD图像, o 0 , o t ∈ R W × H × 4 o_{0},o_{t}\in\mathbb{R}^{W\times H\times4} o0,ot∈