Learning Real-World Robot Policies by Dreaming 论文速读

最新推荐文章于 2024-03-17 00:12:26 发布

hehedadaq

最新推荐文章于 2024-03-17 00:12:26 发布

阅读量211

点赞数

分类专栏：机器人机器视觉 DRL 文章标签： VAE model-based 强化学习 DRL RL

本文链接：https://blog.csdn.net/hehedadaq/article/details/115343594

版权

DRL 同时被 3 个专栏收录

33 篇文章 16 订阅

订阅专栏

机器视觉

17 篇文章 6 订阅

订阅专栏

机器人

12 篇文章 2 订阅

订阅专栏

Learning Real-World Robot Policies by Dreaming 论文速读

文章目录

Learning Real-World Robot Policies by Dreaming 论文速读

前言：

好久没有细看论文了，最近好奇一个新的领域，搜了十几篇文章。
但是如果认真看，时间肯定来不及，因此整一个速读。
康康能不能两小时整一篇比较感兴趣的文章。
模板直接借用刘嘉俊大佬的。

title: Learning Real World Robot Policies by Dreaming

Paper: http://arxiv.org/abs/1805.07813

Website: https://piergiaj.github.io/robot-dreaming-policy/

Keywords
data efficiency, real-world, dreaming model(world model)

Main Idea

设计了一个dreaming model，使机器人在其中进行interaction，而不是直接和real-world。
能够处理没有见过的（unseen）场景，这个就有点意思了。
任务场景：
任务场景1为导航到目标点
任务场景2为避开目标点。
总共就一两米的场景，接近0.2米内就算成功，这任务也太…

预训练过程：

we collect a dataset consisting of 40,000 images (400 random trajectories)

训练：

except initial random action policy samples in all
our experiments

和model-based的区别，用作者的原话：

We use “dreaming” to refer to far more than just model-based RL. What our “dreaming” model does is learns a state-transition model that we can randomly sample previously unseen trajectories from (i.e. what we call dreaming).

Dreaming Model 由 FCNN, VAE, action-conditioned future regressor(ACFR)构成。

ACFR: 可以模拟机器人执行指令action之后的state变化。这就意味着，Dreaming Model相较于之前的Model-based方法，引入了 imagined trajectories 来代替之前的 real trajectories，这也是作者 use the word ‘dreaming’ rather than ‘model-based’ 的用意。详见reddit上的debate.

以下是dreaming生成的imagine trajectories的可视化：

在这里插入图片描述

It is really awesome, isn’t it?

信息流图

那我们接下来看一下如此marvelous的dreaming是如何实现的吧！
在这里插入图片描述

利用VAE对state图像进行表征，而不是简单的自编码器，因此有一定的生成能力，能处理没有见过的场景。但是缺点是生成的图片太模糊。
那我们接下来看一下如此marvelous的dreaming是如何实现的吧！

在这里插入图片描述

Opinion
其实我一直认为像VAE，GAN这种生成网络是可以用于RL提升data efficiency的，这篇文章确实在像这个方向做，但是GAN本身在实际使用时训练时间过长，消耗大量资源，所以对RL来说是利是弊还得看具体使用。

利用VAE对state图像进行表征；
创建一个state-transition model，以 $s_t, a_t$ 作为输入，以 $s_{t+1}$ 作为输出，使其成为action-conditioned $s_{t+1}=f(s_t, a_t)=F(s_t, G(a_t))$
总loss： $L_{VAE}+ \gamma* L_{f}$

不同的实验设置：

没什么可说的，这个思路还行，但是效果不够吸引我。

效果：

图都不想贴了~

联系方式：

ps: 欢迎做强化的同学加群一起学习：

深度强化学习-DRL：799378128

欢迎关注知乎帐号：未入门的炼丹学徒

CSDN帐号：https://blog.csdn.net/hehedadaq

极简spinup+HER+PER代码实现：https://github.com/kaixindelele/DRLib

hehedadaq

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
Learning Real-World Robot Policies by Dreaming 论文速读

Learning Real-World Robot Policies by Dreaming 论文速读文章目录Learning Real-World Robot Policies by Dreaming 论文速读前言：title: Learning Real World Robot Policies by DreamingMain Idea和model-based的区别，用作者的原话：信息流图不同的实验设置：效果：联系方式：前言：好久没有细看论文了，最近好奇一个新的领域，搜了十几篇文章。但是如果认真
复制链接

扫一扫