Visual Reinforcement Learning with Imagined Goals

这篇文章的核心使用Variational Autoencoder配合高斯分布将图像转换到另一个空间下。使用编码器encoder的输出结果作为状态和目标。这种编码方式优于欧式空间的度量方法,称之为latent space。使用Variational Autoencoder的好处如下:

  • Provides a space where distances are more meaningful, and thus allows use of a well-structured reward function (ex. distance between encodings)
  • Inputs to the reinforcement learning network are structured (不理解)
  • New states can be sampled from the decoder output, allowing automated synthetic goal creation during training to allow the goal-conditioned policy to practice diverse policies

算法的流程如下:

  1. state observations are collected by random exploration of the environment (使用随机的策略收集状态观测)
  2. a variational autoencoder is trained from these observations (训练VA)
  3. latent encodings for each state are obtained from the variational autoencoder(得到在laten space下的状态和目标)
  4. (goal, state) encodings are sampled from existing set (采样(s,a,r,s‘,g))
  5. a reinforcement learning algorithm is trained on latent encodings (基于Q-learning的都可以)
  6. repeat steps 4–5 with the following conditions:
  7. 6.1) periodically retrain the autoencoder with newly generated image spaces.(间断性的重新训练VA,不同的状态下目标是有所变化的)
  8. 6.2)Generate new goals by feeding goal images through variational autoencoder.(生成新的目标)

 

https://towardsdatascience.com/ai-research-deep-dive-visual-reinforcement-learning-with-imagined-goals-862115d122a6

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值