Reinforecement Learning 论文及github仓库汇总

最新推荐文章于 2023-05-07 13:19:34 发布

JY HUA

最新推荐文章于 2023-05-07 13:19:34 发布

阅读量437

点赞数

分类专栏： RL 文章标签：深度学习

本文链接：https://blog.csdn.net/CallMeYunzi/article/details/107715292

版权

RL 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

环境：

参考github：https://github.com/openai/gym

v0和v4的env差别在于有无0.25的repeat_action_probability，前者有，后者没有。

有无ram的差别在于input，是2d image(pixel) 还是1d array。更详细的解释可参考：https://stackoverflow.com/questions/45207569/how-to-interpret-the-observations-of-ram-environments-in-openai-gym

算法：

ppo

论文：Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv:1707.06347, 2017.（https://arxiv.org/pdf/1707.06347.pdf）

参考github：https://github.com/hill-a/stable-baselines

已尝试ppo2

muzero

论文：Schrittwieser J, Antonoglou I, Hubert T, et al. Mastering atari, go, chess and shogi by planning with a learned model[J]. arXiv preprint arXiv:1911.08265, 2019.（https://arxiv.org/pdf/1911.08265.pdf）

参考github：https://github.com/werner-duvaud/muzero-general

该开源项目使用了Ray，可方便的实现多机集群实验。👍

go-explore

原论文：Ecoffet A, Huizinga J, Lehman J, et al. Go-explore: a new approach for hard-exploration problems[J]. arXiv preprint arXiv:1901.10995, 2019.（https://arxiv.org/pdf/1901.10995.pdf）

最新论文：Ecoffet A, Huizinga J, Lehman J, et al. First return then explore[J]. arXiv preprint arXiv:2004.12919, 2020.

（https://arxiv.org/abs/2004.12919）

JY HUA

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Reinforecement Learning 论文及github仓库汇总

环境：参考github：https://github.com/openai/gymv0和v4的env差别在于有无0.25的repeat_action_probability，前者有，后者没有。有无ram的差别在于input，是2d image(pixel) 还是1d array。更详细的解释可参考：https://stackoverflow.com/questions/45207569/how-to-interpret-the-observations-of-ram-environments-i
复制链接

扫一扫

专栏目录