Paper Reading
文章平均质量分 75
hanjialeOK
世上无难事,只怕有心人!
展开
-
[paper reading] IMPALA: V-trace
https://arxiv.org/abs/1802.01561IMPALA 原作者代码:https://github.com/deepmind/scala ble_agent考虑一个由策略 μ\muμ 产生的经验轨迹 (xt,at,rt)t=st=s+n(x_t, a_t, r_t)_{t=s}^{t=s+n}(xt,at,rt)t=st=s+n。定以 n 步 V-trace 目标如下vs=defV(xs)+∑t=ss+n−1γt−s(∏i=st−1ci)δtVv_s\overset{def}{=原创 2022-07-13 15:53:00 · 439 阅读 · 0 评论 -
[paper reading] Trust Region Policy Optimization
Trust Region Policy Optimizationhttps://arxiv.org/pdf/1502.05477.pdf有保证的单调递增策略梯度指出问题没啥问题,直接提出单调递增策略梯度理论证明(S,A,P,r,ρ0,γ)(\mathcal{S}, \mathcal{A}, P, r, \rho_0, \gamma)(S,A,P,r,ρ0,γ) ρ0\rho_0ρ0 : 初始状态 s0s_0s0 的分布 那么期望折扣回报可以表示为η(π)=Es0,a0,…[∑t=0∞原创 2022-05-10 15:24:03 · 234 阅读 · 0 评论 -
[ICML 2018] RLlib: Abstractions for Distributed Reinforcement Learning
https://arxiv.org/abs/1712.09381Introductionmany of the challenges in reinforcement learning stem from the need to scale learning and simulation while also integrating a rapidly increasing range of algorithms and models. many of the frameworks used by th原创 2021-11-07 20:58:28 · 206 阅读 · 0 评论 -
[ICML 2015] Massively Parallel Methods for Deep Reinforcement Learning
http://arxiv.org/abs/1507.04296IntroductionExisting work on distributed deep learning has focused exclusively on supervised and unsupervised learning. In this paper we develop a new architecture for the reinforcement learning paradigm, which we called Go原创 2021-10-31 23:13:39 · 173 阅读 · 0 评论 -
2015 - Deep recurrent q-learning for partially observable MDPs
地址:https://arxiv.org/abs/1507.06527文章介绍了 POMDP(partially observation MDP),进而引入 FlickeringAtari Games,游戏中的每一帧有 0.5 的概率是模糊的,不可观测,有 0.5 的概率是完整清晰的。 MDP POMDP 文章提出使用 LSTM 替代全连接层,架构如下:训练时和 DQN 相同,使用连续 4 帧图片,MDP 过程,所以其表现不会优于 DQN。...原创 2021-08-30 15:52:15 · 280 阅读 · 0 评论 -
2017 - Revisiting the Arcade Learning Environment
地址:https://arxiv.org/abs/1709.06009在第 12 页,文章提出了粘性动作(sticky actions),目的是向 ALE(The Arcade Learning Environment) 游戏中加入随机性(stochasticity),并且评估智能体的健壮性(robustness)。粘性动作是指,每次环境执行的动作(At−1A_{t-1}At−1)有 ζ\zetaζ 的概率是上一步动作(at−1a_{t-1}at−1),有 1−ζ1-\zeta1−ζ 的概率是此次智原创 2021-08-28 16:13:26 · 282 阅读 · 0 评论 -
2015 - Human-level control through deep reinforcement learning
地址:https://www.nature.com/articles/nature14236原创 2021-08-27 16:59:31 · 121 阅读 · 0 评论