ABSTRACT
Reinforcement learning agents learn by exploring the environment and then ex-ploiting what they have learned. This frees the human trainers from having to know the preferred action or intrinsic value of each encountered state. The cost of this freedom is reinforcement learning is slower and more unstable than su-pervised learning. We explore the possibility that ensemble methods can remedy these shortcomings and do so by investigating a novel technique which harnesses the wisdom of the crowds by bagging Q-function approximator estimates.
Our results show that this proposed approach improves all three tasks and rein-forcement learning approaches attempted. We are able to demonstrate that this is
adirect result of the increased sta

本文提出了一种利用人群智慧的集合方法,通过结合多个Q-函数近似器的估计来提高深度强化学习(DQN)的稳定性和性能。研究发现,这种方法在多个任务中显著改善了强化学习的表现,同时增强了选择动作的稳定性,减少了训练过程中的不稳定性。通过借鉴群体决策的理论,该方法在不牺牲与环境交互次数的情况下,降低了训练的不稳定性。
订阅专栏 解锁全文
2万+

被折叠的 条评论
为什么被折叠?



