1.High-throughput architectures(高通量的架构)
Distributed Prioritized Experience Replay (Ape-X)
Apex论文和实现
DQN、DDPG和QMIX (APEX_DQN、APEX_DDPG、APEX_QMIX)的Ape-X变量使用一个GPU学习器和许多CPU worker来收集实验。实验收集可以扩展到数百个CPU worker,因为实验的分布式优先级高于存储在重播缓冲区。
Tuned examples: PongNoFrameskip-v4, Pendulum-v0, MountainCarContinuous-v0, {BeamRider,Breakout,Qbert,SpaceInvaders}NoFrameskip-v4.
Atari results @10M steps: more details
Scalability:
Ape-X 说明配置信息(详细信息请参考训练API)