Asynchronous Methods for Deep Reinforcement Learning (A3C) (ICML, 2016)
1. Introduction
- On-line DRL
- problems
- related samples lead to unstable
- solution
- ER
- only for off-policy alg
- resource cost
- Asynchronous Methods
- ER
- problems
4. Asynchronous RL Framework
- Design goal
- train deep neural network policies reliably and without large resource requirements.
- Main ideas
- asynchronous actor-learners
- parallel running(to explorate different parts of envs)
- Techs
- accumulate gradients
- n-step bootstrap
- parameters sharing
- exploration
- different ϵ \epsilon ϵ
- entropy for A3C
5. Experiments
- Atari
- n-step is faster
- A3C outperformance others
- Scalability (可拓展性)
- improvement over actors
- n-step gets more speed up ratio
- perhaps due to the reduction of bias
- Robustness
- there is usually a range of good learning rate