分布式-分布DDPG,发表在ICLR 2018
论文链接:https://arxiv.org/pdf/1804.08617.pdf
要点总结
从两个方面对DDPG进行扩展:
- Distributed:对Actor,将单一Actor扩展至多个,并行收集experience,如算法Actor部分所示
- Distributional:对Critic,将Critic由一个函数扩展成一个分布
在DDPG中:
Q π ( x , a ) = E [ ∑ t = 0 ∞ γ t r ( x t , a t ) ] where x 0 = x , a 0 = a x t ∼ p ( ⋅ ∣ x t − 1 , a t − 1 ) a t = π ( x t ) \begin{aligned} Q_{\pi}(\mathbf{x}, \mathbf{a})=\mathbb{E}\left[\sum_{t=0}^{\infty} \gamma^{t} r\left(\mathbf{x}_{t}, \mathbf{a}_{t}\right)\right] \text { where } & \mathbf{x}_{0}=\mathbf{x}, \mathbf{a}_{0}=\mathbf{a} \\ & \mathbf{x}_{t} \sim p\left(\cdot | \mathbf{x}_{t-1}, \mathbf{a}_{t-1}\right) \\ & \mathbf{a}_{t}=\pi\left(\mathbf{x}_{t}\right) \end{aligned} Qπ(x,a)=E[t=0∑∞γtr(xt,at)] where