Reinforcement learning: An introduction | Sutton R S, Barto A G. Reinforcement learning: An introduction[M]. MIT press, 2018. | 入门书籍 |
Reinforcement Learning | Wiering M A, Van Otterlo M. Reinforcement learning[J]. Adaptation, learning, and optimization, 2012, 12(3): 729. | 入门书籍 |
Q-learning | Watkins C J C H, Dayan P. Q-learning[J]. Machine learning, 1992, 8(3): 279-292. | Q-Learning算法的收敛性 |
Convergence of Q-learning: A simple proof | Melo F S. Convergence of Q-learning: A simple proof[J]. Institute Of Systems and Robotics, Tech. Rep, 2001: 1-4. | Q-Learning算法的收敛性 |
Human-level control through deep reinforcement learning | Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. nature, 2015, 518(7540): 529-533. | 提出了DQN算法 |
Policy gradient methods for reinforcement learning with function approximation | Sutton R S, McAllester D A, Singh S P, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Advances in neural information processing systems. 2000: 1057-1063. | 提出了Policy Gradient算法 |
Deterministic Policy Gradient Algorithms | Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms[C]//International conference on machine learning. PMLR, 2014: 387-395. | 提出了DPG算法 |
Continuous control with deep reinforcement learning | Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015. | 提出了DDPG算法 |
Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems | Matignon L, Laurent G J, Le Fort-Piat N. Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems[J]. The Knowledge Engineering Review, 2012, 27(1): 1-31. | 汇总了Multi-Agent RL相较于Single-Agent RL的难点 |
Multi-agent actor-critic for mixed cooperative-competitive environments | Lowe R, Wu Y I, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[J]. Advances in neural information processing systems, 2017, 30. | 提出了MADDPG算法 |
Trust region policy optimization | Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization[C]//International conference on machine learning. PMLR, 2015: 1889-1897. | 提出了TRPO算法 |
Proximal policy optimization algorithms | Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv:1707.06347, 2017. | 提出了PPO算法 |
Soft Actor-Critic: Off-Policy Entropy Deep Reinforcement Learning with a Stochastic Actor | Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. ICML. | 提出了Soft Actor-Critic算法 |
Actor-Attention-Critic for Multi-Agent Reinforcement Learning | Iqbal, S., & Sha, F. (2019). Actor-Attention-Critic for Multi-Agent Reinforcement Learning. ICML. | 探讨了在强化学习中引入Attention机制 |
Counterfactual Multi-Agent Policy Gradients | Foerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual Multi-Agent Policy Gradients. AAAI. | 提出了COMA算法 |
Mean Field Multi-Agent Reinforcement Learning | Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., & Wang, J. (2018). Mean Field Multi-Agent Reinforcement Learning. ArXiv, abs/1802.05438. | 提出了MFRL算法 |
Actor-Attention-Critic for Multi-Agent Reinforcement Learning | Iqbal, S., & Sha, F. (2019). Actor-Attention-Critic for Multi-Agent Reinforcement Learning. ICML. | 引入了Attention机制 |