强化学习重点文献汇总

理论

文献名引用信息备注
Reinforcement learning: An introductionSutton R S, Barto A G. Reinforcement learning: An introduction[M]. MIT press, 2018.入门书籍
Reinforcement LearningWiering M A, Van Otterlo M. Reinforcement learning[J]. Adaptation, learning, and optimization, 2012, 12(3): 729.入门书籍
Q-learningWatkins C J C H, Dayan P. Q-learning[J]. Machine learning, 1992, 8(3): 279-292.Q-Learning算法的收敛性
Convergence of Q-learning: A simple proofMelo F S. Convergence of Q-learning: A simple proof[J]. Institute Of Systems and Robotics, Tech. Rep, 2001: 1-4.Q-Learning算法的收敛性
Human-level control through deep reinforcement learningMnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. nature, 2015, 518(7540): 529-533.提出了DQN算法
Policy gradient methods for reinforcement learning with function approximationSutton R S, McAllester D A, Singh S P, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Advances in neural information processing systems. 2000: 1057-1063.提出了Policy Gradient算法
Deterministic Policy Gradient AlgorithmsSilver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms[C]//International conference on machine learning. PMLR, 2014: 387-395.提出了DPG算法
Continuous control with deep reinforcement learningLillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015.提出了DDPG算法
Independent reinforcement learners in cooperative markov games: a survey regarding coordination problemsMatignon L, Laurent G J, Le Fort-Piat N. Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems[J]. The Knowledge Engineering Review, 2012, 27(1): 1-31.汇总了Multi-Agent RL相较于Single-Agent RL的难点
Multi-agent actor-critic for mixed cooperative-competitive environmentsLowe R, Wu Y I, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[J]. Advances in neural information processing systems, 2017, 30.提出了MADDPG算法
Trust region policy optimizationSchulman J, Levine S, Abbeel P, et al. Trust region policy optimization[C]//International conference on machine learning. PMLR, 2015: 1889-1897.提出了TRPO算法
Proximal policy optimization algorithmsSchulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv:1707.06347, 2017.提出了PPO算法
Soft Actor-Critic: Off-Policy Entropy Deep Reinforcement Learning with a Stochastic ActorHaarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. ICML.提出了Soft Actor-Critic算法
Actor-Attention-Critic for Multi-Agent Reinforcement LearningIqbal, S., & Sha, F. (2019). Actor-Attention-Critic for Multi-Agent Reinforcement Learning. ICML.探讨了在强化学习中引入Attention机制
Counterfactual Multi-Agent Policy GradientsFoerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual Multi-Agent Policy Gradients. AAAI.提出了COMA算法
Mean Field Multi-Agent Reinforcement LearningYang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., & Wang, J. (2018). Mean Field Multi-Agent Reinforcement Learning. ArXiv, abs/1802.05438.提出了MFRL算法
Actor-Attention-Critic for Multi-Agent Reinforcement LearningIqbal, S., & Sha, F. (2019). Actor-Attention-Critic for Multi-Agent Reinforcement Learning. ICML.引入了Attention机制

应用

Deep Reinforcement Learning for Internet of Things: A Comprehensive SurveyChen, W., Qiu, X., Cai, T., Dai, H., Zheng, Z., & Zhang, Y. (2021). Deep Reinforcement Learning for Internet of Things: A Comprehensive Survey. IEEE Communications Surveys & Tutorials, 23, 1659-1692.综述:强化学习的主流算法,强化学习在UAV(unmanned aerial vehicle), MEC(mobile edge computing), packet routing等方面的应用
3D UAV Trajectory Design and Frequency Band Allocation for Energy-Efficient and Fair Communication: A Deep Reinforcement Learning ApproachR. Ding, F. Gao and X. S. Shen, “3D UAV Trajectory Design and Frequency Band Allocation for Energy-Efficient and Fair Communication: A Deep Reinforcement Learning Approach,” in IEEE Transactions on Wireless Communications, vol. 19, no. 12, pp. 7796-7809, Dec. 2020, doi: 10.1109/TWC.2020.3016024.DDPG算法应用无人机通信资源分配+路径规划
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值