一. 开山鼻祖DQN
-
Playing Atari with Deep Reinforcement Learning,V. Mnih et al., NIPS Workshop, 2013.
-
Human-level control through deep reinforcement learning, V. Mnih et al., Nature, 2015.
二. DQN的各种改进版本(侧重于算法上的改进)
-
Dueling Network Architectures for Deep Reinforcement Learning. Z. Wang et al., arXiv, 2015.
-
Prioritized Experience Replay, T. Schaul et al., ICLR, 2016.
-
Deep Reinforcement Learning with Double Q-learning, H. van Hasselt et al., arXiv, 2015.
-
Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al., IJCAI Deep RL Workshop, 2016.
-
Deep Exploration via Bootstrapped DQN, I. Osband et al., arXiv, 2016.
-
Learning functions across many orders of magnitudes,H Van Hasselt,A Guez,M Hessel,D Silver
-
Massively Parallel Methods for Deep Reinforcement Learning, A. Nair et al., ICML Workshop, 2015.
-
State of the Art Control of Atari Games using shallow reinforcement learning
-
Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening(11.13更新)
-
Deep Reinforcement Learning with Averaged Target DQN(11.14更新)
三. DQN的各种改进版本(侧重于模型的改进)
-
Deep Recurrent Q-Learning for Partially Observable MDPs, M. Hausknecht and P. Stone, arXiv, 2015.
-
Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al., ICML, 2016.
-
Language Understanding for Text-based Games Using Deep Reinforcement Learning
-
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks
-
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
四. 基于策略梯度的深度强化学习
深度策略梯度:
深度行动者评论家算法:
-
High-Dimensional Continuous Control Using Using Generalized Advantage Estimation
-
Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies
-
Terrain-adaptive locomotion skills using deep reinforcement learning
-
Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies
-
SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY(11.13更新)
搜索与监督:
连续动作空间下探索改进:
结合策略梯度和Q学习:
其它策略梯度文章:
-
Benchmarking Deep Reinforcement Learning for Continuous Control
-
Learning Continuous Control Policies by Stochastic Value Gradients
五. 分层DRL
-
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
-
Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks
六. DRL中的多任务和迁移学习
-
ADAAPT: A Deep Arc hitecture for Adaptive Policy Transfer from Multiple Sources
-
A Deep Hierarchical Approach to Lifelong Learning in Minecraft
-
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
-
Multi-task learning with deep model based reinforcement learning(11.14更新)
-
Modular Multitask Reinforcement Learning with Policy Sketches (11.14更新)
七. 基于外部记忆模块的DRL模型
八. DRL中探索与利用问题
-
Action-Conditional Video Prediction using Deep Networks in Atari Games
-
Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks
-
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
-
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
-
#Exploration: A Study of Count-Based Exploration for Deep Reinforcemen Learning(11.14更新)
-
Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning(11.14更新)
九. 多Agent的DRL
-
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks
-
Multiagent Cooperation and Competition with Deep Reinforcement Learning
十. 逆向DRL
-
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
-
Generalizing Skills with Semi-Supervised Reinforcement Learning(11.14更新)
十一. 探索+监督学习
-
Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning
-
Better Computer Go Player with Neural Network and Long-term Prediction
-
Mastering the game of Go with deep neural networks and tree search, D. Silver et al., Nature, 2016.
十二. 异步DRL
十三:适用于难度较大的游戏场景
十四:单个网络玩多个游戏
十五:德州poker
十六:Doom游戏
-
ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning
-
Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning
-
Deep Reinforcement Learning From Raw Pixels in Doom(11.14更新)
十七:大规模动作空间
十八:参数化连续动作空间
十九:Deep Model
-
Learning Visual Predictive Models of Physics for Playing Billiards
-
Learning Continuous Control Policies by Stochastic Value Gradients
-
Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models
-
Action-Conditional Video Prediction using Deep Networks in Atari Games
-
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
二十:DRL应用
机器人领域:
-
Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control
-
Learning Deep Neural Network Policies with Continuous Memory States
-
High-Dimensional Continuous Control Using Generalized Advantage Estimation
-
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
-
DeepMPC: Learning Deep Latent Features for Model Predictive Control
-
Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search
-
Learning Continuous Control Policies by Stochastic Value Gradients
机器翻译:
目标定位:
目标驱动的视觉导航:
自动调控参数:
人机对话:
-
SimpleDS: A Simple Deep Reinforcement Learning Dialogue System
-
Strategic Dialogue Management via Deep Reinforcement Learning
视频预测:
文本到语音:
文本生成:
文本游戏:
无线电操控和信号监控:
DRL来学习做物理实验:
DRL加速收敛:
利用DRL来设计神经网络:
-
Designing Neural Network Architectures using Reinforcement Learning(11.14更新)
-
Tuning Recurrent Neural Networks with Reinforcement Learning(11.14更新)
-
Neural Architecture Search with Reinforcement Learning(11.14更新)