https://zhuanlan.zhihu.com/p/21378532?refer=intelligentunit
通用解决框架DQN:
DQN:Playing Atari with Deep Reinforcement Learning
Nature DQN:Human-levelcontrol through deep reinforcement learning
Nature DQN:Human-level Control Through Deep Reinforcement Learning
简介文:
RL:reinforcement learning:an introduction
POMDP方向:Partially Observable Markov Decision Processes
数据集上的改进:
优先经验回放方法:PrioritizedExperience Replay
训练上的改进:
异步训练(A3C):AsynchronousMethods for Deep Reinforcement Learning
网络结构上的改进:
增加RNN:DeepRecurrent Q-Learning for Partially Observable MDP
增加TL:Actor-Mimic:Deep Multitask and Transfer Reinforcement Learning
评估单独动作价值:DuelingNetwork Architectures for Deep Reinforcement Learning
增加LSTM的DRQN:Deep Recurrent Q-Learning for Partially Observable MDPs
基于最优解计算结构的改进:
Target Q的改进:DeepReinforcement Learning with Double Q-learning
置信域策略优化(TRPO):Trust Region Policy Optimization
基于Actor的PG方向:
基础:Policy Gradient Methods for Reinforcement Learning with FunctionApproximation
对数似然项解读:Why we consider log likelihood instead of Likelihood in GaussianDistribution
DPG算法:Deterministic Policy Gradient Algorithms
DDPG算法:CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING
扩展应用领域的改进:
解决高难度游戏:UnifyingCount-Based Exploration and Intrinsic Motivation
连续控制上面:ContinuousDeep Q-Learning with Model-based Acceleration
平台:
SC2:StarCraft II: A New Challenge for Reinforcement Learning
elf:ELF: An Extensive, Lightweight and Flexible Research Platformfor Real-time Strategy Games