RL论文

17 篇文章 1 订阅
4 篇文章 0 订阅

https://zhuanlan.zhihu.com/p/21378532?refer=intelligentunit

通用解决框架DQN:

DQN:Playing Atari with Deep Reinforcement Learning

Nature DQNHuman-levelcontrol through deep reinforcement learning

Nature DQN:Human-level Control Through Deep Reinforcement Learning

 

简介文:

RL:reinforcement learningan introduction

POMDP方向:Partially Observable Markov Decision Processes

 

数据集上的改进:

优先经验回放方法:PrioritizedExperience Replay

 

训练上的改进:

异步训练(A3C):AsynchronousMethods for Deep Reinforcement Learning

 

网络结构上的改进:

增加RNN:DeepRecurrent Q-Learning for Partially Observable MDP

增加TL:Actor-Mimic:Deep Multitask and Transfer Reinforcement Learning

评估单独动作价值:DuelingNetwork Architectures for Deep Reinforcement Learning

增加LSTM的DRQN:Deep Recurrent Q-Learning for Partially Observable MDPs

 

基于最优解计算结构的改进:

Target Q的改进:DeepReinforcement Learning with Double Q-learning

置信域策略优化(TRPO):Trust Region Policy Optimization

 

基于Actor的PG方向:

基础:Policy Gradient Methods for Reinforcement Learning with FunctionApproximation

对数似然项解读:Why we consider log likelihood instead of Likelihood in GaussianDistribution

DPG算法:Deterministic Policy Gradient Algorithms

DDPG算法:CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING

 

扩展应用领域的改进:

解决高难度游戏:UnifyingCount-Based Exploration and Intrinsic Motivation

连续控制上面:ContinuousDeep Q-Learning with Model-based Acceleration

 

 

平台:

SC2:StarCraft II: A New Challenge for Reinforcement Learning

elf:ELF: An Extensive, Lightweight and Flexible Research Platformfor Real-time Strategy Games


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值