RL Value-Based: off-policy DQN(Deep Q-Learning),on-policy

最新推荐文章于 2024-03-29 10:38:32 发布

apche CN

最新推荐文章于 2024-03-29 10:38:32 发布

阅读量255

点赞数

分类专栏： 03.RL

本文链接：https://blog.csdn.net/apache/article/details/115291451

版权

03.RL 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

基于值的方法：V值，Q值。
有价值的是Q值方法，后续Value-Based,一般是指Q值。
Q-Learning，代表一大类相关的算法。

RL Value-Based: off-policy DQN(Deep Q-Learning), on-policy

Q Learning->Approximate Q-Learning -> Deep Q-Learning.

DQN(Deep Q-Learning):

Deep Q-Learning was introduced in 2013. Since then, a lot of improvements have been made.
So, today we’ll see four strategies that improve — dramatically — the training and the results of our DQN agents:

raw DQN & fixed Q-targets
double DQNs
dueling DQN (aka DDQN)
Prioritized Experience Replay ( PER)

DeepMind 2013，DQN & Fixed Q-Targets
Playing Atari with Deep Reinforcement Learning, NIPS 2013
https://arxiv.org/abs/1312.5602

Human Level Control Through Deep Reinforcement Learning, Nature 2015
https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf
Code: https://sites.google.com/a/deepmind.com/dqn/

DeepMind 2015，Double DQN
Deep Reinforcement Learning with Double Q-learning, AAAI 2016
https://arxiv.org/abs/1509.06461

DeepMind 2015，Prioritized Experience Replay,PER,
Prioritized Experience Replay, https://arxiv.org/abs/1511.05952 ,ICLR 2016

DeepMind 2015，Dueling DQN,
Dueling Network Architectures for Deep Reinforcement Learning , ICML 2016
https://arxiv.org/abs/1511.06581

=================

Alpha Go, 2016-2017
Mastering the game of Go with deep neural networks and tree search. Nature 2016
https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf

Mastering the game of Go without human knowledge. Nature 2017
https://www.nature.com/articles/nature24270.epdf
http://augmentingcognition.com/assets/Silver2017a.pdf
https://cs.uwaterloo.ca/~ppoupart/teaching/cs885-spring18/slides/cs885-lecture14a.pdf
https://discovery.ucl.ac.uk/id/eprint/10045895/1/agz_unformatted_nature.pdf

Robot 2016,
Levine, S., Pastor, P., Krizhevsky, A., and Quillen, D. (2016).
Learning hand-eye coordination for robotic grasping with large-scale data collection
https://arxiv.org/abs/1603.02199

drone control 2017,
Kahn, G., Zhang, T., Levine, S., and Abbeel, P. (2017).
PLATO: policy learning using adaptive trajectory optimization, in Proceedings of IEEE International Conference on Robotics and Automation (Singapore), 3342–3349.
https://arxiv.org/abs/1603.00622

============================================

Ref:
https://zhuanlan.zhihu.com/p/107172115
https://theaisummer.com/Taking_Deep_Q_Networks_a_step_further/

https://www.freecodecamp.org/news/improvements-in-deep-q-learning-dueling-double-dqn-prioritized-experience-replay-and-fixed-58b130cc5682/

https://cugtyt.github.io/blog/rl-notes/201807201658.html

深度增强学习（DRL）漫谈 - 从DQN到AlphaGo
https://blog.csdn.net/jinzhuojun/article/details/52752561

强化学习之DQN进阶的三大法宝（Pytorch）
https://blog.csdn.net/MR_kdcon/article/details/111245496
DQN三大改进
https://cloud.tencent.com/developer/article/1092132
https://cloud.tencent.com/developer/article/1092124
https://cloud.tencent.com/developer/article/1092121
使用Python的OpenAI Gym对Deep Q-Learning的实操介绍
https://blog.csdn.net/tMb8Z9Vdm66wH68VX1/article/details/90306841

请问DeepMind和OpenAI身后的两大RL流派有什么具体的区别
https://www.zhihu.com/question/316626294