RL2_policy_gradients_mainly

https://flyyufelix.github.io/2017/10/12/dqn-vs-pg.html   ***Deep Q Network vs Policy Gradients - An Experiment on VizDoom with Keras

http://karpathy.github.io/2016/05/31/rl/     ***Deep Reinforcement Learning: Pong from Pixels

https://www.jianshu.com/p/a3432c0e1ef2   ***DDPG and TORCS(The Open Racing Car Simulator)

https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html#a2c   ***Policy Gradient Algorithms

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2   ***Simple Reinforcement Learning with Tensorflow Part 8: Asynchronous Actor-Critic Agents (A3C)

https://towardsdatascience.com/proximal-policy-optimization-ppo-with-sonic-the-hedgehog-2-and-3-c9c21dbed5e   ***Proximal Policy Optimization (PPO) with Sonic the Hedgehog 2 and 3

https://blog.csdn.net/Pony017/article/details/81146374   ***从REINFORCE到PPO,看Policy Gradient的前世今生

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值