https://flyyufelix.github.io/2017/10/12/dqn-vs-pg.html ***Deep Q Network vs Policy Gradients - An Experiment on VizDoom with Keras
http://karpathy.github.io/2016/05/31/rl/ ***Deep Reinforcement Learning: Pong from Pixels
https://www.jianshu.com/p/a3432c0e1ef2 ***DDPG and TORCS(The Open Racing Car Simulator)
https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html#a2c ***Policy Gradient Algorithms
https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2 ***Simple Reinforcement Learning with Tensorflow Part 8: Asynchronous Actor-Critic Agents (A3C)
https://towardsdatascience.com/proximal-policy-optimization-ppo-with-sonic-the-hedgehog-2-and-3-c9c21dbed5e ***Proximal Policy Optimization (PPO) with Sonic the Hedgehog 2 and 3
https://blog.csdn.net/Pony017/article/details/81146374 ***从REINFORCE到PPO,看Policy Gradient的前世今生