现有的强化学习主要分为五种:
通过价值选行为:Q-learning、Sarsa、Deep Q Network
直接选行为:Policy Gradients
想象环境并从中学习:Model Based RL
回合更新:基础版的Policy Gradients、Monte-Carlo Learning
单步更新:Q Learning、Sarsa、升级版Policy Gradients
一、Q-Learning
见参考:
https://baijiahao.baidu.com/s?id=1597978859962737001&wfr=spider&for=pc
https://www.jianshu.com/p/29db50000e3f?utm_medium=hao.caibaojian.com&utm_source=hao.caibaojian.com