文章目录
前言
前面我们讲了强化学习的一些入门知识,我现在又整理些强化学习的笔记,主要是参考台湾大学李宏毅老师讲的强化学习系列。这一篇主要是介绍value-based reinforcement learning approach,讲到了利用梯度上升找到一个最好的actor。1 Machine Learning
≈ Looking for a Function
2 Three Steps for Deep Learning
Deep Learning is so simple ……
3 Goodness of Actor:
Total Loss:
Find the network parameters 𝜽∗ that minimize total loss L
Training Example:
4 Gradient Ascent
•Problem statement
•Gradient ascent
𝑅𝜏 do not have to be differentiable It can even be a black box.
𝛻𝑙𝑜𝑔𝑃(𝜏|𝜃)=?