由于组里新同学进来,需要带着他入门RL,选择从silver的课程开始。
对于我自己,增加一个仔细阅读《reinforcement learning:an introduction》的要求。
因为之前读的不太认真,这一次希望可以认真一点,将对应的知识点也做一个简单总结。
Reinforcement learning problems involve learning what to do - how to map situations to actions - so as to maximize a numerical reward signal.
RL is different from supervised learning/unsupervised learning.
There is no supervisor (to tell what is best!), only a reward signal, must discover which actions yield the most reward by trying them out
action influence the environment and sub-sequential data; data distribution is not iid
Feedback is (sometimes) delayed, not instantaneous
trade-off between exploration and exploitation
for stochastic task, each action must be tried many times to gain a reliable estimate of its expected reward