RL An introduction 2nd edition Reading Notes

lj2jq

于 2020-10-12 10:26:30 发布

阅读量203

点赞数

分类专栏： Reinforcement learning 文章标签： reinforcement learning

本文链接：https://blog.csdn.net/qq_34897331/article/details/109022018

版权

1 篇文章 0 订阅

订阅专栏

Chap 2

Action-value methods = estimating the values of actions + using the estimates to make action selection decisions.
Sample-average method -> estimating the action value

denotes the random variable that is 1 if predicate is true and 0 if it is not.
Greedy action selection - > action selection rule.
ε-greedy method：ε probability to randomly select from among all actions with equal probability. for k-armed bandits with ε, the probability of the optimal action will converge to:

(Simple derivation: 1-ε this is greedy action selection probability however since for k actions, each action is selected with equal probability, therefore there is (1/k)ε to still select the greedy action. so the total probability will be the upper equation shows. )