RL An introduction 2nd edition Reading Notes

RL An introduction 2nd edition Reading Notes

Chap 2

S 2.1

  1. Action value = expected or mean reward.
    在这里插入图片描述

S 2.2

  1. Action-value methods = estimating the values of actions + using the estimates to make action selection decisions.
  2. Sample-average method -> estimating the action value
    在这里插入图片描述
    在这里插入图片描述
    denotes the random variable that is 1 if predicate is true and 0 if it is not.
  3. Greedy action selection - > action selection rule.
    在这里插入图片描述
  4. ε-greedy method:ε probability to randomly select from among all actions with equal probability. for k-armed bandits with ε, the probability of the optimal action will converge to:
    在这里插入图片描述
    (Simple derivation: 1-ε this is greedy action selection probability however since for k actions, each action is selected with equal probability, therefore there is (1/k)ε to still select the greedy action. so the total probability will be the upper equation shows. )
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值