Reinforcement Learning: An Introduction - Richard S. Sutton Part 1: Tabular Solution Methods

最新推荐文章于 2023-08-03 02:03:06 发布

chitoseyono

最新推荐文章于 2023-08-03 02:03:06 发布

阅读量580

点赞数

分类专栏： MachineLearning 文章标签：强化学习

本文链接：https://blog.csdn.net/chitoseyono/article/details/87913378

版权

本书资源请到这：Reinforcement Learning: An Introduction资源

Chapter 2 Multi-armed Bandit

Definition of k-armed bandit problem

You are faced repeatedly with a choice among k different options, or
actions. After each choice you receive a numerical reward chosen from
a stationary probability distribution that depends on the action you
selected. Your objective is to maximize the expected total reward over
some time period, for example, over 1000 action selections, or time
steps.

Exploit and explore

If you maintain estimates of the action values, then at any time step there is at least
one action whose estimated value is greatest. We call these the greedy actions. When you
select one of these actions, we say that you are exploiting your current knowledge of the
values of the actions. If instead you select one of the nongreedy actions, then we say you
are exploring, because this enables you to improve your estimate of the nongreedy action’s
value.

denotations:

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

chitoseyono

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Reinforcement Learning: An Introduction - Richard S. Sutton Part 1: Tabular Solution Methods

RL: An IntroductionPart 1: Tabular Solution Methods
复制链接

扫一扫