强化学习系列（1.2）：强化学习示例

最新推荐文章于 2024-04-01 11:52:43 发布

汉秋_

最新推荐文章于 2024-04-01 11:52:43 发布

阅读量482

点赞数

分类专栏：强化学习文章标签：强化学习机器学习人工智能

本文链接：https://blog.csdn.net/weixin_42278880/article/details/114928980

版权

强化学习专栏收录该内容

3 篇文章 0 订阅

订阅专栏

参考书目

An Introduction to Reinforcement Learning, Sutton and Barto 提取码8qvg

正文

理解强化学习的一种好方法是考虑一些指导其发展的示例和可能的应用。

A master chess player makes a move. The choice is informed both by planning— anticipating possible replies and counterreplies—and by immediate, intuitive judgments of the desirability of particular positions and moves.

下象棋时走步决策。

An adaptive controller adjusts parameters of a petroleum refinery’s operation in real time. The controller optimizes the yield/cost/quality trade-off on the basis of specified marginal costs without sticking strictly to the set points originally suggested by engineers.

自适应控制器可实时调整炼油厂的运行参数。

A mobile robot decides whether it should enter a new room in search of more trash to collect or start trying to find its way back to its battery recharging station. It makes its decision based on the current charge level of its battery and how quickly and easily it has been able to find the recharger in the past.

移动机器人决定是否应该进入一个新房间，以寻找更多的垃圾来收集，或者开始尝试返回电池充电站。它根据电池的当前电量以及过去能够多快速便捷地找到充电器来做出决定。

Phil prepares his breakfast. Closely examined, even this apparently mundane activity reveals a complex web of conditional behavior and interlocking goal–subgoal relationships: walking to the cupboard, opening it, selecting a cereal box, then reaching for, grasping, and retrieving the box. Other complex, tuned, interactive sequences of behavior are required to obtain a bowl, spoon, and milk carton. Each step involves a series of eye movements to obtain information and to guide reaching and locomotion. Rapid judgments are continually made about how to carry the objects or whether it is better to ferry some of them to the dining table before obtaining others. Each step is guided by goals, such as grasping a spoon or getting to the refrigerator, and is in service of other goals, such as having the spoon to eat with once the cereal is prepared and ultimately obtaining nourishment. Whether he is aware of it or not, Phil is accessing information about the state of his body that determines his nutritional needs, level of hunger, and food preferences.

菲尔准备早餐时候的行动制定。

Correct choice requires taking into account indirect, delayed consequences of actions, and thus may require foresight or planning.

正确的选择需要考虑到行动的间接的，延迟的后果，因此可能需要预见或计划。

At the same time, in all of these examples the e↵ects of actions cannot be fully predicted; thus the agent must monitor its environment frequently and react appropriately.
All these examples involve goals that are explicit in the sense that the agent can judge progress toward its goal based on what it can sense directly.