强化学习系列(1.2):强化学习示例

参考书目

正文

理解强化学习的一种好方法是考虑一些指导其发展的示例和可能的应用。

A master chess player makes a move. The choice is informed both by planning— anticipating possible replies and counterreplies—and by immediate, intuitive judgments of the desirability of particular positions and moves.

  • 下象棋时走步决策。

An adaptive controller adjusts parameters of a petroleum refinery’s operation in real time. The controller optimizes the yield/cost/quality trade-off on the basis of specified marginal costs without sticking strictly to the set points originally suggested by engineers.

  • 自适应控制器可实时调整炼油厂的运行参数。

A mobile robot decides whether it should enter a new room in search of more trash to collect or start trying to find its way back to its battery recharging station. It makes its decision based on the current charge level of its battery and how quickly and easily it has been able to find the recharger in the past.

  • 移动机器人决定是否应该进入一个新房间,以寻找更多的垃圾来收集,或者开始尝试返回电池充电站。它根据电池的当前电量以及过去能够多快速便捷地找到充电器来做出决定。

Phil prepares his breakfast. Closely examined, even this apparently mundane activity reveals a complex web of conditional behavior and interlocking goal–subgoal relationships: walking to the cupboard, opening it, selecting a cereal box, then reaching for, grasping, and retrieving the box. Other complex, tuned, interactive sequences of behavior are required to obtain a bowl, spoon, and milk carton. Each step involves a series of eye movements to obtain information and to guide reaching and locomotion. Rapid judgments are continually made about how to carry the objects or whether it is better to ferry some of them to the dining table before obtaining others. Each step is guided by goals, such as grasping a spoon or getting to the refrigerator, and is in service of other goals, such as having the spoon to eat with once the cereal is prepared and ultimately obtaining nourishment. Whether he is aware of it or not, Phil is accessing information about the state of his body that determines his nutritional needs, level of hunger, and food preferences.

  • 菲尔准备早餐时候的行动制定。

Correct choice requires taking into account indirect, delayed consequences of actions, and thus may require foresight or planning.

  • 正确的选择需要考虑到行动的间接的,延迟的后果,因此可能需要预见或计划。

At the same time, in all of these examples the e↵ects of actions cannot be fully predicted; thus the agent must monitor its environment frequently and react appropriately.
All these examples involve goals that are explicit in the sense that the agent can judge progress toward its goal based on what it can sense directly.

  • 行为的后果不能完全预测,代理必须频繁监视环境以及做出适当的反应。所有这些示例都涉及明确的目标,即代理可以根据其直接感知的结果来判断实现其目标的进度。

In all of these examples the agent can use its experience to improve its performance over time.

  • 在所有这些示例中,代理可以利用其经验来随着时间的推移改善其性能。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值