强化学习系列(1.1):强化学习介绍

参考书目

正文

The approach we explore, called reinforcement learning, is much more focused on goal-directed learning from interaction than are other approaches to machine learning.

  • 强化学习为从交互中学习的计算方法,它比其他机器学习方法更着重于通过交互进行目标导向学习。

强化学习

These two characteristics—trial-and-error search and delayed reward—are the two most important distinguishing features of reinforcement learning.

  • 强化学习是学习如何将情况映射到行动,以最大化数字奖励信号。行动不仅影响即时奖励,还会影响下一个情况进而影响后续奖励。“试错搜索”和“延迟奖励”是强化学习的两个最重要的区别性特征。

A learning agent must be able to sense the state of its environment to some extent and must be able to take actions that affect the state. The agent also must have a goal or goals relating to the state of the environment. Markov decision processes are intended to include just these three aspects—sensation, action, and goal—in their simplest possible forms without trivializing any of them. Any method that is well suited to solving such problems we consider to be a reinforcement learning method.

  • 不完全已知的马尔科夫决策过程最优控制:基本思想只是捕获学习代理随着时间与其环境交互以实现目标所面临的实际问题的最重要方面。学习代理必须能够在某种程度上感知其环境状态,并且必须能够采取影响状态的行动。代理还必须具有与环境状态有关的一个或多个目标。马尔可夫决策过程旨在用最简单的形式,包括“感觉、行动和目标”这三个方面。任何非常适合解决此类问题的方法,我们都认为是强化学习方法。

Reinforcement learning is different from supervised learning, the kind of learning studied in most current research in the field of machine learning.
In interactive problems it is often impractical to obtain examples of desired behavior that are both correct and representative of all the situations in which the agent has to act. In uncharted territory—where one would expect learning to be most beneficial—an agent must be able to learn from its own experience.

  • 强化学习和监督学习不同。监督学习目的是让系统推断或概括它的反应,以便在训练集中没有出现的情况下正确地行动。在交互问题中,要想获得既正确又能代表所有情况的期望行为的例子,往往是不切实际的。

Reinforcement learning is also different from what machine learning researchers call unsupervised learning, which is typically about finding structure hidden in collections of unlabeled data.
Uncovering structure in an agent’s experience can certainly be useful in reinforcement learning, but by itself does not address the reinforcement learning problem of maximizing a reward signal. We therefore consider reinforcement learning to be a third machine learning paradigm, alongside supervised learning and unsupervised learning and perhaps other paradigms.

  • 强化学习和非监督学习不同。强化学习正在尝试最大化奖励信号,而不是试图寻找隐藏的结构。从代理经验中发现结构对强化学习有用,但是其本身无法解决最大化奖励信号的强化学习问题。认为强化学习是第三种机器学习范式,其他有监督学习、非监督学习等。

The agent has to exploit what it has already experienced in order to obtain reward, but it also has to explore in order to make better action selections in the future.

  • “开发”和“探索”的平衡是强化学习的挑战,在监督学习和非监督学习中未出现。

Another key feature of reinforcement learning is that it explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment.

  • 强化学习另一个关键特征:明确考虑目标导向的智能体与不确定环境交互的整个问题。

Reinforcement learning takes the opposite tack, starting with a complete, interactive, goal-seeking agent.
Moreover, it is usually assumed from the beginning that the agent has to operate despite significant uncertainty about the environment it faces.
When reinforcement learning involves planning, it has to address the interplay between planning and real-time action selection, as well as the question of how environment models are acquired and improved.

  • 增强学习采取相反的策略,从完整的、交互式的、寻求目标的代理开始。即使面对的环境有不确定性,代理从一开始必须运作。强化学习涉及计划时,它必须解决计划与实时行动选择之间的相互作用,以及如何获取和改善环境模型的问题。

A complete, interactive, goal-seeking agent can also be a component of a larger behaving system.
It is important to look beyond the most obvious examples of agents and their environments to appreciate the generality of the reinforcement learning framework.

  • 一个完整的、互动的、目标寻求的代理可以是更大行为系统的组成部分.它和更大系统的其他部分直接交互,和环境间接交互。学习强化学习框架的一般性很重要。

Reinforcement learning is part of a decades-long trend within artificial intelligence and machine learning toward greater integration with statistics, optimization, and other mathematical subjects.

  • 强化学习是几十年来人工智能和机器学习与统计、优化和其他数学学科更紧密结合的趋势的一部分。

Finally, reinforcement learning is also part of a larger trend in artificial intelligence back toward simple general principles.

  • 强化学习是人工智能向简单一般原则回归的大趋势的一部分。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值