强化学习导论(Reinforcement Learning: An introduction)读书笔记-1

本文介绍了强化学习的基本概念,对比了它与监督学习和无监督学习的区别。强调了在RL中探索与利用之间的平衡问题,以及目标导向的重要性。此外,还提到了强化学习的组成部分,如策略、奖励信号、价值函数和模型,并探讨了状态定义的灵活性。
摘要由CSDN通过智能技术生成

Chapter 1 Introduction

Chapter 1.1 Introduction-Reinforcement Learning

RL包括哪些部分

Reinforcement learning, like many topics whose names end with “ing” such as machine learning and mountaineering, is simultaneously a problem, a class of solution methods that work well on the problem, and the field that studies this problem and its solution methods. It is convenient to use a single name for all three things, but at the same time essential to keep the three conceptually separate. In particular, the distinction between problems and solution methods is very important in reinforcement learning; failing to make this distinction is the source of many confusions.

带有“ing”后缀的词都包含以下三个部分

  1. 一个问题
  2. 一类解决问题的方法
  3. 研究问题和方法的领域 ?

把他们区分开是非常有必要的,在之后的理解过程中,很多迷惑都是因为没有把这些区分清楚。

RL与ML中supervised learning的不同

In interactive problems it is often impractical to obtain examples of desired behavior that are both correct and representative of all the situations in which the agent has to act.

在互动性问题中,找到有以下两个特点的例子通常是不实际的:

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值