Introduction to Reinforcement Learning

Learning from interaction is a foundational idea underlying nearly all theories of learning and intelligence.The approach we explore, called reinforcement learning, is much more focused on goal-directed learning form interaction than are other approaches to machine learning.
区分强化学习和其他种类的学习方式最显著的特点是:在强化学习中,训练信息被用于评估状态和动作的好坏,而不是用于指导到底该是什么策略。
这里写图片描述
Reinforcement learning is learning what to do—how to map situations to actions—so as to maximize a numerical reward signal.
These two characteristics—trial-and-error search and delayed reward—are the two most important distinguishing features of reinforcement learning.

One of the challenges that arise in reinforcement learning, and not in other kinds of learning, is the trade-off between exploration and exploitation.
Another key feather of reforcement learning is that it explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment .

One must look beyond the most obvious examples of agents and their environments to appreciate the generality of the reinforcement learning framework.

Feathers shared by cases that can use reinforcement learning:
All involve interaction between an active decision-making agent and its environment, within which the agent seeks to achieve a goal despite uncertainty about its environment.

Elements of reforcement learning:

  • agent

  • policy
    A policy is a mapping from perceived states of the environment to actions to be taken when in those states.

  • reward signal
    A reward signal defines the goal in a reinforcement learning problem.

  • value function
    Whereas rewards determine the immediate, intrinsic desirablity of environmental states, values indicate the long-term desirability of states after taking into account the states that are likely to follow, and the rewards available in those states.

  • model of the environment (optional)
    A model predicts what the environment will do next. There are two kinds of model: Transitions Model and Rewards Model. Transition model predicts the next state. (i.e. dynamics) Pass=[S=sS=s,A=a] P s s ′ a = P [ S ′ = s ′ ∣ S = s , A = a ]
    Rewards model predicts the next (immediate reward) Ras=

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值