强化学习笔记:
强化学习是机器学习的子领域。机器学习包括监督学习,非监督学习,强化学习。
强化学习的定义:
Reinforcement learning is a learning paradigm concerned with learning to control asystem so as to maximize a numerical performance measure that expresses a long-term objective。
Reinforcement learning is learning what to do—how to map situations to actions—so
as to maximize a numerical reward signal.
强化学习是一种学习控制系统的学习范式,以最大化表达长期目标。强化学习引起了极大的兴趣,因为它可以用来解决大量的实际应用,从人工智能问题到运筹学或控制工程。
强化学习正在学习如何做 - 如何将情境映射到行动 - 以便最大化数字奖励信号。
A controller receives the controlled system’s state and a reward associated with the last state transition. It then calculates an action which is sent back to the system. In response, the system makes a transition to a new state and the cycle is repeated. The problem is to learn a way of controlling the system so as to maximize the total reward.
一个 controller接收控制系统的状态和当前这个状态的immediate reward;之后计算应该发回到行动,系统作为应答,系统会发生一个状态转换,之后过程重复。目标是最大化长期奖励。
Problems with these characteristics are best described in the framework of Markovian Decision Processes (MDPs). The standard approach to ‘solve’ MDPs is to use dynamic programming, which transforms the problem of finding a good controller into the problem of finding a good value function.
马尔可夫决策过程是强化学习的基础。
MDP(Markov Decision Processes):
Markov Property:
St表示当前状态,在St已知时,可以不必考虑当前状态之前的历史信息,即St包含了所有的历史信息,对于将来状态的确定,具有足够的信息。
马尔科夫过程:
对于马尔科夫状态当前s,后继状态s',状态转移概率定义如下:
表示为矩阵形式如下: