来自Github开源项目的基于Grid World游戏的Q-Learning算法
Github地址:https://github.com/rlcode/reinforcement-learning/tree/master/1-grid-world/5-q-learning
Q-Learning
Q-Learning是一项无模型的增强学习技术,它可以在MDP问题中寻找一个最优的动作选择策略。它通过一个动作-价值函数来进行学习,并且最终能够根据当前状态及最优策略给出期望的动作。它的一个优点就是它不需要知道某个环境的模型也可以对动作进行期望值比较,这就是为什么它被称作无模型的。
以下是维基百科原文:
Q-learning is a model-free reinforcement learning technique. Specifically, Q-learning can be used to find an optimal action-selection policy for any given (finite) Markov decision process (MDP). It works by learning an action-value function that ultimately gives the expected utility of taking a given action in a given state and following the optimal policy thereafter. A policy is a rule that the agent follows in selecting actions, given the s