机器学习之Grid World的Q-Learning算法解析

最新推荐文章于 2024-06-25 21:52:14 发布

番茄大圣

最新推荐文章于 2024-06-25 21:52:14 发布

阅读量6.4k

点赞数 1

分类专栏：机器学习文章标签：机器学习增强学习 q-learning grid-world dqn

本文链接：https://blog.csdn.net/tomatomas/article/details/77341114

版权

本文介绍了Q-Learning在Grid World环境中的应用，这是一种无模型的增强学习技术，能够在Markov决策过程中找到最优策略。通过学习动作价值函数，Q-Learning无需环境模型就能比较不同动作的预期价值。文章还提供了Q-Learning与SARSA算法的区别，并提到了其与深度学习结合的DQN算法，该算法在Atari 2600游戏中表现出专家级水平。

摘要由CSDN通过智能技术生成

来自Github开源项目的基于Grid World游戏的Q-Learning算法
Github地址：https://github.com/rlcode/reinforcement-learning/tree/master/1-grid-world/5-q-learning

Q-Learning

Q-Learning是一项无模型的增强学习技术，它可以在MDP问题中寻找一个最优的动作选择策略。它通过一个动作-价值函数来进行学习，并且最终能够根据当前状态及最优策略给出期望的动作。它的一个优点就是它不需要知道某个环境的模型也可以对动作进行期望值比较，这就是为什么它被称作无模型的。

以下是维基百科原文：

Q-learning is a model-free reinforcement learning technique. Specifically, Q-learning can be used to find an optimal action-selection policy for any given (finite) Markov decision process (MDP). It works by learning an action-value function that ultimately gives the expected utility of taking a given action in a given state and following the optimal policy thereafter. A policy is a rule that the agent follows in selecting actions, given the s