ai人工智能编程_从人工智能动态编程：Q学习

最新推荐文章于 2024-01-30 08:51:55 发布

weixin_26752765

最新推荐文章于 2024-01-30 08:51:55 发布

阅读量555

点赞数

文章标签：人工智能 python 机器学习算法大数据

原文链接：https://medium.com/swlh/dynamic-programming-to-artificial-intelligence-q-learning-51a189fc0441

版权

ai人工智能编程

A failure is not always a mistake, it may simply be the best one can do under the circumstances. The real mistake is to stop trying. — B. F. Skinner

失败并不总是错误，它可能只是在这种情况下可以做的最好的事情。真正的错误是停止尝试。 — BF斯金纳

Reinforcement learning models are beating human players in games around the world. Huge international companies are investing millions in reinforcement learning. Reinforcement learning in today’s world is so powerful because it requires neither data nor labels. It could be a technique that leads to general artificial intelligence.

强化学习模型正在全球游戏中击败人类玩家。庞大的国际公司在强化学习上投入了数百万美元。当今世界，强化学习是如此强大，因为它既不需要数据也不需要标签。它可能是导致通用人工智能的技术。

有监督和无监督学习 (Supervised and Unsupervised Learning)

As a summary, in supervised learning, a model learns to map input to outputs using predefined and labeled data. An unsupervised learning approach teaches a model to cluster and group data using predefined data.

总而言之，在监督学习中，模型学习使用预定义和标记的数据 将输入映射到输出 。一种无监督的学习方法，教一个模型使用预定义的数据对数据进行聚类和分组 。

强化学习 (Reinforcement Learning)

However, in reinforcement learning, the model receives no data set and guidance, using a trial and error approach.

但是，在强化学习中，该模型没有使用试错法接收任何数据集和指导。

Reinforcement learning is an area of machine learning defined by how some model (called agent in reinforcement learning) behaves in an environment to maximize a given reward. The most similar real-world example is of a wild animal trying to find food in its ecosystem. In this example, the animal is the agent, the ecosystem is the environment, and the food is the reward.

强化学习是机器学习的一个领域，它是由某种模型(在强化学习中称为主体)在环境中的行为来定义的，以最大化给定的奖励。现实世界中最相似的例子是野生动物试图在其生态系统中寻找食物。在此示例中，动物是主体，生态系统是环境，食物是奖励。

Reinforcement learning is frequently used in the domain of game playing, where there is no immediate way to label how “good” an action was, since we would need to consider all future outcomes.

强化学习经常用于游戏领域，因为我们需要考虑所有未来的结果，因此无法立即标记动作的“良好”程度。

马尔可夫决策过程 (Markov Decision Processes)

The Markov Decision Process is the most fundamental concept of reinforcement learning. There are a few components in an MDP that interact with each other:

马尔可夫决策过程是强化学习的最基本概念。 MDP中有一些相互影响的组件： </

最低0.47元/天解锁文章

weixin_26752765

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ai人工智能编程_从人工智能动态编程：Q学习

ai人工智能编程A failure is not always a mistake, it may simply be the best one can do under the circumstances. The real mistake is to stop trying. — B. F. Skinner 失败并不总是错误，它可能仅仅是在这种情况下可以做的最好的事情。真正的错误是停止尝试。...
复制链接

扫一扫