ai人工智能编程_从人工智能动态编程:Q学习

ai人工智能编程

A failure is not always a mistake, it may simply be the best one can do under the circumstances. The real mistake is to stop trying. — B. F. Skinner

失败并不总是错误,它可能只是在这种情况下可以做的最好的事情。 真正的错误是停止尝试。 — BF斯金纳

Reinforcement learning models are beating human players in games around the world. Huge international companies are investing millions in reinforcement learning. Reinforcement learning in today’s world is so powerful because it requires neither data nor labels. It could be a technique that leads to general artificial intelligence.

强化学习模型正在全球游戏中击败人类玩家。 庞大的国际公司在强化学习上投入了数百万美元。 当今世界,强化学习是如此强大,因为它既不需要数据也不需要标签。 它可能是导致通用人工智能的技术。

有监督和无监督学习 (Supervised and Unsupervised Learning)

As a summary, in supervised learning, a model learns to map input to outputs using predefined and labeled data. An unsupervised learning approach teaches a model to cluster and group data using predefined data.

总而言之,在监督学习中,模型学习使用预定义和标记的数据 将输入映射到输出 。 一种无监督的学习方法,教一个模型使用预定义的数据对数据进行聚类和分组

强化学习 (Reinforcement Learning)

However, in reinforcement learning, the model receives no data set and guidance, using a trial and error approach.

但是,在强化学习中,该模型没有使用试错法接收任何数据集和指导。

Reinforcement learning is an area of machine learning defined by how some model (called agent in reinforcement learning) behaves in an environment to maximize a given reward. The most similar real-world example is of a wild animal trying to find food in its ecosystem. In this example, the animal is the agent, the ecosystem is the environment, and the food is the reward.

强化学习是机器学习的一个领域,它是由某种模型(在强化学习中称为主体)在环境中的行为来定义的,以最大化给定的奖励。 现实世界中最相似的例子是野生动物试图在其生态系统中寻找食物。 在此示例中,动物是主体,生态系统是环境,食物是奖励。

Reinforcement learning is frequently used in the domain of game playing, where there is no immediate way to label how “good” an action was, since we would need to consider all future outcomes.

强化学习经常用于游戏领域,因为我们需要考虑所有未来的结果,因此无法立即标记动作的“良好”程度。

马尔可夫决策过程 (Markov Decision Processes)

The Markov Decision Process is the most fundamental concept of reinforcement learning. There are a few components in an MDP that interact with each other:

马尔可夫决策过程是强化学习的最基本概念。 MDP中有一些相互影响的组件: </

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值