Learning to act by predicting the future

最新推荐文章于 2024-10-15 01:20:33 发布

deye1979

最新推荐文章于 2024-10-15 01:20:33 发布

阅读量139

点赞数

文章标签：人工智能

原文链接：http://www.cnblogs.com/huangshiyu13/p/7063838.html

版权

Dosovitskiy, Alexey, and Vladlen Koltun. "Learning to act by predicting the future." arXiv preprint arXiv:1611.01779 (2016).

vizdoom比赛track2的冠军。

要点：

1.使用了监督学习，而不是增强学习。

2.克服sparse reward的问题。

3.在test时不同目标的泛化能力强。更加长远的作用就是减少了人为reward的制定。

实验分析：

1.通过下面这个在D4上训练，D3-tx和D4-tx上测试的结果可以看出，其在不同地图上的泛化能力弱。要想在不同地图上提高泛化能力，一个是要数据量大，二个是要加强perception部分的处理。

未来展望：

1.把RL统一到supervised learning框架下。

转载于:https://www.cnblogs.com/huangshiyu13/p/7063838.html

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

deye1979

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

LEARNING GOAL-CONDITIONED VALUE FUNCTIONS WITH ONE-STEP PATH REWARDS RATHER THAN GOAL- REWARDS

weixin_41697507的博客

06-30

821

ABSTRACT Multi-goal reinforcement learning (MGRL) addresses tasks where the desired goal state can change for every trial. State-of-the-art algorithms model these problems such that the reward formula...

From Prediction to Prevention: Improving Transportation

AI天才研究院

09-22

1797

作者：禅与计算机程序设计艺术Traffic management systems (TMS) play a crucial role in improving transportation security by preventing collisions and traffic congestion. TMS solutions use various techniques such as route planning, real-time traffic information monitoring,

参与评论您还未登录，请先登录后发表或查看评论

Xamarin.Forms.Mocks 使用教程

gitblog_00048的博客

06-12

390

Xamarin.Forms.Mocks 使用教程 Xamarin.Forms.Mocks Library for running Xamarin.Forms inside of unit tests 项目地址: https://gitcode.com/gh_mir...

Lesson 51:Predicting the future 预测未来

小人物的专栏

09-12

1043

Listen to the tape then answer the question below.听录音，然后回答以下问题。What was the future electronic development that Leon Bagrit wasnt able to foresee? Predicting the future is notorio

深度增强学习方向论文整理

凌风探梅的专栏

11-30

9022

from：https://zhuanlan.zhihu.com/p/23600620 作者：Alex-zhai 链接：https://zhuanlan.zhihu.com/p/23600620 来源：知乎著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。一. 开山鼻祖DQN 1. Playing Atari with Deep Reinforcem

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale阅读笔记

Malidong的博客

04-08

610

ICLR 2021 Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby https://arxiv.org/abs/2010.11929 一、简介在视觉领

利用TensorFlow对复杂目标进行强化学习

OReillyData

03-07

1420

编者注：请访问GitHub上这篇文章相应的Python代码和iPython notebook文件。更多人工智能业务方面重要的发展请关注2018年4月10-13日人工智能北京大会。强化学习(RL)是关于训练智能体来完成一些任务。一般认为这能够达成某个目标，例如，我们可能想要训练机器人来打开一扇门。强化学习可以作为一个框架，允许机器人用试错的方法来学习打开门。但是，如果我们更感兴趣的是让智能体不仅只

Day 5. Suicidal Ideation Detection: A Review of Machine Learning Methods and Applications综述

weixin_37996254的博客

11-05

3473

Title: Suicidal Ideation Detection: A Review of Machine Learning Methods and Applications 自杀意念检测：机器学习方法及应用综述 Keywords: Deep learning 深度学习 feature engineering 特征工程 social contents 社交内容 suicidal ideation detection (SID) 自杀意念检测 Abstract： Suicide is a critical

AdamTechLouis's talk:Decoding the Best Papers from ICLR 2019 – Neural Networks are Here to Rule

weixin_41697507的博客

06-03

409

Introduction I love reading and decoding machine learning research papers. There is so much incredible information to parse through – a goldmine for us data scientists! I was thrilled when the best pa...

Conceptual Challenges for Interpretable Machine Learning

weixin_42786150的博客

03-01

1411

Conceptual Challenges for Interpretable Machine Learning David S. Watson1 'Department of Statistical Science, University College London, London, UK Email for correspondence: david.watson@ucl.ac.uk §0 Abstract As machine learning has gradually entere..

转自知乎，深度强化学习论文https://zhuanlan.zhihu.com/p/23600620

zxx650的博客

06-06

7412

一. 开山鼻祖DQN1. Playing Atari with Deep Reinforcement Learning，V. Mnih et al., NIPS Workshop, 2013.2. Human-level control through deep reinforcement learning, V. Mnih et al., Nature, 2015.二. DQN的各种改进版本（侧...

深度学习的可解释性——Striving For Simplicity: The All Convolution Net

heruili的博客

05-12

1180

Striving For Simplicity: The All Convolution Net 是ICLR 2015的一篇论文，作者Jost Tobias Springenberg , Alexey Dosovitskiy , Thomas Brox, Martin Riedmiller 这个方法来自于ICLR-2015 的文章《Striving for Simplicity: The Al...

计算机视觉论文整理

最新发布

家鸽的代码屋

10-15

695

DINO系列学习总结

【人工智能学习之PaddleOCR快速上手】

Jiagym的博客

10-12

1167

在配置文件中，可以设置组建模型、优化器、损失函数、模型前后处理的参数，PaddleOCR从配置文件中读取到这些参数，进而组建出完整的训练流程，完成模型训练，在需要对模型进行优化的时，可以通过修改配置文件中的参数完成配置，使用简单且方便修改。而 L2 正则化中，添加正则化项的目的在于减少参数平方的总和。准确检测的标准是检测框与标注框的IOU大于某个阈值，正确识别的检测框中的文本与标注的文本相同。如果缺少带标注的数据，或者不想投入研发成本，建议直接调用开放的API，开放的API覆盖了目前比较常见的一些垂类。

线性判别器LDA

qq_52421831的博客

10-10

1257

LDA是一种有监督的降维方法，和它比较类似的是PCA(一种无监督的降维方法)

【进阶OpenCV】（6）--指纹识别

m0_74896766的博客

10-06

1739

本篇介绍了，如何将源图像指纹同指纹库中的指纹进行匹配，并得到的对应指纹的信息。

论文及其创新点学习cvpr2022 On the Integration of Self-Attention and Convolution

qq_53536373的博客

10-13

387

论文创新点，将注意力机制和卷积相结合。

predicting patient outcomes with graph representation learning

04-30

随着医疗技术的不断发展和数据的不断积累，如何快速而准确地预测患者的治疗结果成为了医学界的一个重要问题。而基于图表示学习的方法正是近年来受到广泛关注的一种能够有效解决这个问题的技术。...