【计算机科学】【2016.11】用于强化学习的深度学习方法

在这里插入图片描述

本文为葡萄牙里斯本技术大学(作者:Daniel Luis Simões Marta)的硕士论文,共95页。

本文主要研究了在强化学习中应用深度学习方法时,分离状态感知和函数逼近的挑战。作为一个起点,高维状态被认为是将强化学习应用于现实任务时的基本限制。针对维数灾难问题,我们建议降低数据的维数,以获得简洁的代码(环境的内部表示),作为强化学习框架中的替代状态。在过去的几十年中,人们采用了不同的方法,包括具有手工设计功能的内核机制,在这些机制中,选择合适的滤波器匹配任务,并且需要进行大量的研究。在这项工作中,各种深度学习方法与无监督学习机制被考虑。

另一个关键主题涉及估算大状态空间的Q值,在这种情况下,表格方法不再可行。作为一种Q函数逼近的方法,我们在深度学习中寻找有监督学习方法。本文的目标包括详细探讨和理解所提出的方法,并实现一个神经控制器。考虑到各种优化程序和增加的参数,进行了一些模拟,得出了一些结论。多种结构被用作Q值函数的近似。为了推断更好的方法并提示更高规模的应用,在两种类似的Q网络之间进行了试验。关于最新技术的实现在经典控制问题上进行了测试分析。

This thesis focuses on the challenge ofdecoupling state perception and function approximation when applying DeepLearning Methods within Reinforcement Learning. As a starting point,high-dimensional states were considered, being this the fundamental limitationwhen applying Reinforcement Learning to real world tasks. Addressing the Curseof Dimensionality issue, we propose to reduce the dimensionality of data inorder to obtain succinct codes (internal representations of the environment),to be used as alternative states in a Reinforcement Learning framework.Different approaches were made along the last few decades, including KernelMachines with hand-crafted features, where the choice of appropriate filterswas task dependent and consumed a considerable amount of research. In thiswork, various Deep Learning methods with unsupervised learning mechanisms wereconsidered. Another key thematic relates to estimating Q-values for largestate-spaces, where tabular approaches are no longer feasible. As a mean toperform Q-function approximation, we search for supervised learning methodswithin Deep Learning. The objectives of this thesis include a detailedexploration and understanding of the proposed methods with the implementationof a neural controller. Several simulations were performed taking into accounta variety of optimization procedures and increased parameters to draw several conclusions.Several architectures were used as a Q-value function approximation. To inferbetter approaches and hint for higher scale applications, a trial between twosimilar types of Q-networks were conducted. Implementations regardingstate-of-the-art techniques were tested on classic control problems.

  1. 引言
  2. 深度学习的概念
  3. 强化学习
  4. 实验架构
  5. 实验结果
  6. 结论

更多精彩文章请关注公众号:在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值