论文阅读--ECCV2018--Reinforcement

Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation

略读, motivation
  1. RL的两种分类
    参考ppp8300885的博客
    Model-free RL: 不对环境进行建模,直接寻找在状态 S S S下,动作 a a a和奖励值 R R R的关系,例如Q-Learning,actor-critic等方法都属于model-free RL
    Model-based RL: 对环境进行建模,学习模型来估计状态 S 1 S_{1} S1下执行动作 a a a得到的新状态 S 2 S_{2} S2,和奖励值得估计。
  2. This work combines model-free and model-based reinforcement learning algorithms and proposes a planned-ahead hybrid reinforcement learning framework to solve a real-world vision-language navigation task. The introduced environment model can predict next state and reward, improving the performance and make the framework more generalizable.
启迪

目前只在做model-free RL,以后深入了,可以考虑做model-based RL

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

在这里插入图片描述

Research background

This work starts from the autonomous urban driving navigation problem and step further to study the multi-agent driving dynamics task. The conventional modular pipeline heavily relies on the hand-designed rules and the pre-processing perception system. These supervised learning-based methods are usually limited by the accessibility of extensive human experience.
Transcription: Learning an optimal driving policy that mimics human drivers is less explored but key to navigate in complex environment that requires undrestanding of multi-agent dynamics, prescriptive traffic rules, negotiations skills for taking left and right turns, and unstructured roadways.

Motivation and proposed approach
  1. The widespred supervised learning methods, e.g. CNNs, cannot perform well on self-driving scenarios due to two major reasons. Firstly, human driving data, that consists of both sensor inputs and vehicle command, is expensive to collect and shows limited coverage. Secondly, considering autonomous vehicles need to interact with the environment, including other vehicles, pedestrians and roadways, it is difficult to pose autonomous driving as a supervised learning problem.
  2. This work resolves the challenging planning task with the novel Controllable Imitative Reinforcement Learning that facilities the continuous deep-RL by exporting the knowledge learned from human experts.
  3. Confused by too many failed explorations for large action space, the conventional DDPG often fails into local optical. To facilities this problem, this work provides better exploration seeds for the search over the action space of the actor network. Precisely, this work first uses human demonstrations to teach knowledge for the actor network via imitation learning, so as to initialize action exploration in a reasonable space. Then CIRL incorporates DDPG to gradually boost the generalization capability of the learned policy. Furthermore, to support the goal-oriented navigation, this work introduce a controllable gating mechanism to selective activate different branches for four distinct control signals.
启迪

为了能train出DDPG,先使用监督的数据训练以达到一个好的初始化,通过controllable gating mechanism将各个可能的动作都执行,同时根据reward信号提升DDPG的效果

潜在的不足
  1. 能否用更强大的连续RL算法提升性能?

Collaborative Deep Reinforcement Learning for Multi-Object Tracking

Dual-Agent Deep Reinforcement Learning for Deformable Face Tracking

略读,motivation

This work studies deformable face tracking task, aiming at generating bounding box and detecting facial landmarks from the face videos. The researchers point out that the intrinsic connections between these two tasks should be noticed. They define two agents to explore the relationships and pass messages via an adaptive sequence of actions under a deep reinforcement learning framework.

Reinforced Temporal Attention and Split-Rate Transfer for Depth-Based Person Re-Identificatio

略读,motivation

This work aims to utilize the commodity depth sensors to handle person re-identification task. Faced with scare data, this work proposes split-rate RGB-to-Depth transfer, which can effectively leverage RGB datasets. This is motivated by the observation that model parameters of the bottom layers of a CNN can be directly shared between RGB and depth data, while the remaining layers need to be fine-tuned rapidly. Besides, this work implements temporal attention as a Bernoulli-Sigmoid unit acting upon frame-level features. This temporal attention process is stochastic and the parameters are trained with reinforcement learning.

  1. 可能借鉴:temporal attention的部分

Improving Spatiotemporal Self-Supervision by Deep Reinforcement Learning

略读,motivation

Self-supervised learning focuses on learning powerful feature representations from large amount of unlabeled data. Generally, training exemplars are randomly selected from a fixed pre-selected set, and the algorithms are expected to order the data in spatial or temporal domain. This work proposes to learn a sample policy which adapts to the state of the network during network training process. They use reinforcement learning to optimize this policy, so as to sample new permutations according to their expected utility for update the network feature representation.

  1. 可能借鉴:直接由network的参数学习RL,如何感知network的参数与feature representation之间的关系,能训练出agent吗?
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值