An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep RL

最新推荐文章于 2021-03-28 21:48:45 发布

fuxin607

最新推荐文章于 2021-03-28 21:48:45 发布

阅读量466

点赞数

分类专栏：跨媒体文章标签： object retrieval by text

本文链接：https://blog.csdn.net/fuxin607/article/details/80036670

版权

跨媒体专栏收录该内容

23 篇文章 2 订阅

订阅专栏

这是一篇做关于用reinfocement learning（RL）做Natural Language Object Retrieval的文章，paper的链接https://arxiv.org/abs/1703.07579，没有找到作者的homepage，但是code已经released出来了https://github.com/jxwufan/NLOR_A3C。
文章要做的事情：
输入：text+image dataset　　　　　输出：object
results exmaple

method

文章中给出的natural language object retrieval via context-aware deep reinforcement learning的一个示意图。
framework exmaple
context-aware policy and value network framework如下所示。

training pipeline如下所示。
training pipeine
这篇文章重要的的一点是用end-end并通过强化学习的方式来产生bbox，而不需要通过训练好的proposel（rely heavily on the training data of object proposals and are restricted to the predefined object categories）网络来提取。
image features: concat global feature( ResNet152 global average pooling) and local feature( ResNet152 Roi pooling+global average pooling)，2048+2048=4096dim。
sentence features: skip-thought vectors [ http://papers.nips.cc/paper/5950-skip-thought-vectors ] trained on the BookCorpus dataset，4096dim。
然后再将image feature和sentence feature做dot product和L2的运算，然后再与50（50x9=450dim）个之前的动作向量和一个bbox向量（5dim）做concatation运算，得到一个4096+450+9=4551dim的向量，然后再通过2个FC得到1024dim的feature，然后在通过一个带有Layer Normalization的LSTM（根据temporal context做subsequent decision making），最后输出policy(决定要采取的action)和value（估计reward）。
training： uses multiple agents associated with environments to collect data in parallel and updates the policy asynchronously by asynchronous advantage actor-critic (A3C) method [ https://arxiv.org/abs/1602.01783 ]。

fuxin607

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep RL

这是一篇做关于用reinfocement learning（RL）做Natural Language Object Retrieval的文章，paper的链接https://arxiv.org/abs/1703.07579，没有找到作者的homepage，但是code已经released出来了https://github.com/jxwufan/NLOR_A3C。文章要做的事情：输入：te...
复制链接

扫一扫