Deep Reinforcement Learning-based Image Captioning with Embedding Reward

最新推荐文章于 2024-02-19 11:53:20 发布

算法学习者

最新推荐文章于 2024-02-19 11:53:20 发布

阅读量7.6k

点赞数

分类专栏： paper reading

本文链接：https://blog.csdn.net/AMDS123/article/details/70170274

版权

paper reading 专栏收录该内容

85 篇文章 0 订阅

订阅专栏

https://arxiv.org/abs/1704.03899

Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a "policy network" and a "value network" to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the correct words towards the goal of generating captions similar to the ground truth captions. We train both networks using an actor-critic reinforcement learning model, with a novel reward defined by visual-semantic embedding. Extensive experiments and analyses on the Microsoft COCO dataset show that the proposed framework outperforms state-of-the-art approaches across different evaluation metrics.

算法学习者

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Deep Reinforcement Learning-based Image Captioning with Embedding Reward

https://arxiv.org/abs/1704.03899Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent a
复制链接

扫一扫