[深度学习论文笔记][Attention]Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention

本文探讨了在图像描述模型中引入注意力机制的重要性。传统的RNN解码器只查看一次整个图像,而新的方法通过注意力机制聚焦于图像中的关键对象,减轻计算负担。此外,注意力模型还允许我们可视化模型的注意力分布,特别是在图像中有多个干扰元素时。研究了硬注意力、软注意力和双重随机注意力的不同实现,并通过实验展示了模型能够关注到‘非对象’的显著区域,生成更丰富和描述性的标题。
摘要由CSDN通过智能技术生成
Xu, Kelvin, et al. “Show, attend and tell: Neural image caption generation with visual attention.” arXiv preprint arXiv:1502.03044 2.3 (2015): 5. (Citations: 401).


1 Motivation

In the previous image captioning model, the RNN decoder part only looks at the whole image once. Besides, the CNN encoder part encode fc7 representations which distill information in image down to the most salient objects.


However, this has one potential drawback of losing information which could be useful for richer, more descriptive captions. Using more low-level representation (conv4/conv5

features) can help preserve this information. However working with these features necessitates a attention mechanism to learn to fix its gaze on salient objects while generating the corresponding words in the output sequence to release computational burden. Another usage of attention model is the ability to visualize what the model “sees”.


</

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值