Show and Tell: A Neural Image Caption Generator

最新推荐文章于 2023-01-02 17:26:03 发布

luputo

最新推荐文章于 2023-01-02 17:26:03 发布

阅读量392

点赞数

分类专栏：论文笔记

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/luo3300612/article/details/89839851

版权

论文笔记专栏收录该内容

41 篇文章 3 订阅

订阅专栏

Show and Tell: A Neural Image Caption Generator

时间：2015年

Target

Automatically describe the content of an image

Difficulty

image captioning 不仅要得到图片包含的物体，而且要给出它们之间的关系

Inspiration

machine translation with Recurrent Neural Networks(RNNs), an “encoder” RNN reads the source sentence and transforms it into a rich fixed-length vector representation, which in turn in used as the initial hidden state of a “decoder” RNN that generates the target sentence

Contribution

an end-to-end system for image caption

Idea

在这里插入图片描述

将机器翻译中的encoder换成 pre-trained 的 CNN来提取图片信息，再用LSTM作为decoder
使用了word embedding
损失函数对encoder、decoder和word embedding同时做更新
Inference 的时候使用 Beam Search

Model

在这里插入图片描述
损失函数

实际上就是分类的负对数损失

Evaluation Metrics

subjective score
BLEU
perplexity

训练细节

使用ImageNet预训练模型避免过拟合
使用在大型文集上初始化的词向量避免过拟合的效果不明显，所以不这样做
dropout、ensembling
SGD with fixed lr and no momentum
CNN参数不变
embeding维度512，LSTM memory也是
描述标签预处理，保留出现次数超过五次的词

Terminology

NIC: Neural Image Caption
BLEU
perplexity

问题

CIDEr

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Show and Tell: A Neural Image Caption Generator

Show and Tell: A Neural Image Caption GeneratorTargetAutomatically describe the content of an imageDifficultyA description must capture not only the objects contained in an image, but it also must...
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。