Show and Tell Lessons learned from the 2015 MSCOCO Image Captioning Challenge论文及tensorflow源码解读

zhoujunr1

于 2017-08-10 19:32:40 发布

阅读量1.7k

点赞数 1

分类专栏：读读论文文章标签：源码 tensorflow caption deep-learning

本文链接：https://blog.csdn.net/zhoujunr1/article/details/77072599

版权

本文详细解读了2015年MSCOCO图像caption挑战赛的论文《Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge》，介绍了基于CNN和LSTM的图像描述模型。论文指出，通过CNN提取图像特征，LSTM生成句子，利用序列到序列学习方法最大化p(S|I)。源码解析部分涵盖了数据预处理、模型构建和评估。重点讨论了模型的构建，包括图像解码器、基于LSTM的句子生成器，以及采样和束搜索的推理策略。

摘要由CSDN通过智能技术生成

Show and Tell Lessons learned from the 2015 MSCOCO Image Captioning Challenge论文及tensorflow源码解读
- 论文
  - 1 Model Overview
    - 文章目的
    - Model
- Source code
  - 1 数据预处理
  - 1 build_model
    - 11 build_inputs

Show and Tell Lessons learned from the 2015 MSCOCO Image Captioning Challenge论文及tensorflow源码解读

1 论文

“Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge.”

Full text available at: http://arxiv.org/abs/1609.06647

1.1 Model Overview

文章目的

用英文来描述图片内容

Indeed, a description must capture not only the objects contained in an image, but it also must express how these objects relate to each other as well as their attributes and the activities they are involved in. Moreover, the above semantic knowledge has to be expressed in a natural language like English, which means that a language model is needed in addition to visual understanding.

最大化 $p(S|I)$ , $S_i$ 代表句子中单词， $I$ 代表图片(image)