Show and Tell Lessons learned from the 2015 MSCOCO Image Captioning Challenge论文及tensorflow源码解读
1 论文
“Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge.”
Full text available at: http://arxiv.org/abs/1609.06647
1.1 Model Overview
文章目的
用英文来描述图片内容
Indeed, a description must capture not only the objects contained in an image, but it also must express how these objects relate to each other as well as their attributes and the activities they are involved in. Moreover, the above semantic knowledge has to be expressed in a natural language like English, which means that a language model is needed in addition to visual understanding.
最大化 p(S|I) , Si 代表句子中单词, I 代表图片(image)
本质:
sequence to sequence learning
Model
概述:通过CNN提取图片特征,使用LSTM在每一个时间点产生一个word
目标函数: