by Cole Murray
通过科尔·默里(Cole Murray)
在Tensorflow中使用深度学习构建图像标题生成器 (Building an image caption generator with Deep Learning in Tensorflow)
In my last tutorial, you learned how to create a facial recognition pipeline in Tensorflow with convolutional neural networks. In this tutorial, you’ll learn how a convolutional neural network (CNN) and Long Short Term Memory (LSTM) can be combined to create an image caption generator and generate captions for your own images.
在我的上一教程中 ,您学习了如何使用卷积神经网络在Tensorflow中创建面部识别管道。 在本教程中,您将学习如何将卷积神经网络 (CNN)和长期短期记忆 (LSTM)组合在一起以创建图像标题生成器并为自己的图像生成标题。
总览 (Overview)
- Introduction to Image Captioning Model Architecture 图像字幕模型架构简介
- Captions as a Search Problem 字幕作为搜索问题
- Creating Captions in Tensorflow 在Tensorflow中创建字幕
先决条件 (Prerequisites)
- Basic understanding of Convolutional Neural Networks 卷积神经网络的基本理解
- Basic understanding of LSTM 对LSTM的基本了解
- Basic understanding of Tensorflow 对Tensorflow的基本了解
图像字幕模型架构简介 (Introduction to image captioning model architecture)
结合CNN和LSTM (Combining a CNN and LSTM)
In 2014, researchers from Google released a paper, Show And Tell: A Neural Image Caption Generator. At the time, this architecture was state-of-the-art on the MSCOCO dataset. It utilized a CNN + LSTM to take an image as input and output a caption.
2014年,来自Google的研究人员发表了一篇论文,《 展示与讲述:神经图像字幕生成器》 。 当时,该体系结构是MSCOCO数据集上的最新技术。 它利用CNN + LSTM拍摄图像作为输入并输出字幕。
使用CNN进行图像嵌入 (Using a CNN for image embedding)
A convolutional neural network can be used to create a dense feature vector. This dense vector, also called an embedding, can be used as feature input into other algorithms or networks.
卷积神经网络可用于创建密集特征向量。 此密集向量也称为嵌入,可以用作其他算法或网络的特征输入。
For an image caption model, this embedding becomes a dense representation of the image and will be used as the initial state of the LSTM.
对于图像标题模型,此嵌入将成为图像的密集表示,并将用作LSTM的初始状态。
LSTM (LSTM)
An LSTM is a recurrent neural network architecture that is commonly used in problems with temporal dependences. It succeeds in being able to capture i