在Tensorflow中使用深度学习构建图像标题生成器

最新推荐文章于 2021-08-24 12:47:39 发布

cumifi2519

最新推荐文章于 2021-08-24 12:47:39 发布

阅读量674

点赞数

文章标签：算法神经网络 python tensorflow 机器学习

原文链接：https://www.freecodecamp.org/news/building-an-image-caption-generator-with-deep-learning-in-tensorflow-a142722e9b1f/

版权

本文介绍了如何使用Tensorflow结合CNN和LSTM创建图像标题生成器。通过CNN进行图像嵌入，LSTM作为语言模型的初始状态，利用波束搜索优化生成的字幕质量。教程涵盖了环境设置、预训练模型的下载、词汇表处理以及在Tensorflow中实现字幕生成的过程。

摘要由CSDN通过智能技术生成

by Cole Murray

通过科尔·默里(Cole Murray)

在Tensorflow中使用深度学习构建图像标题生成器 (Building an image caption generator with Deep Learning in Tensorflow)

In my last tutorial, you learned how to create a facial recognition pipeline in Tensorflow with convolutional neural networks. In this tutorial, you’ll learn how a convolutional neural network (CNN) and Long Short Term Memory (LSTM) can be combined to create an image caption generator and generate captions for your own images.

在我的上一教程中，您学习了如何使用卷积神经网络在Tensorflow中创建面部识别管道。在本教程中，您将学习如何将卷积神经网络 (CNN)和长期短期记忆 (LSTM)组合在一起以创建图像标题生成器并为自己的图像生成标题。

总览 (Overview)

Introduction to Image Captioning Model Architecture
图像字幕模型架构简介
Captions as a Search Problem
字幕作为搜索问题
Creating Captions in Tensorflow
在Tensorflow中创建字幕

先决条件 (Prerequisites)

Basic understanding of Convolutional Neural Networks
卷积神经网络的基本理解
Basic understanding of LSTM
对LSTM的基本了解
Basic understanding of Tensorflow
对Tensorflow的基本了解

图像字幕模型架构简介 (Introduction to image captioning model architecture)

结合CNN和LSTM (Combining a CNN and LSTM)

In 2014, researchers from Google released a paper, Show And Tell: A Neural Image Caption Generator. At the time, this architecture was state-of-the-art on the MSCOCO dataset. It utilized a CNN + LSTM to take an image as input and output a caption.

2014年，来自Google的研究人员发表了一篇论文，《展示与讲述：神经图像字幕生成器》。当时，该体系结构是MSCOCO数据集上的最新技术。它利用CNN + LSTM拍摄图像作为输入并输出字幕。

使用CNN进行图像嵌入 (Using a CNN for image embedding)

A convolutional neural network can be used to create a dense feature vector. This dense vector, also called an embedding, can be used as feature input into other algorithms or networks.

卷积神经网络可用于创建密集特征向量。此密集向量也称为嵌入，可以用作其他算法或网络的特征输入。

For an image caption model, this embedding becomes a dense representation of the image and will be used as the initial state of the LSTM.

对于图像标题模型，此嵌入将成为图像的密集表示，并将用作LSTM的初始状态。

LSTM (LSTM)

An LSTM is a recurrent neural network architecture that is commonly used in problems with temporal dependences. It succeeds in being able to capture i

最低0.47元/天解锁文章

cumifi2519

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
在Tensorflow中使用深度学习构建图像标题生成器

by Cole Murray 通过科尔·默里(Cole Murray) 在Tensorflow中使用深度学习构建图像标题生成器 (Building an image caption generator with Deep Learning in Tensorflow)In my last tutorial, you learned how to create a facial recogni...
复制链接

扫一扫