Show and Tell Lessons learned from the 2015 MSCOCO Image Captioning Challenge论文及tensorflow源码解读(2)

最新推荐文章于 2023-10-25 22:57:03 发布

zhoujunr1

最新推荐文章于 2023-10-25 22:57:03 发布

阅读量752

点赞数

分类专栏：读读论文文章标签：源码 tensorflow inception deep-learning

本文链接：https://blog.csdn.net/zhoujunr1/article/details/77131416

版权

本文深入探讨如何利用预训练的Inception V3模型进行图像特征提取，结合LSTM进行图像标题生成。通过TensorFlow实现训练过程，包括模型构建、学习率设置及训练操作。

摘要由CSDN通过智能技术生成

Source code

在建立了图片和caption的输入后，这部分将图片转换为固定大小的tensor，就像论文提及的，使用已经用很大的数据集训练好的深度网络模型，不改变它的参数，直接用于特征提取。

首先将图片丢入inception v3网络中，得到输出，代码如下：

inception_output = image_embedding.inception_v3(
        self.images,
        trainable=self.train_inception,
        is_training=self.is_training())

这里我们先来看一下inception v3这个模型。inception model
“Rethinking the Inception Architecture for Computer Vision”slim包中提供的inception_v3函数直接返回论文中提到的模型。

Map inception output into embedding space.
这里直接用inception的输出作为图片的特征，并且通过一个全联接层，作为embedding。

with tf.variable_scope("image_embedding") as scope:
  image_embeddings = tf.contrib

关注

专栏目录