Show Attend and Tell代码修改获取测试集所有图像描述并利用coco-caption进行评估

最新推荐文章于 2024-04-10 09:58:29 发布

Curya

最新推荐文章于 2024-04-10 09:58:29 发布

阅读量1.6k

点赞数 2

分类专栏：图像描述文章标签：图像描述

本文链接：https://blog.csdn.net/Ricardo232525/article/details/89484427

版权

图像描述专栏收录该内容

5 篇文章 1 订阅

订阅专栏

1 背景

背景就是计算了show and tell的评估结果之后，觉得结果怎么这么高，是不是指标的计算有问题，然后想到之前跑过的一份show attend and tell代码里面计算的评估结果怎么很低，然后就想也用这种生成所有描述的方法计算指标，看看那一份show attend and tell的评价指标是不是也是很低。

show attend and tell代码地址（非论文作者实现）：https://github.com/yunjey/show-attend-and-tell

然后看了一下评估部分的代码，发现不是在测试集所有图像的描述基础上计算的指标，而是采样一部分（随机采样的话和整体计算得到的指标应该也会比较接近）。所以我想着按照show and tell的测试集，然后在show attend and tell上面生成描述（模型之前就训练过，但是只训练了很少的步数，凑合能用能生成描述就行）。

这个show attend and tell代码思路如下：

在vgg19预训练模型上，生成所有图像的特征映射并保存为hkl文件
所以只训练了后续结构，送入上一步生成的特征映射生成图像描述

因此，如果要修改，我需要做：

利用show and tell获取的final_test.json（包含测试集所有图像的路径），将测试集所有图像送入vgg19预训练模型，生成特征映射，存储为一个新的hkl文件
修改测试部分代码，把上一步生成的hkl文件中存储的测试集所有图像的特征映射送入模型，进一步生成所有图像的描述，并保存为coco-caption要求的json文件

2 代码修改

2.1 生成测试集图像的vgg19特征映射

载入final_test.json文件，获取测试集所有图像的路径

from scipy import ndimage, misc
from collections import Counter
from core.vggnet import Vgg19
from core.utils import *
import matplotlib.pyplot as plt
from PIL import Image

import tensorflow as tf
import numpy as np
import pandas as pd
import hickle
import os
import json

final_test = './final_test.json'
final_test_paths = []
with tf.gfile.FastGFile(final_test, 'r') as f:
    final_test_paths = json.load(f)
    
print('path by curya')
for final_test_path in final_test_paths:
	# 添加完整路径
    final_test_path['file_path'] = os.path.join('../show-and-tell', final_test_path['file_path'])
    print(final_test_path['id'], final_test_path['file_path'])

构建vgg19模型，并载入预训练模型参数

# vgg19预训练模型参数数据
vgg_model_path = './data/imagenet-vgg-verydeep-19.mat'
vggnet = Vgg19(vgg_model_path)
vggnet.build()

读取图像数据，送入vgg19预训练模型，保存特征映射

这部分代码参考的是show-attend-and-tell/prepro.py，按批次送入图像，一批次打开batch_size=90张图像，每张图像调整大小为224×224，组织成(batch_size, 224, 224, 3)的多维数组。最后将测试集图像的特征映射保存为final_test.features.hkl文件。

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    save_path = './final_test.features.hkl'
    n_example = len(final_test_paths)
    batch_size = 90
    # 开辟空间存储所有特征映射
    all_features = np.ndarray([n_example, 196, 512], dtype = np.float32)
    for start, end in zip(range(0, n_example, batch_size), range(batch_size, n_example, batch_size)):
    	# 把最后不足batch_size数量的图像加入到最后一批次
        if n_example - end < batch_size:
            end = None
        # 提取该批次图像的路径
        image_batch_path = final_test_paths[start:end]
        # 打开该批次图像，resize为224*224，并把所有图像组合为(batch_size, 224, 224, 3)大小
        image_batch = np.array(map(lambda x: misc.imresize(ndimage.imread(x['file_path'], mode='RGB'), size=(224,224)), 
                                   image_batch_path)).astype(np.float32)
        # 将该批次图像数据送入vgg19模型，生成该批次图像的特征映射
        features = sess.run(vggnet.features, feed_dict={vggnet.images: image_batch})
        # 把该批次特征映射填充到all_features对应位置中
        all_features[start:end, :] = features
        if end == None:
            end = n_example-1
        print ("Processed %d final test features.." % end)
    # 保存所有特征映射到hkl文件中
    hickle.dump(all_features, save_path)
    print('Saved %s..' % save_path)

2.2 利用图像特征映射文件`final_test.features.hkl`为所有图像生成描述并保存

这部分代码参考了show-attend-and-tell/evaluate_model.ipynb

导入需要的数据

import matplotlib.pyplot as plt
import cPickle as pickle
import tensorflow as tf
import numpy as np
import hickle
import time
import json
import os

from core.solver import CaptioningSolver
from core.model import CaptionGenerator

# 导入先前生成的测试集所有图像的特征映射
feature_path = './final_test.features.hkl'
data = {}
start_t = time.time()
data['features'] = hickle.load(feature_path)
end_t = time.time()
print('time: %.2f' % (end_t-start_t))
# 导入word_to_idx
with tf.gfile.FastGFile('./data/train/word_to_idx.pkl', 'r') as f:
    word_to_idx = pickle.load(f)

# 导入测试集图像路径的文件，这其中包含了所有图像的id号，生成描述之后存为json文件时需要
image_path_file = './final_test.json'
image_paths_file = []
with tf.gfile.FastGFile(image_path_file, 'r') as f:
    image_paths_file = json.load(f)

image_ids = []
image_paths = []
for image_path in image_paths_file:
    image_path['file_path'] = os.path.join('../show-and-tell', image_path['file_path'])
    image_ids.append(image_path['id'])
    image_paths.append(image_path['file_path'])

# 组织模型会用到的数据
data['image_ids'] = image_ids
data['image_paths'] = image_paths
print(len(data['features']))
print(len(data['image_ids']))
print(len(data['image_paths']))

构建show attend and tell模型

这部分可以和show-attend-and-tell/evaluate_model.ipynb一样，因为只是用于测试，所以里面很多参数其实都是没有用的，

'''
model = CaptionGenerator(word_to_idx, dim_feature=[196, 512], dim_embed=512,
                         dim_hidden=1024, n_time_step=16, prev2out=True, 
                         ctx2out=True, alpha_c=1.0, selector=True, dropout=True)
solver = CaptioningSolver(model, data, data, n_epochs=15, batch_size=128, update_rule='adam',
                          learning_rate=0.0025, print_every=2000, save_every=1, image_path='./image/val2014_resized',
                          pretrained_model=None, model_path='./model/lstm', test_model='./model/lstm/model-200',
                          print_bleu=False, log_path='./log/')
'''
# 构造图像描述网络模型，dim_feature指定了图像特征映射的大小
model = CaptionGenerator(word_to_idx, dim_feature=[196, 512], dim_embed=512, dim_hidden=1024)
# 用于描述生成
solver = CaptioningSolver(model, data, data, test_model='./model/lstm/model-200', log_path='./log/')

生成图像描述并存储

solver.curyaEvaluation(data)

我在show-attend-and-tell/core/solver.py里的CaptioningSolver类下新定义了一个函数curyaEvaluation()，用于为每一个图像生成描述并进行保存到attend_test_captions.json中，代码参考CaptioningSolver类下的test函数。

class CaptioningSolver(object):
    def __init__(self, model, data, val_data, **kwargs):
    	...
	def train(self):
		...
	def test(self, data, split='train', attention_visualization=True, save_sampled_captions=True):
		...
		
	# code by curya
    # 2019 4 23
    def curyaEvaluation(self, data, show=False):
        features = data['features']

        # build a graph to sample captions
        # tf.reset_default_graph()
        alphas, betas, sampled_captions = self.model.build_sampler(max_len=20)    # (N, max_len, L), (N, max_len)

        config = tf.ConfigProto(allow_soft_placement=True)
        config.gpu_options.allow_growth = True
        with tf.Session(config=config) as sess:
            saver = tf.train.Saver()
            saver.restore(sess, self.test_model)
            
            # feed in all test features 
            n_example = data['features'].shape[0]
            batch_size_curya = 90
            attend_test_captions = []
            
            for start, end in zip(range(0, n_example, batch_size_curya), \
            		range(batch_size_curya, n_example, batch_size_curya)):
                #print(start, end)
                if n_example-end < batch_size_curya:
                    end = None
                # 该批次测试集图像特征映射
                features_batch = np.array(data['features'])[start:end]
                print(features_batch.shape)
                image_files = np.array(data['image_paths'])[start:end]
                image_ids = np.array(data['image_ids'])[start:end]
                # 构建模型输入，即图像特征映射
                feed_dict = { self.model.features: features_batch } 
                # 把数据送入模型，获取图像描述
                alps, bts, sam_cap = sess.run([alphas, betas, sampled_captions], feed_dict)
                decoded = decode_captions(sam_cap, self.model.idx_to_word)
                # 把每张图像的描述与其id号一起构成字典，追加到列表attend_test_captions中
                for i in range(len(decoded)):
                    tmp_captions = {'image_id': image_ids[i], 'caption': decoded[i]}
                    attend_test_captions.append(tmp_captions)
                if end==None:
                    end = -1
                print('generated captions from %d to %d' % (start, end))
            # 把所有图像的id和描述存储为json文件
            with tf.gfile.FastGFile('./attend_test_captions.json', 'w') as f:
                json.dump(attend_test_captions, f, ensure_ascii=False)
                print('saved to attend_test_captions.json')
            # 这部分是对test函数里面的修改，随机选取batch_size张图像，获取其特征映射及图像的路径           
            def curya_minibatch(data, batch_size):
                data_size = data['features'].shape[0]
                mask = np.random.choice(data_size, batch_size)
                features = np.array(data['features'])[mask]
                image_paths = np.array(data['image_paths'])[mask]
                return features, image_paths            
			# 随机选取部分照片进行展示
            if show:
                #features_batch, image_files = sample_coco_minibatch(data, self.batch_size)
                features_batch, image_files = curya_minibatch(data, self.batch_size)
                feed_dict = { self.model.features: features_batch }
                alps, bts, sam_cap = sess.run([alphas, betas, sampled_captions], feed_dict) 
                decoded = decode_captions(sam_cap, self.model.idx_to_word)
                print('Curya Test!')
                for n in range(10):
                    print "Sampled Caption: %s" %decoded[n]

                    # Plot original image
                    img = ndimage.imread(image_files[n])
                    plt.subplot(4, 5, 1)
                    plt.imshow(img)
                    plt.axis('off')

                    # Plot images with attention weights
                    words = decoded[n].split(" ")
                    print(decoded[n])
                    for t in range(len(words)):
                        if t > 18:
                            break
                        plt.subplot(4, 5, t+2)
                        plt.text(0, 1, '%s(%.2f)'%(words[t], bts[n,t]),\
                        		color='black', backgroundcolor='white', fontsize=8)
                        plt.imshow(img)
                        alp_curr = alps[n,t,:].reshape(14,14)
                        alp_img = skimage.transform.pyramid_expand(alp_curr, upscale=16, sigma=20)
                        plt.imshow(alp_img, alpha=0.85)
                        plt.axis('off')
                    plt.show()

3 利用coco-caption计算描述指标

通过上一步，我们获取到了attend_test_captions.json，和show-and-tell一样，直接利用coco-caption的示例代码计算指标即可。
结果如下（和代码自带的采样一部分描述计算的指标相差不大，说明就是模型有好坏）：

CIDEr: 0.436
Bleu_4: 0.135
Bleu_3: 0.213
Bleu_2: 0.345
Bleu_1: 0.544
ROUGE_L: 0.405
METEOR: 0.172
SPICE: 0.101

CIDEr得分偏低的图像描述以及CIDEr得分的分布情况展示如下：
在这里插入图片描述

4 其他

在利用vgg19模型生成图像特征映射时，利用的scipy读取的图像并调整大小，

未完。。。

Curya

关注

2
点赞
踩
13

收藏

觉得还不错? 一键收藏
4
评论
Show Attend and Tell代码修改获取测试集所有图像描述并利用coco-caption进行评估

背景背景就是计算了show and tell的评估结果之后，觉得怎么这么高，然后想到之前show attend and tell代码里面计算的评估结果怎么很低，然后就像也用这种生成所有描述的方法计算指标。show attend and tell代码地址（非论文作者实现）：https://github.com/yunjey/show-attend-and-tell然后看了一下评估部分的代码，发...
复制链接

扫一扫