基于tensorflow 1.0的图像叙事功能测试(model/im2txt)

作为多模态数据处理的经典,图像自动打标签(图像叙事功能)一直是一项非常前沿的技术,涉及到机器视觉,自然语言处理等模块。


幸运的是,谷歌基于tensorflow将此项功能进行开源。https://github.com/tensorflow/models/tree/master/im2txt#generating-captions


该功能的英文介绍如下:

The Show and Tell model is a deep neural network that learns how to describe the content of images.


其架构英文介绍如下:

The Show and Tell model is an example of an encoder-decoder neural network. It works by first "encoding" an image into a fixed-length vector representation, and then "decoding" the representation into a natural language description.

The image encoder is a deep convolutional neural network. This type of network is widely used for image tasks and is currently state-of-the-art for object recognition and detection. Our particular choice of network is the Inception v3 image recognition model pretrained on the ILSVRC-2012-CLS image classification dataset.

The decoder is a long short-term memory (LSTM) network. This type of network is commonly used for sequence modeling tasks such as language modeling and machine translation. In the Show and Tell model, the LSTM network is trained as a language model conditioned on the image encoding.

Words in the captions are represented with an embedding model. Each word in the vocabulary is associated with a fixed-length vector representation that is learned during training.

The following diagram illustrates the model architecture.

即结合了inception v3 + LSTM模型来实现整个架构。将图像的表示后向量与图像标记的词向量传入到整个模型中。(具体的模型见GITHUB相关页面,比较经典的。)


二、实验测试

为了进行实验,找了提前训练好的模型,不过由于本文实验在tensorflow 1.0版本之上,需要填好几个坑:


(1) word_counts.txt文件的处理,需要将文件中的 b' str'  ==>  str,即把字符串的引号等全部去掉。


(2)修改预训练模型中的名称,由于预训练模型的名称不一致的问题,所以需要进行修改。

在具体代码修改中,添加一个函数来进行模型的修改和重新保存

# 由于版本不同,需要进行修改
def RenameCkpt():
    vars_to_rename = {
    "lstm/BasicLSTMCell/Linear/Matrix": "lstm/basic_lstm_cell/weights",
    "lstm/BasicLSTMCell/Linear/Bias": "lstm/basic_lstm_cell/biases",
    }
    new_checkpoint_vars = {}
    reader = tf.train.NewCheckpointReader(FLAGS.checkpoint_path)
    for old_name in reader.get_variable_to_shape_map():
      if old_name in vars_to_rename:
        new_name = vars_to_rename[old_name]
      else:
        new_name = old_name
      new_checkpoint_vars[new_name] = tf.Variable(reader.get_tensor(old_name))
    
    init = tf.global_variables_initializer()
    saver = tf.train.Saver(new_checkpoint_vars)
    
    with tf.Session() as sess:
      sess.run(init)
      saver.save(sess, "/home/ndscbigdata/work/change/tf/gan/im2txt/ckpt/newmodel.ckpt-2000000")
    print("checkpoint file rename successful... ")


具体实验:

(1)手动设置一些参数

FLAGS.checkpoint_path = "/home/ndscbigdata/work/change/tf/gan/im2txt/ckpt/newmodel.ckpt-2000000"
FLAGS.vocab_file = "./data/volab.txt"
FLAGS.input_files = "./data/COCO_val2014_000000224477.jpg,./data/ep271.jpg,./data/dog.jpg"


(2)实验图片

图像 COCO_val2014_000000224477.jpg 标题是:
  0) a man riding a wave on top of a surfboard . (概率=0.035672)
  1) a person riding a surf board on a wave (概率=0.016238)
  2) a man on a surfboard riding a wave . (概率=0.010146)



图像 ep271.jpg 标题是:
  0) a woman is standing next to a horse . (概率=0.000759)
  1) a woman is standing next to a horse (概率=0.000647)
  2) a woman is standing next to a brown horse . (概率=0.000384)



图像 dog.jpg 标题是:
  0) a dog is eating a slice of pizza . (概率=0.000138)
  1) a dog is eating a slice of pizza on a plate . (概率=0.000047)
  2) a dog is sitting at a table with a pizza on it . (概率=0.000039)


注:最后这张图片,是谷歌经典的实验用图,可以看出其测试结果还是相当令人满意的。


可惜由于实验硬件太差,要不可以结合inception v4来训练,应该效果会更好。另外,还有中文标签的生成。


具体的修改源码将公布在本人的github上,欢迎大家前往下载。https://github.com/ndscigdata

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 13
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 13
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值