matlabfor读取多个图像_tensorflow实现不同尺度图像的批量训练(无resize操作)

第一部分:写作动机

以往总想写点什么东西来记录自己的学习历程,记录自己遇到的难题。但是当自己真正把难题解决了之后,又总是心想着反正都知道怎么解决了,还记录下来有什么用呢,再加上懒惰等因素,每次都难以提笔。现在打算逐渐克服这些因素,随着学习的历程,将自己的所思所感都记录下来,便于自己查阅,更便于大家一起交流。

第二部分:本文内容

相信不少搞深度学习的小伙伴都有使用tensorflow的习惯。Tensorflow的流行度以及其优势我自然不必多说。大多数搞计算机视觉(CV)的小伙伴都有使用卷积神经网络(CNN)的经历,一般地,都是将图像resize成长和宽都一样的批量数据,格式为(batch_size, height, width, channel),分别表示为:mini_batch大小、图像的高(图像的行数)、图像的宽(图像的列数)和通道数。使用tensorflow的数据读取机制,比如实用tensorflow内置队列机制,或者Dataset类,如果要批量读取图像,首先就一定要把图像的height和width都resize成统一的数字,不然不可能将一个batch的图像一次性送入网络。如下图2-1,是小伙伴们一般性的做法。

090dda9dfff4b5b612fc0a366fb92d5d.png

图2-1 一般性的网络分类

但是如果我们要面临一种情况:要将不同尺度的图像塞进网络进行训练,该怎么办?

最简单的方法当然是每次仅仅读取一张图像,当然也有不少方法是这样做的,比如最初使用的caffe版本的faster rcnn等。可是具有强迫症的小伙伴肯定不愿意就此停住追求自己想法的脚步,一定要让长和夸不同的图像进行批量训练当然不是不可能,以下来详解我自己得出的一个想法(已验证可正常训练)

第三部分:长和夸不一致的图像批量训练(不用resize)

实际可行的方法较多,由于个人精力有限(主要还是懒惰),本文仅以一种想法进行说明,本文想法比较简单,整个过程如图3-1所示。首先就是开启多个输入通道,每个通道每次仅读取一张图像,图像读取通道的数目即为batch_size的数目。如图3-1所示,表示开启了两个通道,每个通道相互独立,各自读取图像。上面一个分支和下面一个分支网络结构相同且参数共享(这是关键,不然就不是一个模型而是两个模型了,是不是想到了Siamese?)。两个分支,由于不同卷积层或者全连接层参数共享,所以在tensorflow中,实际上为两个Graph,却表示一个网络模型,两个Graph各自接受来自不同图像输入通道的图像,表示这一个网络模型同时接受两个图像的输入(batch_size为2)。上面的一个Graph分支可计算出

f1da4ccde5a886fc1b77bbe6030b10b6.png

图3-1 不同尺度图像的批量训练的思路

一个Loss,下面一个Graph也可以计算出来一个Loss,最后将两个Loss求平均,得到batch_size为2的情况得到的最终损失Loss,根据这个最终Loss,进行网络参数的优化,最终以实现不同尺度图像的批量训练。

第四部分:代码实现

本文以cifar10分类为例,进行说明。首先提取cifar10的图像,在[32,64,128]这个列表中随机选择两个数字,对原始的图像进行resize,形成大小不同的训练图像。这一部分涉及到简单图像的处理,就不用多说了。Python就几行代码罢了。

1)接下来实现网络模型(代码没有认证整理,稍乱),如下所示:

  1. #coding:utf-8
  2. import tensorflow as tf
  3. import numpy as np
  4. from functools import reduce
  5. from tensorflow.python.ops import control_flow_ops
  6. from tensorflow.python.training import moving_averages
  7. UPDATE_OPS_COLLECTION = 'resnet_update_ops'
  8. VGG_MEAN = [103.939, 116.779, 123.68]
  9. class vgg16:
  10. def __init__(self, vgg16_npy_path=None, trainable=True, dropout=0.5,count=1):
  11. if vgg16_npy_path is not None:
  12. self.data_dict = np.load(vgg16_npy_path, encoding='latin1').item()
  13. else:
  14. self.data_dict = None
  15. self.names = globals()
  16. self.var_dict = {}
  17. self.trainable = trainable
  18. self.dropout = dropout
  19. self.NUM_CLASSES = 10
  20. self.count = count
  21. def build(self, rgb, train_mode=None):
  22. print('nbuild network...')
  23. rgb_scaled = rgb * 1.0
  24. red, green, blue = tf.split(axis=3, num_or_size_splits=3, value=rgb_scaled)
  25. '''
  26. assert red.get_shape().as_list()[1:] == [224, 224, 1]
  27. assert green.get_shape().as_list()[1:] == [224, 224, 1]
  28. assert blue.get_shape().as_list()[1:] == [224, 224, 1]
  29. '''
  30. bgr = tf.concat(axis=3, values=[blue - VGG_MEAN[0],green - VGG_MEAN[1],red - VGG_MEAN[2]])
  31. '''
  32. assert bgr.get_shape().as_list()[1:] == [224, 224, 3]
  33. '''
  34. self.conv1_1 = self.conv_layer(bgr, 3,64,"conv1_1")
  35. self.conv1_2 = self.conv_layer(self.conv1_1, 64,64,"conv1_2")
  36. self.pool1 = self.max_pool(self.conv1_2, 'pool1')
  37. self.conv2_1 = self.conv_layer(self.pool1, 64,128,"conv2_1")
  38. self.conv2_2 = self.conv_layer(self.conv2_1,128 ,128,"conv2_2")
  39. self.pool2 = self.max_pool(self.conv2_2, 'pool2')
  40. self.conv3_1 = self.conv_layer(self.pool2, 128,256,"conv3_1")
  41. self.conv3_2 = self.conv_layer(self.conv3_1,256,256,"conv3_2")
  42. self.conv3_3 = self.conv_layer(self.conv3_2, 256,256,"conv3_3")
  43. self.pool3 = self.max_pool(self.conv3_3, 'pool3')
  44. self.conv4_1 = self.conv_layer(self.pool3, 256,512, "conv4_1")
  45. self.conv4_2 = self.conv_layer(self.conv4_1,512,512,"conv4_2")
  46. self.conv4_3 = self.conv_layer(self.conv4_2,512,512, "conv4_3")
  47. self.pool4 = self.max_pool(self.conv4_3, 'pool4')
  48. self.conv5_1 = self.conv_layer(self.pool4,512,512, "conv5_1")
  49. self.conv5_2 = self.conv_layer(self.conv5_1,512,512, "conv5_2")
  50. self.conv5_3 = self.conv_layer(self.conv5_2,512,512, "conv5_3")
  51. self.pool5 = self.max_pool(self.conv5_3, 'pool5')
  52. self.pool5 = tf.reduce_mean(self.pool5,[1,2])
  53. self.fc6 = self.fc_layer(self.pool5, 512, 512, "fc_6")
  54. self.relu6 = tf.nn.relu(self.fc6)
  55. if train_mode is not None:
  56. self.relu6 = tf.cond(train_mode, lambda: tf.nn.dropout(self.relu6, self.dropout),lambda: self.relu6)
  57. elif self.trainable:
  58. self.relu6 = tf.nn.dropout(self.relu6, self.dropout)
  59. self.fc7 = self.fc_layer(self.relu6, 512, 512, "fc_7")
  60. self.relu7 = tf.nn.relu(self.fc7)
  61. if train_mode is not None:
  62. self.relu7 = tf.cond(train_mode, lambda: tf.nn.dropout(self.relu7, self.dropout), lambda: self.relu7)
  63. elif self.trainable:
  64. self.relu7 = tf.nn.dropout(self.relu7, self.dropout)
  65. self.fc8 = self.fc_layer(self.relu7, 512, self.NUM_CLASSES, 'fc8_pred')
  66. self.pred = tf.nn.softmax(self.fc8,name='pred')
  67. def avg_pool(self, bottom, name):
  68. return tf.nn.avg_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)
  69. def max_pool(self, bottom, name):
  70. return tf.nn.max_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)
  71. def conv_layer(self, bottom, in_channels, out_channels, name):
  72. if self.count == 1:
  73. with tf.variable_scope(name):
  74. filt, conv_biases = self.get_conv_var(3, in_channels, out_channels, name)
  75. conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME')
  76. bias = tf.nn.bias_add(conv, conv_biases)
  77. relu = tf.nn.relu(bias)
  78. return relu
  79. else:
  80. filt = tf.get_default_graph().get_tensor_by_name(name+'/'+name+'_filters:0')
  81. conv_biases = tf.get_default_graph().get_tensor_by_name(name+'/'+name+'_biases:0')
  82. conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME')
  83. bias = tf.nn.bias_add(conv, conv_biases)
  84. relu = tf.nn.relu(bias)
  85. return relu
  86. def fc_layer(self, bottom, in_size, out_size, name):
  87. if self.count == 1:
  88. with tf.variable_scope(name):
  89. weights, biases = self.get_fc_var(in_size, out_size, name)
  90. x = tf.reshape(bottom, [-1, in_size])
  91. fc = tf.nn.bias_add(tf.matmul(x, weights), biases)
  92. return fc
  93. else:
  94. weights = tf.get_default_graph().get_tensor_by_name(name+'/'+name+'_weights:0')
  95. biases = tf.get_default_graph().get_tensor_by_name(name+'/'+name+'_biases:0')
  96. x = tf.reshape(bottom, [-1, in_size])
  97. fc = tf.nn.bias_add(tf.matmul(x, weights), biases)
  98. return fc
  99. def get_conv_var(self, filter_size, in_channels, out_channels, name):
  100. initial_value = tf.truncated_normal([filter_size, filter_size, in_channels, out_channels], 0.0, 0.001)
  101. filters = self.get_var(initial_value, name, 0, name + "_filters")
  102. initial_value = tf.truncated_normal([out_channels], .0, .001)
  103. biases = self.get_var(initial_value, name, 1, name + "_biases")
  104. return filters, biases
  105. def get_fc_var(self, in_size, out_size, name):
  106. initial_value = tf.truncated_normal([in_size, out_size], 0.0, 0.001)
  107. weights = self.get_var(initial_value, name, 0, name + "_weights")
  108. initial_value = tf.truncated_normal([out_size], .0, .001)
  109. biases = self.get_var(initial_value, name, 1, name + "_biases")
  110. return weights, biases
  111. def get_var(self, initial_value, name, idx, var_name):
  112. if self.data_dict is not None and name in self.data_dict:
  113. value = self.data_dict[name][idx]
  114. else:
  115. value = initial_value
  116. if self.trainable:
  117. var = tf.Variable(value, name=var_name)
  118. else:
  119. var = tf.constant(value, dtype=tf.float32, name=var_name)
  120. self.var_dict[(name, idx)] = var
  121. assert var.get_shape() == initial_value.get_shape()
  122. return var
  123. def save_npy(self, sess, npy_path="./vgg16-save.npy"):
  124. assert isinstance(sess, tf.Session)
  125. data_dict = {}
  126. for (name, idx), var in list(self.var_dict.items()):
  127. var_out = sess.run(var)
  128. if name not in data_dict:
  129. data_dict[name] = {}
  130. data_dict[name][idx] = var_out
  131. np.save(npy_path, data_dict)
  132. print(("file saved", npy_path))
  133. return npy_path
  134. def get_var_count(self):
  135. count = 0
  136. for v in list(self.var_dict.values()):
  137. count += reduce(lambda x, y: x * y, v.get_shape().as_list())
  138. return count

2)数据读取代码:

  1. #coding:utf-8
  2. import tensorflow as tf
  3. import numpy as np
  4. import cv2
  5. def mydataset(image_root,filename,batch_size,hight,width):
  6. def get_batches(filename,label):
  7. '''
  8. image_string = tf.read_file(filename)
  9. image_decoded = tf.image.decode_png(image_string)
  10. '''
  11. image_resized = tf.image.resize_images(filename, [hight,width])
  12. image_resized = tf.cast(image_resized,tf.float32)
  13. return image_resized, label
  14. def get_files(image_root,filename):
  15. image_name = []
  16. labels0 = []
  17. image_root = image_root + '/'
  18. for line in open(filename):
  19. items = line.strip().split(' ')
  20. image_name.append(image_root+items[0])
  21. labels0.append(int(items[1]))
  22. temp = np.array([image_name,labels0])
  23. temp = temp.transpose()
  24. np.random.shuffle(temp)
  25. name_list = list(temp[:,0])
  26. labels = list(temp[:,1])
  27. labels_list = [int(i) for i in labels]
  28. return name_list,labels_list
  29. def _read_py_function(filename, label):
  30. filename = filename.decode(encoding='utf-8')
  31. image_decoded = cv2.imread(filename)
  32. return image_decoded.astype('float32'), label
  33. filenames , labels = get_files(image_root,filename)
  34. dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
  35. dataset = dataset.map(lambda filename, label: tf.py_func(_read_py_function, [filename, label], [tf.float32, tf.int32]))
  36. dataset = dataset.shuffle(buffer_size=1000).batch(batch_size).repeat()
  37. return dataset

3)训练代码

  1. # -*- coding: utf-8 -*-
  2. import tensorflow as tf
  3. import os
  4. from datagenerator_dataset import mydataset
  5. import vgg16_trainable_dataset as vgg16
  6. BATCH_SIZE_TRAIN = 1
  7. BATCH_SIZE_VAL = 1
  8. TRAINING_STEPS = 25001
  9. IMAGE_SIZE_HEIGHT = 224
  10. IMAGE_SIZE_WIDTH = 224
  11. NUM_CHANNELS = 3
  12. save_model_interval = 4000
  13. val_interval = 20005
  14. the_num_of_train_images = 50000
  15. the_num_of_val_images = 10000
  16. image_root = 'data_different_size'
  17. filename = 'train_labels.txt'
  18. valtxt = 'test_labels.txt'
  19. inf_log = './log'
  20. save_model_path = './trainedmodel'
  21. lr_steps = [ 4000, 8000 ,12000 ,16000 ,20000 ]
  22. lr_value = [0.01, 0.001, 0.001, 0.0005, 0.0002 ,0.00004]
  23. def training(loss,lr,global_step):
  24. train_op = tf.train.GradientDescentOptimizer(lr).minimize(loss, global_step=global_step)
  25. return train_op
  26. def Loss(true_labels,pred):
  27. return tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = true_labels,logits = pred))
  28. def get_accuracy(true_labels,pred):
  29. correct_prediction = tf.equal(true_labels, tf.argmax(pred,1))
  30. return tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
  31. def train():
  32. os.environ["CUDA_VISIBLE_DEVICES"] = "0"
  33. global_step = tf.get_variable('global_step', [], dtype=tf.int32,initializer=tf.constant_initializer(0), trainable=False)
  34. learning_rate = tf.train.piecewise_constant(global_step, boundaries=lr_steps, values=lr_value, name='lr_schedule')
  35. dataset1 = mydataset(image_root,filename,BATCH_SIZE_TRAIN,IMAGE_SIZE_HEIGHT,IMAGE_SIZE_WIDTH)
  36. iterator1 = dataset1.make_one_shot_iterator()
  37. batch_data1 = iterator1.get_next()
  38. dataset2 = mydataset(image_root,filename,BATCH_SIZE_TRAIN,IMAGE_SIZE_HEIGHT,IMAGE_SIZE_WIDTH)
  39. iterator2 = dataset2.make_one_shot_iterator()
  40. batch_data2 = iterator2.get_next()
  41. images1 = tf.placeholder(tf.float32, [None,None,None,NUM_CHANNELS],name='x-input1')
  42. images2 = tf.placeholder(tf.float32, [None,None,None,NUM_CHANNELS],name='x-input2')
  43. labels1 = tf.placeholder(tf.int64, [None,], name = 'y-output1')
  44. labels2 = tf.placeholder(tf.int64, [None,], name = 'y-output2')
  45. train_mode1 = tf.placeholder(tf.bool,name = 'trainmode1')
  46. train_mode2 = tf.placeholder(tf.bool,name = 'trainmode2')
  47. model1 = vgg16.vgg16('vgg16.npy',count=1)
  48. model2 = vgg16.vgg16('vgg16.npy',count=2)
  49. model1.build(images1, train_mode1)
  50. model2.build(images2, train_mode2)
  51. acc1 = get_accuracy(labels1,model1.pred)
  52. loss1 = Loss(labels1,model1.pred)
  53. acc2 = get_accuracy(labels2,model2.pred)
  54. loss2 = Loss(labels2,model2.pred)
  55. loss = (loss1+loss2)/2
  56. acc = (acc1+acc2)/2
  57. train_op = training((loss1+loss2)/2, learning_rate, global_step)
  58. config = tf.ConfigProto()
  59. config.gpu_options.allow_growth = True
  60. with tf.Session(config=config) as sess:
  61. tf.global_variables_initializer().run()
  62. print("begin training...")
  63. for i in range(TRAINING_STEPS):
  64. name_list_bathch_train1,labels_list_bathch_train1 = sess.run(batch_data1)
  65. name_list_bathch_train2,labels_list_bathch_train2 = sess.run(batch_data2)
  66. my_feed_dict = {}
  67. my_feed_dict[train_mode1]= True
  68. my_feed_dict[train_mode2]= True
  69. my_feed_dict[images1] = name_list_bathch_train1
  70. my_feed_dict[labels1] = labels_list_bathch_train1
  71. my_feed_dict[images2] = name_list_bathch_train2
  72. my_feed_dict[labels2] = labels_list_bathch_train2
  73. _,LOSS,ACC,num_step,lr = sess.run([train_op,loss,acc,global_step,learning_rate], feed_dict = my_feed_dict)
  74. print("After %dth training step(s), loss is %f, acc is %f, lr is %f." % (num_step, LOSS,ACC,lr))
  75. def main(argv=None):
  76. train()
  77. if __name__ == '__main__':
  78. tf.app.run()

特别说明:本文没有理会训练的好不好,搞机器学习的你们都知道,数据加上模型能不能得到好的结果需要好好实验并作分析,本文目的在于使得网络能够批量读取尺度不同的图像进行训练,实验证明,本代码可以正常运行。如果有问题,欢迎共同交流。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值