TensorFlow数据集操作（使用slim和tfrecord）

最新推荐文章于 2022-01-05 18:29:25 发布

CHEN666CONG

最新推荐文章于 2022-01-05 18:29:25 发布

阅读量1.4k

点赞数 2

分类专栏：神经网络模型 Tensorflow tfrecord 文章标签： tensorflow tfrecord 数据集 slim batch

本文链接：https://blog.csdn.net/CHEN666CONG/article/details/86586112

版权

神经网络模型同时被 3 个专栏收录

6 篇文章 0 订阅

订阅专栏

Tensorflow

2 篇文章 0 订阅

订阅专栏

tfrecord

1 篇文章 0 订阅

订阅专栏

在卷积神经网络中，输入的图像数据集都非常大，而且与其他数据不同，图像都需要以三维张量（height，width，channel）形式表示，这样使得神经网络读取数据非常麻烦。在TensorFlow框架中，有一种用的非常多的方法来处理数据集，就是tfrecord文件，它与TensorFlow的中层封装库slim搭配使用起来非常方便，下面我以DeepLabv3+中的数据处理代码为例解释一下该方法。

该数据处理过程主要包括三步：

读入图像和标签数据，并转化为tfrecord格式
使用slim库解码tfrecord文件，生成描述数据集信息的dataset
根据dataset得到相对应的图像和标签数据，并组成batch格式，输入队列

１．读入图像和标签数据，并转化为tfrecord格式

DeepLabv3+模型使用的是分割与检测数据集PASCAL VOC2012，在这里只使用它的分割部分在它的VOC主目录下包括5个文件夹，如图1所示。其中JPEGImages文件夹下是所有的原始图像image；SegmentationClass文件夹下是与图像相对应的标签label；ImageSets文件夹如图2所示，其中Segmentation文件夹下为输入图像的4个文件名索引文件，其内容如图4所示
在这里插入图片描述

下面的代码为从前面的图像和标签数据生成tfrecord的过程（源程序的简化版，省略了图像高和宽的计算过程与example中生成整型和字符串型属性的程序）。从数据集生成tfrecord文件主要步骤为：（1）读入图像文件名的索引文件，即生成索引列表；（2）将索引列表所指代的图像声明分别由多个tfrecord文件存储；（3）读取图像和标签文件；（4）将图像和标签的像素值，文件类型，图像通道数等重要信息通过tfrecord的数据结构存储。

import math
import os.path
import sys
import tensorflow as tf

dataset_splits = tf.gfile.Glob(os.path.join('./VOCdevkit/VOC2012/ImageSets/Segmentation', '*.txt')) # 查找匹配该路径名的文件（即前面提到的4个图像索引文件）
for dataset_split in dataset_splits:
	dataset = os.path.basename(dataset_split)[:-4] # 返回路径dataset_splits所对应的文件名，并且去除扩展名.txt
	filenames = [x.strip('\n') for x in open(dataset_split, 'r')] # 打开索引文件，并生成索引文件名列表
	num_images = len(filenames) # 数据集中图像的数量即索引列表的长度
	num_per_shard = int(math.ceil(num_images /4.0)) # 将图像均匀分成4份，输入4个tfrecord文件
	for shard_id in range(4):
		output_filename = os.path.join(./tfrecords,
        '%s-%05d-of-%05d.tfrecord' % (dataset, shard_id, _NUM_SHARDS)) # 4个输出的tfrecord文件的输出路径
        with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer: # 向output_filename文件夹写入tfrecord文件
        	start_idx = shard_id * num_per_shard # 每个tfrecord文件存储的起始图像索引
        	end_idx = min((shard_id + 1) * num_per_shard, num_images) # 每个tfrecord文件存储的结束图像索引；对于第四个tfrecord文件，可能仅保存剩余的图像。
        	for i in range(start_idx, end_idx):
       			# 读取图像文件.
        		image_filename = os.path.join(
            	'./VOCdevkit/VOC2012/JPEGImages', filenames[i] + '.' + 'jpeg' ) # 图像路径
        		image_data = tf.gfile.FastGFile(image_filename, 'rb').read() # FastGFile函数读取图像
        		# 读取标签文件
        		seg_filename = os.path.join(
            	FLAGS.semantic_segmentation_folder,
            	filenames[i] + '.' + FLAGS.label_format) # 标签图像路径
        		seg_data = tf.gfile.FastGFile(seg_filename, 'rb').read() # 读取标签
        		# 保存为tfrecord的example格式
        		example = tf.train.Example(features=tf.train.Features(feature={
      				'image/encoded': _bytes_list_feature(image_data),
     			    'image/filename': _bytes_list_feature(filename),
      				'image/format': _bytes_list_feature(_IMAGE_FORMAT_MAP[FLAGS.image_format]),
      				'image/channels': _int64_list_feature(3),
      				'image/segmentation/class/encoded': (_bytes_list_feature(seg_data)),
      				'image/segmentation/class/format': _bytes_list_feature(FLAGS.label_format)})) # tfrecord以字符串形式保存图像像素值，文件名，图像格式，通道数，标签像素值，标签格式
      	tfrecord_writer.write(example.SerializeToString()) # 用字符串形式存储example信息

2．使用slim库解码tfrecord文件，生成描述数据集信息的dataset

在生成tfrecord文件后，我们使用slim库来对其进行解读，生成slim中的一种数据结构dataset数据集。需要注意的是这里的dataset并不是真正的包含图像和标签的数据集，而是存储着数据集重要信息的一种数据结构，我们可以根据它一个一个地解码图像和标签。下面的代码是具体的生成dataset的过程，通过slim库定义了tfrecord文件的格式转换方式（注意这里并没有对其进行真正的格式转换，仅仅定义了这种方式），通过解码器对象decoder保存转换方式，后面再进行解码转换。

import os.path
import tensorflow as tf

slim = tf.contrib.slim
dataset = slim.dataset
tfexample_decoder = slim.tfexample_decoder

 file_pattern = '%s-*  '# 前面保存的tfrecord文件的文件名类似于“train-00001-of-00004.tfrecord”
 file_pattern = os.path.join(dataset_dir, file_pattern % split_name)  # dataset_dir即前面保存的tfrecord文件的路径

# 使用slim中的函数tf.FixedLenFeature将tfrecord的example反序列化成存储之前的格式，
# 字符串格式的用''表示，整型格式的用0表示，其他确定的信息还原为原来的形式，如'jpeg'，'png'
keys_to_features = {
     'image/encoded': tf.FixedLenFeature(
          (), tf.string, default_value=''),
     'image/filename': tf.FixedLenFeature(
          (), tf.string, default_value=''),
     'image/format': tf.FixedLenFeature(
          (), tf.string, default_value='jpeg'),
     'image/height': tf.FixedLenFeature(
          (), tf.int64, default_value=0),
     'image/width': tf.FixedLenFeature(
          (), tf.int64, default_value=0),
     'image/segmentation/class/encoded': tf.FixedLenFeature(
          (), tf.string, default_value=''),
     'image/segmentation/class/format': tf.FixedLenFeature(
          (), tf.string, default_value='png')}
# 将反序列化的数据重组为更适合网络读入的格式
items_to_handlers = {
      'image': tfexample_decoder.Image(
          image_key='image/encoded',
          format_key='image/format',
          channels=3),
      'image_name': tfexample_decoder.Tensor('image/filename'),
      'height': tfexample_decoder.Tensor('image/height'),
      'width': tfexample_decoder.Tensor('image/width'),
      'labels_class': tfexample_decoder.Image(
          image_key='image/segmentation/class/encoded',
          format_key='image/segmentation/class/format',
          channels=1)}
# 解码器进行解码，定义一个解码器对象，保存到dataset中
decoder = tfexample_decoder.TFExampleDecoder(
      keys_to_features, items_to_handlers)
# 返回由tfrecord信息所得到的数据集dataset，dataset对象定义了数据集的文件位置，解码方式等元信息
dataset = dataset.Dataset(
      data_sources=file_pattern,  # tfrecord路径
      reader=tf.TFRecordReader,   # 读取tfrecord文件的方式
      decoder=decoder,            # 解码tfrecord文件的方式
      num_samples=1464,           # PASCAL-VOC2012数据集训练样本数
      items_to_descriptions={     # 样本集图像和标签描述
      		'image': 'A color image of varying height and width.',
      		'labels_class': ('A semantic segmentation label whose size matches image.'
                     		 'Its values range from 0 (background) to num_classes.')}
      ignore_label=ignore_label,  # 忽略部分标签
      num_classes=21,             # 数据集包含类别数（20个前景类别和1个背景类别）
      multi_label=True)           # 多标签（具体我也不太清楚）

3．根据dataset得到相对应的图像和标签数据，并组成batch格式，输入队列

下面的代码是从dataset将数据集组合为batch并且输入队列的过程，该过程主要分为以下几个步骤：（1）创建一个DatasetDataProvider类的对象data_provider，该对象中存储着解析tfrecord文件的详细信息，通过调用其get函数可以得到解读的图像，标签，图像文件名，图像宽和高等。（2）图像预处理，在将图像送入网络前的图像预处理过程一般在此处进行，包括图像裁剪，亮度调整，镜像翻转等。（3）将处理后的图像通过tf.train.batch函数组合为多个batch，然后通过slim库中的prefetch_queue函数输入队列，然后就可以作为整个网络的输入。需要注意的是，dataset通过get函数对tfrecord文件进行解析的时候返回的为一个样例，即一幅图像及其标签。

import tensorflow as tf
from deeplab import common
from deeplab import input_preprocess

slim = tf.contrib.slim
dataset_data_provider = slim.dataset_data_provider
prefetch_queue = slim.prefetch_queue

# 创建一个DatasetDataProvider类的对象data_provider，根据dataset和其他的一些已知信息读取数据。
data_provider = dataset_data_provider.DatasetDataProvider(
      dataset,
      num_readers=1,
      num_epochs=None,
      shuffle=True)
# 通过调用data_provider对象的get实例函数能够根据data_provider中给出的信息解读tfrecord文件，生成图像和标签和图像文件名
image, height, width = data_provider.get(['image', 'height', 'width'])
image_name, = data_provider.get(['image_name'])
label = data_provider.get(['label'])
# 图像预处理过程，这里具体的处理过程与本文主题无关，因此省略具体的处理过程
original_image, image, label = input_preprocess.preprocess_image_and_label(
      image,
      label,
      crop_height=crop_size[0],                        # 裁剪后图像高度
      crop_width=crop_size[1],                         # 裁剪后图像宽度
      min_resize_value=min_resize_value,               # 对原图进行放缩的最小值
      max_resize_value=max_resize_value,               # 对原图进行放缩的最大值
      resize_factor=resize_factor,                     # 对原图的放缩倍数
      min_scale_factor=min_scale_factor,               # 最小放缩倍数
      max_scale_factor=max_scale_factor,               # 最大放缩倍数
      scale_factor_step_size=scale_factor_step_size,   # 每迭代多少步对图像进行一次放缩
      is_training=is_training,                         # 是否处于训练阶段
      model_variant=model_variant)                     # 网络模型选择
# 将样本图像，图像名称，高和宽组合为适合网络读入的字典形式
sample = {'image': image, 'image_name': image_name, 'height': height, 'width': width}
# 将一个batch的样本的图像和标签打包
samples = tf.train.batch(
      sample,
      batch_size=8,
      num_threads=1,
      capacity=32 * 8,
      allow_smaller_final_batch=False,
      dynamic_pad=True)
# 将打包好的数据集存入队列，数据集准备过程结束
inputs_queue = prefetch_queue.prefetch_queue(samples, capacity=128 * num_clones) # num_clones代表网络训练时的gpu数量

本文所述的过程中thread值等于1，即只使用了单线程，在实际的DeepLab程序训练过程中都是使用多线程的，这就需要用到tf.Coordinator和tf.QueueRunner两个类来完成多线程协同，这个不是本文所涉及的内容。