TFRecords文件实现不定长图片和标签的存储和读取感悟（1）（附完整代码）

最新推荐文章于 2020-03-24 19:18:58 发布

马鹤宁

最新推荐文章于 2020-03-24 19:18:58 发布

阅读量1.1k

点赞数 2

分类专栏： tensorflow 机器学习和深度学习之旅文章标签： tensorflow tfreocrds

本文链接：https://blog.csdn.net/weixin_42111770/article/details/85136378

版权

机器学习和深度学习之旅同时被 2 个专栏收录

84 篇文章 35 订阅

订阅专栏

tensorflow

10 篇文章 0 订阅

订阅专栏

最近一段时间接触到用tfrecord储存数据和读取，期间踩了数之不尽的坑，在消bug的路上艰难行走，所以在这里记录下我所遇见过的各种坑，望共勉。

TFRecord是谷歌推荐的一种二进制文件格式，理论上它可以保存任何格式的信息。使用tfrecord时，实际上是先读取原生数据，然后转换成tfrecord格式，在存储在硬盘上。以后使用数据时，就可以从tfrecord文件解码读出。

TFRecords文件中包含了类型为tf.train.Example的协议内存块（protocol buffer)，而在协议内存块中又包含了字段features(tf.train.Features)。features中又包含了若干个feature，每一个feature是一个map，也就是key-value的键值对， key取值是String类型，而value是Feature类型的消息体，它包含三种，BytesList，FloatList和Int64List，它们都是列表list的形式。如下面的函数int64_feature和int64_list_feature，两者最大的区别在于前者是value=[value]和后者的value=value, []表示列表。

def int64_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

def int64_list_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=value))

def bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

example = tf.train.Example(features=tf.train.Features(feature={
    'label': int64_list_feature(image_label),
    'image': bytes_feature(image),
    'h': int64_feature(shape[0]),
    'w': int64_feature(shape[1]),
    'c': int64_feature(shape[2])
}))

如上所定义的Example消息体，包含一张图片image的信息，及其标签label信息和shape大小信息（height， width， channel），在这里和大多数博客里不一样的在于‘label’标签，通常的数据标签是一个整数，例如猫狗图片，用‘0’表示猫，用‘1’表示狗，即使是多分类标签，也可以用0-N来表示各个类别，而我们的图像标签是一串中文或者英文，长度不一，首先在字典中查找其对应下标，形成list数组。关于这个标签的处理，所以在这里提供了两个解决方案（都是踩过的坑）：

方案一是将标签list数组转换成one-hot形式，不使用tensorflow的tf.one-hot表示，而是自己定义函数，最后使得每个类别标签为一个字典大小的向量，读取时， feature中定义'label': tf.FixedLenFeature([VOCUBLARY_SIZE], tf.int64)，如果不加大小VOCUBLARY_SIZE，会报错
方案二是针对tf.nn.ctc_loss中labels参数的SparseTensor稀疏张量的要求，上述方案一得到的虽然是一个类one-hot形式，终究不是稀疏张量，所以将读取到的label直接传给参数时还是labels时，是要报错，所以为了得到稀疏向量，直接将标签存储，读取时，feature中定义'label': tf.VarLenFeature(dtype=tf.int64),使用的是变长读取，这样得到的是一个稀疏张量SparseTensor

另一个特殊之处在于图片shape信息的储存，可以看到这里不是直接存储shape，而是分开存储，因为每一个图片的尺寸大小不同，所以如果直接以shape的大小存储，也同样会报错。所以在这里定义了三个键值对。读取image时，就可以使用读取的h，w，c这三个数据reshape图像，如果图像是定长的，shape的大小就可以直接定义，例如shape=[224, 224, 3]等等。

h = tf.cast(image_features['h'], tf.int32)
w = tf.cast(image_features['w'], tf.int32)
c = tf.cast(image_features['c'], tf.int32)

image = tf.decode_raw(image_features['image'], tf.uint8)
image = tf.cast(image, tf.float32)
image = tf.reshape(image, shape=[h, w, c])

最后说到图像image方面，现在面对是image尺寸不一，目的是要图片的高要resize到同一大小，宽度不定长。因为要读取数据时，数据量巨大，程序每次运行时是需要分batch，而每一batch里面要求大小一致，所以如果对image不处理，也是会报错。

第一种情况是我碰见的image数据，height大小一致，宽度不定长，这样存储时，不用对数据进行resize，只是数据读取时，对每一个图像reshape后，使用resized_image = tf.image.resize_image_with_crop_or_pad(image, target_height=32, target_width=max_width)对图像进行填充，虽说有剪裁，但是剪裁后会影响结果，所以这里max_width的设定要尽可能包含所有的图片的宽度，这样后面在对图像进行reshape后resized_image = tf.reshape(resized_image, shape=[32, max_width, 3])就可以batch数据了
第二种情况是碰见的image数据长宽各不一，所以在存储前就需要对image进行resize，等比例缩小。后面处理和第一种情况类似

关于存储成tfrecord的步骤和读取tfrecord文件步骤，现在已经有很多博客进行详细描述，我就不用过多赘述，下面粘贴的是完整的代码。这个代码得到的tfrecord文件要比原图像文件大10倍左右，原图像有2.2G左右，生成的tfrecord文件大约有22G，网上也有人给出答案，例如一个图像共有h*w*c个像素，将图片转变成byte类型时，这些像素按顺序存在一个二进制序列中，每个像素需要用一个的字符进行表示，所以这样就使得图像文件存储变大，如何解决，tensorflow中提供了一个tf.gfile.FastGFile类，可以直接读取图像的bytes形式

tf.gfile.FastGFile(filename, 'rb').read()

'r'表示要从文件中读取数据，‘b’表示要读取二进制数据，但是由于我们数据的复杂性，所以就不再尝试此种方法了。

import tensorflow as tf
from PIL import Image
import numpy as np
import os
import random
from config import CHAR_VECTOR
from config import NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
from config import NUM_EXAMPLES_PER_EPOCH_FOR_TEST

VOCUBLARY_SIZE = len(CHAR_VECTOR)


def resize_image(image):
    '''resize the size of image'''
    width, height = image.size
    ratio = 32.0 / float(height)
    image = image.resize((int(width * ratio), 32))
    return image


def generation_vocublary(CHAR_VECTOR):
    vocublary = {}
    index = 0
    for char in CHAR_VECTOR:
        vocublary[char] = index
        index = index + 1
    return vocublary


def int64_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))


def int64_list_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=value))


def bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))


def generation_TFRecord(data_dir):
    vocublary = generation_vocublary(CHAR_VECTOR)

    image_name_list = []
    for file in os.listdir(data_dir):
        if file.endswith('.jpg'):
            image_name_list.append(file)

    random.shuffle(image_name_list)
    capacity = len(image_name_list)

    # 生成train tfrecord文件
    train_writer = tf.python_io.TFRecordWriter('./dataset/train_dataset.tfrecords')
    train_image_name_list = image_name_list[0:int(capacity * 0.9)]
    for train_name in train_image_name_list:
        train_image_label = []
        for s in train_name.strip('.jpg'):
            train_image_label.append(vocublary[s])

        train_image = Image.open(os.path.join(data_dir, train_name))
        train_image = resize_image(train_image)
        # print(image.size)
        train_image_array = np.asarray(train_image, np.uint8)
        train_shape = np.array(train_image_array.shape, np.int32)
        train_image = train_image.tobytes()

        train_example = tf.train.Example(features=tf.train.Features(feature={
            'label': int64_list_feature(train_image_label),
            'image': bytes_feature(train_image),
            'h': int64_feature(train_shape[0]),
            'w': int64_feature(train_shape[1]),
            'c': int64_feature(train_shape[2])
        }))
        train_writer.write(train_example.SerializeToString())
    train_writer.close()

    # 生成test tfrecord文件
    test_writer = tf.python_io.TFRecordWriter('./dataset/test_dataset.tfrecords')
    test_image_name_list = image_name_list[int(capacity * 0.9):capacity]
    for test_name in test_image_name_list:
        test_image_label = []
        for s in test_name.strip('.jpg'):
            test_image_label.append(vocublary[s])

        test_image = Image.open(os.path.join(data_dir, test_name))
        test_image = resize_image(test_image)
        # print(image.size)
        test_image_array = np.asarray(test_image, np.uint8)
        test_shape = np.array(test_image_array.shape, np.int32)
        test_image = test_image.tobytes()

        test_example = tf.train.Example(features=tf.train.Features(feature={
            'label': int64_list_feature(test_image_label),
            'image': bytes_feature(test_image),
            'h': int64_feature(test_shape[0]),
            'w': int64_feature(test_shape[1]),
            'c': int64_feature(test_shape[2])
        }))
        test_writer.write(test_example.SerializeToString())
    test_writer.close()


def read_tfrecord(filename, max_width, batch_size, train=True):
    filename_queue = tf.train.string_input_producer([filename])
    reader = tf.TFRecordReader()
    _, serialize_example = reader.read(filename_queue)
    image_features = tf.parse_single_example(serialized=serialize_example,
                                             features={
                                                 # 'label': tf.FixedLenFeature([VOCUBLARY_SIZE], tf.int64),
                                                 'label': tf.VarLenFeature(dtype=tf.int64),
                                                 'image': tf.FixedLenFeature([], tf.string),
                                                 'h': tf.FixedLenFeature([], tf.int64),
                                                 'w': tf.FixedLenFeature([], tf.int64),
                                                 'c': tf.FixedLenFeature([], tf.int64)
                                             })
    h = tf.cast(image_features['h'], tf.int32)
    w = tf.cast(image_features['w'], tf.int32)
    c = tf.cast(image_features['c'], tf.int32)

    image = tf.decode_raw(image_features['image'], tf.uint8)
    image = tf.cast(image, tf.float32)
    image = tf.reshape(image, shape=[h, w, c])
    resized_image = tf.image.resize_image_with_crop_or_pad(image, target_height=32, target_width=max_width)
    resized_image = tf.reshape(resized_image, shape=[32, max_width, 3])

    label = tf.cast(image_features['label'], tf.int32)

    min_fraction_of_example_in_queue = 0.4
    if train is True:
        min_queue_examples = int(min_fraction_of_example_in_queue * NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN)
        train_image_batch, train_label_batch = tf.train.shuffle_batch([resized_image, label],
                                                                      batch_size=batch_size,
                                                                      capacity=min_queue_examples + 3 * batch_size,
                                                                      min_after_dequeue=min_queue_examples,
                                                                      num_threads=32)
        return train_image_batch, train_label_batch
    else:
        min_queue_examples = int(min_fraction_of_example_in_queue * NUM_EXAMPLES_PER_EPOCH_FOR_TEST)
        test_image_batch, test_label_batch = tf.train.batch([resized_image, label],
                                                            batch_size=batch_size,
                                                            capacity=min_queue_examples + 3 * batch_size,
                                                            num_threads=32)
        return test_image_batch, test_label_batch


def index_to_word(result):
    return ''.join([CHAR_VECTOR[i] for i in result])


def main(argv):
    generation_TFRecord('./dataset/images')
    train_image, train_label = read_tfrecord('./dataset/train_dataset.tfrecords', 250, 32)
    test_image, test_label = read_tfrecord('./dataset/test_dataset.tfrecords', 250, 32)
    with tf.Session() as session:
        session.run(tf.group(tf.global_variables_initializer(),
                             tf.local_variables_initializer()))
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coord)

        image_train, label_train = session.run([train_image, train_label])
        print(image_train.shape)

        image_test, label_test = session.run([test_image, test_label])
        print(image_test.shape)

        for image, label in  zip(image_test, label_test):
            # 将array转换成image
            img = Image.fromarray(image, 'RGB')
            img.save(index_to_word(label) + '.jpg')
            print(index_to_word(label))

        coord.request_stop()
        coord.join(threads=threads)


if __name__ == '__main__':
    tf.app.run()

马鹤宁

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
4
评论
TFRecords文件实现不定长图片和标签的存储和读取感悟（1）（附完整代码）

最近一段时间接触到用tfrecord储存数据和读取，期间踩了数之不尽的坑，在消bug的路上艰难行走，所以在这里记录下我所遇见过的各种坑，望共勉。 TFRecord是谷歌推荐的一种二进制文件格式，理论上它可以保存任何格式的信息。使用tfrecord时，实际上是先读取原生数据，然后转换成tfrecord格式，在存储在硬盘上。以后使用数据时，就可以从tfrecord文件解码读出。TFRecor...
复制链接

扫一扫