TensorFlow高效读取数据的方法

最新推荐文章于 2022-10-18 15:17:56 发布

yiqingyang2012

最新推荐文章于 2022-10-18 15:17:56 发布

阅读量621

点赞数

分类专栏： tensorflow

tensorflow 专栏收录该内容

13 篇文章 1 订阅

订阅专栏

转自：http://blog.csdn.net/u012759136/article/details/52232266

TFRecords定义

TFRecords其实是一种二进制文件，用来保存tf.train.Example 协议内存块(protocol buffer)。一个Example中包含Features，Features里包含一个名字为Feature的字典，里面是(key , value) 对, value是一个 FloatLis/ByteList/Int64List. 下面写如何写入及读取TFRecords

写入数据到TFRecords

写入时我们可以写一段代码获取数据，将数据填入到 Example 协议内存块(protocol buffer)，然后将协议内存块序列化为一个字符串，并且通过 tf.python_io.TFRecordWriter 写入到TFRecords文件。

import os
import tensorflow as tf 
from PIL import Image

cwd = os.getcwd()

'''
此处我加载的数据目录如下：
0 -- img1.jpg
     img2.jpg
     img3.jpg
     ...
1 -- img1.jpg
     img2.jpg
     ...
2 -- ...
 这里的0， 1， 2...就是类别，也就是下文中的classes
 classes是我根据自己数据类型定义的一个列表，大家可以根据自己的数据情况灵活运用
...
'''
writer = tf.python_io.TFRecordWriter("train.tfrecords")
for index, name in enumerate(classes):
    class_path = cwd + name + "/"
    for img_name in os.listdir(class_path):
        img_path = class_path + img_name
            img = Image.open(img_path)
            img = img.resize((224, 224))
        img_raw = img.tobytes()              #将图片转化为原生bytes
        example = tf.train.Example(features=tf.train.Features(feature={
            "label": tf.train.Feature(int64_list=tf.train.Int64List(value=[index])),
            'img_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw]))
        }))
        writer.write(example.SerializeToString())  #序列化为字符串
writer.close()

从TFRecords文件中读取数据

可以使用tf.TFRecordReader的tf.parse_single_example解析器。

for serialized_example in tf.python_io.tf_record_iterator("train.tfrecords"):
    example = tf.train.Example()
    example.ParseFromString(serialized_example)

    image = example.features.feature['image'].bytes_list.value
    label = example.features.feature['label'].int64_list.value
    # 可以做一些预处理之类的
    print image, label

`下面是一种通过队列读取TFRecord的方式`

def read_and_decode(filename):
    #根据文件名生成一个队列
    filename_queue = tf.train.string_input_producer([filename])

    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_queue)   #返回文件名和文件
    features = tf.parse_single_example(serialized_example,
                                       features={
                                           'label': tf.FixedLenFeature([], tf.int64),
                                           'img_raw' : tf.FixedLenFeature([], tf.string),
                                       })

    img = tf.decode_raw(features['img_raw'], tf.uint8)
    img = tf.reshape(img, [224, 224, 3])
    img = tf.cast(img, tf.float32) * (1. / 255) - 0.5
    label = tf.cast(features['label'], tf.int32)

    return img, label

yiqingyang2012

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
TensorFlow高效读取数据的方法

转载自：数据读取TFRecords定义TFRecords其实是一种二进制文件，用来保存tf.train.Example 协议内存块(protocol buffer)。一个Example中包含Features，Features里包含一个名字为Feature的字典，里面是(key , value) 对。最后，value是一个 FloatList，或者ByteList，
复制链接

扫一扫

专栏目录