数据读取--TFrecord格式

最新推荐文章于 2022-03-18 19:09:05 发布

换种方式生活

最新推荐文章于 2022-03-18 19:09:05 发布

阅读量316

点赞数

分类专栏： Tensorflow深度学习算法原理与编程实战文章标签： Tensorflow深度学习算法原理与编程实战

本文链接：https://blog.csdn.net/u010094573/article/details/102820458

版权

Tensorflow深度学习算法原理与编程实战专栏收录该内容

15 篇文章 2 订阅

订阅专栏

TensorFlow程序读取数据一共有3种方法:
( 1 )预加载数据:当数据量比较小时,通过在程序中定义常量或变量的方式来保存所有数据。
( 2 )供给数据( Feeding ) : 供给数据就是通过给run()函数输入feed_dict 参数的方式将数据注入到 placeholder中,再启动运算过程
( 3 )从文件读取数据:这种读取数据的方法意味着在TensorFlow 图的起始,让一个输入管线从文件中读取数据。常用的文件格式有TFRecord 格式和 csv 格式。
TFRecord数据生成
Protocol Buffer是一个处理结构化数据的工具。将数据存储为TFRecord,首先要对数据进行序列化处理。 train.Example协议内存块( Protocol Buffer )文件定义了将数据进行序列化时的格式。

import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

# mnist = input_data.read_data_sets("/home/jiangziyang/MNIST_data",
#                                   dtype=tf.uint8, one_hot=True)
mnist = input_data.read_data_sets("/mnt/downloads/tf/tf_book_source/11/11.1/MNIST_data",
                                  dtype=tf.uint8, one_hot=True)


# 定义生成整数型和字符串型属性的方法，这是将数据填入到Example协议内存块
# (protocol buffer)的第一步，以后会调用到这个方法
def Int64_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

def Bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))


# 读取mnist数据。
images = mnist.train.images
labels = mnist.train.labels
pixels = images.shape[1]
num_examples = mnist.train.num_examples

#输出TFRecord文件的地址(相对于系统根目录)。
# filename = "/home/jiangziyang/TFRecord/MNIST_tfrecords"
filename = "/mnt/downloads/tf/tf_book_source/11/11.1/MNIST_tfrecords"

# 创建一个python_io.TFRecordWriter()类的实例
writer = tf.python_io.TFRecordWriter(filename)

# for循环执行了将数据填入到Example协议内存块的主要操作
for i in range(num_examples):
    # 将图像矩阵转化成一个字符串
    image_to_string = images[i].tostring()

    feature = {
        "pixels": Int64_feature(pixels),
        "label": Int64_feature(np.argmax(labels[i])),
        "image_raw": Bytes_feature(image_to_string)
    }
    features = tf.train.Features(feature=feature)

    # 定义一个Example，将相关信息写入到这个数据结构
    example = tf.train.Example(features=features)

    # 将一个Example写入到TFRecord文件
    # 原型writer(self, record)
    writer.write(example.SerializeToString())

# 在写完文件后最好的习惯是调用close()函数关闭
writer.close()

SerializeToString()函数用于将协议内存块序列化为一个字符串。
运行后本地下载了MINIST数据集，并生成TFrecord格式数据，文件为二进制文件:
在这里插入图片描述
如果遇到数据量较大的情况,可以将数据写入到多个 TFRecord 文件中。
TFRecord数据读取
从TFRecord 文件中读取数据,可以使用 TFRecordReader类的parse single_ example()函数作为解析器。 parse_ single example()函数会将Example协议内存块解析为张量。

import tensorflow as tf

#创建一个TFRecordReader类的实例
reader = tf.TFRecordReader()

#创建一个队列对输入文件列表进行维护，队列的知识放到了本章的稍后
#函数原型string_input_producer(string_tensor,num_epochs,shuffle,seed,
#                                  capacity,shared_name,name,cancel_op)
filename_queue = tf.train.string_input_producer(
                       ["/mnt/downloads/tf/tf_book_source/11/11.1/MNIST_tfrecords"])

#使用TFRecordReader.read()函数从文件中读取一个样例，原型reader(self,queue,name)
#也可使用read_up_to()函数一次性读取多个样例，
#原型read_up_to(self,queue,num_records,name)
_,serialized_example = reader.read(filename_queue)

#使用parse_single_example()函数解析读取的样例。
#原型parse_single_example(serialized,features,name,example_names)
features = tf.parse_single_example(
    serialized_example,
    features={
        #可以使用FixedLenFeature类对属性进行解析，
        "image_raw":tf.FixedLenFeature([],tf.string),
        "pixels":tf.FixedLenFeature([],tf.int64),
        "label":tf.FixedLenFeature([],tf.int64)
    })

#decode_raw()函数用于将字符串解析成图像对应的像素数组
#函数原型decode_raw(bytes,out_type,little_endian,name)
images = tf.decode_raw(features["image_raw"],tf.uint8)
#使用cast()函数进行类型转换
labels = tf.cast(features["label"],tf.int32)
pixels = tf.cast(features["pixels"],tf.int32)


with tf.Session() as sess:
    #启动多线程处理输入数据，多线程处理数据也会在本章的稍后予以介绍
    coordinator = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coordinator)

    for i in range(10):
        image, label, pixel = sess.run([images, labels, pixels])
        print(label)
        #输出7 3 4 6 1 8 1 0 9 8

在这里插入图片描述
在解析 features时, 使用了FixedLenFeature类 , 这个类会将解析的结果转换为一个Tensor。Tensorflow还提供了其他的解析器类 ,比如VarLenFeature 解析后的结果为稀疏张量( Sparse Tensor ) 。

换种方式生活

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
数据读取--TFrecord格式

TensorFlow程序读取数据一共有3种方法:( 1 )预加载数据:当数据量比较小时,通过在程序中定义常量或变量的方式来保存所有数据。( 2 )供给数据( Feeding ) : 供给数据就是通过给run()函数输入feed_dict 参数的方式将数据注入到 placeholder中,再启动运算过程( 3 )从文件读取数据:这种读取数据的方法意味着在TensorFlow 图的起始,让...
复制链接

扫一扫

专栏目录