Python深度学习笔记03——线程队列与tensorflow的io操作(叁)

最新推荐文章于 2021-11-04 11:56:20 发布

是故里吖

最新推荐文章于 2021-11-04 11:56:20 发布

阅读量306

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/Liii_NN/article/details/108194605

版权

Python 专栏收录该内容

27 篇文章 1 订阅

订阅专栏

文章目录

1)将二进制文件转化为TFRecords文件并进行存储和读取

2)将CSV文件转化为TFRecords文件练习

3)将图片文件转化为TFRecords文件

总结

前言

关于python深度学习（tensorflow 1.x版本）学习笔记03——线程队列与io操作（叁）

一、二进制文件读取

1、读取分析

1）文件读取API-文件阅读器

读取二进制文件：tf.FixedLengthRecordReader(record_bytes)

要读取每个记录是固定数量字节的二进制文件

record_bytes:整型，指定每次读取的字节数

return：读取器实例

二进制文件在读取时需指定每个样本的bytes

即：二进制文件中每个样本的字节数=目标值+特征值

2）文件读取API-文件内容解码器

由于从文件中读取的是字符串，需要函数去解析这些字符串到张量

tf.decode_csv(records,record_defaults=None,field_delim = None，name = None)

将CSV转换为张量，与tf.TextLineReader搭配使用

records: #tensor型字符串，每个字符串是csv中的记录行

field_delim:默认分割符”,”

record_defaults: 参数决定了所得张量的类型，并设置一个值在输入字符串中缺少使用默认值

tf.decode_raw(bytes,out_type,little_endian = None，name = None)

将字节转换为一个数字向量表示，字节为一字符串类型的张量,与函数tf.FixedLengthRecordReader搭配使用，将字符串表示的二进制读取为uint8格式

3）读取流程

构造文件队列
构造二进制文件读取器，读取内容,每个样本的字节数
解码内容,二进制文件解码
分割出图片和标签数据，切出特征值和目标值
对图片的特征数据进行形状的改变 [3072]-->[32,32,3]
批处理数据

2、读取实例练习

import  tensorflow as tf
import os

# 定义cifar的数据等命令行参数
FLAGS=tf.app.flags.FLAGS
tf.app.flags.DEFINE_string("cifar_dir", "./data/cifar10/cifar-10-batches-bin", "文件的目录")

class CifarRead(object):
    """完成读取二进制文件，写进tfrecords，读取tfrecords"""
    def __init__(self, filelist):
        # 文件列表
        self.file_list = filelist
        # 定义读取图片的一些属性（长宽通道数）
        self.height = 32
        self.weight = 32
        self.channel = 3
        # 二进制文件的每张图片的字节
        self.label_bytes = 1
        self.image_bytes = self.weight * self.height * self.channel
        self.bytes = self.label_bytes+self.image_bytes
    def read_and_decode(self):
        # 1、构造文件队列
        file_queue = tf.train.string_input_producer(self.file_list)
        # 2、构造二进制文件读取器，读取内容,每个样本的字节数
        reader = tf.FixedLengthRecordReader(self.bytes)
        key, value = reader.read(file_queue)
        # 3、解码内容,二进制文件解码uint8
        label_image = tf.decode_raw(value, tf.uint8)
        print(label_image)
        # 4、分割出图片和标签数据，切出特征值和目标值
        label = tf.slice(label_image, [0], [self.label_bytes], tf.int32)
        image = tf.slice(label_image, [self.label_bytes], [self.image_bytes])
        # 5、可以对图片的特征数据进行形状的改变  [3072]-->[32,32,3]
        image_reshape = tf.reshape(image, [self.height, self.weight, self.channel])
        # 6、批处理数据
        image_batch, label_batch = tf.train.batch([image_reshape, label], batch_size=10, num_threads=1, capacity=10)
        return image_batch, label_batch

if __name__=="__main__":
    # 找到文件，放入列表  路径+名字-->列表当中
    file_name = os.listdir(FLAGS.cifar_dir)
    filelist = [os.path.join(FLAGS.cifar_dir, file) for file in file_name if file[-3:] == "bin"]
    # print(file_name)
    image_batch = CifarRead(filelist)
    # 开启会话运行结果
    with tf.Session() as sess:
        # 定义一个线程协调器
        coord = tf.train.Coordinator()
        # 开启读取文件的线程
        threads = tf.train.start_queue_runners(sess, coord=coord)
        # 打印读取的内容
        print(sess.run([image_batch]))
        # 回收子线程
        coord.request_stop()
        coord.join(threads)

二、TFrecords文件读取与存储

1、TFrecords基础知识

1）TFrecords文件分析

TFRecords是Tensorflow自带的文件格式（二进制文件），方便读取和移动。

文件格式：*.tfrecords

写入文件内容：Example协议块（类字典的格式）

2）TFrecords存储

1、建立TFRecord存储器

tf.python_io.TFRecordWriter(path)

写入tfrecords文件

path: TFRecords文件的路径

return：写文件

方法：

write(record):向文件中写入一个字符串记录

close():关闭文件写入器

（注：字符串为一个序列化的Example,Example.Serialize ToStrring()，对于每一个样本，都要构造example协议块）

2、构造每个样本的Example协议块

（1）tf.train.Example(features=None)

写入tfrecords文件

features:tf.train.Features类型的特征实例

return：example格式协议块

（2）tf.train.Features(feature=None)

构建每个样本的信息键值对

feature:字典数据,key为要保存的名字，value为tf.train.Feature实例

return:Features类型

（3）tf.train.Feature(**options)

**options可以选择以下三种格式数据：

tf.train. Int64List(value=[Value])
tf.train. BytesList(value=[Bytes])
tf.train. FloatList(value=[value])

将图片数据转化为TFRecords的例子：

对于每个样本，都要构造example协议块

example = tf.train.Example(feature = tf.train.Features(feature = {
"image":tf.train.Feature(bytes_list=tf.train.BytesList(value=[image(bytes)]))
"label":tf.train.Feature(int64_list=tf.train.Int64List(value=[label(int)]))
}))

3）TFRecords读取方法

1、流程：同文件阅读器流程，只是中间需要解析的过程

2、解析TFRecords的example协议内存块：

tf.parse_single_example(serialized,features=None,name=None)

解析一个单一的Example原型

serialized : 标量字符串的Tensor，一个序列化的Example,文件经过文件阅读器之后的value

features :dict字典数据，key为读取的名字，value为FixedLenFeature

return : 一个键值对组成的字典，键为读取的名字

tf.FixedLenFeature(shape,dtype)

shape : 输入数据的形状，一般不指定，为空列表

dtype : 输入数据类型，与存储进文件的类型要一致，类型只能是float32，int 64, string

return : Tensor (即使有零的部分也存储）

2、代码实现

1)将二进制文件转化为TFRecords文件并进行存储和读取

存储：CIFAR-10 批处理结果存入tfrecords流程

① 构造存储器

② 构造每一个样本的Example

③ 写入序列化的Example

读取：tfrecords读取流程

① 构造TFRecords阅读器

② 解析Example

③ 转换格式，bytes解码

import tensorflow as tf
# 定义cifar的数据等命令行参数
FLAGS=tf.app.flags.FLAGS
tf.app.flags.DEFINE_string("cifar_dir", "./data/cifar10/cifar-10-batches-bin", "文件的目录")
tf.app.flags.DEFINE_string("cifar_dir", "./studyTF/cifar.tfrecords", "存进tfrecords的文件")
class CifarRead(object):
    """完成读取二进制文件，写进tfrecords，读取tfrecords"""
    def __init__(self, filelist):
        # 文件列表
        self.file_list = filelist
        # 定义读取图片的一些属性（长宽通道数）
        self.height = 32
        self.weight = 32
        self.channel = 3
        # 二进制文件的每张图片的字节
        self.label_bytes = 1
        self.image_bytes = self.weight * self.height * self.channel
        self.bytes = self.label_bytes+self.image_bytes

    def read_and_decode(self):
        # 1、构造文件队列
        file_queue = tf.train.string_input_producer(self.file_list)
        # 2、构造二进制文件读取器，读取内容,每个样本的字节数
        reader = tf.FixedLengthRecordReader(self.bytes)
        key, value = reader.read(file_queue)
        # 3、解码内容,二进制文件解码uint8
        label_image = tf.decode_raw(value, tf.uint8)
        print(label_image)
        # 4、分割出图片和标签数据，切出特征值和目标值
        label = tf.slice(label_image, [0], [self.label_bytes], tf.int32)
        image = tf.slice(label_image, [self.label_bytes], [self.image_bytes])
        # 5、可以对图片的特征数据进行形状的改变  [3072]-->[32,32,3]
        image_reshape = tf.reshape(image, [self.height, self.weight, self.channel])
        print(label, image_reshape)
        # 6、批处理数据
        image_batch, label_batch = tf.train.batch([image_reshape, label], batch_size=10, num_threads=1, capacity=10)
        return image_batch, label_batch
        print(image_batch, label_batch)
        return image_batch, label_batch


    def write_to_tfrecords(self, iamge_batch, label_batch):
        """
        将图片的特征值和目标值存进tfrecords
        :param iamge_batch: 10张特征值
        :param label_batch: 10图片目标值
        :return: None
        """
        # 1、建立一个tfrecords文件存储器
        writer = tf.python_io.TFRecordWriter(FLAGS.cifar_tfrecords)
        # 2、循环的将所有样本写入文件，每张图片样本都要构造example协议
        for i in range(10):
            # 取出第i个图片的特征值和目标值
            image = image_batch[i].eval().tostring()
            label = label_batch[i].eval()[0]
            # 构造一个样本的example值
            example = tf.train.Example(features=tf.train.Features(feature={
                "image": tf.train.Feature(bytes_list=tf.train.BytesList(value=[image])),
                "label": tf.train.Feature(int64_list=tf.train.Int64List(value=[label])),
            }))
            # 写入单独的样本
            writer.write(example.SerializeToString())
        # 关闭
        writer.close()
        return None


    def read_from_tfrecords(self):
        # 1、构造文件队列
        file_queue = tf.train.string_input_producer([FLAGS.cifar_tfrecords])
        # 2、构造文件阅读器，读取内容example  ,value=一个样本的序列化example
        reader = tf.TFRecordReader()
        key, value = reader.read(file_queue)
        # 3、解析example
        features = tf.parse_single_example(value, features={
            "image": tf.FixedLenFeature([], tf.string),
            "label": tf.FixedLenFeature([], tf.int64),
        })
        # 4、解码内容   读取的格式是string需要解码，如果是int64 float32  不需要解码
        image = tf.decode_raw(features['image'], tf.uint8)
        # 5、固定图片形状方便与批处理
        image_reshape = tf.reshape(image, [self.height, self.weight, self.channel])
        label = features['label', tf.int32]
        # 6、进行批处理
        image_batch, label_batch = tf.train.batch([image_reshape, label], batch_size=10, num_threads=1, capacity=10)
        return image_batch, label_batch


if __name__=="__main__":
    # 1、找到文件，放入列表  路径+名字-->列表当中
    file_name = os.listdir(FLAGS.cifar_dir)
    filelist = [os.path.join(FLAGS.cifar_dir, file) for file in file_name if file[-3:] == "bin"]
    # print(file_name)
    cf = CifarRead(filelist)
    image_batch, label_batch = cf.read_and_decode()
    # 开启会话运行结果
    with tf.Session() as sess:
        # 定义一个线程协调器
        coord = tf.train.Coordinator()
        # 开启读取文件的线程
        threads = tf.train.start_queue_runners(sess, coord=coord)
        # 存进tfrecords文件
        print("开始存储")
        cf.write_ro_tfrecords(image_batch, label_batch)
        print("结束存储")
        # 打印读取的内容
        print(sess.run([image_batch, label_batch]))
        # 回收子线程
        coord.request_stop()
        coord.join(threads)

2)将CSV文件转化为TFRecords文件练习

import tensorflow as tf
import numpy as np
import pandas as pd

train_frame = pd.read_csv("train.csv")
print(train_frame.head())
train_labels_frame = train_frame.pop(item="label")
train_values = train_frame.values
train_labels = train_labels_frame.values
print("values shape: ", train_values.shape)
print("labels shape:", train_labels.shape)

writer = tf.python_io.TFRecordWriter("csv_train.tfrecords")

for i in range(train_values.shape[0]):
    image_raw = train_values[i].tostring()
    example = tf.train.Example(
        features=tf.train.Features(
            feature={
                "image_raw": tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_raw])),
                "label": tf.train.Feature(int64_list=tf.train.Int64List(value=[train_labels[i]]))
            }
        )
    )
    writer.write(record=example.SerializeToString())

writer.close()

3)将图片文件转化为TFRecords文件

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import tensorflow as tf
import pandas as pd

def get_label_from_filename(filename):
    return 1

filenames = tf.train.match_filenames_once('.\data\*.jpg')

writer = tf.python_io.TFRecordWriter('jpg_train.tfrecords')

for filename in filenames:
    img=mpimg.imread(filename)
    print("{} shape is {}".format(filename, img.shape))
    img_raw = img.tostring()
    label = get_label_from_filename(filename)
    example = tf.train.Example(
        features=tf.train.Features(
            feature={
                "image_raw": tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_raw])),
                "label": tf.train.Feature(int64_list=tf.train.Int64List(value=[label]))
            }
        )
    )
    writer.write(record=example.SerializeToString())

writer.close()

总结

以上是通过学习视频《Python深度学习（tensorflow）》整理学习笔记（附：视频学习地址https://www.bilibili.com/video/BV1Wt411C75s/），本篇记录了tensorflow读取二进制文件及各类型文件转化TFrecords的存储与读取。

是故里吖

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python深度学习笔记03——线程队列与tensorflow的io操作(叁)

文章目录前言一、二进制文件读取1、读取分析1）文件读取API-文件阅读器2）文件读取API-文件内容解码器3）读取流程2、读取实例练习二、TFrecords文件读取与存储1、TFrecords基础知识1）TFrecords文件分析2）TFrecords存储3）TFRecords读取方法2、代码实现1)将二进制文件转化为TFRecords文件并进行存储和读取2)将CSV文件转化为TFRecords文件练习3)将图片文件转化为TFRecords文件
复制链接

扫一扫

专栏目录