tensorflow梳理（三）

最新推荐文章于 2022-03-25 16:50:24 发布

墨水兰亭

最新推荐文章于 2022-03-25 16:50:24 发布

阅读量186

点赞数

分类专栏：语法笔记

本文链接：https://blog.csdn.net/moshuilangting/article/details/86705484

版权

语法笔记专栏收录该内容

7 篇文章 0 订阅

订阅专栏

数据读取

TensorFlow程序读取数据有以下3种方法，主要将前两种:

供给数据(Feeding)：在TensorFlow程序运行的每一步，让Python代码来供给数据。
从文件读取数据：在TensorFlow图的起始，让一个输入管线从文件中读取数据。
预加载数据：在TensorFlow图中定义常量或变量来保存所有数据(仅适用于数据量比较小的情况)

1.供给数据

主要使用 sess.run(train ,feed_dict={x:x_data,y:y_data})，train是优化器，tensorflow中sess.run计算优化器就是在用优化器对应方式计算权重。在每一步之前，用numpy随机生成x和y进行训练，采用python供给数据的方式。

import tensorflow as tf
import numpy as np

#定义张量，用于在feed_dict中读入data和label
x = tf.placeholder(tf.float32,[None,1])
y = tf.placeholder(tf.float32,[None,1])

#model
layer1 = tf.layers.dense(inputs=x, units=16, activation=tf.nn.relu,name='layer1')
net = tf.layers.dense(inputs=layer1, units=1, activation=tf.nn.tanh,name='layer2')


#定义损失函数,这里的y用之前定义的张量表示，在run中会读入标签
loss = tf.reduce_mean(tf.square(y- net))
#定义优化函数
train = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    writer = tf.summary.FileWriter("logs/", sess.graph)
    for step in range(1000):
        x_data = np.linspace(0.5,0.8,200)[:,np.newaxis]
        b = np.random.normal(0.,0.02,x_data.shape)
        y_data = np.square(x_data) + b
        #python供给数据
        sess.run(train,feed_dict={x:x_data,y:y_data})

        if step % 50 == 0:
            l = sess.run(loss,feed_dict={x:x_data,y:y_data})
            print('loss: ',l)
            print(step)

2.TFrecords的创建和读取

从文件读取数据，tensorflow中使用TFrecords 管理数据的读入。

创建TFrecords三步骤：1.获取原始数据和必要的数据处理（例如图片调整大小）。2.定义TFrecords文件。3.一条一条写入数据。

读取TFrecords步骤：1.定义reader对象和文件输入队列（tfrecords队列）。2.reader读入tfrecords队列。3.数据的解析parse。4.数据读取和必要的处理（如果是图片数据，需要调整图片的大小，因为在TFrecords中存储的是16进制的字符串）。5. 包装成小的batch，tf.train.batch函数可以将数据转换为一个个batch输出。6. 创建线程，使用QueueRunner对象来预取。

tf.train.batch 函数如图：


tf.train.batch(
    tensors,
    batch_size,
    num_threads=1,
    capacity=32,
    enqueue_many=False,
    shapes=None,
    dynamic_pad=False,
    allow_smaller_final_batch=False,
    shared_name=None,
    name=None
)
#tensor：列表或者字典的tensor用来入队
#batch_size：设置批次的大小
#num_threads：用于控制入对tensor线程数量，如果num_threads大于1，则batch操作将是非确定性的，输出的batch可能会乱序
#capacity：一个整数，用来设置队列中元素的最大数量
#enqueue_many：在tensors中的tensor是否是单个样本
#shapes：可选，每个样本的shape，默认是tensors的shape
#dynamic_pad：Boolean值.允许输入变量的shape，出队后会自动填补维度，来保持与batch内的shapes相同
#allow_samller_final_batch：可选，Boolean值，如果为True队列中的样本数量小于batch_size时，出队的数量会以最终遗留下来的样本进行出队，如果为Flalse，小于batch_size的样本不会做出队处理
#shared_name：可选，通过设置该参数，可以对多个会话共享队列
#name：可选，操作的名字


tf.train.shuffle_batch(
    tensors,
    batch_size,
    capacity,
    min_after_dequeue,
    num_threads=1,
    seed=None,
    enqueue_many=False,
    shapes=None,
    allow_smaller_final_batch=False,
    shared_name=None,
    name=None
)

#min_after_dequeue：这个代表队列中的元素大于它的时候就输出乱的顺序的batch。也就是说这个函数的输出结果是一个乱序的样本排列的batch，不是按照顺序排列的。

第一个例子，是titanic例子中csv文件的读入：

import tensorflow as tf
import pandas as pd


# 第一步：获取原始数据
data = pd.read_csv('train.csv')
print(data.shape)

# 第二步：定义record文件
tfrecord_file = 'titanic_train.tfrecords'
writer = tf.python_io.TFRecordWriter(tfrecord_file)

# 第三步：每一次写入一条样本记录
for i in range(len(data)):
    features = tf.train.Features(
    feature={'Age': tf.train.Feature(float_list=tf.train.FloatList(value=[data['Age'][i]])),
             'Sex': tf.train.Feature(int64_list=tf.train.Int64List(value=[1 if data['Sex'][i] == 'male' else 0])),
             'Pclass':tf.train.Feature(int64_list=tf.train.Int64List(value=[data['Pclass'][i]])),
             'Parch': tf.train.Feature(int64_list=tf.train.Int64List(value=[data['Parch'][i]])),
             'Sibsp': tf.train.Feature(int64_list=tf.train.Int64List(value=[data['SibSp'][i]])),
             'Fare': tf.train.Feature(float_list=tf.train.FloatList(value=[data['Fare'][i]])),
             'Survived': tf.train.Feature(int64_list=tf.train.Int64List(value=[data['Survived'][i]]))
             })
    # 每一条样本的特征，将一系列特征组织成一条样本
    example = tf.train.Example(features=features)
    # 将每一条样本写入到tfrecord文件
    writer.write(example.SerializeToString())

# 第四步：写入后关闭文件
writer.close()
print('写入tfrecords文件完毕！')



# 第一步：定义reader对象以及tfrecords文件的输入部分
filename_queue = tf.train.string_input_producer(['titanic_train.tfrecords'])
reader = tf.TFRecordReader()

# 第二步：使用reader函数读入tfrecords内容，它返回的是（key，value）
_, serialized_example = reader.read(filename_queue)

# print(serialized_example.shape)

# 第三步：数据的解析parse
features = tf.parse_single_example(serialized_example,features = {'Age': tf.FixedLenFeature([], tf.float32),
                                                                  'Sex': tf.FixedLenFeature([], tf.int64),
                                                                  'Pclass': tf.FixedLenFeature([], tf.int64),
                                                                  'Parch': tf.FixedLenFeature([], tf.int64),
                                                                  'Sibsp': tf.FixedLenFeature([], tf.int64),
                                                                  'Fare': tf.FixedLenFeature([], tf.float32),
                                                                  'Survived': tf.FixedLenFeature([], tf.int64)
                                                                  })

#第四步：读取数据
age = features['Age']
sex = features['Sex']
pclass = features['Pclass']
parch = features['Parch']
sibsp = features['Sibsp']
fare = features['Fare']
label = features['Survived']

#第五步：将样本包装成一个一个的batch
age, sex, pclass, parch, sibsp, fare, label = tf.train.batch([age, sex, pclass, parch, sibsp, fare, label],num_threads=20,
                                                             batch_size=16, capacity=500)

print(age.shape)  # 在这就可以查看特征的数据维度了，为（16,）因为batch_size为16

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    #第六步：创建线程，使用QueueRunner对象来预取
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord = coord)
    for step in range(10):
        age_,sex_ = sess.run([age,sex])
        print(age_,sex_)
    # 第七步 关闭线程
    coord.request_stop()
    coord.join(threads=threads)
    print('完结！')

第二个例子，TFrecords读取图片。需要注意输入图片可能大小不一，在创建TFrecords时，需要调整图片大小，以方便在读取TFrecords时解析。基本步骤是，原始图片——调整大小（例如500×500像素）——转成string存入TFrecords——解析读取TFrecords——图片的string转换成（500×500像素）——批量读取进行训练。

import tensorflow as tf
from PIL import Image
import os


#创建TFrecords三步走： 获取原始数据，定义TFrecords文件，每一次写入一条样本数据

image_path='TFrecord_data/img/'
tfrecord_file = 'TFrecord_data/data.tfrecords'
def TFrecords_w():
    #第一步：定义TFrecords文件

    writer = tf.python_io.TFRecordWriter(tfrecord_file)


    #第二步：读取原始数据
    #遍历文件夹下的文件
    pathDir = os.listdir(image_path)
    label =0
    for filename in pathDir:
        image = Image.open(image_path+filename)
        image = image.resize((500,400))
        image_raw = image.tobytes()
        #如果不需要调整大小可以用下面的函数        
        #image = tf.gfile.FastGFile(image_path+'/'+filename, 'rb').read()
        label += 1


    #第三步 每一次写入一条样本记录
        example = tf.train.Example(features=tf.train.Features(feature={
                'image': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_raw])),
                'label': tf.train.Feature(int64_list=tf.train.Int64List(value=[label])),
                }))
        writer.write(example.SerializeToString())

    #第四步 关闭
    writer.close()


#读取TFrecords
def TFrecords_r():
    # 第一步：定义reader对象以及tfrecords文件的输入队列
    filename_queue = tf.train.string_input_producer([tfrecord_file])
    reader = tf.TFRecordReader()

    # 第二步：使用reader函数读入tfrecords内容，它返回的是（key，value）
    _, serialized_example = reader.read(filename_queue)


    # 第三步：数据的解析parse
    features = tf.parse_single_example(serialized_example,
                                       features = {'image': tf.FixedLenFeature([], tf.string),
                                                   'label': tf.FixedLenFeature([], tf.int64)
                                                   })
    # 第四步 需要的数据的预处理
    images = tf.decode_raw(features['image'],tf.uint8) # 需要解码，因为不是单个的数值,创建的时候是string写入TFrecords
    labels = tf.cast(features['label'],tf.int64)

    #与写入的时候不同，宽高的位置刚好反一反
    images = tf.reshape(images, [400,500,3])


    #第五步 包装成小的batch
    img,lab = tf.train.shuffle_batch([images,labels], batch_size=3,capacity=32, min_after_dequeue=8)

    #print(img.shape)  # 形状为（3,500,500）
    #print(lab.shape)  # 形状为（3,3）

    import matplotlib.pyplot as plt
    with tf.Session() as sess:
        tf.global_variables_initializer().run()

        #启动多线程
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(sess=sess,coord=coord)

        # for step in range(3):
        #     lab_ = sess.run([lab])
        #     print(lab_)
        for step in range(3):
            img_,lab_ = sess.run([img,lab])
            print(img_)
            plt.imshow(img_[1])
            plt.show()

        coord.request_stop()
        coord.join(threads=threads)



if __name__=='__main__':
    TFrecords_w()
    TFrecords_r()

    # for serialized_example in tf.python_io.tf_record_iterator("TFrecord_data/data.tfrecords"):
    #     example = tf.train.Example()
    #     example.ParseFromString(serialized_example)
    #
    #     image = example.features.feature['image'].bytes_list.value
    #     #label = example.features.feature['label'].int64_list.value
    #     # 可以做一些预处理之类的
    #     print(image)

如果程序中从TFrecords读取的数据大小和要batch输出的大小不一致，会出错：

 RandomShuffleQueue '_1_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 3, current size 0)

比如2张500×500×3的图片存入TFrecords，但是解析的时候解析成400×400×3，会导致数据的不对称，在输出批量数据的时候会出错

所以图像的数据处理过程中要注意数据的对称

墨水兰亭

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
tensorflow梳理（三）

数据读取TensorFlow程序读取数据有以下3种方法，主要将前两种:供给数据(Feeding)：在TensorFlow程序运行的每一步，让Python代码来供给数据。从文件读取数据：在TensorFlow图的起始，让一个输入管线从文件中读取数据。预加载数据：在TensorFlow图中定义常量或变量来保存所有数据(仅适用于数据量比较小的情况)1.供给数据主要使用 s...
复制链接

扫一扫