tensorflow中如何加载数据

最新推荐文章于 2023-04-24 11:37:47 发布

to do 1+1

最新推荐文章于 2023-04-24 11:37:47 发布

阅读量604

点赞数

分类专栏： tensorflow 文章标签： tensorlow 数据加载 TFRecords

本文链接：https://blog.csdn.net/lidichengfo0412/article/details/100017542

版权

tensorflow 专栏收录该内容

27 篇文章 3 订阅

订阅专栏

Tensorflow作为符号编程框架，需要先构建数据流图，再读取数据，随后进行模型训练。tensorflow官网给出了以下三种方法来加载数据。
– 预加载数据：在tensorflow图中定义常量或变量来保存所有数据。
– 填充数据：python产生数据，再把数据填充后端。
– 从文件中读取数据：从文件中直接读取，让队列管理器从文件中读取数据。

一、预加载数据

x1 = tf.constant([[2,3,4]])
x2 = tf.constant([4,0,1])
y = tf.add(x1,x2)

这种方式的缺点是，将数据直接嵌在数据流图中，当训练数据较大时，很耗内存。

二、填充数据

a1 = tf.placeholder(tf.int64)
a2 = tf.placeholder(tf.int64)
b = tf.add(a1,a2)
li1 = [2,3,4]
li2 = [4,0,1]
with tf.Session() as sess:
    print(sess.run(b, feed_dict={a1:li1,a2:li2}))

填充的方法也有数据量大、消耗内存等缺点，并且数据类型转换等中间环节增加了不小开销。这时最好用第三种方法，在图中定义好文件读取方法，让tensorflow自己从文件中读取数据，并解码成可使用的样本集。

三、从文件读取数据

从文件读取数据分为如下两个步骤：
(1)把样本数据写入TFRecords二进制文件
(2)再从队列中读取
TFRecords是一种二进制文件，能更好地利用内存，更方便地复制和移动，并且不需要单独得标记文件

1、生成TFRecords文件

def main(unused_argv):
    with np.load("./MNIST_data/mnist.npz") as f:
        x_train, y_train = f['x_train'], f['y_train']
        x_train = np.expand_dims(x_train,axis=-1)
        x_test, y_test = f['x_test'], f['y_test']
        x_test = np.expand_dims (x_test, axis=-1)
    conver_to(x_train, y_train, "train")
    conver_to(x_test,  y_test,  "test")
    
def conver_to(images, labels, name):
    num_examples = labels.shape[0]
    if images.shape[0]!=num_examples:
        raise  ValueError("Images size %d does not match label size %d." %
                          (images.shape[0], num_examples))
    filename = os.path.join("./data", name+".tfrecords")
    print("Writing: ", filename)
    writer = tf.python_io.TFRecordWriter(filename)
    for index in range(num_examples):
        image_raw = images[index].tostring()
        examples = tf.train.Example(
            ### 此处是tf.train.Features不是tf.train.Feature
            features = tf.train.Features(
            	feature={
                	"label":_int64_feature(int(labels[index])),
                	"image_raw":_bytes_feature(image_raw)
        }))
        writer.write(examples.SerializeToString())
    writer.close()
    
### 对int类型进行处理
def _int64_feature(value):
    return tf.train.Feature(int64_list = tf.train.Int64List(value=[value]))
    
### 对string类型进行处理，转化成bytes类型
def _bytes_feature(value):
    return tf.train.Feature(bytes_list = tf.train.BytesList(value=[value]))
    
if __name__=="__main__":
    main("nihao")

2、从队列中读取数据

一旦生成了TFRecords文件，接下来就可以使用队列读取数据了，主要分三步：
(1)创建张量，从二进制文件中读取一个样本
(2)创建张量，从二进制文件中随机读取一个mini-batch
(3)把每一批张量传入网络作为输入节点

##  定义从文件中读取并解析一个样本
def read_and_decode(filename_queue):
    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_queue)  ### 返回文件名和文件
    features = tf.parse_single_example(
        serialized_example,
        features={
            "image_raw":tf.FixedLenFeature([], tf.string),
            "label":tf.FixedLenFeature([], tf.int64)
        })
    image = tf.decode_raw(features["image_raw"], out_type=tf.uint8)
    image.set_shape([784])
    image= tf.cast(image, tf.float32)*(1./255)-0.5
    label = tf.cast(features["label"], tf.int32)
    return image, label
    
def inputs(train, batch_size, num_epochs):
    if not num_epochs:
        num_epochs = None
    filename = os.path.join("./data", "train.tfrecords" if train else "test.tfrecords")
    with tf.name_scope("input"):
        filename_queue = tf.train.string_input_producer(
            [filename], num_epochs=num_epochs
        )
        image, label = read_and_decode(filename_queue)
        images, sparse_labels = tf.train.shuffle_batch(
            [image, label], batch_size=batch_size, num_threads=2,
            capacity=1000+3*batch_size, min_after_dequeue=1000
        )
        return images, sparse_labels
        
mnist_model = tf.keras.Sequential([
    ### 一定要加上[]
    tf.keras.layers.Conv2D(16,[3,3], activation='relu'),
    tf.keras.layers.Conv2D(16,[3,3], activation='relu'),
    ### 把下面一步改成“tf.keras.layers.MaxPooling2D(), tf.keras.layers.Flatten()”之后，准确率增加很多
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(10)
])

def run_training():
    with tf.Graph().as_default():
        ### 生成images与labels， 必须要放在数据流图之内，否则在logits = mnist_model(images, training=True)时，会报错，报错提示是不在同一个Graph内
        images, labels = inputs(train = True, batch_size=32,num_epochs=2)
        ### 将images进行reshape   [-1,784] ---> [-1.28,28,1]
        images = tf.reshape(images, shape=[-1,28,28,1])
        ### 定义logits
        logits = mnist_model(images, training=True)
        ### 定义损失函数
        loss = tf.losses.sparse_softmax_cross_entropy(labels = labels, logits=logits)
        ### 定义训练op
        train_op = tf.train.AdamOptimizer(0.001).minimize(loss)
        ### 定义初始化op 需在loss,train_op定义之后
        init_op  = tf.group(tf.global_variables_initializer(),
                            tf.local_variables_initializer())
        sess = tf.Session()
        ### 初始化
        sess.run(init_op)
        ### 多线程
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(sess=sess, coord=coord)
        import time
        try:
            step = 0
            while not coord.should_stop():
                start_time = time.time()
                _, loss_value = sess.run([train_op, loss])
                duration = time.time()
                if step%100 == 0:
                    print("Step %d: loss = %.2f (%.3f sec)" % (step, loss_value, duration))
                step += 1
        except tf.errors.OutOfRangeError:
            print("Done training for %d epochs, %d steps." % (2, step))
        finally:
            coord.request_stop()
        coord.join(threads)
        sess.close()
        
if __name__ == "__main__":
    run_training()

to do 1+1

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
tensorflow中如何加载数据

Tensorflow作为符号编程框架，需要先构建数据流图，再读取数据，随后进行模型训练。tensorflow官网给出了以下三种方法来加载数据。– 预加载数据：在tensorflow图中定义常量或变量来保存所有数据。– 填充数据：python产生数据，再把数据填充后端。– 从文件中读取数据：从文件中直接读取，让队列管理器从文件中读取数据。一、预加载数据x1 = tf.constant([[...
复制链接

扫一扫