如何使用tfrecord？

最新推荐文章于 2024-04-28 22:50:14 发布

tianzhiya121

最新推荐文章于 2024-04-28 22:50:14 发布

阅读量386

点赞数 1

分类专栏： tensorflow学习

本文链接：https://blog.csdn.net/tianzhiya121/article/details/88805440

版权

tensorflow学习专栏收录该内容

6 篇文章 0 订阅

订阅专栏

目的

了解tfrecord文件格式，并学会如何write和read此文件。

tfrecord文件内部快速浏览

一、传统方法

如果你的每一条特征都是列表，且列表中包含着相同类型的值，例如

图像等。

1.创建包含特征的列表，使用tf.train.BytesList,tf.train.FLoatList,tf.train.Int64List

字段features包含一个或多个: feature={"key": tf.train.Feature()}
feature是基于key-value对的存储，key是字符串,其映射到的是value 包含3种数据类型：
1. BytesList: 字符串列表: tf.train.BytesList(value=[value])
2. FloatList: 浮点数列表tf.train.FloatList()
3. Int64List: 64位整数列表tf.train.Int64List()
对于图片的numpy数组，可以.tostring之后存到BytesList，可以tf.gfile.FastGFile读入成bytes存到BytesList，可以.flatten后存到FloatList
原文：https://blog.csdn.net/weiweixiao3/article/details/82352062

movie_name_list = tf.train.BytesList(value=[b'The Shawshank Redemption', b'Fight Club'])
movie_rating_list = tf.train.FloatList(value=[9.0, 9.7])

2.使用tf.train.Feature创建包装后的列表，以便于tensorflow可以理解。

举例：

movie_names = tf.train.Feature(bytes_list=movie_name_list)
movie_ratings = tf.train.Feature(float_list=movie_rating_list)

3.以特征名称为键值，特征为对应值，创建字典。将字典赋予tf.train.Features的feature属性，创建tf.train.Features对象

movie_dict = {
  'Movie Names': movie_names,
  'Movie Ratings': movie_ratings
}
movies = tf.train.Features(feature=movie_dict)

4.使用tf.train.Example将tf.train.Features对象存储进tf.train.Example的features属性中

example = tf.train.Example(features=movies)

可以使用tf.train.Example.FromString（）来解析信息

example_proto = tf.train.Example.FromString(serialized_example)
example_proto

features {
  feature {
    key: "feature0"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "feature1"
    value {
      int64_list {
        value: 4
      }
    }
  }
  feature {
    key: "feature2"
    value {
      bytes_list {
        value: "goat"
      }
    }
  }
  feature {
    key: "feature3"
    value {
      float_list {
        value: 0.9876000285148621
      }
    }
  }
}

5.将文件路径传给tf.python_io.TFRecordWriter，创建tf.python_io.TFRecordWriter对象writer。调用tf.train.Features对象的serializeToString方法，将结构化数据序列化。调用对象writer将序列化数据写入磁盘。

with tf.python_io.TFRecordWriter('customer_1.tfrecord') as writer:
    writer.write(example.SerializeToString())

如何读取TFRecords结构数据

创建tf.TFRecordReader对象reader
使用reader从.tfrecords文件中读取序列化数据serialized_sample
创建features字典，字典中包含着你想从tfrecord中读取的关键字以及对应值的类型，之后将features字典和序列化数据传入tf.parse_single_example()进行解析。得到包含期望数据的字典。

1）tf.parse_single_example(serialized,features=None,name= None

解析一个单一的Example原型
serialized : 标量字符串的Tensor，一个序列化的Example,文件经过文件阅读器之后的value
features :字典数据，key为读取的名字，value为FixedLenFeature
return : 一个键值对组成的字典，键为读取的名字
（2）tf.FixedLenFeature(shape,dtype)

shape : 输入数据的形状，一般不指定，为空列表
dtype : 输入数据类型，与存储进文件的类型要一致，类型只能是float32，int 64, string
return : Tensor (即使有零的部分也存储）

https://blog.csdn.net/chengshuhao1991/article/details/78656724

# Read and print data:
sess = tf.InteractiveSession()

# Read TFRecord file
reader = tf.TFRecordReader()
filename_queue = tf.train.string_input_producer(['customer_1.tfrecord'])

_, serialized_example = reader.read(filename_queue)

# Define features
read_features = {
    'Age': tf.FixedLenFeature([], dtype=tf.int64),
    'Movie': tf.VarLenFeature(dtype=tf.string),
    'Movie Ratings': tf.VarLenFeature(dtype=tf.float32),
    'Suggestion': tf.FixedLenFeature([], dtype=tf.string),
    'Suggestion Purchased': tf.FixedLenFeature([], dtype=tf.float32),
    'Purchase Price': tf.FixedLenFeature([], dtype=tf.float32)}

# Extract features from serialized data
read_data = tf.parse_single_example(serialized=serialized_example,
                                    features=read_features)

# Many tf.train functions use tf.train.QueueRunner,
# so we need to start it before we read
tf.train.start_queue_runners(sess)

# Print features
for name, tensor in read_data.items():
    print('{}: {}'.format(name, tensor.eval()))

二、使用tf.data读写tfrecord文件

在创建tfrecord数据特征feature0, feature1,feature2,feature3之后，使用tf.data读写数据：

创建dataset对象，

feature_dataset = tf.data.Dataset.from_tensor_slices(feature0, feature1, feature2, feature3)

# Use `take(1)` to only pull one example from the dataset.
for f0,f1,f2,f3 in features_dataset.take(1):
  print(f0)
  print(f1)
  print(f2)
  print(f3)

使用tf.data.Dataset.map方法映射函数到Dataset的每一个元素。

map函数必须操作并返回tf.Tensors,一个非张量的函数例如必须用tf.py_func包装。

serialized_features_dataset = features_dataset.map(serialize_example)

写入数据

filename = 'test.tfrecord'
writer = tf.data.experimental.TFRecordWriter(filename)
writer.write(serialized_features_dataset)

读取数据

filenames = [filename]
raw_dataset = tf.data.TFRecordDataset(filenames)
raw_dataset

dataset包含序列化的tf.train.Example信息。当迭代结束，它返回字符串张量。

for raw_record in raw_dataset.take(10):
  print(repr(raw_record))

<tf.Tensor: id=43, shape=(), dtype=string, numpy=b'\nQ\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x00\n\x13\n\x08feature2\x12\x07\n\x05\n\x03cat\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\x14\xc4\xd0?'>
<tf.Tensor: id=45, shape=(), dtype=string, numpy=b'\nS\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x03\n\x15\n\x08feature2\x12\t\n\x07\n\x05horse\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\x0f\x91W\xbf\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00'>
<tf.Tensor: id=47, shape=(), dtype=string, numpy=b'\nS\n\x15\n\x08feature2\x12\t\n\x07\n\x05horse\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\x12H\xa2\xbc\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x03'>
<tf.Tensor: id=49, shape=(), dtype=string, numpy=b'\nQ\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x01\n\x13\n\x08feature2\x12\x07\n\x05\n\x03dog\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\x04\xc4\x07@\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01'>
<tf.Tensor: id=51, shape=(), dtype=string, numpy=b'\nQ\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x00\n\x13\n\x08feature2\x12\x07\n\x05\n\x03cat\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\xd2\x81\x96>'>
<tf.Tensor: id=53, shape=(), dtype=string, numpy=b'\nU\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x02\n\x17\n\x08feature2\x12\x0b\n\t\n\x07chicken\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\xd9\xa8\x9a\xbe'>
<tf.Tensor: id=55, shape=(), dtype=string, numpy=b'\nQ\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x00\n\x13\n\x08feature2\x12\x07\n\x05\n\x03cat\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\x97\r\xd3>'>
<tf.Tensor: id=57, shape=(), dtype=string, numpy=b'\nU\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x02\n\x17\n\x08feature2\x12\x0b\n\t\n\x07chicken\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\xd7>\xe4?'>
<tf.Tensor: id=59, shape=(), dtype=string, numpy=b'\nQ\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x01\n\x13\n\x08feature2\x12\x07\n\x05\n\x03dog\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\xfa\xc9\x0c@'>
<tf.Tensor: id=61, shape=(), dtype=string, numpy=b'\nU\n\x17\n\x08feature2\x12\x0b\n\t\n\x07chicken\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\xd6\xe1g>\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x02'>

# Create a description of the features.  
feature_description = {
    'feature0': tf.FixedLenFeature([], tf.int64, default_value=0),
    'feature1': tf.FixedLenFeature([], tf.int64, default_value=0),
    'feature2': tf.FixedLenFeature([], tf.string, default_value=''),
    'feature3': tf.FixedLenFeature([], tf.float32, default_value=0.0),
}

def _parse_function(example_proto):
  # Parse the input tf.Example proto using the dictionary above.
  return tf.parse_single_example(example_proto, feature_description)

parsed_dataset = raw_dataset.map(_parse_function)
parsed_dataset

<MapDataset shapes: {feature3: (), feature0: (), feature1: (), feature2: ()}, types: {feature3: tf.float32, feature0: tf.int64, feature1: tf.int64, feature2: tf.string}>

2.https://www.tensorflow.org/tutorials/load_data/tf_records

3.https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/18_TFRecords_Dataset_API.ipynb

4.https://www.tensorflow.org/tutorials/load_data/tf_records

tianzhiya121

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
如何使用tfrecord？

目的了解tfrecord文件格式，并学会如何write和read此文件。tfrecord文件内部快速浏览一、传统方法如果你的每一条特征都是列表，且列表中包含着相同类型的值，例如图像等。 1.创建包含特征的列表，使用tf.train.BytesList,tf.train.FLoatList,tf.train.Int64List 字段features包...
复制链接

扫一扫