1.reader = tf.TextLineReader(),每次读取一行
阅读器的read
方法会输出一个key来表征输入的文件和其中的纪录(对于调试非常有用),同时得到一个字符串标量, 这个字符串标量可以被一个或多个解析器,或者转换操作将其解码为张量并且构造成为样本。
file1.csv内容
100 | 10 | 11 | 12 | 0 |
101 | 10 | 11 | 12 | 0 |
102 | 10 | 11 | 12 | 0 |
103 | 10 | 11 | 12 | 0 |
104 | 10 | 11 | 12 | 0 |
105 | 10 | 11 | 12 | 0 |
106 | 10 | 11 | 12 | 0 |
file0.csv内容
1 | 10 | 11 | 12 | 0 |
2 | 10 | 11 | 12 | 0 |
3 | 10 | 11 | 12 | 0 |
4 | 10 | 11 | 12 | 0 |
5 | 10 | 11 | 12 | 0 |
6 | 10 | 11 | 12 | 0 |
7 | 10 | 11 | 12 | 0 |
import tensorflow as tf
filename_queue = tf.train.string_input_producer(["file0.csv", "file1.csv"])
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
# Default values, in case of empty columns. Also specifies the type of the
# decoded result.
record_defaults = [[1], [1], [1], [1], [1]]
col1, col2, col3, col4, col5 = tf.decode_csv(
value, record_defaults=record_defaults)
#features = tf.concat(0, [col1, col2, col3, col4])
features = [col1, col2, col3, col4]
with tf.Session() as sess:
# Start populating the filename queue.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(1200):
# Retrieve a single instance:
example, label = sess.run([features, col5])
print(example,label)
coord.request_stop()
coord.join(threads)
输出结果
[1, 10, 11, 12] 0
[2, 10, 11, 12] 0
[3, 10, 11, 12] 0
[4, 10, 11, 12] 0
[5, 10, 11, 12] 0
[6, 10, 11, 12] 0
[7, 10, 11, 12] 0
[100, 10, 11, 12] 0
[101, 10, 11, 12] 0
[102, 10, 11, 12] 0
[103, 10, 11, 12] 0
[104, 10, 11, 12] 0
[105, 10, 11, 12] 0
[106, 10, 11, 12] 0
[100, 10, 11, 12] 0
[101, 10, 11, 12] 0
[102, 10, 11, 12] 0
[103, 10, 11, 12] 0
[104, 10, 11, 12] 0
[105, 10, 11, 12] 0
[106, 10, 11, 12] 0
[1, 10, 11, 12] 0
[2, 10, 11, 12] 0
可以看出一个文件的内容是按顺序读的,没有打乱。
2.固定长度记录
从二进制文件中读取固定长度纪录, 可以使用tf.FixedLengthRecordReader
的tf.decode_raw
操作。decode_raw
操作可以讲一个字符串转换为一个uint8的张量。