python队列数组_python – 将一个队列连接到tensorflow中的numpy数组,以获取数据而不是文件？...

最新推荐文章于 2023-06-03 02:15:22 发布

weixin_39740419

最新推荐文章于 2023-06-03 02:15:22 发布

阅读量113

点赞数

文章标签： python队列数组

我已经阅读了CNN Tutorial on the TensorFlow,我正在尝试为我的项目使用相同的模型.

现在的问题是数据读取.我有大约25000张图像用于培训,大约5000张用于测试和验证.文件是png格式,我可以读取它们并将它们转换为numpy.ndarray.

教程中的CNN示例使用队列从提供的文件列表中获取记录.我试图通过将我的图像重塑为一维数组并在其前面附加标签值来创建我自己的二进制文件.所以我的数据看起来像这样

[[1,12,34,24,53,...,105,234,102],

[12,112,43,24,52,...,115,244,98],

....

]

上述数组的单行长度为22501,其中第一个元素是标签.

我将文件转储到使用pickle并尝试使用文件读取文件

tf.FixedLengthRecordReader从文件读取为demonstrated in example

我正在做与cifar10_input.py中给出的相同的事情来读取二进制文件并将它们放入记录对象中.

现在,当我从文件中读取标签和图像值不同时.我可以理解这是因为pickle还在二进制文件中转储大括号和括号的额外信息,并且它们更改了固定长度的记录大小.

上面的示例使用文件名并将其传递给队列以获取文件,然后将队列传递给文件中的单个记录.

我想知道我是否可以将上面定义的numpy数组而不是文件名传递给某些阅读器,它可以从该数组而不是文件中逐个获取记录.

解决方法:

使用CNN示例代码使数据工作的最简单方法可能是修改版本的read_cifar10()并使用它：

>写出包含numpy数组内容的二进制文件.

import numpy as np

images_and_labels_array = np.array([[...], ...], # [[1,12,34,24,53,...,102],

# [12,112,43,24,52,...,98],

# ...]

dtype=np.uint8)

images_and_labels_array.tofile("/tmp/images.bin")

此文件类似于CIFAR10数据文件中使用的格式.您可能希望生成多个文件以获得读取并行性.请注意,ndarray.tofile()以行主顺序写入二进制数据而没有其他元数据; pickle数组将添加TensorFlow的解析例程无法理解的特定于Python的元数据.

>编写一个处理记录格式的read_cifar10()的修改版本.

def read_my_data(filename_queue):

class ImageRecord(object):

pass

result = ImageRecord()

# Dimensions of the images in the dataset.

label_bytes = 1

# Set the following constants as appropriate.

result.height = IMAGE_HEIGHT

result.width = IMAGE_WIDTH

result.depth = IMAGE_DEPTH

image_bytes = result.height * result.width * result.depth

# Every record consists of a label followed by the image, with a

# fixed number of bytes for each.

record_bytes = label_bytes + image_bytes

assert record_bytes == 22501 # Based on your question.

# Read a record, getting filenames from the filename_queue. No

# header or footer in the binary, so we leave header_bytes

# and footer_bytes at their default of 0.

reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)

result.key, value = reader.read(filename_queue)

# Convert from a string to a vector of uint8 that is record_bytes long.

record_bytes = tf.decode_raw(value, tf.uint8)

# The first bytes represent the label, which we convert from uint8->int32.

result.label = tf.cast(

tf.slice(record_bytes, [0], [label_bytes]), tf.int32)

# The remaining bytes after the label represent the image, which we reshape

# from [depth * height * width] to [depth, height, width].

depth_major = tf.reshape(tf.slice(record_bytes, [label_bytes], [image_bytes]),

[result.depth, result.height, result.width])

# Convert from [depth, height, width] to [height, width, depth].

result.uint8image = tf.transpose(depth_major, [1, 2, 0])

return result

def distorted_inputs(data_dir, batch_size):

"""[...]"""

filenames = ["/tmp/images.bin"] # Or a list of filenames if you

# generated multiple files in step 1.

for f in filenames:

if not gfile.Exists(f):

raise ValueError('Failed to find file: ' + f)

# Create a queue that produces the filenames to read.

filename_queue = tf.train.string_input_producer(filenames)

# Read examples from files in the filename queue.

read_input = read_my_data(filename_queue)

reshaped_image = tf.cast(read_input.uint8image, tf.float32)

# [...] (Maybe modify other parameters in here depending on your problem.)

考虑到您的起点,这是一个最小的步骤.使用TensorFlow ops进行PNG解码可能更有效,但这将是一个更大的变化.

标签：python,tensorflow,machine-learning

weixin_39740419

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python队列数组_python – 将一个队列连接到tensorflow中的numpy数组,以获取数据而不是文件？...

我已经阅读了CNN Tutorial on the TensorFlow,我正在尝试为我的项目使用相同的模型.现在的问题是数据读取.我有大约25000张图像用于培训,大约5000张用于测试和验证.文件是png格式,我可以读取它们并将它们转换为numpy.ndarray.教程中的CNN示例使用队列从提供的文件列表中获取记录.我试图通过将我的图像重塑为一维数组并在其前面附加标签值来创建我自己的二进制文...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。