python函数映射教学_如何正确映射python函数然后在Tensorflow中批处理数据集

最新推荐文章于 2022-07-23 17:54:02 发布

weixin_39943926

最新推荐文章于 2022-07-23 17:54:02 发布

阅读量136

点赞数

文章标签： python函数映射教学

本文介绍了一种使用TensorFlow处理非标准文件格式的方法，并探讨了如何将这些文件数据以特定形状和大小的批次提供给神经网络进行训练。通过使用tf.data.Dataset的unbatch和shuffle方法，可以有效地将大型数据集分割成所需的批次大小。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

I wish to create a pipeline to provide non-standard files to the neural network (for example with extension *.xxx).

Currently I have structured my code as follows:

1) I define a list of paths where to find training files

2) I define an instance of the tf.data.Dataset object containing these paths

3) I map to the Dataset a python function that takes each path and returns the associated numpy array (loaded from the folder on the pc); this array is a matrix with dimensions [256, 256, 192].

4) I define an initializable iterator and then use it during network training.

My doubt lies in the size of the batch I provide to the network. I would like to have batches of size 64 supplied to the network. How could I do?

For example, if I use the function train_data.batch(b_size) with b_size = 1 the result is that when iterated, the iterator gives one element of shape [256, 256, 192]; what if I wanted to feed the neural net with just 64 slices of this array?

This is an extract of my code:

with tf.name_scope('data'):

train_filenames = tf.constant(list_of_files_train)

train_data = tf.data.Dataset.from_tensor_slices(train_filenames)

train_data = train_data.map(lambda filename: tf.py_func(

self._parse_xxx_data, [filename], [tf.float32]))

train_data.shuffle(buffer_size=len(list_of_files_train))

train_data.batch(b_size)

iterator = tf.data.Iterator.from_structure(train_data.output_types, train_data.output_shapes)

input_data = iterator.get_next()

train_init = iterator.make_initializer(train_data)

[...]

with tf.Session() as sess:

sess.run(train_init)

_ = sess.run([self.train_op])

Thanks in advance

----------

I posted a solution to my problem in the comments below. I would still be happy to receive any comment or suggestion on possible improvements. Thank you ;)

解决方案

It's been a long time but I'll post a possible solution to batch the dataset with custom shape in TensorFlow, in case someone may need it.

The module tf.data offers the method unbatch() to unwrap the content of each dataset element. One can first unbatch and than batch again the dataset object in the desired way. Oftentimes, a good idea may also be shuffling the unbatched dataset before batching it again (so that we have random slices from random elements in each batch):

with tf.name_scope('data'):

train_filenames = tf.constant(list_of_files_train)

train_data = tf.data.Dataset.from_tensor_slices(train_filenames)

train_data = train_data.map(lambda filename: tf.py_func(

self._parse_xxx_data, [filename], [tf.float32]))

# un-batch first, then batch the data

train_data = train_data.apply(tf.data.experimental.unbatch())

train_data.shuffle(buffer_size=BSIZE)

train_data.batch(b_size)

# [...]