python函数映射教学_如何正确映射python函数然后在Tensorflow中批处理数据集

I wish to create a pipeline to provide non-standard files to the neural network (for example with extension *.xxx).

Currently I have structured my code as follows:

1) I define a list of paths where to find training files

2) I define an instance of the tf.data.Dataset object containing these paths

3) I map to the Dataset a python function that takes each path and returns the associated numpy array (loaded from the folder on the pc); this array is a matrix with dimensions [256, 256, 192].

4) I define an initializable iterator and then use it during network training.

My doubt lies in the size of the batch I provide to the network. I would like to have batches of size 64 supplied to the network. How could I do?

For example, if I use the function train_data.batch(b_size) with b_size = 1 the result is that when iterated, the iterator gives one element of shape [256, 256, 192]; what if I wanted to feed the neural net with just 64 slices of this array?

This is an extract of my code:

with tf.name_scope('data'):

train_filenames = tf.constant(list_of_files_train)

train_data = tf.data.Dataset.from_tensor_slices(train_filenames)

train_data = train_data.map(lambda filename: tf.py_func(

self._parse_xxx_data, [filename], [tf.float32]))

train_data.shuffle(buffer_size=len(list_of_files_train))

train_data.batch(b_size)

iterator = tf.data.Iterator.from_structure(train_data.output_types, train_data.output_shapes)

input_data = iterator.get_next()

train_init = iterator.make_initializer(train_data)

[...]

with tf.Session() as sess:

sess.run(train_init)

_ = sess.run([self.train_op])

Thanks in advance

----------

I posted a solution to my problem in the comments below. I would still be happy to receive any comment or suggestion on possible improvements. Thank you ;)

解决方案

It's been a long time but I'll post a possible solution to batch the dataset with custom shape in TensorFlow, in case someone may need it.

The module tf.data offers the method unbatch() to unwrap the content of each dataset element. One can first unbatch and than batch again the dataset object in the desired way. Oftentimes, a good idea may also be shuffling the unbatched dataset before batching it again (so that we have random slices from random elements in each batch):

with tf.name_scope('data'):

train_filenames = tf.constant(list_of_files_train)

train_data = tf.data.Dataset.from_tensor_slices(train_filenames)

train_data = train_data.map(lambda filename: tf.py_func(

self._parse_xxx_data, [filename], [tf.float32]))

# un-batch first, then batch the data

train_data = train_data.apply(tf.data.experimental.unbatch())

train_data.shuffle(buffer_size=BSIZE)

train_data.batch(b_size)

# [...]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值