python 随机padding_深入理解Tensorflow中的masking和padding

最新推荐文章于 2024-07-30 11:00:40 发布

weixin_39898854

最新推荐文章于 2024-07-30 11:00:40 发布

阅读量306

点赞数

文章标签： python 随机padding

TensorFlow是一个采用数据流图(data flow graphs)，用于数值计算的开源软件库。节点(Nodes)在图中表示数学操作，图中的线(edges)则表示在节点间相互联系的多维数据数组，即张量(tensor)。它灵活的架构让你可以在多种平台上展开计算，例如台式计算机中的一个或多个CPU(或GPU)，服务器，移动设备等等。TensorFlow 最初由Google大脑小组(隶属于Google机器智能研究机构)的研究员和工程师们开发出来，用于机器学习和深度神经网络方面的研究，但这个系统的通用性使其也可广泛用于其他计算领域。

声明：

需要读者对tensorflow和深度学习有一定了解

tf.boolean_mask实现类似numpy数组的mask操作

Python的numpy array可以使用boolean类型的数组作为索引，获得numpy array中对应boolean值为True的项。示例如下：

# numpy array中的boolean mask

import numpy as np

target_arr = np.arange(5)

print "numpy array before being masked:"

print target_arr

mask_arr = [True, False, True, False, False]

masked_arr = target_arr[mask_arr]

print "numpy array after being masked:"

print masked_arr

运行结果如下：

numpy array before being masked: [0 1 2 3 4] numpy array after being masked: [0 2]

tf.boolean_maks对目标tensor实现同上述numpy array一样的mask操作，该函数的参数也比较简单，如下所示：

tf.boolean_mask(

tensor, # target tensor

mask, # mask tensor

axis=None,

name='boolean_mask'

)

下面，我们来尝试一下tf.boolean_mask函数，示例如下：

import tensorflow as tf

# tensorflow中的boolean mask

target_tensor = tf.constant([[1, 2], [3, 4], [5, 6]])

mask_tensor = tf.constant([True, False, True])

masked_tensor = tf.boolean_mask(target_tensor, mask_tensor, axis=0)

sess = tf.InteractiveSession()

print masked_tensor.eval()

mask tensor中的第0和第2个元素是True，mask axis是第0维，也就是我们只选择了target tensor的第0行和第1行。

[[1 2] [5 6]]

如果把mask tensor也换成2维的tensor会怎样呢？

mask_tensor2 = tf.constant([[True, False], [False, False], [True, False]])

masked_tensor2 = tf.boolean_mask(target_tensor, mask_tensor, axis=0)

print masked_tensor2.eval()

[[1 2] [5 6]]

我们发现，结果不是[[1], [5]]。tf.boolean_mask不做元素维度的mask，tersorflow中有tf.ragged.boolean_mask实现元素维度的mask。

tf.ragged.boolean_mask

tf.ragged.boolean_mask(

data,

mask,

name=None

)

tensorflow中的sparse向量和sparse mask tensorflow中的sparse tensor由三部分组成，分别是indices、values、dense_shape。对于稀疏张量SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2], dense_shape=[3, 4])，转化成dense tensor的值为：

[[1, 0, 0, 0] [0, 0, 2, 0] [0, 0, 0, 0]]

使用tf.sparse.mask可以对sparse tensor执行mask操作。

tf.sparse.mask(

mask_indices,

name=None

)

上文定义的sparse tensor有1和2两个值，对应的indices为[[0, 0], [1, 2]]，执行tf.sparsse.mask(a, [[1, 2]])后，稀疏向量转化成dense的值为：

[[1, 0, 0, 0] [0, 0, 0, 0] [0, 0, 0, 0]]

由于tf.sparse中的大多数函数都只在tensorflow2.0版本中有，所以没有实例演示。

padded_batch

tf.Dataset中的padded_batch函数，根据输入序列中的最大长度，自动的pad一个batch的序列。

padded_batch(

batch_size,

padded_shapes,

padding_values=None,

drop_remainder=False

)

这个函数与tf.Dataset中的batch函数对应，都是基于dataset构造batch，但是batch函数需要dataset中的所有样本形状相同，而padded_batch可以将不同形状的样本在构造batch时padding成一样的形状。

elements = [[1, 2],

[3, 4, 5],

[6, 7],

[8]]

A = tf.data.Dataset.from_generator(lambda: iter(elements), tf.int32)

B = A.padded_batch(2, padded_shapes=[None])

B_iter = B.make_one_shot_iterator()

print B_iter.get_next().eval()

[[1 2 0] [3 4 5]]

总结

到此这篇关于深入理解Tensorflow中的masking和padding的文章就介绍到这了,更多相关Tensorflow中的masking和padding内容请搜索python博客以前的文章或继续浏览下面的相关文章希望大家以后多多支持python博客！

weixin_39898854

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫