tesorflow 常用函数和需要注意的地方（2）

最新推荐文章于 2022-05-06 09:00:00 发布

不一样的等待12305

最新推荐文章于 2022-05-06 09:00:00 发布

阅读量283

点赞数

分类专栏：深度学习

本文链接：https://blog.csdn.net/qq_39068872/article/details/101036158

版权

深度学习专栏收录该内容

29 篇文章 0 订阅

订阅专栏

文章目录

11 tf.nn.dropout

tf.nn.dropout(
    x,
    keep_prob=None,
    noise_shape=None,
    seed=None,
    name=None,
    rate=None
)

这里只有一点要强调，dropout在train过程起作用，test阶段将其设置为1
有的人这么说，有的人说test过程也要dropout这个问题暂时不太清楚，以后再解决

12 tf.rank

返回张量的维度

import tensorflow as tf

#  tf.rank(
#      input,
#      name=None
#      )
#
#  Returns the rank of a tensor.

#  tf.rank returns the dimension of a tensor, not the number of elements. For
#  instance, the output from tf.rank called for the 2x2 matrix would be 2.

x = tf.constant([[1, 2, 4]]) # 2x2 matrix
x2 = tf.constant([[1, 2, 4], [8, 16, 32]]) # 2x2 matrix
x3 = tf.constant([1, 2, 4]) # 1x1 matrix

with tf.Session() as sess:
    print(sess.run(tf.rank(x))) # 2
    print(sess.run(tf.rank(x2))) # 2
    print(sess.run(tf.rank(x3))) # 1

13 tf.Gradient

tf.gradients(
    ys,
    xs,
    grad_ys=None,
    name='gradients',
    colocate_gradients_with_ops=False,
    gate_gradients=False,
    aggregation_method=None,
    stop_gradients=None
)
tensorflow中有一个计算梯度的函数tf.gradients(ys, xs)，要注意的是，
**xs中的x必须要与ys相关，不相关的话，会报错**

ys : 类型是张量或者张量列表，类似于目标函数，需要被微分的函数
xs：类型是张量或者张量列表，需要求微分的对象。（上述即为:dys/dxs）
stop_gradients: 可选参数，类型是张量或者张量列表，不需要通过微分的对象

a = tf.constant(0.)
b = 2 * a
g = tf.gradients(a + b, [a, b])
with tf.Session() as sess:
    print(sess.run(g))
结果:[3.0, 1.0]

a = tf.constant(0.)
b = 2 * a
g = tf.gradients(a + b, [a, b], stop_gradients=[a])
with tf.Session() as sess:
    print(sess.run(g))
结果:[3.0, 1.0]

a = tf.constant(0.)
b = 2 * a
g = tf.gradients(a + b, [a, b], stop_gradients=[b])
with tf.Session() as sess:
    print(sess.run(g))
结果:[1.0, 1.0]

14 tf.stop_gradient

tf.stop_gradient(
input,
name=None
)
阻断梯度的反向传播按我的理解这个操作的作用是将图上的op变成一个常量tensor，举个例子：

import tensorflow as tf
w1 = tf.Variable(2.0)
w2 = tf.Variable(88.)
a = w1 + w2 #永远是100,不会因为反向传播迭代过程而改变，相当于一个constant tensor
a_stop = tf.stop_gradient(a)
gradient = tf.gradient(a_stop, [w1, w2])
#[None, None]

15 tf.reduce_prob

'''
tf.reduce_prod(
    input_tensor,
    axis=None,
    keepdims=None,
    name=None,
    reduction_indices=None,
    keep_dims=None
)
此函数计算一个张量的各个维度上元素的乘积（张量沿着某一维度计算乘积）。
Computes the product of elements across dimensions of a tensor. (deprecated arguments)。
这里的prod应该是product(乘积)
函数中的input_tensor是按照axis中已经给定的维度来减少的；除非 keep_dims 是true，否则张量的秩将在axis的每个条目中减少1；
如果keep_dims为true，则减小的维度将保留为长度1。
如果axis没有条目，则缩小所有维度，并返回具有单个元素的张量。
参数：
input_tensor：要减少的张量。应该有数字类型。
axis：要减小的尺寸。如果为None（默认），则将缩小所有尺寸。必须在[-rank(input_tensor), rank(input_tensor))范围内。沿着哪个维度缩减，
哪个维度就不存在了。
keep_dims：如果为true，则保留长度为1的缩小维度。
name：操作的名称（可选）。
reduction_indices：axis的废弃的名称。
返回：
结果返回减少的张量。
'''

import tensorflow as tf
a = tf.constant([i+1 for i in range(6)], shape=[2, 3])
sess = tf.Session()
b = tf.reduce_prod(a)
c = tf.reduce_prod(a, 0)
d = tf.reduce_prod(a, 1)
e = tf.reduce_prod(a, 1, keep_dims=True)
f = tf.reduce_prod(a, [0, 1])
print("b: ", sess.run(b))
print("c: ", sess.run(c))
print("d: ", sess.run(d))
print("e: ", sess.run(e))
print("f: ", sess.run(f))
sess.close()

output:

b:  720
c:  [ 4 10 18]
d:  [  6 120]
e:  [[  6]
 [120]]
f:  720


'''
tf.placeholder(
    dtype,
    shape=None,
    name=None
)
占位符，可以理解成C语言中的宏定义
'''
import tensorflow as tf
import numpy as np

s_1_flex = (None, None, None)
a = tf.placeholder(dtype=tf.int32, shape=s_1_flex, name="my_input")
b = tf.reduce_prod(a, name="prod_b")
c = tf.reduce_sum(a, name="sum_c")
d = tf.add(b, c, name="add_d")
sess = tf.Session()

#input_array = np.array([i+1 for i in range(24)]).reshape([2, 2， 3])
input_array = np.arange(1, 25, 1).reshape([3, 2, 4])  #tensor.reshape(a, b, c)表示每一维度的元素数目
input_dict = {a: input_array}
sess.run(c, feed_dict=input_dict)

output：

300

16 tf.unique

tf.unique(
    x,
    out_idx=tf.int32,
    name=None
)

在一维张量中找到唯一的元素.

该操作返回一个张量 y,该张量包含所有发生在 x 中的所有唯一的元素 x,它们按照相同的顺序排序.此操作还会返回一个与 x 具有相同大小的张量 idx,包含唯一的输出 y 中 x 的每个值的索引.
举个例子：

# tensor 'x' is [1, 1, 2, 4, 4, 4, 7, 8, 8]
y, idx = unique(x)
y ==> [1, 2, 4, 7, 8]
idx ==> [0, 0, 1, 2, 2, 2, 3, 4, 4]

函数参数：

x：一个 Tensor,是1维的.
out_idx：可选 tf.DType 来自：tf.int32, tf.int64,默认为 tf.int32.
name：操作的名称(可选).

函数返回值：

Tensor对象(y, idx)的元型态组.

y：一个 Tensor,与 x 类型相同.
idx：一个 out_idx 类型的 Tensor

17 tf.SarseTensor 类

TensorFlow表示一个稀疏张量,作为三个独立的稠密张量：indices,values和dense_shape.在Python中,三个张量被集合到一个SparseTensor类中,以方便使用.如果你有单独的indices,values和dense_shape张量,SparseTensor在传递给下面的操作之前,将它们包装在一个对象中.

具体来说,该稀疏张量SparseTensor(indices, values, dense_shape)包括以下组件,其中N和ndims分别是在SparseTensor中的值的数目和维度的数量：

indices：density_shape[N, ndims]的2-D int64张量,指定稀疏张量中包含非零值(元素为零索引)的元素的索引.例如,indices=[[1,3], [2,4]]指定索引为[1,3]和[2,4]的元素具有非零值.
values：任何类型和dense_shape [N]的一维张量,它提供了indices中的每个元素的值.例如,给定indices=[[1,3], [2,4]]的参数values=[18, 3.6]指定稀疏张量的元素[1,3]的值为18,张量的元素[2,4]的值为3.6.
dense_shape：density_shape[ndims]的一个1-D int64张量,指定稀疏张量的dense_shape.获取一个列表,指出每个维度中元素的数量.例如,dense_shape=[3,6]指定二维3x6张量,dense_shape=[2,3,4]指定三维2x3x4张量,并且dense_shape=[9]指定具有9个元素的一维张量.

b = tf.sparse.SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2], dense_shape=[3, 4])
dense_b = tf.sparse.to_dense(b)
with tf.Session() as sess:
    print(b.eval())
    # SparseTensorValue(indices=array([[0, 0],[1, 2]]), values=array([1, 2], dtype=int32), dense_shape=array([3, 4]))
    print(dense_b.eval())
    '''
    [[1, 0, 0, 0]
 [0, 0, 2, 0]
 [0, 0, 0, 0]]'''

18 tf.boolean_mask

tf.boolean_mask(
    tensor,
    mask,
    name='boolean_mask'
)

举个例子：

tensor = np.array([[1, 2], [3, 4], [5, 6]])
mask1 = np.array([True, False, True])
result1 = tf.boolean_mask(tensor, mask1)
mask2 = tensor > 3
result2 = tf.boolean_mask(tensor, mask2)
with tf.Session() as sess:
    print(mask1)
    #[ True False  True]
    print(result1.eval())
    ’‘’
    [[1 2]
    [5 6]]’‘’
    print(mask2)
    ‘’‘
    [[False False]
 [False  True]
 [ True  True]]’‘’
    print(result2.eval())
    #[4 5 6]

需要注意上面例子中mask和输出的维度。

19 关于bn层

bn层的原理早就已经分析过了，这里补充一下代码中要注意的地放，tensorflow使用bn层总会遇到下面一行代码

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) print(update_ops) # with tf.control_dependencies(update_ops):
   train_op = optimizer.minimize(loss)

这段代码的意思是，要想更新loss，要先执行updae_ops需要进行的操作，也就是说loss依赖于update_ops。
还有一点之前知道使用bn层时，在training的时候bn函数的设置training=True，而测试的时候training=False，不太明白这段的意思，下面看一下利用tf.nn.batch_norm函数实现bn层就会理解了。

import tensorflow as tf

# 实现Batch Normalization
def bn_layer(x,is_training,name='BatchNorm',moving_decay=0.9,eps=1e-5):
    # 获取输入维度并判断是否匹配卷积层(4)或者全连接层(2)
    shape = x.shape
    assert len(shape) in [2,4]

    param_shape = shape[-1]
    with tf.variable_scope(name):
        # 声明BN中唯一需要学习的两个参数，y=gamma*x+beta
        gamma = tf.get_variable('gamma',param_shape,initializer=tf.constant_initializer(1))
        beta  = tf.get_variable('beat', param_shape,initializer=tf.constant_initializer(0))

        # 计算当前整个batch的均值与方差
        #这里要特别注意，假设len（shape）=n，那就是对前n-1层进行求均值和方差，而保留了最后一位channel
        #也就是说，axes.shape=[c]
        #每个维度都会有一组gamma,beta,一共有2×c个参数要学习
        axes = list(range(len(shape)-1))
        batch_mean, batch_var = tf.nn.moments(x,axes,name='moments')

        # 采用滑动平均更新均值与方差
        ema = tf.train.ExponentialMovingAverage(moving_decay)

        def mean_var_with_update():
            ema_apply_op = ema.apply([batch_mean,batch_var])
            with tf.control_dependencies([ema_apply_op]):
                return tf.identity(batch_mean), tf.identity(batch_var)

        # 训练时，更新均值与方差，测试时使用之前最后一次保存的均值与方差
        mean, var = tf.cond(tf.equal(is_training,True),mean_var_with_update,
                lambda:(ema.average(batch_mean),ema.average(batch_var)))

        # 最后执行batch normalization
        return tf.nn.batch_normalization(x,mean,var,beta,gamma,eps)

利用tf.layer.batch_normal

* axis：应标准化的轴或轴的int列表,通常是轴的特征,例如,在一个具有data_format="channels_first"的Conv2D图层后面,设置axis=1.如果提供了轴列表,则axis中的每个轴将同时规范化；默认值是-1,它使用最后一个轴.注意：在使用多轴批处理范数的情况下,beta,gamma,moving_mean,和moving_variance变量与输入张量具有相同的秩,所有缩减(非轴)维度中的维度大小为1).
* momentum：移动平均值的动量.
* epsilon：添加到方差的小浮点数,以避免除以零.
* center：如果为True,则将beta的偏移量添加到标准化张量；如果为False,则忽略beta.
* scale：如果为True,则乘以gamma；如果为False,则不使用gamma.当下一层是线性的(例如,nn.relu)时,可以禁用此操作,因为缩放可以由下一层完成.
* beta_initializer：beta权重的初始值设定项.
* gamma_initializer：gamma权重的初始值设定项.
* moving_mean_initializer：移动平均值的初始值设定项.
* moving_variance_initializer：移动方差的初始值设定项.
* beta_regularizer：beta权重的可选正规化器.
* gamma_regularizer：gamma权重的可选正规化器.
* beta_constraint：由Optimizer更新后应用于beta权重的可选投影函数(例如,用于实现层权重的范数约束或值约束).该函数必须将未投影的变量作为输入,并且必须返回投影变量(必须具有相同的形状).在进行异步分布式培训时,使用约束是不安全的.
* gamma_constraint：由Optimizer更新后应用于gamma权重的可选投影功能.
* renorm：是否使用批量重规范化(https://arxiv.org/abs/1702.03275).这会在培训期间增加额外的变量；对于此参数的任一值,推断都是相同的.
* renorm_clipping：一个字典,可以将键'rmax','rmin','dmax'映射到用于剪辑重新校正的Tensors标量.校正(r, d)被用作corrected_value = normalized_value * r + d,其中,r的限幅为[RMIN,RMAX],d为[-dmax,DMAX]；丢失的rmax,rmin,dmax分别设定为inf,0,inf.
* renorm_momentum：使用renorm将动量(momentum)用于更新移动手段和标准偏差；与动量不同的是,这会影响训练,既不会太小(会增加噪音)也不会太大(这会产生过时的估计)；请注意,动量仍然应用于获取推理的均值和方差.
* fused：如果为None或者True,则尽可能使用更快,更融合的实现；如果为False,请使用系统推荐的实现.
* trainable：Boolean,如果为True,则还将变量添加到图集合GraphKeys.TRAINABLE_VARIABLES中(请参阅tf.Variable).
* virtual_batch_size：一个int,默认情况下,virtual_batch_size是None,这表示在整个批次中执行批量标准化.如果virtual_batch_size不是None,则执行“Ghost Batch Normalization”,创建虚拟子批次,每个子批次分别进行标准化(具有共享的gamma,beta和移动统计数据).必须在执行期间划分实际批量大小.
* adjustment：一个函数,它采用包含输入张量的(动态)形状Tensor并返回一对(scale, bias),以应用于标准化值(在gamma和beta之前),仅在训练期间；例如,如果axis == - 1,adjustment = lambda shape: ( tf.random_uniform(shape[-1:], 0.93, 1.07), tf.random_uniform(shape[-1:], -0.1, 0.1)),则将标准化值向上或向下缩放7％,然后将结果移动到最多0.1(对每个特征进行独立缩放和偏移,但在所有示例中共享),最后应用gamma或beta；如果为None,则不应用调整；如果指定了virtual_batch_size,则无法指定.
* name：字符串,图层的名称.

<font color=red> 注意在训练的时候执行下面的代码
```python
 update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
 with tf.control_dependencies(update_ops):
   train_op = optimizer.minimize(loss)
```

20 tf.GraphKeys.UPDATE_OPS

关于tf.GraphKeys.UPDATE_OPS，这是一个tensorflow的计算图中内置的一个集合，其中会保存一些需要在训练操作之前完成的操作，并配合tf.control_dependencies函数使用。
关于在batch_norm中，即为更新mean和variance的操作。通过下面一个例子可以看到tf.layers.batch_normalization中是如何实现的。

import tensorflow as tf

is_traing = tf.placeholder(dtype=tf.bool)
input = tf.ones([1, 2, 2, 3])
output = tf.layers.batch_normalization(input, training=is_traing)

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
print(update_ops)
# with tf.control_dependencies(update_ops):
    # train_op = optimizer.minimize(loss)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver = tf.train.Saver()
    saver.save(sess, "batch_norm_layer/Model")

 输出：
 [<tf.Tensor 'batch_normalization/AssignMovingAvg:0' shape=(3,) dtype=float32_ref>, <tf.Tensor 'batch_normalization/AssignMovingAvg_1:0' shape=(3,) dtype=float32_ref>]

可以看到输出的即为两个batch_normalization中更新mean和variance的操作，需要保证它们在train_op前完成。
这两个操作是在tensorflow的内部实现中自动被加入tf.GraphKeys.UPDATE_OPS这个集合的，在tf.contrib.layers.batch_norm的参数中可以看到有一项updates_collections的默认值即为tf.GraphKeys.UPDATE_OPS，而在tf.layers.batch_normalization中则是直接将两个更新操作放入了上述集合。

21 tf.nn.normalize

第二种Normalization对于每个样本缩放到单位范数（每个样本的范数为1），主要有L1-normalization（L1范数）、L2-normalization（L2范数）等

Normalization主要思想是对每个样本计算其p-范数，然后对该样本中每个元素除以该范数，这样处理的结果是使得每个处理后样本的p-范数（比如l1-norm,l2-norm）等于1。
p-范式的计算公式：
$X||_p=((x_1)^p+(x_2)^p+...+(x_n)^p)^{1/p}$
tensorflow中实现这一方法的函数如下：

tf.nn.l2_normalize(x,
                  dim,
                  epsilon=1e-12,
                  name=None)

上式：
x为输入的向量；
dim为l2范化的维数，dim取值为0或0或1或[0,1]；
epsilon的范化的最小值边界；
下面看个例子：

#-*-coding:utf-8-*-
import tensorflow as tf
input_data = tf.constant([[1.0,2,3],[4.0,5,6],[7.0,8,9]])
output_1 = tf.nn.l2_normalize(input_data, dim=0, epsilon=1e-10, name='nn_l2_norm')
output_2 = tf.nn.l2_normalize(input_data, dim=1, epsilon=1e-10, name='nn_l2_norm')
output_3 = tf.nn.l2_normalize(input_data, dim=[0, 1], epsilon=1e-10, name='nn_l2_norm')


with tf.Session() as sess:
    print(output_1.eval())
    print(output_2.eval())
    print(output_3.eval())

‘’’output:
  [[0.12309149 0.20739034 0.26726127]
 [0.49236596 0.51847583 0.53452253]
 [0.86164045 0.82956135 0.80178374]]


[[0.26726124 0.5345225  0.8017837 ]
 [0.45584232 0.5698029  0.6837635 ]
 [0.5025707  0.5743665  0.64616233]]


[[0.05923489 0.11846977 0.17770466]
 [0.23693955 0.29617444 0.35540932]
 [0.4146442  0.4738791  0.53311396]]
'''

dim = 0, 为按列进行l2范化
$\sqrt{1^2+4^2+7^2}=\sqrt{66}$
$\sqrt{2^2+5^2+8^2}=\sqrt{93}$
$\sqrt{3^2+6^2+9^2}=\sqrt{126}$

[[1./norm(1), 2./norm(2) , 3./norm(3) ]
[4./norm(1) , 5./norm(2) , 6./norm(3) ]    =
[7./norm(1) , 8./norm(2) , 9./norm(3) ]]
[[0.12309149 0.20739034 0.26726127]
[0.49236596 0.51847583 0.53452253]
[0.86164045 0.82956135 0.80178374]]

dim=1,为按行进行l2范化
$\sqrt{1^2+2^2+3^2}=\sqrt{14}$
$\sqrt{4^2+5^2+6^2}=\sqrt{77}$
$\sqrt{7^2+8^2+9^2}=\sqrt{194}$

[[1./norm(1), 2./norm(1) , 3./norm(1) ]
[4./norm(2) , 5./norm(2) , 6./norm(2) ]    =
[7./norm(3) , 8..norm(3) , 9./norm(3) ]]
[[0.12309149 0.20739034 0.26726127]
[0.49236596 0.51847583 0.53452253]
[0.86164045 0.82956135 0.80178374]]

dim=[1, 2],按行列进行l2范化

$norm=\sqrt{1^2+2^2+3^2+4^2+5^2+6^2+7^2+8^2+9^2}=\sqrt{285}$ 16.1882

[[1./norm, 2./norm , 3./norm ]
[4./norm , 5./norm , 6./norm ]    =
[7./norm , 8./norm , 9./norm ]]
[[0.05923489 0.11846977 0.17770466]
 [0.23693955 0.29617444 0.35540932]
 [0.4146442  0.4738791  0.53311396]]

22 权重衰减（weight decay），l2正则化

L2正则化的目的就是为了让权重衰减到更小的值，在一定程度上减少模型过拟合的问题，所以权重衰减也叫L2正则化。
L2正则化就是在代价函数后面再加上一个正则化项：
$c=c_0+\frac{\lambda}{2n}\Sigma_ww^2$
其中C0代表原始的代价函数，后面那一项就是L2正则化项，它是这样来的：所有参数w的平方的和，除以训练集的样本大小n。λ就是正则项系数，权衡正则项与C0项的比重。另外还有一个系数1/2，1/2 1/211经常会看到，主要是为了后面求导的结果方便，后面那一项求导会产生一个2，与1/2相乘刚好凑整为1。系数λ就是权重衰减系数。

23 slim.conv2d参数

convolution(inputs,
          num_outputs,
          kernel_size,
          stride=1,
          padding='SAME',
          data_format=None,
          rate=1,
          activation_fn=nn.relu,
          normalizer_fn=None,
          normalizer_params=None,
          weights_initializer=initializers.xavier_initializer(),
          weights_regularizer=None,
          biases_initializer=init_ops.zeros_initializer(),
          biases_regularizer=None,
          reuse=None,
          variables_collections=None,
          outputs_collections=None,
          trainable=True,
          scope=None)

inputs                        是指需要做卷积的输入图像
num_outputs             指定卷积核的个数（就是filter的个数）
kernel_size               用于指定卷积核的维度（卷积核的宽度，卷积核的高度）
stride                         为卷积时在图像每一维的步长
padding                     为padding的方式选择，VALID或者SAME
data_format              是用于指定输入的input的格式
rate                           这个参数不是太理解，而且tf.nn.conv2d中也没有，对于使用atrous convolution的膨胀率（不是太懂这个atrous convolution）
activation_fn             用于激活函数的指定，默认的为ReLU函数
normalizer_fn           用于指定正则化函数
normalizer_params  用于指定正则化函数的参数
weights_initializer     用于指定权重的初始化程序
weights_regularizer  为权重可选的正则化程序
biases_initializer       用于指定biase的初始化程序
biases_regularizer    biases可选的正则化程序
reuse                        指定是否共享层或者和变量
variable_collections  指定所有变量的集合列表或者字典
outputs_collections   指定输出被添加的集合
trainable                    卷积层的参数是否可被训练
scope                        共享变量所指的variable_scope

24 slim.max_pool2d 参数

def max_pool2d(inputs,
               kernel_size,
               stride=2,
               padding='VALID',
               data_format=DATA_FORMAT_NHWC,
               outputs_collections=None,
               scope=None):

不一样的等待12305

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
tesorflow 常用函数和需要注意的地方（2）

文章目录11 tf.nn.dropout12 tf.rank13 tf.Gradient14 tf.stop_gradient15 tf.reduce_prob16 tf.unique17 tf.SarseTensor 类18 tf.boolean_mask19 关于bn层20 tf.GraphKeys.UPDATE_OPS21 tf.nn.normalize22 权重衰减（weight deca...
复制链接

扫一扫