tensorflow clip_by_norm函数理解

最新推荐文章于 2025-03-29 23:50:47 发布

linuxwindowsios

最新推荐文章于 2025-03-29 23:50:47 发布

阅读量2w

点赞数 11

分类专栏：深度学习文章标签： tensorflow

本文链接：https://blog.csdn.net/linuxwindowsios/article/details/67635867

版权

深度学习专栏收录该内容

4 篇文章

订阅专栏

clip_by_norm

这里的clip_by_norm是指对梯度进行裁剪，通过控制梯度的最大范式，防止梯度爆炸的问题，是一种比较常用的梯度规约的方式。

tensorflow中的clip_by_norm

示例

optimizer = tf.train.AdamOptimizer(learning_rate, beta1=0.5)
grads = optimizer.compute_gradients(cost)
for i, (g, v) in enumerate(grads):
    if g is not None:
        grads[i] = (tf.clip_by_norm(g, 5), v)  # clip gradients
train_op = optimizer.apply_gradients(grads)

上面是一段比较通用的定义梯度计算公式的代码，其中用到了tf.clip_by_norm这个方法，下面是该函数的源码：

def clip_by_norm(t, clip_norm, axes=None, name=None):
  """Clips tensor values to a maximum L2-norm.

  Given a tensor `t`, and a maximum clip value `clip_norm`, this operation
  normalizes `t` so that its L2-norm is less than or equal to `clip_norm`,
  along the dimensions given in `axes`. Specifically, in the default case
  where all dimensions are used for calculation, if the L2-norm of `t` is
  already less than or equal to `clip_norm`, then `t` is not modified. If
  the L2-norm is greater than `clip_norm`, then this operation returns a
  tensor of the same type and shape as `t` with its values set to:

  `t * clip_norm / l2norm(t)`

  In this case, the L2-norm of the output tensor is `clip_norm`.

  As another example, if `t` is a matrix and `axes == [1]`, then each row
  of the output will have L2-norm equal to `clip_norm`. If `axes == [0]`
  instead, each column of the output will be clipped.

  This operation is typically used to clip gradients before applying them with
  an optimizer.

  Args:
    t: A `Tensor`.
    clip_norm: A 0-D (scalar) `Tensor` > 0. A maximum clipping value.
    axes: A 1-D (vector) `Tensor` of type int32 containing the dimensions
      to use for computing the L2-norm. If `None` (the default), uses all
      dimensions.
    name: A name for the operation (optional).

  Returns:
    A clipped `Tensor`.
  """
  with ops.name_scope(name, "clip_by_norm", [t, clip_norm]) as name:
    t = ops.convert_to_tensor(t, name="t")

    # Calculate L2-norm, clip elements by ratio of clip_norm to L2-norm
    l2norm_inv = math_ops.rsqrt(
        math_ops.reduce_sum(t * t, axes, keep_dims=True))
    tclip = array_ops.identity(t * clip_norm * math_ops.minimum(
        l2norm_inv, constant_op.constant(1.0, dtype=t.dtype) / clip_norm),
                               name=name)

  return tclip

通过注解可以清晰的明白其作用在于将传入的梯度张量t的L2范数进行了上限约束，约束值即为clip_norm，如果t的L2范数超过了clip_norm，则变换为t * clip_norm / l2norm(t)，如此一来，变换后的t的L2范数便小于等于clip_norm了。

示例

下面我们通过一段代码来直观地展示该函数的作用。

生成随机数

import numpy as np
t = np.random.randint(low=0,high=5,size=10)
t

array([1, 1, 3, 4, 2, 2, 1, 4, 2, 3])

计算L2范数

l2norm4t = np.linalg.norm(t)
l2norm4t

8.0622577482985491

随机数规约

clip_norm = 5
transformed_t = t *clip_norm/l2norm4t
transformed_t

array([ 0.62017367,  0.62017367,  1.86052102,  2.48069469,  1.24034735,
        1.24034735,  0.62017367,  2.48069469,  1.24034735,  1.86052102])

验证

np.linalg.norm(transformed_t)

5.0

可以看出，该随机数序列的L2范数已经被规约为clip_norm的值。