tf.clip_by_global_norm的理解

help(tf.clip_by_global_norm)
 

Help on function clip_by_global_norm in module tensorflow.python.ops.clip_ops:

clip_by_global_norm(t_list, clip_norm, use_norm=None, name=None)
    Clips values of multiple tensors by the ratio of the sum of their norms.
    
    Given a tuple or list of tensors `t_list`, and a clipping ratio `clip_norm`,
    this operation returns a list of clipped tensors `list_clipped`
    and the global norm (`global_norm`) of all tensors in `t_list`. Optionally,
    if you've already computed the global norm for `t_list`, you can specify
    the global norm with `use_norm`.
    
    To perform the clipping, the values `t_list[i]` are set to:
    
        t_list[i] * clip_norm / max(global_norm, clip_norm)
    
    where:
    
        global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))
    
    If `clip_norm > global_norm` then the entries in `t_list` remain as they are,
    otherwise they're all shrunk by the global ratio.

    
    Any of the entries of `t_list` that are of type `None` are ignored.
    
    This is the correct way to perform gradient clipping (for example, see
    [Pascanu et al., 2012](http://arxiv.org/abs/1211.5063)
    ([pdf](http://arxiv.org/pdf/1211.5063.pdf))).
    
    However, it is slower than `clip_by_norm()` because all the parameters must be
    ready before the clipping operation can be performed.
    
    Args:
      t_list: A tuple or list of mixed `Tensors`, `IndexedSlices`, or None.
      clip_norm: A 0-D (scalar) `Tensor` > 0. The clipping ratio.
      use_norm: A 0-D (scalar) `Tensor` of type `float` (optional). The global
        norm to use. If not provided, `global_norm()` is used to compute the norm.
      name: A name for the operation (optional).
    
    Returns:
      list_clipped: A list of `Tensors` of the same type as `list_t`.
      global_norm: A 0-D (scalar) `Tensor` representing the global norm.
    
    Raises:
      TypeError: If `t_list` is not a sequence.

其实了解 该函数的核心就在如下几句话:
To perform the clipping, the values `t_list[i]` are set to:
    
        t_list[i] * clip_norm / max(global_norm, clip_norm)
    
    where:
    
        global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))
    
    If `clip_norm > global_norm` then the entries in `t_list` remain as they are,
    otherwise they're all shrunk by the global ratio.

    其实就一个公式:
    y =  x*clip_norm/max(sqrt(sum([l2norm(i)**2 for t in t_list])),clip_norm

用中文来描述:当梯度值的l2范式的l2范式小于等于指定的最大梯度值,返回原来的梯度值;如果大于指定的梯度值,就需要缩小;限定梯度值;防止梯度爆炸

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值