tensorflow clip对NaN、inf的效果

HackerTom

已于 2022-07-16 18:12:43 修改

阅读量1.1k

点赞数 1

分类专栏：机器学习文章标签： tensorflow inf NaN clip python

于 2022-07-16 18:10:50 首次发布

本文链接：https://blog.csdn.net/HackerTom/article/details/125822982

版权

机器学习专栏收录该内容

121 篇文章

订阅专栏

一个训练不稳定的模型如 [1]，训练时梯度可能会出现 NaN。之前为了 debug 加了 check_numerics^[2]，但这会直接报错退出。用 clipping 稳定一下训练，但要将 NaN 去掉先。

这里记录下 tensorflow 中两种 clipping^[3,4] 在有 NaN 和 inf 时的效果。结论：

clip_by_value 可以将 inf 截断成正常值，但对 NaN 无效；
有 NaN 时，clip_by_norm 会将整个向量变成 NaN（因为算 norm 有 NaN，除 NaN 得 NaN）；
有 inf 无 NaN 时，clip_by_norm 好像没有效果？

也许可以将 inf、NaN 置零先，然而这样可能会影响优化方向，就再 clip 一下，避免一次大步过头越练越差？

Code

import math
import tensorflow as tf

def zero_inf_nan(grad):
    """将 NaN、inf 置零"""
    if grad is None:
        return grad
    _cond = tf.is_nan(grad) | tf.is_inf(grad)
    return tf.where(_cond, tf.zeros_like(grad), grad)

with tf.Session() as sess:
    # 有 NaN 有 inf
    a = tf.constant([10, math.nan, math.inf, - math.inf], tf.float32)
    b = tf.clip_by_value(a, -1, 1)
    c = tf.clip_by_norm(a, 5)
    print(sess.run([a, b, c]))

    # 去掉之后
    d = zero_inf_nan(a)
    e = tf.clip_by_value(d, -1, 1)
    f = tf.clip_by_norm(d, 5)
    print(sess.run([d, e, f]))

    # 有 inf 无 NaN 用 clip_by_norm
    g = tf.constant([20, math.inf, - math.inf], tf.float32)
    h = tf.clip_by_norm(g, 5)
    print(sess.run(g))

输出

[array([ 10.,  nan,  inf, -inf], dtype=float32), array([ 1., nan,  1., -1.], dtype=float32), array([nan, nan, nan, nan], dtype=float32)]
[array([10.,  0.,  0.,  0.], dtype=float32), array([1., 0., 0., 0.], dtype=float32), array([5., 0., 0., 0.], dtype=float32)]
[ 20.  inf -inf]

tensorflow clip对NaN、inf的效果

Code

References