TensorFlow2-高阶操作（六）：张量限幅【根据值：clip_by_value、relu】【根据范数：clip_by_norm（可指定轴）、clip_by_global_norm（全局）】

u013250861

已于 2022-04-18 22:25:25 修改

阅读量664

点赞数

分类专栏： TensorFlow 文章标签： TensorFlow2 张量限幅

于 2022-04-18 21:20:05 首次发布

本文链接：https://blog.csdn.net/u013250861/article/details/124259490

版权

TensorFlow 专栏收录该内容

29 篇文章 6 订阅

订阅专栏

一、clip_by_value、relu：根据值进行限幅

1、clip_by_value

根据值进行限幅

tf.maximum(tensor, thread)：下限幅，值必须大于阈值 if data < thread， data = thread
tf.minimum(tensor, thread)：上限幅，值必须小于阈值 if data > thread， data = thread
tf.clip_by_value(tensor, down_thread, up_thread)：值必须在阈值之间

import tensorflow as tf

a = tf.range(10)
print("a = ", a)
print("-" * 100)

b = tf.maximum(a, 2)
print("b = tf.maximum(a, 2) = ", b)
print("-" * 50)

c = tf.minimum(a, 7)
print("c = tf.minimum(a, 7) = ", c)
print("-" * 50)

d = tf.clip_by_value(a, 2, 7)
print("d = tf.clip_by_value(a, 2, 7) = ", d)
print("-" * 100)

打印结果：

a =  tf.Tensor([0 1 2 3 4 5 6 7 8 9], shape=(10,), dtype=int32)
----------------------------------------------------------------------------------------------------
b = tf.maximum(a, 2) =  tf.Tensor([2 2 2 3 4 5 6 7 8 9], shape=(10,), dtype=int32)
--------------------------------------------------
c = tf.minimum(a, 7) =  tf.Tensor([0 1 2 3 4 5 6 7 7 7], shape=(10,), dtype=int32)
--------------------------------------------------
d = tf.clip_by_value(a, 2, 7) =  tf.Tensor([2 2 2 3 4 5 6 7 7 7], shape=(10,), dtype=int32)
----------------------------------------------------------------------------------------------------

Process finished with exit code 0

2、relu

relu：当值小于0时，将值置位0; 当值大于0时，等于原值。

在这里插入图片描述

tf.nn.relu(a): 将a进行relu化操作;
tf.maximum(a, 0): 作用与tf.nn.relu(a)一样;

import tensorflow as tf

a = tf.range(10)
print("a = ", a)
print("-" * 100)

b = a - 5
print("b = a - 5 = ", b)
print("-" * 50)

# 使用限幅函数实现RELU函数功能
c = tf.maximum(a, 0)
print("c = tf.maximum(a, 0) = ", c)
print("-" * 50)

d = tf.nn.relu(a)
print("d = tf.nn.relu(a) = ", d)
print("-" * 100)

打印结果：

a =  tf.Tensor([0 1 2 3 4 5 6 7 8 9], shape=(10,), dtype=int32)
----------------------------------------------------------------------------------------------------
b = a - 5 =  tf.Tensor([-5 -4 -3 -2 -1  0  1  2  3  4], shape=(10,), dtype=int32)
--------------------------------------------------
c = tf.maximum(a, 0) =  tf.Tensor([0 1 2 3 4 5 6 7 8 9], shape=(10,), dtype=int32)
--------------------------------------------------
d = tf.nn.relu(a) =  tf.Tensor([0 1 2 3 4 5 6 7 8 9], shape=(10,), dtype=int32)
----------------------------------------------------------------------------------------------------

Process finished with exit code 0

二、clip_by_norm：根据范数裁剪，等比例缩放，只改变模值大小，不改变方向！

如果我们将一些数值限幅在我们希望的区域内，但是可能会导致梯度变化，就是不是我们希望看到的结果，这时我们就需要clip_by_norm()函数了，clip_by_norm的思想就是先求这个范围的向量值，也就是二范数，将其值限制在[0~1]之间，再放大这个范围，利用这个方法进行限幅就不会改变梯度值的大小。

clip_by_norm(t, clip_norm, axes=None, name=None)：

t: 输入tensor，也可以是list
clip_norm: 一个具体的数，如果 L2norm(t) ≤ clip_norm，则t不变化；否则 $\cfrac{clip\_norm}{L2norm(t)}$
axes：指定计算l2norm的维度，如果不指定，利用t中所有元素计算L2-norm，对于一维tensor没有影响，对于二维tensor会有影响

1、案例01

import tensorflow as tf

a = tf.clip_by_norm(t=[[6.0, 4.0], [4.0, 6.0]], clip_norm=2)
b = tf.clip_by_norm(t=[[6.0, 4.0], [4.0, 6.0]], clip_norm=2, axes=1)

print("a = \n", a)
print("-" * 100)
print("b = \n", b)
print("-" * 200)

aa = tf.norm(a)
print("aa = \n", aa)
print("-" * 100)
bb = tf.norm(b, axis=1)
print("bb = \n", bb)
print("-" * 200)

打印结果：

a = 
 tf.Tensor(
[[1.1766968  0.78446454]
 [0.78446454 1.1766968 ]], shape=(2, 2), dtype=float32)
----------------------------------------------------------------------------------------------------
b = 
 tf.Tensor(
[[1.6641006 1.1094004]
 [1.1094004 1.6641006]], shape=(2, 2), dtype=float32)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
aa = 
 tf.Tensor(2.0, shape=(), dtype=float32)
----------------------------------------------------------------------------------------------------
bb = 
 tf.Tensor([2. 2.], shape=(2,), dtype=float32)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Process finished with exit code 0

2、案例02

tf.norm(a)：

求a的二范数，结果为一个标量。即: $\sqrt{\sum{x_i^2}}$ ；
aa = tf.clip_by_norm(a, 15): 将a限制在15之间，但不改变其梯度大小，其中15就是一个new norm;

import tensorflow as tf

a = tf.random.normal([2, 2], mean=10)
print("a = \n", a)
print("-" * 100)

b = tf.norm(a)
print("b = tf.norm(a) = ", b)
print("-" * 200)

c = tf.clip_by_norm(a, 15)
print("c = tf.clip_by_norm(a, 15) = \n", c)
print("-" * 100)

d = tf.norm(c)
print("d = tf.norm(c) = ", d)
print("-" * 200)

打印结果：

a = 
 tf.Tensor(
[[ 9.107755   9.7168665]
 [10.5676365  9.9393   ]], shape=(2, 2), dtype=float32)
----------------------------------------------------------------------------------------------------
b = tf.norm(a) =  tf.Tensor(19.693483, shape=(), dtype=float32)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
c = tf.clip_by_norm(a, 15) = 
 tf.Tensor(
[[6.9371333 7.4010773]
 [8.049086  7.570499 ]], shape=(2, 2), dtype=float32)
----------------------------------------------------------------------------------------------------
d = tf.norm(c) =  tf.Tensor(15.0, shape=(), dtype=float32)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Process finished with exit code 0

三、梯度裁剪：clip_by_global_norm【梯度整体同比例缩放】

实现梯度下降存在两大障碍：梯度爆炸和梯度弥散。

梯度爆炸：就是梯度值太大了，每一次前进的步长太长了，导致不停的来回震荡！
梯度消失：就是梯度的值太小了，每一次前进基本没什么变化，导致loss的值长时间不动。

Gradient clipping 实现了整体梯度等比例缩放，但是梯度方向不变，在一定程度上抑制了梯度爆炸和梯度弥散

tf.clip_by_global_norm(t_list, clip_norm, use_norm=None, name=None):

t_list: 输入的tensor list
clip_norm: 一个具体的数，如果 L2norm(t) ≤ clip_norm，则t不变化；否则 $\cfrac{clip\_norm}{L2norm(t)}$

理解了clip_by_norm中的axes其实就比较好理解这个了，clip_by_global_norm相当于不指定axes，即利用所有元素计算一个global norm。另外，还需要注意的是，clip_by_global_norm返回值有两个，分别为裁剪后的tensor、裁剪前的global norm。

在这里插入图片描述
例如：new_grads, total_norm = tf.clip_by_global_norm(grads, 25)

目的在于保持整体的参数梯度方向不变，
例如原来的 $w_1,w_2,w_3 ]=[2,4,8]$ ，利用 clip_by_global_norm可以使 $w_1,w_2,w_3$ 同时缩小n倍，例如同时缩小2倍，就是 $w_1,w_2,w_3 ]=[1,2,4]$ ，这样就保证了梯度的方向不会发生变化。
其中25代表梯度的值不会超过25。

import tensorflow as tf

a = tf.clip_by_norm([[3.0, 4.0], [1.0, 2.0]], clip_norm=2)
b, c = tf.clip_by_global_norm([[3.0, 4.0], [1.0, 2.0]], clip_norm=2)

print("a = \n", a)
print("-" * 200)
print("b = \n", b)
print("-" * 200)
print("c = \n", c)
print("-" * 200)

$\sqrt{3^2+4^2+1^2+2^2}=5.477226$

$3×\cfrac{2}{5.477226}=1.0954452$
$4×\cfrac{2}{5.477226}=1.4605935$
$1×\cfrac{2}{5.477226}=0.36514837$
$2×\cfrac{2}{5.477226}=0.73029673$

打印结果：

a = 
 tf.Tensor(
[[1.095445   1.4605935 ]
 [0.36514837 0.73029673]], shape=(2, 2), dtype=float32)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
b = 
 [<tf.Tensor: shape=(2,), dtype=float32, numpy=array([1.0954452, 1.4605935], dtype=float32)>, <tf.Tensor: shape=(2,), dtype=float32, numpy=array([0.36514837, 0.73029673], dtype=float32)>]
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
c = 
 tf.Tensor(5.477226, shape=(), dtype=float32)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Process finished with exit code 0

1、不使用clip_by_global_norm

我们为了演示梯度爆炸将学习率设置得高一点，设置学习率 lr=1，这样即使是简单的MNIST数据集也会发生梯度爆炸问题。

import pandas as pd
import tensorflow as tf
from tensorflow.keras import datasets, optimizers
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# 读取数据集
# x: [60k, 28, 28], y: [60k]
mnist_dataset = datasets.mnist.load_data()
dataset_train, datasset_val = mnist_dataset[0], mnist_dataset[1]
(x, y) = dataset_train
(x_val, y_val) = datasset_val

# 将ndarray数组转为Tensor格式
x = tf.convert_to_tensor(x, dtype=tf.float32) / 255.  # x: [0~255] => [0~1.]
print("x.shape = {0}, x[0] = \n{1}".format(x.shape, pd.DataFrame(x[0].numpy())))  # x.shape =  (60000, 28, 28)
y = tf.convert_to_tensor(y, dtype=tf.int32)
print("y.shape = {0}, y = {1}".format(y.shape, y))  # y.shape = (60000,), y = [5 0 4 ... 5 6 8]

print("\n数据集数据特征：")
print("x.shape = {0}, y.shape = {1}, x.dtype = {2}, y.dtype = {3}".format(x.shape, y.shape, x.dtype, y.dtype))
print("tf.reduce_min(x) = {0}, tf.reduce_max(x) = {1}".format(tf.reduce_min(x), tf.reduce_max(x)))
print("tf.reduce_min(y) = {0}, tf.reduce_max(y) = {1}".format(tf.reduce_min(y), tf.reduce_max(y)))

# 从(x, y)中抽取训练数据集,batch_size=128
train_db = tf.data.Dataset.from_tensor_slices((x, y)).batch(128)
train_iter = iter(train_db)
sample = next(train_iter)
print('\n每一个batch数据的形状: x.shape = {0}, y.shape = {1}'.format(sample[0].shape, sample[1].shape))

# 初始化参数
# [b, 784] => [b, 256] => [b, 128] => [b, 10]
# [dim_in, dim_out], [dim_out]
w1 = tf.Variable(tf.random.truncated_normal([784, 256], mean=0, stddev=0.1))
b1 = tf.Variable(tf.zeros([256]))
w2 = tf.Variable(tf.random.truncated_normal([256, 128], mean=0, stddev=0.1))
b2 = tf.Variable(tf.zeros([128]))
w3 = tf.Variable(tf.random.truncated_normal([128, 10], mean=0, stddev=0.1))
b3 = tf.Variable(tf.zeros([10]))

# 初始化优化器
optimizer = optimizers.SGD(lr=1)

# 训练10个epoch
for epoch in range(10):  # iterate db for 10
    # 遍历所有的batch
    for batch_idx, (x, y) in enumerate(train_db):
        # x:[128, 28, 28]；y: [128]
        # 将特征值进行维度变换
        x = tf.reshape(x, [-1, 28 * 28])  # [b, 28, 28] => [b, 28*28]
        # Tensorflow使用梯度带（tf.GradientTape）来记录正向运算过程，然后反向传播自动得到梯度值。
        with tf.GradientTape() as tape:  # tf.Variable
            # 步骤一: 根据参数，通过模型前向计算，得到输入特征值对应的输出(步骤①+②)
            # ①: 第一层网络
            # x: [b, 28*28]
            # h1 = x@w1 + b1
            # [b, 784]@[784, 256] + [256] => [b, 256] + [256] => [b, 256] + [b, 256]
            h1 = x @ w1 + tf.broadcast_to(b1, [x.shape[0], 256])
            h1 = tf.nn.relu(h1)

            # ②: 第二层网络
            # [b, 256] => [b, 128]
            h2 = h1 @ w2 + b2
            h2 = tf.nn.relu(h2)
            # [b, 128] => [b, 10]
            out = h2 @ w3 + b3

            # 步骤二: 计算Loss
            # out: [b, 10]
            # y: [b] => [b, 10]
            y_onehot = tf.one_hot(y, depth=10)
            loss = tf.reduce_mean(tf.square(y_onehot - out))  # mse = mean(sum(y-out)^2)

        # 步骤三: 计算梯度
        grads = tape.gradient(loss, [w1, b1, w2, b2, w3, b3])
        # print("grads = ", grads)
        # grads, _ = tf.clip_by_global_norm(grads, 15)

        # 步骤四: 根据梯度下降算法更新参数
        # w1 = w1 - lr * w1_grad
        optimizer.apply_gradients(zip(grads, [w1, b1, w2, b2, w3, b3]))  # 使用optimizer进行梯度下降

        if batch_idx % 100 == 0:
            print("epoch = {0}, batch_idx = {1}, loss = {2}".format(epoch, batch_idx, float(loss)))

打印结果：

x.shape = (60000, 28, 28), x[0] = 
     0    1    2    3         4   ...        23   24   25   26   27
0   0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
1   0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
2   0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
3   0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
4   0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
5   0.0  0.0  0.0  0.0  0.000000  ...  0.498039  0.0  0.0  0.0  0.0
6   0.0  0.0  0.0  0.0  0.000000  ...  0.250980  0.0  0.0  0.0  0.0
7   0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
8   0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
9   0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
10  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
11  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
12  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
13  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
14  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
15  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
16  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
17  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
18  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
19  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
20  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
21  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
22  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
23  0.0  0.0  0.0  0.0  0.215686  ...  0.000000  0.0  0.0  0.0  0.0
24  0.0  0.0  0.0  0.0  0.533333  ...  0.000000  0.0  0.0  0.0  0.0
25  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
26  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
27  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0

[28 rows x 28 columns]
y.shape = (60000,), y = [5 0 4 ... 5 6 8]

数据集数据特征：
x.shape = (60000, 28, 28), y.shape = (60000,), x.dtype = <dtype: 'float32'>, y.dtype = <dtype: 'int32'>
tf.reduce_min(x) = 0.0, tf.reduce_max(x) = 1.0
tf.reduce_min(y) = 0, tf.reduce_max(y) = 9

每一个batch数据的形状: x.shape = (128, 28, 28), y.shape = (128,)

epoch = 0, batch_idx = 0, loss = 0.45434314012527466
epoch = 0, batch_idx = 100, loss = nan
epoch = 0, batch_idx = 200, loss = nan
epoch = 0, batch_idx = 300, loss = nan
epoch = 0, batch_idx = 400, loss = nan
epoch = 1, batch_idx = 0, loss = nan
epoch = 1, batch_idx = 100, loss = nan
epoch = 1, batch_idx = 200, loss = nan

Process finished with exit code -1

2、使用clip_by_global_norm

import pandas as pd
import tensorflow as tf
from tensorflow.keras import datasets, optimizers
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# 读取数据集
# x: [60k, 28, 28], y: [60k]
mnist_dataset = datasets.mnist.load_data()
dataset_train, datasset_val = mnist_dataset[0], mnist_dataset[1]
(x, y) = dataset_train
(x_val, y_val) = datasset_val

# 将ndarray数组转为Tensor格式
x = tf.convert_to_tensor(x, dtype=tf.float32) / 255.  # x: [0~255] => [0~1.]
print("x.shape = {0}, x[0] = \n{1}".format(x.shape, pd.DataFrame(x[0].numpy())))  # x.shape =  (60000, 28, 28)
y = tf.convert_to_tensor(y, dtype=tf.int32)
print("y.shape = {0}, y = {1}".format(y.shape, y))  # y.shape = (60000,), y = [5 0 4 ... 5 6 8]

print("\n数据集数据特征：")
print("x.shape = {0}, y.shape = {1}, x.dtype = {2}, y.dtype = {3}".format(x.shape, y.shape, x.dtype, y.dtype))
print("tf.reduce_min(x) = {0}, tf.reduce_max(x) = {1}".format(tf.reduce_min(x), tf.reduce_max(x)))
print("tf.reduce_min(y) = {0}, tf.reduce_max(y) = {1}".format(tf.reduce_min(y), tf.reduce_max(y)))

# 从(x, y)中抽取训练数据集,batch_size=128
train_db = tf.data.Dataset.from_tensor_slices((x, y)).batch(128)
train_iter = iter(train_db)
sample = next(train_iter)
print('\n每一个batch数据的形状: x.shape = {0}, y.shape = {1}'.format(sample[0].shape, sample[1].shape))

# 初始化参数
# [b, 784] => [b, 256] => [b, 128] => [b, 10]
# [dim_in, dim_out], [dim_out]
w1 = tf.Variable(tf.random.truncated_normal([784, 256], mean=0, stddev=0.1))
b1 = tf.Variable(tf.zeros([256]))
w2 = tf.Variable(tf.random.truncated_normal([256, 128], mean=0, stddev=0.1))
b2 = tf.Variable(tf.zeros([128]))
w3 = tf.Variable(tf.random.truncated_normal([128, 10], mean=0, stddev=0.1))
b3 = tf.Variable(tf.zeros([10]))

# 初始化优化器
optimizer = optimizers.SGD(lr=1)

# 训练10个epoch
for epoch in range(10):  # iterate db for 10
    # 遍历所有的batch
    for batch_idx, (x, y) in enumerate(train_db):
        # x:[128, 28, 28]；y: [128]
        # 将特征值进行维度变换
        x = tf.reshape(x, [-1, 28 * 28])  # [b, 28, 28] => [b, 28*28]
        # Tensorflow使用梯度带（tf.GradientTape）来记录正向运算过程，然后反向传播自动得到梯度值。
        with tf.GradientTape() as tape:  # tf.Variable
            # 步骤一: 根据参数，通过模型前向计算，得到输入特征值对应的输出(步骤①+②)
            # ①: 第一层网络
            # x: [b, 28*28]
            # h1 = x@w1 + b1
            # [b, 784]@[784, 256] + [256] => [b, 256] + [256] => [b, 256] + [b, 256]
            h1 = x @ w1 + tf.broadcast_to(b1, [x.shape[0], 256])
            h1 = tf.nn.relu(h1)

            # ②: 第二层网络
            # [b, 256] => [b, 128]
            h2 = h1 @ w2 + b2
            h2 = tf.nn.relu(h2)
            # [b, 128] => [b, 10]
            out = h2 @ w3 + b3

            # 步骤二: 计算Loss
            # out: [b, 10]
            # y: [b] => [b, 10]
            y_onehot = tf.one_hot(y, depth=10)
            loss = tf.reduce_mean(tf.square(y_onehot - out))  # mse = mean(sum(y-out)^2)

        # 步骤三: 计算梯度
        grads = tape.gradient(loss, [w1, b1, w2, b2, w3, b3])
        # print("grads = ", grads)
        grads, _ = tf.clip_by_global_norm(grads, 15)

        # 步骤四: 根据梯度下降算法更新参数
        # w1 = w1 - lr * w1_grad
        optimizer.apply_gradients(zip(grads, [w1, b1, w2, b2, w3, b3]))  # 使用optimizer进行梯度下降

        if batch_idx % 100 == 0:
            print("epoch = {0}, batch_idx = {1}, loss = {2}".format(epoch, batch_idx, float(loss)))

打印结果：

x.shape = (60000, 28, 28), x[0] = 
     0    1    2    3         4   ...        23   24   25   26   27
0   0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
1   0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
2   0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
3   0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
4   0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
5   0.0  0.0  0.0  0.0  0.000000  ...  0.498039  0.0  0.0  0.0  0.0
6   0.0  0.0  0.0  0.0  0.000000  ...  0.250980  0.0  0.0  0.0  0.0
7   0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
8   0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
9   0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
10  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
11  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
12  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
13  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
14  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
15  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
16  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
17  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
18  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
19  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
20  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
21  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
22  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
23  0.0  0.0  0.0  0.0  0.215686  ...  0.000000  0.0  0.0  0.0  0.0
24  0.0  0.0  0.0  0.0  0.533333  ...  0.000000  0.0  0.0  0.0  0.0
25  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
26  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0
27  0.0  0.0  0.0  0.0  0.000000  ...  0.000000  0.0  0.0  0.0  0.0

[28 rows x 28 columns]
y.shape = (60000,), y = [5 0 4 ... 5 6 8]

数据集数据特征：
x.shape = (60000, 28, 28), y.shape = (60000,), x.dtype = <dtype: 'float32'>, y.dtype = <dtype: 'int32'>
tf.reduce_min(x) = 0.0, tf.reduce_max(x) = 1.0
tf.reduce_min(y) = 0, tf.reduce_max(y) = 9

每一个batch数据的形状: x.shape = (128, 28, 28), y.shape = (128,)
2022-04-18 21:44:36.988060: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2022-04-18 21:44:37.251719: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
epoch = 0, batch_idx = 0, loss = 0.49875059723854065
epoch = 0, batch_idx = 100, loss = 0.04440992325544357
epoch = 0, batch_idx = 200, loss = 0.02600466087460518
epoch = 0, batch_idx = 300, loss = 0.022756315767765045
epoch = 0, batch_idx = 400, loss = 0.026590073481202126
epoch = 1, batch_idx = 0, loss = 0.017639024183154106
epoch = 1, batch_idx = 100, loss = 0.018795546144247055
epoch = 1, batch_idx = 200, loss = 0.01446774136275053
epoch = 1, batch_idx = 300, loss = 0.016330739483237267
epoch = 1, batch_idx = 400, loss = 0.020526951178908348
epoch = 2, batch_idx = 0, loss = 0.012861981987953186
epoch = 2, batch_idx = 100, loss = 0.015346418134868145
epoch = 2, batch_idx = 200, loss = 0.012077028863132
epoch = 2, batch_idx = 300, loss = 0.013777618296444416
epoch = 2, batch_idx = 400, loss = 0.01685297302901745
epoch = 3, batch_idx = 0, loss = 0.01044696569442749
epoch = 3, batch_idx = 100, loss = 0.01293286494910717
epoch = 3, batch_idx = 200, loss = 0.010283653624355793
epoch = 3, batch_idx = 300, loss = 0.011961964890360832
epoch = 3, batch_idx = 400, loss = 0.014623480848968029
epoch = 4, batch_idx = 0, loss = 0.009075941517949104
epoch = 4, batch_idx = 100, loss = 0.011382261291146278
epoch = 4, batch_idx = 200, loss = 0.009031490422785282
epoch = 4, batch_idx = 300, loss = 0.010693713091313839
epoch = 4, batch_idx = 400, loss = 0.013129209168255329
epoch = 5, batch_idx = 0, loss = 0.008182947523891926
epoch = 5, batch_idx = 100, loss = 0.010114772245287895
epoch = 5, batch_idx = 200, loss = 0.008154838345944881
epoch = 5, batch_idx = 300, loss = 0.009680045768618584
epoch = 5, batch_idx = 400, loss = 0.011868517845869064
epoch = 6, batch_idx = 0, loss = 0.007426575757563114
epoch = 6, batch_idx = 100, loss = 0.009111211635172367
epoch = 6, batch_idx = 200, loss = 0.007471185177564621
epoch = 6, batch_idx = 300, loss = 0.008953024633228779
epoch = 6, batch_idx = 400, loss = 0.010908501222729683
epoch = 7, batch_idx = 0, loss = 0.006791996769607067
epoch = 7, batch_idx = 100, loss = 0.008285066112875938
epoch = 7, batch_idx = 200, loss = 0.00694951880723238
epoch = 7, batch_idx = 300, loss = 0.008228149265050888
epoch = 7, batch_idx = 400, loss = 0.010189652442932129
epoch = 8, batch_idx = 0, loss = 0.006223582196980715
epoch = 8, batch_idx = 100, loss = 0.007774400059133768
epoch = 8, batch_idx = 200, loss = 0.006595847196877003
epoch = 8, batch_idx = 300, loss = 0.007571699563413858
epoch = 8, batch_idx = 400, loss = 0.00969000905752182
epoch = 9, batch_idx = 0, loss = 0.005765336565673351
epoch = 9, batch_idx = 100, loss = 0.007371656596660614
epoch = 9, batch_idx = 200, loss = 0.006356566213071346
epoch = 9, batch_idx = 300, loss = 0.00712556904181838
epoch = 9, batch_idx = 400, loss = 0.009176169522106647

Process finished with exit code 0

参考资料：
TensorFlow高阶操作之张量限幅
 深度学习（16）TensorFlow高阶操作五: 张量限幅
 tensorflow(十六)：张量的限幅（tf.clip_by_value()、 tf.clip_by_norm()、 tf.clip_by_global_norm()）

u013250861

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
TensorFlow2-高阶操作（六）：张量限幅【根据值：clip_by_value、relu】【根据范数：clip_by_norm（可指定轴）、clip_by_global_norm（全局）】

一、clip_by_value、relu：根据值进行限幅1、clip_by_value根据值进行限幅tf.maximum(tensor, thread)：下限幅，值必须大于阈值 if data < thread， data = threadtf.minimum(tensor, thread)：上限幅，值必须小于阈值 if data > thread， data = threadtf.clip_by_value(tensor, down_thread, up_thread)：值必须在阈
复制链接

扫一扫

专栏目录