1. 官方文档
https://www.tensorflow.org/versions
2. 创建 tf.Tensor
在 TensorFlow 中,张量(Tensor)是指多维数组,可以理解为是一种数据结构。它可以表示各种类型的数据,如标量、向量、矩阵等。在 TensorFlow 中,所有的数据都是通过张量的形式进行传递和处理的。
张量有以下几个重要的属性:
1. 阶(Rank or Dim):张量的阶指的是它的维度数,也就是它有多少个轴。例如,标量的阶为0,向量的阶为1,矩阵的阶为2。
2. 形状(Shape):张量的形状指的是它各个轴上的维度大小,用一个元组来表示。例如,一个形状为(3, 4)的张量表示一个3行4列的矩阵。
3. 数据类型(Data Type):张量的数据类型指的是它包含的数据的类型,如 float32、int32等。
a = tf.constant([3, 4])
type(a)
# tensorflow.python.framework.ops.EagerTensor
a.device
# '/job:localhost/replica:0/task:0/device:CPU:0'
a.dtype
# tf.int32
a.numpy()
# array([3, 4])
tf.is_tensor(a)
# True
a.ndim
# 1
tf.rank(a)
# <tf.Tensor: shape=(), dtype=int32, numpy=1>
a.shape
# TensorShape([2])
tf.shape(a)
# <tf.Tensor: shape=(1,), dtype=int32, numpy=array([2])>
TensorFlow supports eager execution and graph execution. In eager execution, operations are evaluated immediately. In graph execution, a computational graph is constructed for later evaluation. TensorFlow defaults to eager execution.
import tensorflow as tf
import numpy as np
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'
a = tf.constant([[1.0, 2.0], [3.0, 4.0]])
b = tf.constant([[1.0, 1.0], [0.0, 1.0]])
c = tf.matmul(a, b)
print(type(c))
# <class 'tensorflow.python.framework.ops.EagerTensor'>
2.1 tf.constant
tf.constant(1)
# <tf.Tensor: shape=(), dtype=int32, numpy=1>
tf.constant(1.)
# <tf.Tensor: shape=(), dtype=float32, numpy=1.0>
tf.constant(2.2, dtype=tf.double)
# <tf.Tensor: shape=(), dtype=float64, numpy=2.2>
tf.constant(True)
# <tf.Tensor: shape=(), dtype=bool, numpy=True>
tf.constant('hello world')
# <tf.Tensor: shape=(), dtype=string, numpy=b'hello world'>
tf.constant([[1, 2], [3, 4]])
# <tf.Tensor: shape=(2, 2), dtype=int32, numpy=
# array([[1, 2],
# [3, 4]])>
2.2 tf.convert_to_tensor:由 Numpy、List 创建 Tensor
tf.convert_to_tensor(np.ones([2, 3]))
tf.convert_to_tensor(np.zeros([2, 3]))
tf.convert_to_tensor([1, 2])
tf.convert_to_tensor([1, 2.])
tf.convert_to_tensor([1, 2], dtype=tf.float32)
2.3 tf.cast 转换 tensor 的数据类型
2.4 tf.zeros
tf.zeros([]) # scalar, a single data
tf.zeros([1])
tf.zeros([2, 2])
tf.zeros([2, 3, 3])
2.5 tf.zeros_like
a = tf.zeros([3, 4])
tf.zeros_like(a)
tf.zeros(a.shape)
2.6 tf.ones 与 tf.ones_like
tf.ones([])
tf.ones([1])
tf.ones([2, 3])
a = tf.zeros([3, 4])
tf.ones_like(a)
2.7 tf.fill
tf.fill([2, 3], 1)
tf.fill([2, 3], 1.)
tf.fill([2, 3], 9)
2.8 tf.random.normal、tf.random.truncated_normal
tf.normal 和 tf.truncated_normal 都是 TensorFlow 中用于生成正态分布随机数的函数。
tf.random.normal([2, 2], mean=10, stddev=1)
tf.random.normal([2, 2])
tf.random.truncated_normal([2, 2], mean=1, stddev=1)
tf.truncated_normal 与 tf.normal 的区别在于,它会将生成的随机数截断到均值两倍标准差之内,这意味着,它生成的随机数不会偏离均值太远,而且不会出现极端值。
tf.truncated_normal: Outputs random values from a truncated normal distribution.The values are drawn from a normal distribution with specified mean and standard deviation, discarding and re-drawing any samples that are more than two standard deviations from the mean.
2.9 tf.random.uniform
tf.random.uniform([2, 2], minval=0, maxval=10)
a = tf.random.normal([10, 28])
b = tf.random.uniform([10], maxval=10, dtype=tf.int32)
# b
# <tf.Tensor: shape=(10,), dtype=int32, numpy=array([9, 0, 7, 4, 3, 1, 9, 3, 7, 8])>
idx = tf.range(10)
idx = tf.random.shuffle(idx)
# idx
# <tf.Tensor: shape=(10,), dtype=int32, numpy=array([7, 1, 3, 4, 8, 5, 2, 9, 0, 6])>
a = tf.gather(a, idx)
b = tf.gather(b, idx)
# b
# <tf.Tensor: shape=(10,), dtype=int32, numpy=array([3, 0, 4, 3, 7, 1, 7, 8, 9, 9])>
2.10 应用场景总结 Typical Dim Data
Dim | Example |
0 scalar [] | loss or accuracy [] |
1 vector | bias [d] |
2 matrix | weight [input_dim, out_put_dim] |
3 | sentence [b, seq_len, word_dim] |
4 | image [b, h, w, c] |
5 | meta-learing [task_b, b, h, w, c] |
3. 索引与切片
3.1 [idx][idx][idx]
a = tf.constant(range(24), shape=[2, 3, 4])
a
a[0]
a[0][1]
a[0][0][3]
3.2 [idx, idx, idx, ...]
a = tf.constant(range(24), shape=[2, 3, 4])
a
a[0, 1]
a[0, 0, 3]
3.3 单冒号 & 双冒号
start:end
a = tf.range(10)
a
# <tf.Tensor: shape=(10,), dtype=int32, numpy=array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])>
a[:4]
# <tf.Tensor: shape=(4,), dtype=int32, numpy=array([0, 1, 2, 3])>
a[4:8]
# <tf.Tensor: shape=(4,), dtype=int32, numpy=array([4, 5, 6, 7])>
a[4:]
# <tf.Tensor: shape=(6,), dtype=int32, numpy=array([4, 5, 6, 7, 8, 9])>
a[-2:]
# <tf.Tensor: shape=(2,), dtype=int32, numpy=array([8, 9])>
a[:-4]
# <tf.Tensor: shape=(6,), dtype=int32, numpy=array([0, 1, 2, 3, 4, 5])>
a = tf.ones([4, 28, 28, 3])
a[:, :14, 14:, :].shape
# TensorShape([4, 14, 14, 3])
a[:, 14:, 14:, :].shape
# TensorShape([4, 14, 14, 3])
a = tf.constant(range(240), shape=[4, 10, 3, 2])
a[0, 1, :, :].shape
# TensorShape([3, 2])
a[:, :, :, 1].shape
# TensorShape([4, 10, 3])
a[:, 3, :, 1].shape
# TensorShape([4, 3])
a[0, 3:10, :, :].shape
# TensorShape([7, 3, 2])
start:end:step
a = tf.ones([4, 28, 28, 3])
a[:, 0:28:2, 0:28:2, :].shape
# TensorShape([4, 14, 14, 3])
a[:, ::2, ::2, :].shape
# TensorShape([4, 14, 14, 3])
倒序 ::-1
a = tf.range(15)
a
# <tf.Tensor: shape=(15,), dtype=int32, numpy=array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])>
a[::-2]
# <tf.Tensor: shape=(8,), dtype=int32, numpy=array([14, 12, 10, 8, 6, 4, 2, 0])>
a[8:2:-2]
# <tf.Tensor: shape=(3,), dtype=int32, numpy=array([8, 6, 4])>
...
a = tf.ones([2, 4, 28, 28, 3])
a[0, ...].shape
# TensorShape([4, 28, 28, 3])
a[..., 1].shape
# TensorShape([2, 4, 28, 28])
a[0, ..., 1].shape
# TensorShape([4, 28, 28]
3.4 selective indexing
tf.gather
gather 函数的作用是从输入张量的指定维度上收集索引指定的元素,返回收集到的元素组成的新张量。
gather 函数的作用是根据给定的索引在指定的维度上收集值,它的参数包括:
- input: 源张量
- dim: 指定的维度
- index: 要收集的元素的索引
a = tf.random.truncated_normal([4, 35, 8], mean=80, stddev=10)
tf.gather(a, axis=0, indices=[2, 3]).shape
# TensorShape([2, 35, 8])
tf.gather(a, axis=0, indices=[3, 0, 2, 1]).shape
# TensorShape([4, 35, 8])
tf.gather(a, axis=1, indices=[3, 5, 2, 6, 7, 0]).shape
# TensorShape([4, 6, 8])
tf.gather(a, axis=2, indices=[2, 3, 5]).shape
# TensorShape([4, 35, 3])
aa = tf.gather(a, axis=1, indices=[2, 4, 6])
aaa = tf.gather(aa, axis=2, indices=[1])
aaa.shape
# ensorShape([4, 3, 1])
tf.gather_nd
gather_nd 函数的作用是从输入张量的指定维度上收集索引指定的元素(可以是多维索引),返回收集到的元素组成的新张量。
gather_nd 函数的作用是根据给定的索引从多维张量中收集值。它的参数包括:
- input: 源张量
- index: 要收集的元素的索引
a = tf.random.truncated_normal([4, 35, 8], mean=80, stddev=10)
tf.gather_nd(a, [[0, 1, 1], [0, 2, 2], [0, 3, 3]]).shape
# TensorShape([3])
tf.gather_nd(a, [[0, 0], [1, 1], [2, 2]]).shape
# TensorShape([3, 8])
tf.boolean_mask
a = tf.random.truncated_normal([4, 2, 3], mean=80, stddev=10)
tf.boolean_mask(a, mask=[True, True, False, False]).shape
# TensorShape([2, 2, 3])
tf.boolean_mask(a, mask=[True, False], axis=1).shape
# TensorShape([4, 1, 3])
a = tf.constant(range(24), shape=[4, 2, 3])
a
tf.boolean_mask(a, mask=[[True, True], [True, False], [False, False], [False, True]])
4. 维度变换
4.1 tf.reshape (View)
a = tf.random.normal([4, 28, 28, 3])
a.shape # TensorShape([4, 28, 28, 3])
a.ndim # 4
tf.reshape(a, [4, 784, 3]).shape
# TensorShape([4, 784, 3])
tf.reshape(a, [4, -1, 3]).shape
# TensorShape([4, 784, 3])
tf.reshape(a, [4, 2, 14, 28, 3]).shape
# TensorShape([4, 2, 14, 28, 3])
tf.reshape(a, [4, -1]).shape
# TensorShape([4, 2352])
tf.reshape(tf.reshape(a, [4, -1]), [4, 14, 56, 3]).shape
# TensorShape([4, 14, 56, 3])
4.2 tf.transpose (Content)
a = tf.random.normal([1, 2, 3, 4])
a.shape
# TensorShape([1, 2, 3, 4])
tf.transpose(a).shape
# TensorShape([4, 3, 2, 1])
tf.transpose(a, perm=[0, 2, 1, 3]).shape
# TensorShape([1, 3, 2, 4])
# b h w c
a = tf.random.normal([4, 28, 28, 3])
a.shape
# TensorShape([4, 28, 28, 3])
# b c h w
tf.transpose(a, [0, 3, 1, 2]).shape
# TensorShape([4, 3, 28, 28])
4.3 expand/squeeze dims
tf.expand_dims
axis 为正数时,往前增加维度;axis 为负数时,往后增加维度。比如 参数 axis=0 表示在 axis0 之前加一个维度;参数 axis=3 表示在 axis2 和 axis3 之间的位置加一个维度。
a = tf.random.normal([4, 35, 8])
tf.expand_dims(a, axis=0).shape
# TensorShape([1, 4, 35, 8])
tf.expand_dims(a, axis=1).shape
# TensorShape([4, 1, 35, 8])
tf.expand_dims(a, axis=3).shape
# TensorShape([4, 35, 8, 1])
tf.expand_dims(a, axis=-1).shape
# TensorShape([4, 35, 8, 1])
tf.expand_dims(a, axis=-4).shape
# ensorShape([1, 4, 35, 8])
only squeeze for shape=1 dim
tf.squeeze
a = tf.zeros([1, 2, 1, 3])
tf.squeeze(a).shape
# TensorShape([2, 3])
tf.squeeze(a, axis=0).shape
# TensorShape([2, 1, 3])
tf.squeeze(a, axis=2).shape
# TensorShape([1, 2, 3])
tf.squeeze(a, axis=-2).shape
# TensorShape([1, 2, 3])
tf.squeeze(a, axis=-4).shape
# TensorShape([2, 1, 3])
4.4 broadcast
TensorFlow 的 broadcasting 是一种广播机制,它可以在不显式复制数据的情况下,自动扩展张量形状,以便进行逐元素的操作,实现对不同形状的张量的运算。它的原理如下:
1. 扩展维度:如果两个张量的形状不同,需要将形状较小的张量扩展维度,使得它们的维度数相同。这个过程可以使用 tf.expand_dims 函数来实现。
2. 复制张量:如果两个张量的形状在某些维度上不同,需要将形状较小的张量进行复制,使得它们的维度大小相同。这个过程可以使用 tf.broadcast_to 函数来实现。
3. 进行运算:将扩展维度和复制张量的结果进行运算,得到最终的结果。
总结:先对齐小维度(从右边对齐) -> 维度数不匹配时,向前插入维度(新插入的维度 shape=1)-> 把 shape=1 的维度扩张为与目标对象相同的 shape。
broadcast 是一种优化运行的手段,当进行张量运算时,如果两个张量的形状不完全相同,但它们的形状在某些维度上是兼容的,那么 TensorFlow 会自动将它们扩展到相同的形状,以便进行运算。
x = tf.random.normal([4, 32, 32, 3])
(x + tf.random.normal([3])).shape
# TensorShape([4, 32, 32, 3])
(x + tf.random.normal([32, 1])).shape
# TensorShape([4, 32, 32, 3])
(x + tf.random.normal([4, 1, 1, 1])).shape
# TensorShape([4, 32, 32, 3])
# (x + tf.random.normal([1, 4, 1, 1])).shape
# InvalidArgumentError: Incompatible shapes: [4,32,32,3] vs. [1,4,1,1] [Op:AddV2]
显示调用 tf.broadcast_to
x = tf.random.normal([4, 32, 32, 3])
b = tf.broadcast_to(tf.random.normal([4, 1, 1, 1]), x.shape)
b.shape
# TensorShape([4, 32, 32, 3])
(x+b).shape
# TensorShape([4, 32, 32, 3])
tf.tile (占用内存)
a = tf.ones([3, 4])
# [3, 4] -> [2, 3, 4]
# way1 tf.broadcast_to
a1 = tf.broadcast_to(a, [2, 3, 4])
a1.shape
# TensorShape([2, 3, 4])
# way2 tf.expand_dims + tf.tile
a2 = tf.expand_dims(a, axis=0)
a2 = tf.tile(a2, [2, 1, 1])
a2.shape
# TensorShape([2, 3, 4])
5. 进阶操作
5.1 数学运算
+ - * / | element-wise |
// % | element-wise |
**, tf.pow, tf.square, tf.sqrt | element-wise |
tf.exp, tf.math.log | element-wise |
矩阵乘法 @ matmul | matrix-wise |
reduce_mean、reduce_max、reduce_min、reduce_sum | dim-wise |
+-*/ % //
a = tf.ones([2, 2])
b = tf.fill([2, 2], 2.)
a + b, a - b, a * b, a / b
a // b, a % b
tf.math.log(a)
tf.exp(a)
tf.exp, tf.math.log
# log 以2为底,8的对数
tf.math.log(8.) / tf.math.log(2.)
# <tf.Tensor: shape=(), dtype=float32, numpy=3.0>
# log 以10为底,100的对数
tf.math.log(100.) / tf.math.log(10.)
# <tf.Tensor: shape=(), dtype=float32, numpy=2.0>
**, tf.pow, tf.square, tf.sqrt
b = tf.fill([2, 2], 2.)
tf.pow(b, 3)
tf.square(b)
b ** 3
tf.sqrt(b)
@, matmul
a = tf.ones([2, 2])
b = tf.fill([2, 2], 2.)
a@b
tf.matmul(a, b)
a = tf.ones([4, 2, 3])
b = tf.fill([4, 3, 5], 2.)
(a@b).shape
# TensorShape([4, 2, 5])
tf.matmul(a, b).shape
# TensorShape([4, 2, 5])
c = tf.fill([3, 5], 2.)
(a@c).shape
# ensorShape([4, 2, 5])
5.2 数据统计
tf.reduce_min, tf.reduce_max, tf.reduce_mean, tf.reduce_sum
a = tf.random.uniform([3, 5], minval=1, maxval=10, dtype=tf.int32)
a
tf.reduce_min(a), tf.reduce_max(a), tf.reduce_mean(a), tf.reduce_sum(a)
tf.reduce_min(a, axis=0)
tf.reduce_min(a, axis=1)
tf.norm(参数 ord 默认为2, 表示2范数;ord=1 表示 1范数)
a = tf.random.normal([3,5])
# 矩阵 L2范数
tf.norm(a)
tf.sqrt(tf.reduce_sum(tf.square(a)))
# 向量 L2范数
tf.norm(a, ord=2, axis=0)
# 矩阵 L1范数
tf.norm(a, ord=1)
tf.norm(a, ord=1, axis=0)
tf.norm(a, ord=1, axis=1)
# 更复杂一点的矩阵
b = tf.random.normal([4, 35, 8])
tf.norm(b)
tf.sqrt(tf.reduce_sum(tf.square(b)))
tf.argmax, tf.argmin
a = tf.random.uniform([4, 10], minval=1, maxval=10, dtype=tf.int32)
a
tf.argmax(a)
tf.argmin(a)
tf.argmin(a, axis=1)
tf.equal
a = tf.constant([1, 2, 3, 1, 5], dtype=tf.float32)
b = tf.ones(a.shape)
b.dtype
tf.equal(a, b)
# 统计相等项个数,即True的个数
tf.reduce_sum(tf.cast(tf.equal(a, b), dtype=tf.int32))
应用:分类问题,计算预测精度
out = tf.constant([
[0.1, 0.2, 0.7],
[0.9, 0.05, 0.05],
[0.8, 0.1, 0.1],
[0.3, 0.6, 0.1]
])
pred = tf.cast(tf.argmax(out, axis=1), dtype=tf.int32)
y = tf.constant([2, 0, 1, 1], dtype=tf.int32)
bool_res = tf.equal(y, pred)
correct = tf.reduce_sum(tf.cast(bool_res, dtype=tf.int32))
accuracy = correct / y.shape[0]
print(accuracy.numpy())
# 0.75
tf.unique
a = tf.constant([1, 2, 3, 1, 5])
tf.unique(a)
tf.unique(a)[0]
tf.unique(a)[1]
tf.gather(*tf.unique(a))
5.3 填充与复制
tf.pad
a = tf.reshape(tf.range(9), [3,3])
tf.pad(a, [[0, 0], [1, 0]])
image padding
a = tf.random.normal([4, 28, 28, 3])
b = tf.pad(a, [[0, 0], [2, 2], [2, 2], [0, 0]])
b.shape
# TensorShape([4, 32, 32, 3])
tf.tile(repeat n times)
a = tf.reshape(tf.range(9), [3, 3])
tf.tile(a, [1, 2])
tile v.s. broadcast_to
a = tf.reshape(tf.range(9), [3, 3])
aa = tf.expand_dims(a, axis=0)
aa = tf.tile(aa, [2, 1, 1])
aa
bb = tf.broadcast_to(a, [2, 3, 3])
bb
5.4 合并与分割
tf.concat
a1 = tf.ones([4, 35, 8])
b1 = tf.ones([2, 35, 8])
c1 = tf.concat([a1, b1], axis=0)
c1.shape
# TensorShape([6, 35, 8])
a2 = tf.ones([4, 35, 8])
b2 = tf.ones([4, 3, 8])
c2 = tf.concat([a2, b2], axis=1)
c2.shape
# TensorShape([4, 38, 8])
concat 要求 axis 参数外的其他维度 shape 必须匹配,否则报错:
tf.stack(creat a new dim)注:stack 要求所有维度 shape 都相等
a1 = tf.ones([4, 35, 8])
b1 = tf.ones([4, 35, 8])
c1 = tf.stack([a1, b1], axis=0)
c1.shape
# TensorShape([2, 4, 35, 8])
a2 = tf.ones([4, 35, 8])
b2 = tf.ones([4, 35, 8])
c2 = tf.stack([a2, b2], axis=3)
c2.shape
# TensorShape([4, 35, 8, 2])
stack 要求所有维度都匹配,否则报错:
tf.unstack(拆成对象个数=对应维度的 shape,对应维度消失)
a = tf.ones([4, 35, 8])
b = tf.ones([4, 35, 8])
c = tf.stack([a, b], axis=0)
c.shape
# TensorShape([2, 4, 35, 8])
res = tf.unstack(c, axis=0)
print(type(res), len(res))
# <class 'list'> 2
res = tf.unstack(c, axis=3)
print(type(res), len(res))
# <class 'list'> 8
res[0].shape
# TensorShape([2, 4, 35])
res[6].shape
# TensorShape([2, 4, 35])
tf.split
a = tf.ones([4, 35, 8])
res = tf.split(a, axis=2, num_or_size_splits=2)
print([item.shape for item in res])
# [TensorShape([4, 35, 4]), TensorShape([4, 35, 4])]
res = tf.split(a, axis=2, num_or_size_splits=[2, 2, 4])
print([item.shape for item in res])
# [TensorShape([4, 35, 2]), TensorShape([4, 35, 2]), TensorShape([4, 35, 4])]
5.5 排序
tf.sort, tf.argsort
一维向量
a = tf.random.shuffle(tf.range(5))
a
tf.sort(a, direction='ASCENDING')
tf.sort(a, direction='DESCENDING')
idx = tf.argsort(a, direction='DESCENDING')
idx
tf.gather(a, idx)
二维矩阵
a = tf.random.uniform([3, 3], minval=1, maxval=10, dtype=tf.int32)
a
tf.sort(a)
tf.argsort(a)
tf.sort(a, direction='DESCENDING')
tf.math.top_k
a = tf.random.uniform([5, 5], minval=1, maxval=10, dtype=tf.int32)
a
tf.math.top_k(a, 2)
TopK Accuracy
out = tf.constant([
[0.8, 0.1, 0.1],
[0.4, 0.5, 0.1],
[0.2, 0.5, 0.3],
])
pred = tf.math.top_k(out, 2).indices # [3, 2]
y = tf.constant([0, 0, 2]) # [3]
# y 与 pred 最后一个维度的shape不相同,需要对pred取转置
pred = tf.transpose(pred, [1, 0])
y = tf.broadcast_to(y, pred.shape)
bool_res = tf.equal(pred, y)
tf.reshape(bool_res, [-1])
tf.reduce_sum(tf.cast(tf.reshape(bool_res, [-1]), dtype=tf.int32), axis=0)
def accuracy(ouput, target, topk=(1,)):
maxk = max(topk)
n = target.shape[0]
pred = tf.math.top_k(output, maxk).indices
pred = tf.transpose(pred, [1, 0])
target_ = tf.broadcast_to(target, pred.shape)
correct = tf.equal(pred, target)
res = []
for k in topk:
correct_k = tf.cast(tf.reshape(correct[:k], [-1]), dtype=tf.int32)
acc = tf.reduce_sum(correct_k) / n
res.append(acc)
return res
output = tf.constant([
[0.7, 0.06, 0.01, 0.15, 0.08],
[0.05, 0.5, 0.1, 0.2, 0.15],
[0.3, 0.4, 0.2, 0.05, 0.05],
])
# [3, 5]
target = tf.constant([1, 3, 2])
# [3]
accuracy(output, target, topk=(1, 2, 3, 4, 5))
# [<tf.Tensor: shape=(), dtype=float64, numpy=0.0>,
# <tf.Tensor: shape=(), dtype=float64, numpy=0.3333333333333333>,
# <tf.Tensor: shape=(), dtype=float64, numpy=0.6666666666666666>,
# <tf.Tensor: shape=(), dtype=float64, numpy=1.0>,
# <tf.Tensor: shape=(), dtype=float64, numpy=1.0>]
5.6 限幅
tf.clip_by_value
a = tf.range(9)
tf.maximum(a, 2)
# <tf.Tensor: shape=(9,), dtype=int32, numpy=array([2, 2, 2, 3, 4, 5, 6, 7, 8])>
tf.minimum(a, 5)
# <tf.Tensor: shape=(9,), dtype=int32, numpy=array([0, 1, 2, 3, 4, 5, 5, 5, 5])>
tf.minimum(tf.maximum(a, 2), 5)
# <tf.Tensor: shape=(9,), dtype=int32, numpy=array([2, 2, 2, 3, 4, 5, 5, 5, 5])>
tf.clip_by_value(a, 2, 5)
# <tf.Tensor: shape=(9,), dtype=int32, numpy=array([2, 2, 2, 3, 4, 5, 5, 5, 5])>
tf.clip_by_norm
a = tf.random.normal([2, 5], mean=10)
tf.norm(a)
aa = tf.clip_by_norm(a, 15)
tf.norm(aa)
tf.cllip_by_global_norm
详细内容参考 <6. 应用>
tf.nn.relu
a = tf.range(9)-5
a
# <tf.Tensor: shape=(9,), dtype=int32, numpy=array([-5, -4, -3, -2, -1, 0, 1, 2, 3])>
tf.nn.relu(a)
# <tf.Tensor: shape=(9,), dtype=int32, numpy=array([0, 0, 0, 0, 0, 0, 1, 2, 3])>
tf.maximum(a, 0)
# <tf.Tensor: shape=(9,), dtype=int32, numpy=array([0, 0, 0, 0, 0, 0, 1, 2, 3])>
5.7 其他
tf.where
tf.where(mask)
返回 mask 为 True 的位置对应的 indices。
a = tf.random.normal([3, 3])
mask = a > 0
mask
tf.where(mask)
print('=====')
tf.boolean_mask(a, mask)
tf.gather_nd(a, tf.where(mask))
tf.where(cond, x, y)
cond 是一个布尔类型的张量,x 和 y 是两个形状相同的张量,用于表示在 cond 为 True 的位置选择 x 中的元素,否则选择 y 中的元素。
mask = tf.random.normal([3, 3]) > 0
mask
a = tf.ones([3, 3])
b = tf.zeros([3, 3])
tf.where(mask, a, b)
tf.scatter_nd
tf.scatter_nd(indices, updates, shape)
tf.scatter_nd 可用于将给定的值按照指定的索引散布到新的张量中。
indices 是一个N x M 的张量,表示要更新的元素的索引,N 表示要更新的元素的数量,M 表示每个索引的维度。updates是一个N x D的张量,表示要更新的值,D表示每个值的维度。shape是一个 K 维元组,表示输出张量的形状。
indices = tf.constant([[4], [2], [1], [7]])
updates = tf.constant([-1, -2, -3, -4])
shape = tf.constant([8])
tf.scatter_nd(indices, updates, shape)
# <tf.Tensor: shape=(8,), dtype=int32, numpy=array([ 0, -3, -2, 0, -1, 0, 0, -4])>
indices = tf.constant([[0], [2]])
updates = tf.constant([[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]],
[[2, 2, 2, 2], [2, 2, 2, 2], [2, 2, 2, 2], [2, 2, 2, 2]]])
shape = tf.constant([4, 4, 4])
tf.scatter_nd(indices, updates, shape)
tf.meshgrid
x = tf.linspace(-2, 2, 5)
y = tf.linspace(-2, 2, 5)
points_x, points_y = tf.meshgrid(x, y)
points_x.shape
# TensorShape([5, 5])
tf.stack([points_x, points_y], axis=2)
应用 tf.meshgrid
import matplotlib.pyplot as plt
import tensorflow as tf
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
def func(points):
z = tf.math.sin(points[..., 0]) + tf.math.sin(points[..., 1])
return z
x = tf.linspace(0., 3.14*2, 100)
y = tf.linspace(0., 3.14*2, 100)
points_x, points_y = tf.meshgrid(x, y)
points = tf.stack([points_x, points_y], axis=2)
z = func(points)
print(z.shape)
plt.contour(points_x, points_y, z)
plt.colorbar()
plt.show()
plt.imshow(z, origin='lower', interpolation='none')
plt.colorbar()
plt.show()
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(points_x, points_y, z, cmap='rainbow')
plt.show()
6. 应用
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import datasets
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
(x, y), _ = datasets.mnist.load_data()
x = tf.convert_to_tensor(x, dtype=tf.float32) / 255.
y = tf.convert_to_tensor(y, dtype=tf.int32)
print(x.shape, y.shape)
print(tf.reduce_min(x), tf.reduce_max(x))
print(tf.reduce_min(y), tf.reduce_max(y))
train_db = tf.data.Dataset.from_tensor_slices((x, y)).batch(128)
train_iter = iter(train_db)
sample = next(train_iter)
print(sample[0].shape, sample[1].shape)
# [b, 784] -> [b, 256] -> [b, 128] -> [b, 10]
# w: [dim_in, dim_out] b: [dim_out]
w1 = tf.Variable(tf.random.truncated_normal([784, 256]))
# w1 = tf.Variable(tf.random.truncated_normal([784, 256], stddev=0.1))
b1 = tf.Variable(tf.zeros([256]))
w2 = tf.Variable(tf.random.truncated_normal([256, 128]))
# w2 = tf.Variable(tf.random.truncated_normal([256, 128], stddev=0.1))
b2 = tf.Variable(tf.zeros([128]))
w3 = tf.Variable(tf.random.truncated_normal([128, 10]))
# w3 = tf.Variable(tf.random.truncated_normal([128, 10], stddev=0.1))
b3 = tf.Variable(tf.zeros([10]))
lr = 1e-3
for epoch in range(10):
# iterate for db
for step, (x, y) in enumerate(train_db):
# iterate batch
x = tf.reshape(x, [-1, 28*28])
with tf.GradientTape() as tape:
# h1: [b, 784]@[784, 256] + [256] => [b, 256]
h1 = tf.nn.relu(x@w1 + b1)
# h2: [b, 256] => [b, 128]
h2 = tf.nn.relu(h1@w2 + b2)
# h3: [b, 128] => [b, 10]
out = h2@w3 + b3
# compute loss
# out [b, 10]
# y [b] => [b, 10]
y_onehot = tf.one_hot(y, depth=10)
loss = tf.reduce_mean(tf.square(y_onehot - out))
# compute gradients
grads = tape.gradient(loss, [w1, b1, w2, b2, w3, b3])
# print('===== before =====')
# for grad in grads:
# print(tf.norm(grad))
grads, _ = tf.clip_by_global_norm(grads, 15)
# print('===== after =====')
# for grad in grads:
# print(tf.norm(grad))
# update gradients
w1.assign_sub(lr * grads[0])
b1.assign_sub(lr * grads[1])
w2.assign_sub(lr * grads[2])
b2.assign_sub(lr * grads[3])
w3.assign_sub(lr * grads[4])
b3.assign_sub(lr * grads[5])
if step % 100 == 0:
print(epoch, step, 'loss:', float(loss))
运行结果:
如果注释掉 tf.clip_by_global_norm 这行代码,重新运行,会发现输出结果中有很多 non,存在梯度爆炸问题 gradient exploding。