tf2 + keras学习

最新推荐文章于 2022-09-05 19:50:11 发布

Joey9898

最新推荐文章于 2022-09-05 19:50:11 发布

阅读量1.3k

点赞数

文章标签： keras 学习 tensorflow

本文链接：https://blog.csdn.net/Joey9898/article/details/126391339

版权

静动态图

搭建静态图

Tensorflow1.x主要都是搭建的静态图，需要手动创建一个图然后在session里执行

图结构就相当于先定义好所有的 数据节点 （op，用以存储所有的中间以及最后的结果tensor）以及 计算节点（算子），然后在创建好图后在session中利用图中的数据节点以及计算节点执行计算逻辑。

只有定义好一个完整的网络结构（Graph），才能开始执行整个图（创建session中执行），且在运行过程中不能对图进行修改（比如添加网络结点、删除结点等操作）。整个过程和C语言编译很像，一旦构图完成，在执行训练的过程无法对网络结构进行改变。

import tensorflow as tf
import numpy as np

# 获得默认图，如果不用with语句显式指定所归属的计算图，
# 则所有的tensor和Operation都是在默认计算图中定义的，
# 使用tf.get_default_graph()函数可以获取当前默认的计算图，
g=tf.get_default_graph()

x = tf.constant([1,2,3],dtype = tf.float32)
result=tf.matmul(x,x)

print(result)
# 不用sess.run只能得到meta信息

# 在graph=g上创建session声明利用graph中的节点执行运算
with tf.Session(graph=g) as sess:
 print(sess.run(result))

# 方式二(更常用)：如果没有tf.Session(graph=g)也可以，因为默认下面创建的所有图节点都在default_graph中
x = tf.constant([1,2,3],dtype = tf.float32)
# x默认被添加到默认图中了
# assert tf.get_default_graph() == x.graph
result=tf.matmul(x,x)

with tf.Session() as sess:
 print(sess.run(result))

搭建动态图

动态图的代表是pytorch，Tensorflow2.0已经将Eager Execution(即时执行)作为Tensorflow的默认执行模式，这即意味着Tensorflow如同PyTorch那样，由编写静态计算图完全转向动态计算图，这使得开发者可以更简洁高效地搭建原型。当然你可以选择不使用eager模式，自己构建计算图。

但在Tensorflow1.x中使用动态图需要在开头增加

import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()

import tensorflow as tf
import numpy as np
import tensorflow.contrib.eager as tfe

# 使用动态图机制
tfe.enable_eager_execution()

x = tf.constant([1,2,3],dtype = tf.float32)
result=tf.matmul(x,x)

print("{}".format(result))

# 动态图不能使用自定义的session
# 因为在创建动态图的过程中，默认也建立一个session。所有的代码都在该session中进行，而且该session具有进程相同的生命周期。
# with tf.Session() as sess:
#  print(sess.run(result))

从动态图的代码和输出结果可以看出：
第一，我们直接在模型定义过程中使用print了result，从结果可以看出，可以直接输出result的值，而在静态图中输出的是Operation的对象，即动态图的操作在python代码中被调用后，其操作立即被执行，张量赋值也是如此。

第二，我们可以注意到，动态图不再需要tf.Session() 来建立对话了，因为在创建动态图的过程中，Tensorflow会默认建立一个session和graph。所有的张量和操作都属于计算图graph，所有的代码都在该session中进行，而且该session具有进程相同的生命周期。

这也就是为什么不用写sess.run()就能得到计算结果的原因。而这表明一旦使用动态图就无法实现静态图中关闭session的功能。这是动态图的不足之处：无法实现多session操作，这使得在一个进程中同时跑多个模型成为困难的事情（在静态图中，我们可以创建多个sess与graph使用多个模型，不同计算图上的张量和运算都不会共享，计算图可以用来隔离张量和计算，使得模型之间相互无影响）。如果当前代码只需要一个session来完成的话，建议优先选择动态图Eager来实现。

张量的结构操作

注意tf其实有很多方法和numpy以及torch的名字和用法都一样或者相似

但是tf最主要区别于numpy以及torch的区别是：

tf中的方法都只支持tf.method(tensor, params)，而不支持tensor.method(params)

torch中：tensor.method(params)和torch.method(tensor, params)等同

numpy中：array.method(params)和np.method(array, params)等同

构建

创建一般的张量

import numpy as np
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()

### 创建一般的张量
print(tf.constant([1,2,3],dtype = tf.float32))
# tf.Tensor([1. 2. 3.], shape=(3,), dtype=float32)

创建特殊的常量张量

### 创建特殊的常量张量
print(tf.range(0,10))
# tf.Tensor([0 1 2 3 4 5 6 7 8 9], shape=(10,), dtype=int32)

print(tf.zeros(3))
# tf.Tensor([0. 0. 0.], shape=(3,), dtype=float32)

print(tf.zeros([3,3]))
# tf.Tensor(
# [[0. 0. 0.]
#  [0. 0. 0.]
#  [0. 0. 0.]], shape=(3, 3), dtype=float32)

print(tf.ones(2))
# tf.Tensor([1. 1.], shape=(2,), dtype=float32)

print(tf.fill([2,2],5))
# tf.Tensor(
# [[5 5]
#  [5 5]], shape=(2, 2), dtype=int32)

# 创建boolean张量
print(tf.constant([True,True,False],dtype=tf.bool))
# tf.Tensor([ True  True False], shape=(3,), dtype=bool)

创建随机张量

# 创建随机种子
tf.random.set_random_seed(1.0)

### 创建随机张量
print(tf.random.uniform([5]))
# tf.Tensor([0.16513085 0.9014813  0.6309742  0.4345461  0.29193902], shape=(5,), dtype=float32)
print(tf.random.uniform([1,5]))
# tf.Tensor([[0.51010704 0.44353175 0.4085331  0.9924923  0.68866396]], shape=(1, 5), dtype=float32)
print(tf.random.normal([2,2]))
# tf.Tensor(
# [[-0.45701224 -0.40686727]
#  [ 0.72857773 -0.8929778 ]], shape=(2, 2), dtype=float32)

索引

常规索引（单个，连续/切片）

a = tf.random.uniform([5,5])
print(a)
# tf.Tensor(
# [[0.01714313 0.55956316 0.11379957 0.4944502  0.97687316]
#  [0.44926536 0.46887696 0.6345625  0.04377449 0.5565767 ]
#  [0.7070466  0.32708418 0.01742852 0.8636614  0.27090502]
#  [0.23116112 0.48299325 0.12780559 0.8452195  0.19541776]
#  [0.8800169  0.6616645  0.15237486 0.9441302  0.5447223 ]], shape=(5, 5), dtype=float32)

# 索引，取最后一行
print(a[-1])
# tf.Tensor([0.8800169  0.6616645  0.15237486 0.9441302  0.5447223 ], shape=(5,), dtype=float32)

# 索引，取0，1，2列
print(a[:,0:3])
# tf.Tensor(
# [[0.01714313 0.55956316 0.11379957]
#  [0.44926536 0.46887696 0.6345625 ]
#  [0.7070466  0.32708418 0.01742852]
#  [0.23116112 0.48299325 0.12780559]
#  [0.8800169  0.6616645  0.15237486]], shape=(5, 3), dtype=float32)

不规则索引

`tf.gather`

gather代表聚集的意思

tf.gather(tensor,indices,axis = 0)

axis不给出时默认为0

`tf.gather_nd`

tf.gather_nd(tensor,indices)

注意tf.gather_nd没有axis参数

tf.gather 和tf.gather_nd 最大的区别是

tf.gather是按照某个单维的索引

由于是单维，indices多为一维数组，如[0, 3, 5]，(其也支持多维，但用法很怪)

由于tf中的tensor不支持不连续的索引，而在torch中是可以的

如三维数组tensor

torch中 tensor[:,[0, 3, 5],:] 等效于 tf.gather(tensor,[0, 3, 5],axis = 1)

必须指定维度axis参数（不指定时默认0）

tf.gather_nd，nd的含义是n_dimension，所以它是按照多维索引，即按照坐标的方式

print(a)
# tf.Tensor(
# [[[94 20 44]
#   [13 35 99]
#   [55 13 40]]

#  [[61 90 25]
#   [28 71 63]
#   [75 13 50]]

#  [[61 90 62]
#   [87 73 65]
#   [72  0 89]]], shape=(3, 3, 3), dtype=int32)

# axis不给出时默认为0，按照0维取第0，2个tensor
# 取完后第0维数量变为2
print(tf.gather(a,[0,2]))
# tf.Tensor(
# [[[94 20 44]
#   [13 35 99]
#   [55 13 40]]

#  [[61 90 62]
#   [87 73 65]
#   [72  0 89]]], shape=(2, 3, 3), dtype=int32)

# 按照1维取第0，2个tensor，即取每个tensor的第0，2行
# 取完后第1维数量变为2
print(tf.gather(a,[0,2],axis = 1))
# tf.Tensor(
# [[[94 20 44]
#   [55 13 40]]

#  [[61 90 25]
#   [75 13 50]]

#  [[61 90 62]
#   [72  0 89]]], shape=(3, 2, 3), dtype=int32)

# gather支持indices多维，只不过是多次单维索引
# tf.gather(a,[0,0])和tf.gather(a,[1,1])按照第0维索引后concat
print(tf.gather(a,[[0,0],[1,1]]))
# tf.Tensor(
# [[[[94 20 44]
#    [13 35 99]
#    [55 13 40]]

#   [[94 20 44]
#    [13 35 99]
#    [55 13 40]]]


#  [[[61 90 25]
#    [28 71 63]
#    [75 13 50]]

#   [[61 90 25]
#    [28 71 63]
#    [75 13 50]]]], shape=(2, 2, 3, 3), dtype=int32)

# 取
print(tf.gather_nd(a,[[0,0],[1,1]]))
# tf.Tensor(
# [[94 20 44]
#  [28 71 63]], shape=(2, 3), dtype=int32)

`tf.boolean_mask`

tf.boolean_mask功能最为强大，它可以实现tf.gather,tf.gather_nd的功能，并且tf.boolean_mask还可以实现布尔索引。

用tf.boolean_mask 实现tf.gather

a = tf.random_normal([2,3])
print(a)
# tf.Tensor(
# [[ 0.22584529  0.41727218  0.39251724]
#  [ 0.7011393   0.05133274 -1.9534125 ]], shape=(2, 3), dtype=float32)

print(tf.boolean_mask(a, [True,False]))
print(tf.boolean_mask(a, [1,0])) # 可以用1，0代替True,False，结果同上
# tf.Tensor([[0.22584529 0.41727218 0.39251724]], shape=(1, 3), dtype=float32)

print(tf.boolean_mask(a, [1,0,1], axis=1))
# tf.Tensor(
# [[ 0.22584529  0.39251724]
#  [ 0.7011393  -1.9534125 ]], shape=(2, 2), dtype=float32)

用tf.boolean_mask 实现tf.gather_nd

print(tf.boolean_mask(a,[[1,0,1],[0,1,0]]))
# tf.Tensor([ 0.22584529  0.39251724 0.05133274 ], shape=(3,), dtype=float32)

其取数方式更直接，1的地方就取，0的地方不取，比tf.gather_nd坐标的形式更加简单易懂

在这里插入图片描述

利用tf.boolean_mask可以实现布尔索引

a = tf.constant([[-1,1,-1],[2,2,-2],[3,-3,3]],dtype=tf.float32)
print(a)
# tf.Tensor(
# [[-1.  1. -1.]
#  [ 2.  2. -2.]
#  [ 3. -3.  3.]], shape=(3, 3), dtype=float32)

print(tf.boolean_mask(a, a<0))
print(a[a<0]) # 等效于上面的用法，建议使用下面的语法糖形式，更简单易懂
# tf.Tensor([-1. -1. -2. -3.], shape=(4,), dtype=float32)

`tf.where`

tf.where可以理解为if的张量版本

用法：

tf.where(condition, a , b)

其中condition，a，b是三个张量，他们的形状需要保持相同

condition是一个布尔张量，用来条件的判断

(一般是给一个条件判断符，如padding_mask == 0（为tf.equal(padding_mask, 0)的语法糖形式）

也可以直接给一个bool_tensor tf.constant([True,True,False],dtype=tf.bool))
a和b是两个张量，用来取数

遍历condition张量中的布尔值，true（即condition满足）则取a中的值，否则取b中的值

此用法可以用来filter不满足条件的值并替换成其他值

如此例中将张量a中所有小于0的值全部替换为na

a = tf.constant([[-1,1,-1],[2,2,-2],[3,-3,3]],dtype=tf.float32)

print(a)
# tf.Tensor(
# [[-1.  1. -1.]
#  [ 2.  2. -2.]
#  [ 3. -3.  3.]], shape=(3, 3), dtype=float32)

print(tf.where(a<0,tf.fill(a.shape,np.nan),a))
# tf.Tensor(
# [[nan  1. nan]
#  [ 2.  2. nan]
#  [ 3. nan  3.]], shape=(3, 3), dtype=float32)

维度变化

tensorFlow维度变换可分为两个级别，一个是view级，一个是content级。

要透彻维度变换，就要搞清楚tensor储存的底层：

在存储数据时，内存并不支持这个维度层级概念，只能以平铺方式按序写入内存，因此我们所看到的高维的tensor，其实底层都是一个一维的array的存储形式，各元素的内存地址相邻。

在这里插入图片描述

view级维度变换：

不改变底层张量元素的存储顺序，比如tf.range(24)的向量底层存储形式为

[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]

我们将其视图改变为[8,3]时，其实就是把上面的array分成了8份，然后就得到了每3个为一份的向量

[[ 0 1 2][ 3 4 5][ 6 7 8][ 9 10 11][12 13 14][15 16 17][18 19 20][21 22 23]]

只不过在print时tensorflow显示的更高维了:

[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]
[12 13 14]
[15 16 17]
[18 19 20]
[21 22 23]]

我们再将其视图改变为[2,3,4]时，其实就是把上面的array先分成2份，再分成3份，然后就得到了每4个为一份的向量[[[ 0 1 2 3][ 4 5 6 7][ 8 9 10 11]][[12 13 14 15][16 17 18 19][20 21 22 23]]]

只不过在print时tensorflow显示的更高维了:

[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]

[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]

所以不管是从维度从[24]变为[2,3,4]，还是从[8,3]变为[2,3,4]。本质上都是从一维的[24]进行分割然后给我们展示出来的view，本身底层的结构一直是一维的没有改变的。

所以，view级维度实际上非常迅速（因为根本没有改变底层的操作，只是从不同角度看底层的一维tensor），并且操作都是可逆的（这里的可逆指的是可以重新变为一维的tensor而不改变）。

content级维度变换：

会改变数据的存储关系，即底层的数据的顺序会发生改变，操作是不可逆的（这里的不可逆指的是不可以重新变为一维的tensor而不改变，不是指两次transpose，其实两次相同的transpose操作是等效于变回原来的tensor）。

以下的例子就能充分理解view和content变换的区别

a = tf.reshape(tf.range(24),[2,3,4])
print(tf.reshape(a, [-1]))
# tf.Tensor([ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23], shape=(24,), dtype=int32)

# 0，1维度的维度变换，但底层数据未发生改变，可逆
b = tf.reshape(a, [3,2,4])
print(tf.reshape(b, [-1]))
# tf.Tensor([ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23], shape=(24,), dtype=int32)

# 0,1维度转置，但底层数据发生改变，不可逆
c = tf.transpose(a,[1,0,2])
print(tf.reshape(c, [-1]))
# tf.Tensor([ 0  1  2  3 12 13 14 15  4  5  6  7 16 17 18 19  8  9 10 11 20 21 22 23], shape=(24,), dtype=int32)

维度变换相关函数主要有 tf.reshape, tf.squeeze, tf.expand_dims, tf.transpose.

tf.reshape 可以改变张量的形状。

tf.squeeze 可以减少维度。

tf.expand_dims 可以增加维度。

tf.transpose 可以交换维度。

tf.reshape, tf.squeeze, tf.expand_dims都是view级的变换

tf.transpose是content级变换

`tf.reshape`

其用法和numpy和torch一致，只不过其不支持tensor.reshape()，只能用tf.reshape(tensor)

注意和numpy和torch一样，-1表示自动计算出符合原始tensor的维度

a = tf.range(1,7)

print(a)
# tf.Tensor([1 2 3 4 5 6], shape=(6,), dtype=int32)

b = tf.reshape(a,[3,2])
# tf.Tensor(
# [[1 2]
#  [3 4]
#  [5 6]], shape=(3, 2), dtype=int32)

c = tf.reshape(b,[2,-1])
# tf.Tensor(
# [[1 2 3]
#  [4 5 6]], shape=(2, 3), dtype=int32)

d = tf.reshape(c, [-1])
# tf.Tensor([1 2 3 4 5 6], shape=(6,), dtype=int32)
# 注意这里的shape参数必须是个tensor，所以不能直接给-1

`tf.squeeze`

如果张量在某个维度上只有一个元素，利用tf.squeeze可以消除这个维度。

用法和torch.squeeze()一致

即可以删除shape中为1的维度，tf.squeeze可以按照指定某个维度消除，也可以指消除所有为1的维度

tf.squeeze(tensor, axis = [])

如果axis参数默认不给，则消除所有为1的维度

注意，axis参数中如果给出了不为1的维度，则会立即报错，因为根本无法压缩

a = tf.ones([1,3,2,1,1])

print(a.shape)
# (1, 3, 2, 1, 1)

# 不给axis参数，消除所有为1的维度
print(tf.squeeze(a).shape)
# (3, 2)

# 消除倒数第二个维度
print(tf.squeeze(a,axis = -2).shape)
# (1, 3, 2, 1)

# 消除第一个以及最后一个维度
print(tf.squeeze(a,axis = [0,-1]).shape)
# (3, 2, 1)

`tf.expand_dims` 或 `[:, tf.newaxis]`

tf.expand_dims 对应tf.squeeze的逆操作，只不过其一次只能扩展一个维度，即axis参数只能给一个int

用法和torch.unsqueeze()一致

tf.expand_dims(tensor, axis)

注意是在指定axis维度增加一维，增加后参数axis索引的维度的shape为1

也可理解为在指定维度前插入一维

a = tf.ones([4,3,2])

print(a.shape)
# (4, 3, 2)

# 在第0维扩张一维，扩张后第0维为1
print(tf.expand_dims(a,0).shape)
# (1, 4, 3, 2)

# 在第3维扩张一维，扩张后第3维为1
print(tf.expand_dims(a,3).shape)
# (4, 3, 2, 1)

还有一种新增维度的方式更加的易懂就是在指定维度添加tf.newaxis

这种方式不但更简单易懂，而且可以一次添加多维

a = tf.ones([4,3,2])

# 在第0维扩张一维，扩张后第0维为1
b = a[tf.newaxis,:,:,:]
# (1, 4, 3, 2)

# 在第1，3维扩张一维，扩张后第1，3维为1
c = a[:,tf.newaxis,:,tf.newaxis,:]
print(b.shape)
# (4, 1, 3, 1, 2)

`tf.transpose`

tf.transpose可以交换张量的维度，与tf.reshape不同，它会改变张量元素的存储顺序。

tf.transpose(a,perm)

perm参数表示转换后的索引排列，索引排列必须给全

输出“y”与“x”的关系。“x”和“y”的形状满足：

y.shape[i]==x.shape[perm[i]]对于[0，1，…，shape[0]-1]

tf.transpose常用于图片存储格式的变换上。

# Batch,Height,Width,Channel
a = tf.random.uniform(shape=[100,600,600,4],minval=0,maxval=255,dtype=tf.int32)
tf.print(a.shape)
# TensorShape([100, 600, 600, 4])

# 将Batch和Channel维度互换
# 转换成 Channel,Height,Width,Batch
s = tf.transpose(a,perm=[3,1,2,0])
tf.print(s.shape)
# TensorShape([4, 600, 600, 100])

tf.transpose在二维张量上十分好理解，就是矩阵的转置

a = tf.reshape(tf.range(12),[3,4])
print(a)
# tf.Tensor(
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]], shape=(3, 4), dtype=int32)

print(tf.transpose(a)) 
# 等效于tf.transpose(a,[1,0])
# tf.Tensor(
# [[ 0  4  8]
#  [ 1  5  9]
#  [ 2  6 10]
#  [ 3  7 11]], shape=(4, 3), dtype=int32)

但是多维矩阵的转置就抽象很多

但是请注意：矩阵对相邻的两维进行转置时才有意义，对任意两个非相邻的维度进行调换的意义不大

在多维张量上的最后两维进行transpose也很好理解：

保持前面维度的切分不变，最后两维看作一个矩阵，把所有的矩阵转置即可

a = tf.reshape(tf.range(24),[2,3,4])
print(a)
# tf.Tensor(
# [[[ 0  1  2  3]
#   [ 4  5  6  7]
#   [ 8  9 10 11]]

#  [[12 13 14 15]
#   [16 17 18 19]
#   [20 21 22 23]]], shape=(2, 3, 4), dtype=int32)


print(tf.transpose(a, [0,2,1]))
# tf.Tensor(
# [[[ 0  4  8]
#   [ 1  5  9]
#   [ 2  6 10]
#   [ 3  7 11]]

#  [[12 16 20]
#   [13 17 21]
#   [14 18 22]
#   [15 19 23]]], shape=(2, 4, 3), dtype=int32)

最难理解的就是把非最后两维的矩阵进行转置

a = tf.reshape(tf.range(24),[2,3,4])
print(a)
# tf.Tensor(
# [[[ 0  1  2  3]
#   [ 4  5  6  7]
#   [ 8  9 10 11]]

#  [[12 13 14 15]
#   [16 17 18 19]
#   [20 21 22 23]]], shape=(2, 3, 4), dtype=int32)

# 将前两维进行转置
print(tf.transpose(a, [1,0,2]))
# tf.Tensor(
# [[[ 0  1  2  3]
#   [12 13 14 15]]

#  [[ 4  5  6  7]
#   [16 17 18 19]]

#  [[ 8  9 10 11]
#   [20 21 22 23]]], shape=(3, 2, 4), dtype=int32)

上面的过程如何理解？

由于最后一个维度不变，我们可以将最后一个维度的一维向量整体（打包）看作一个变量:

那么上述的一开始[2,3,4]

[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]

[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]

打包：

A = [ 0 1 2 3] ，B = [ 4 5 6 7]， C = [ 8 9 10 11]，D=[12 13 14 15]，E=[16 17 18 19]，F=[20 21 22 23]

按以上规则打包后可以看作一个[2,3]的矩阵，其中每一个元素都是一个向量

[[A, B, C]

[D, E, F]]

那么[2,3]的矩阵转置后就成了[3,2]

[[A, D],

[B, E],

[C, F]]

将打包后的向量还原后即成了

[[[ 0 1 2 3]
[12 13 14 15]]

[[ 4 5 6 7]
[16 17 18 19]]

[[ 8 9 10 11]
[20 21 22 23]]]

所以多维张量对非最后的相邻两维进行转置时的秘诀就是：

把想要转置的维度之后的所有维度打包，看成个整体，之后其实想要转置的维度就变为了最后的两维，然后按照常规维矩阵的转置方法转置即可。

如将[bs, seq_len, n_head, n_dim]的中间两维进行[0, 2, 1, 3]转置为[bs, n_head, seq_len, n_dim]的过程为：

由于bs维度不变，可以不管，将n_dim的每个向量进行打包看成整体（每个词的embedding看作整体），然后将[seq_len, n_head]转为[n_head, seq_len]即可

`tf.concat`和`tf.stack`

和numpy和torch类似，可以用tf.concat和tf.stack方法对多个张量进行合并，可以用tf.split方法把一个张量分割成多个张量。由于和numpy和torch完全一致，这里就不再赘述。

注意：tf.concat和tf.stack有略微的区别，tf.concat是连接，不会增加维度，而tf.stack是堆叠，会增加维度。

a = tf.constant([[1.0,2.0],[3.0,4.0]])
b = tf.constant([[5.0,6.0],[7.0,8.0]])
c = tf.constant([[9.0,10.0],[11.0,12.0]])

# 横向拼接，concat维度不会增加，向量拼接最多的就是横向拼接
print(tf.concat([a,b,c],axis = 1))
# tf.Tensor(
# [[ 1.  2.  5.  6.  9. 10.]
#  [ 3.  4.  7.  8. 11. 12.]], shape=(2, 6), dtype=float32

# 竖向拼接，concat维度不会增加
print(tf.concat([a,b,c],axis = 0))
# tf.Tensor(
# [[ 1.  2.]
#  [ 3.  4.]
#  [ 5.  6.]
#  [ 7.  8.]
#  [ 9. 10.]
#  [11. 12.]], shape=(6, 2), dtype=float32)

# 横向堆叠，stack维度增加一维
print(tf.stack([a,b,c],axis = 1))
# tf.Tensor(
# [[[ 1.  2.]
#   [ 5.  6.]
#   [ 9. 10.]]

#  [[ 3.  4.]
#   [ 7.  8.]
#   [11. 12.]]], shape=(2, 3, 2), dtype=float32)

# 属性堆叠，stack维度增加一维
print(tf.stack([a,b,c],axis = 0))
# tf.Tensor(
# [[[ 1.  2.]
#   [ 3.  4.]]

#  [[ 5.  6.]
#   [ 7.  8.]]

#  [[ 9. 10.]
#   [11. 12.]]], shape=(3, 2, 2), dtype=float32)

`tf.tile`

tf.tile(input, multiples, name=None)

通过“平铺”一个给定的 tensor 来构造一个新的 tensor。用人话讲就是：把输入的 tensor，在指定的维度上复制N遍（就像铺瓷砖一样），来创建出一个新的 tensor。

3个参数：
input：输入的tensor
multiples：在指定的维度上复制原tensor的次数
name：operation的名字

a = tf.constant([[15, 16], [17, 18]])
b = tf.tile(a, [1, 3])
c = tf.tile(a, [3, 2])
print(a)
# tf.Tensor(
# [[15 16]
#  [17 18]], shape=(2, 2), dtype=int32)

print(b)
# tf.Tensor(
# [[15 16 15 16 15 16]
#  [17 18 17 18 17 18]], shape=(2, 6), dtype=int32)

print(c)
# tf.Tensor(
# [[15 16 15 16]
#  [17 18 17 18]
#  [15 16 15 16]
#  [17 18 17 18]
#  [15 16 15 16]
#  [17 18 17 18]], shape=(6, 4), dtype=int32)

输入的 a 是一个 2x2 的矩阵，tf.tile(a, [1, 3]) 里的 [1, 3] 表示在第一个维度上把输入的tensor重复1遍，再在第二个维度上把输入的tensor重复3遍。在本例中，第一个维度就是行，第二个维度就是列，因此 b 就变成了 2x6 的矩阵。

注意：tf.tile() 里的第2个参数，例如 [1, 3]，里面有两个元素，它必须与输入的 tensor 的维度一样（2维），如果输入的 tensor 是3维的，那么 tf.tile() 的第2个参数里也必须有3个元素，例如 [2, 3, 5]，否则会报类似于下面的错：

ValueError: Shape must be rank 3 but is rank 1 for 'Tile_1' (op: 'Tile') with input shapes

张量的数学运算

标量运算

张量的数学运算符可以分为标量运算符、向量运算符、以及矩阵运算符。

加减乘除乘方，以及三角函数，指数，对数等常见函数，逻辑比较运算符等都是标量运算符。

标量运算符的特点是对张量实施逐元素运算。

有些标量运算符对常用的数学运算符进行了重载，支持+， - ，*， /, **, // , %等运算符。

并且支持类似numpy的广播特性。

标量运算和torch和numpy一致，都支持广播，广播的两个原则是：

两个数组各维度大小从后往前比对均一致（从低维到高维）

2.两个数组存在一些维度大小不相等时，其中有一个数组的该维度大小为1
[4,3] 和 [4]无法广播，因为从后往前，4和3不相等且都不为1

[4,1] 和 [4]可以广播，因为从后往前，4和1不等，但后者该维度为1，复制为4即可。然后后者再将[4]扩展为[4,4]，最后形状为[4,4]

[4,1,1,3] 和 [4,3] 可以广播，理由是前者 [1,3] 复制为[4,3]，然后后者[4,3]自动扩维为[4,1,4,3]，最终形状为[4,1,4,3]

同理[4,1,1,3]和[4,3]也可以广播，最终结果为[4,1,4,3] 
(注意最终结果不是[4,1,1,3]，一定是从后往前广播，这里一些新手可能认为结果是[4,1,1,3])

[4,3,3]和[4,2,3,3]无法广播，因为从后往前2和4不相等且都不为1
(这里新手也可能认为[4,3,3]可以自动广播为[4,2,3,3]，实际上不行，正确的做法是将tf.exp_dim([4,3,3],1))为[4,1,3,3]后再和[4,2,3,3]广播)
广播的几种情况是：

1、如果张量的维度不同，将维度较小的张量进行扩展，等效于tf.expand_dim + tf.tile两个过程，直到两个张量的维度都一样。
2、如果两个张量在某个维度上的长度是相同的，或者其中一个张量在该维度上的长度为1，那么我们就说这两个张量在该维度上是相容的。
3、如果两个张量在所有维度上都是相容的，它们就能使用广播。
4、广播之后，每个维度的长度将取两个张量在该维度长度的较大值。
5、在任何一个维度上，如果一个张量的长度为1，另一个张量长度大于1，那么在该维度上，就好像是对第一个张量进行了复制，等效于tf.tile。

a = tf.constant([[1,2],[3,4]])
b = tf.constant([[2,0],[0,2]])
a*b  #等价于tf.multiply(a,b)

# 举个利用广播的例子来计算一个user和多个item的embedding的内积
user_embedding = tf.random_normal([3])
# tf.Tensor([0.53647536 0.2574643  1.6876464 ], shape=(3,), dtype=float32)
item_embedding = tf.random_normal([3,3])
# tf.Tensor(
# [[ 0.18329209 -0.33839703 -0.9801966 ]
#  [-0.5128904   0.39472547 -1.079079  ]
#  [-1.2373055   0.9422188   0.52374583]], shape=(3, 3), dtype=float32)

print(user_embedding*item_embedding)
# tf.Tensor(
# [[ 0.09833169 -0.08712515 -1.6542252 ]
#  [-0.27515307  0.10162771 -1.8211038 ]
#  [-0.6637839   0.24258769  0.8838978 ]], shape=(3, 3), dtype=float32)


# 幅值裁剪
x = tf.constant([0.9,-0.8,100.0,-20.0,0.7])
y = tf.clip_by_value(x,clip_value_min=-1,clip_value_max=1)
# [0.9 -0.8 1 -1 0.7]

向量运算

向量运算符只在一个特定轴上运算，将一个向量映射到一个标量或者另外一个向量。

与numpy和torch不同的是，在tf中向量运算符都要以reduce开头。

其他用法与numpy和torch一致

a = tf.range(1,10)

print(tf.reduce_sum(a))
print(tf.reduce_mean(a))
print(tf.reduce_max(a))
print(tf.reduce_min(a))
print(tf.reduce_prod(a))
# 45
# 5
# 9
# 1
# 362880

#张量指定维度进行reduce
b = tf.reshape(a,(3,3))
# tf.Tensor(
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]], shape=(3, 3), dtype=int32)

# 按第二维度相加，并保持向量维度不变
print(tf.reduce_sum(b, axis=1, keepdims=True))
# tf.Tensor(
# [[ 6]
#  [15]
#  [24]], shape=(3, 1), dtype=int32)

# 按第一维度相加，并保持向量维度不变
print(tf.reduce_sum(b, axis=0, keepdims=True))
# tf.Tensor([[12 15 18]], shape=(1, 3), dtype=int32)

矩阵运算

#矩阵乘法
a = tf.constant([[1,2],[3,4]])
b = tf.constant([[2,0],[0,2]])
a@b  #等价于tf.matmul(a,b)
# tf.matmul(a,b) 等价于 tf.tensordot(a,b,axes=1)

#矩阵转置
a = tf.constant([[1,2],[3,4]])
tf.transpose(a)

模块、层和模型

模块tf.Module，层 tf.keras.layers.Layer和模型tf.keras.Model

他们的关系是：

tf.keras.layers.Layer继承自tf.Module，而tf.keras.Model继承自tf.keras.layers.Layer
所以tf.Module是层 tf.keras.layers.Layer和模型tf.keras.Model的基类

tf2中最基本且最基础的基类就是tf.Module，本质上tf.Module（包括了其子类层 tf.keras.layers.Layer和模型tf.keras.Model）都是一种函数变化，将输入X通过tf.Module中的神经网络参数（矩阵）和运算逻辑转变后return该Module的输出y。

本质上tf.Module与python的类没有什么区别，其核心功能就是调用实例名（即__call__()方法），而__call__()方法就是一个封装从输入到输出的一个函数罢了。

更直白的用深度学习的话来说，tf.Module就两步：创建实例，调用实例

创建实例其实就是初始化了变化所需的矩阵的大小

调用实例其实就是运用矩阵变换将输入转变为该module的输出

tf.Module创建实例：

simple_module =SimpleModule(in_features=3, out_features=3)

tf.Module创建实例（假定输入x = tf.constant([[2.0, 2.0, 2.0]])）：

x = simple_module(x)

实际上是调用了simple_module.__call__()，执行了里面定义的运算逻辑

这样，就实现了将输入x转变为输出x的过程

所以继承tf.Module的根本原因只不过是tf团队对tf.Module封装了一些好用的基本方法更有利于保存和取出模型以及模型中的参数。

`tf.Module`

TensorFlow提供了一个基类tf.Module，通过继承它构建子类，我们不仅可以获得以上的自然而然，而且可以非常方便地管理变量，还可以非常方便地管理它引用的其它Module，最重要的是，我们能够利用tf.saved_model保存模型并实现跨平台部署使用。

使用方式

继承tf.Module都要实现__init__()，__call()__这两个方法。

__init__()：实例创建时执行，创建tf.Variable（多为矩阵的参数bias）

__call__()：调用时会被执行，实现运算逻辑

class SimpleModule(tf.Module):
  def __init__(self, name=None):
    super().__init__(name=name)
    # `tf.Variable`创建放在类的初始化方法（神经网络参数（矩阵以及bias）放在初始化方法中）
    self.a_variable = tf.Variable(5.0, name="train_me")
    self.non_trainable_variable = tf.Variable(5.0, trainable=False, name="do_not_train_me")
  def __call__(self, x):
    # 函数的运算逻辑放在`__call__(self, input)`中
    return self.a_variable * x + self.non_trainable_variable

simple_module = SimpleModule(name="simple")

simple_module(tf.constant(5.0)) #等效于simple_module.__call__(tf.constant(5.0))
# <tf.Tensor: shape=(), dtype=float32, numpy=30.0>

__call__()的作用是使实例能够像函数一样被调用

即 a()等效于a.__call__()

所以上述的simple_module(tf.constant(5.0))其实就等效于simple_module.__call__(tf.constant(5.0))

通过将 tf.Module子类化，将自动收集分配给该tf.Module中的任何 tf.Variable，也可以通过module.trainable_variables和module.variables获取到Module中的可训练参数和所有参数，以及其所包含的所有子Module。这样，就可以方便的保存和加载变量。

# All trainable variables
print("trainable variables:", simple_module.trainable_variables)
# Every variable
print("all variables:", simple_module.variables)
# all sub Module
print("Submodules:", my_model.submodules)

例子

通过引用和管理其他的tf.Module，我们可以创建 tf.Module的集合（搭积木）。

下面是一个由基础Dense模块组成的两层线性层模型SequentialModule的示例。

首先是一个密集（线性）层：

class MyDense(tf.Module):
  def __init__(self, in_features, out_features, name=None):
    super().__init__(name=name)
    self.w = tf.Variable(
      tf.random.normal([in_features, out_features]), name='w')
    self.b = tf.Variable(tf.zeros([out_features]), name='b')
  def __call__(self, x):
    y = tf.matmul(x, self.w) + self.b
    return tf.nn.relu(y)

随后是完整的模型，此模型将创建并应用两个层实例：

class SequentialModule(tf.Module):
  def __init__(self, name=None):
    super().__init__(name=name)
 # 初始化`Dense`模块，规定Dense中的参数形状大小
    self.dense_1 = MyDense(in_features=3, out_features=3)
    self.dense_2 = MyDense(in_features=3, out_features=2)

  def __call__(self, x):
    x = self.dense_1(x) # 调用了self.dense_1的__call__方法
    return self.dense_2(x)

# You have made a model!
my_model = SequentialModule(name="the_model")

# Call it, with random results
print("Model results:", my_model(tf.constant([[2.0, 2.0, 2.0]])))
# Model results: tf.Tensor([[8.111373 0.      ]], shape=(1, 2), dtype=float32)

print("variables:", my_model.variables)
# variables: (<tf.Variable 'b:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>, <tf.Variable 'w:0' shape=(3, 3) dtype=float32, numpy=
# array([[ 0.871796  ,  0.04100253,  1.6504226 ],
#        [-0.4237731 , -2.6332445 ,  1.8764867 ],
#        [ 0.8134965 ,  2.0158744 , -1.4425671 ]], dtype=float32)>, <tf.Variable 'b:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)>, <tf.Variable 'w:0' shape=(3, 2) dtype=float32, numpy=
# array([[-0.84247786,  0.19298597],
#        [-1.1047211 ,  0.7861384 ],
#        [-1.013045  ,  1.0758061 ]], dtype=float32)>)

print("sub modules:", my_model.submodules)
# sub modules: (<__main__.Dense object at 0x000001961DF29F10>, <__main__.Dense object at 0x0000019627F8C0D0>)

`tf.keras.layers.Layer`

使用方法

继承tf.keras.layers.Layer都要实现__init__()，build(), call()这三个方法

从直观上理解，似乎__init__()和build()函数都在对Layer进行初始化，都初始化了一些成员函数，而call()函数则是在该layer被调用时执行。

其与tf.Module的区别在于两点：

tf.Module的__call__()方法变为了Layer中的call()方法

原因是Keras 层有自己的__call__()方法，在__call__()中会进行一系列的前置操作（如调用build()方法）后然后再调用 call()。

所以这两者是几乎等价的，在自己使用时不会看到功能上的任何变化，只需按照原来在tf.Module中定义运算逻辑的方式原封不动的放在Layer中的call()中。
多了一个build()方法。build()方法并不能被我们自己调用，其用于动态的获取并利用input_shape

因为build()方法并不能被我们自己调用，所以其有固定的形式：build(self, input_shape)

其只能接受一个位置参数input_shape，没有参数或多个参数都会报错

input_shape是动态获取的，当第一次调用call(input)时，input_shape即为input.shape

一句话来说：

原先tf.Module的__init__()必须固定计算需要的所有参数，和input无关。

而现在Layer的__init__()固定一部分的args，而在build()动态地从input.shape获取一些args最终构建矩阵。

注意：build()至始至终仅会被调用一次，是在call()函数第一次执行时会被调用一次，所以后续input的shape和第一次input的shape不匹配时会报错

例子

基于tf.Module和Layer中的不同，我们再以Layer的方式写一遍Dense层

这时可以根据输入的大小灵活地重写上面的 Dense 层:

class FlexibleDense(tf.keras.layers.Layer):
  # Note the added `**kwargs`, as Keras supports many arguments
  def __init__(self, out_features, **kwargs):
    super().__init__(**kwargs)
    # 固定的参数放在__init__中初始化, 这里固定了输出的维度
    self.out_features = out_features

  def build(self, input_shape):  # Create the state of the layer (weights)
    # 从input中动态获取的参数放在build中，这里动态获取input的第二维
    print(f"build方法中自动能拿到input_shape：{input_shape}")
    self.w = tf.Variable(
      tf.random.normal([input_shape[-1], self.out_features]), name='w')
    self.b = tf.Variable(tf.zeros([self.out_features]), name='b')
    super().build(input_shape) # must be add in the end, 相当于设置self.built = True
    # 理论上可以不加，因为我们不会直接调用build(), 但是防止有人无聊直接调用build方法，那加上最后一句可以保证其只调用一次self.build()

  def call(self, inputs):  # Defines the computation from inputs to outputs
    return tf.matmul(inputs, self.w) + self.b
 
  def get_config(self):
    # get_config方法返回一个参数字典，只有这个参数字典中的参数才会被保存成h5模型
    # 因此如果我们自定义了一些超参数并且想保存到模型的配置中，必须重写这个方法确保自定义超参数被添加在了字典中
    base_config = super().get_config()
    config.update({...自定义参数列表}) # 如{'unit': self.units}
    return config

# Create the instance of the layer
flexible_dense = FlexibleDense(out_features=1)
# 此时模型尚未构建，因此没有变量: flexible_dense.variables为[]

# 调用该函数会分配大小适当的变量。
print("Model results:", flexible_dense(tf.constant([[2.0, 2.0, 2.0], [3.0, 3.0, 3.0]])))
# build方法中自动能拿到input_shape：(2, 3)
# Model results: tf.Tensor(
# [[-3.4673862]
#  [-5.2010794]], shape=(2, 1), dtype=float32)

# 由于仅调用一次 build，因此如果输入形状(input_shape[-1])与层的变量不兼容，输入将被拒绝。
# 此处w的shape[0] = 3， 而输入input_shape[-1] = 2不匹配，直接报错
print("Model results:", flexible_dense(tf.constant([[2.0, 2.0], [3.0, 3.0]])))
# Failed: Exception encountered when calling layer "flexible_dense" (type FlexibleDense).
# Matrix size-incompatible: In[0]: [1,4], In[1]: [3,3] [Op:MatMul]

# input_shape[0]不同没事，只要保证input_shape[-1]一样即可
print("Model results:", layer1(tf.constant([[2.0, 2.0, 2.0], [3.0, 3.0, 3.0], [3.0, 3.0, 3.0]])))
# Model results: tf.Tensor(
# [[6.261455]
#  [9.392181]
#  [9.392181]
#  [9.392181]], shape=(4, 1), dtype=float32)
# 没有print出input_shape，说明没有调用self.build()

原理解析

以上执行flexible_dense(tf.constant([[2.0, 2.0, 2.0], [3.0, 3.0, 3.0]]))本质上是调用__call__(tf.constant([[2.0, 2.0, 2.0], [3.0, 3.0, 3.0]]))，在__call__()主要执行了三个步骤：

获取build(self, input_shape)中的input_shape参数，input_shape = input.shape
调用self.build(input_shape)初始化矩阵中的参数
调用self.call(input)返回结果

以下是自定义的类My_Layer，效果等效于tf.keras.layers.Layer

如何保证只调用一次build方法？

需要使用标志符self.built表示是否调用过self.build()

class My_Layer():
    def __init__(self):
        #  标志符`self.built`表示是否调用过`self.build()`以保证只调用一次`self.build()`
        self.built = False
    
    def __call__(self, input):
        input_shape = input.shape
        
        # 如果已经构建了，则跳过
        if not self.built:
            self.build(input_shape)
            self.built = True
            
        return self.call(input)
    
    def build(self, input_shape):
        # Only record the build input shapes of overridden build methods.
        # 基础原生的build方法只是用来记录一下输入的形状，并且设置self.built = True
        self._build_input_shape = input_shape
        self.built = True
    
    def call(self, input):
        pass
    

class My_Flexible_Dense(My_Layer):
    def __init__(self, out_features):
        super().__init__()
        self.out_features = out_features
        
    def build(self, input_shape):
        print(f"build方法中自动能拿到input_shape：{input_shape}")
        self.w = tf.Variable(
        tf.random.normal([input_shape[-1], self.out_features]), name='w')
        self.b = tf.Variable(tf.zeros([self.out_features]), name='b')
        super().build(input_shape)
    
    def call(self, inputs):  # Defines the computation from inputs to outputs
        return tf.matmul(inputs, self.w) + self.b
    
flexible_dense = My_Flexible_Dense(out_features=1)
print("Model results:", flexible_dense(tf.constant([[2.0, 2.0, 2.0], [3.0, 3.0, 3.0]])))
# 结果与上面相同
print("Model results:", flexible_dense(tf.constant([[2.0, 2.0], [3.0, 3.0]])))
# 报错
print("Model results:", layer1(tf.constant([[2.0, 2.0, 2.0], [3.0, 3.0, 3.0], [3.0, 3.0, 3.0]])))
# 结果与上面相同

`tf.keras.Model`

用以上的Layer可以将模型定义为嵌套的 Keras 层，其已经能够完成所有内容了

但是，Keras 还提供了称为tf.keras.Model 的全功能模型类。它继承自tf.keras.layers.Layer，因此 tf.keras.Model 支持以同样的方式使用、嵌套和保存。

tf.keras.Model 还具有额外封装的功能以及极其强大的API，这使它们可以轻松的实现以下功能：

查看模型概要model.summary()(注意其自带了print()，返回None)
编译模型model.compile(optimizer = .., loss = .., metric = ..)

和训练模型model.fit(x_train,y_train, batch_size, epochs, validation_split = 0.2)
预测model.predict(x)
评估model.evaluate(x = x_test,y = y_test)，返回模型y_predict和y_test之间的loss（指标为compile时使用的loss）
保存model.save('model_name.h5')
加载model = tf.keras.models.load_model('model_name.h5')

…

甚至在多台机器上进行训练。

其也拥有其他很多属性如model.layers()查看模型中的各层对象

使用方法

继承tf.Module都要实现__init__()，call()这两个方法。

其实现计算的逻辑是引用或嵌套其他的tf.Module或tf.keras.layers.Layer

__init__()进行初始化，call()实现运算逻辑

class MyModel(tf.keras.Model):

  def __init__(self):
    super(MyModel, self).__init__()
    self.dense1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
    self.dense2 = tf.keras.layers.Dense(5, activation=tf.nn.softmax)

  def call(self, inputs):
    x = self.dense1(inputs)
    return self.dense2(x)

model = MyModel()

例子

使用几乎相同的代码定义上面的 SequentialModule，其也需要将 __call__() 转换为 call() 。

class MySequentialModel(tf.keras.Model):
  def __init__(self, name=None, **kwargs):
    super().__init__(**kwargs)

    self.dense_1 = FlexibleDense(out_features=3)
    self.dense_2 = FlexibleDense(out_features=2)
  def call(self, x):
    x = self.dense_1(x)
    return self.dense_2(x)

# You have made a Keras model!
my_sequential_model = MySequentialModel(name="the_model")

# Call it on a tensor, with random results
print("Model results:", my_sequential_model(tf.constant([[2.0, 2.0, 2.0]])))
# Model results: tf.Tensor([[-1.4071871 -1.8095387]], shape=(1, 2), dtype=float32)

# 同样可以使用tf.Module中的所有属性
my_sequential_model.variables
my_sequential_model.submodules

# 特有API
my_functional_model.summary()
# Model: "my_sequential_model"
# _________________________________________________________________
#  Layer (type)                Output Shape              Param #   
# =================================================================
#  flexible_dense (FlexibleDen  multiple                 12        
#  se)                                                             
                                                                 
#  flexible_dense_1 (FlexibleD  multiple                 8         
#  ense)                                                           
                                                                 
# =================================================================
# Total params: 20
# Trainable params: 20
# Non-trainable params: 0
# _________________________________________________________________

# 特有属性
print(my_sequential_model.layers)
# [<tf_Layer.FlexibleDense object at 0x000001CC59E97FA0>, <tf_Layer.FlexibleDense object at 0x000001CC66321460>]

注意这种情况下必须model被build后的才能执行model.summary()，即model.summary()必须要在model(x)喂入数据后才能查看

因为还没有确定输入的形状，这种情况下其并也不会自动为输入维度创建一个占位符None

如果在my_sequential_model(tf.constant([[2.0, 2.0, 2.0]]))前使用my_functional_model.summary()会报错

ValueError: This model has not yet been built. Build the model first by calling build() or by calling the model on a batch of data.

`tf.keras.Model`的其他形式

以上的tf.keras.Model类是遵循python类的构建方式，其非常规范

keras中对tf.keras.Model又进行了一些封装使其有更简便的创建方式如Sequential API和函数式API，其运作方式等价于以上的python规范式创建方法

Sequential序列模型

Sequential序列模型是按照层的顺序构建的，其适用于每个层恰好有一个输入张量和一个输出张量

Keras.Sequential(layers=None, name=None)

layers: 一个list, 里面包含继承tf.Module（包括tf.keras.layers.Layer，tf.keras.Model)的子类

name: 模型的名字，model.summay()

# 列表包含的全为tf.keras.layers.Layer
model = keras.Sequential(
    [ FlexibleDense(out_features=3), # FlexibleDense为`tf.keras.layers.Layer`的子类
     FlexibleDense(out_features=2)
    ], 
    name = "my_sequential_model"
)


# Call model on a test input
print("Model results:", model(tf.constant([[2.0, 2.0, 2.0]])))
model.summary()
# Model: "my_sequential_model"
# _________________________________________________________________
#  Layer (type)                Output Shape              Param #   
# =================================================================
#  flexible_dense (FlexibleDen  (1, 3)                   12        
#  se)                                                             
                                                                 
#  flexible_dense_1 (FlexibleD  (1, 2)                   8         
#  ense)                                                           
                                                                 
# =================================================================
# Total params: 20
# Trainable params: 20
# Non-trainable params: 0
# _________________________________________________________________
# 特有API
my_functional_model.summary()
# Model: "my_sequential_model"
# _________________________________________________________________
#  Layer (type)                Output Shape              Param #   
# =================================================================
#  flexible_dense (FlexibleDen  multiple                 12        
#  se)                                                             
                                                                 
#  flexible_dense_1 (FlexibleD  multiple                 8         
#  ense)                                                           
                                                                 
# =================================================================
# Total params: 20
# Trainable params: 20
# Non-trainable params: 0
# _________________________________________________________________

# 特有属性
print(my_sequential_model.layers)
# [<tf_Layer.FlexibleDense object at 0x000001CC59E97FA0>, <tf_Layer.FlexibleDense object at 0x000001CC66321460>]

注意这种情况下必须model被build后的才能执行model.summary()，即model.summary()必须要在model(x)喂入数据后才能查看

因为还没有确定输入的形状，这种情况下其并也不会自动为输入维度创建一个占位符None

如果在my_sequential_model(tf.constant([[2.0, 2.0, 2.0]]))前使用my_functional_model.summary()会报错

ValueError: This model has not yet been built. Build the model first by calling build() or by calling the model on a batch of data.

`model.add()`

也可通过model.add()方法往Layer_list中加层

model.add()方法接受的参数也只能是继承tf.Module（包括tf.keras.layers.Layer，tf.keras.Model)的子类

model = keras.Sequential() # 创建一个空的层列表
model.add(MyDense(in_features = 3, out_features=3))
model.add(FlexibleDense(out_features = 2))

# 结果和上面等效
print("Model results:", model(tf.constant([[2.0, 2.0, 2.0]])))
model.summary()

# Model: "sequential"
# _________________________________________________________________
#  Layer (type)                Output Shape              Param #   
# =================================================================
#  module_wrapper (ModuleWrapp  (1, 3)                   12        
#  er)                                                             
                                                                 
#  flexible_dense (FlexibleDen  (1, 2)                   8         
#  se)                                                             
                                                                 
# =================================================================
# Total params: 20
# Trainable params: 20
# Non-trainable params: 0
# _________________________________________________________________

注意这种情况下必须model被build后的才能执行model.summary()，即model.summary()必须要在model(x)喂入数据后才能查看

因为还没有确定输入的形状，这种情况下其并也不会自动为输入维度创建一个占位符None

如果在my_sequential_model(tf.constant([[2.0, 2.0, 2.0]]))前使用my_functional_model.summary()会报错

ValueError: This model has not yet been built. Build the model first by calling build() or by calling the model on a batch of data.

`tf.keras.Input(shape)`

然而，当以add()方式构建序列模型时，能够不用喂数据就能显示到目前为止的模型摘要，包括当前输出形状非常重要。

在这种情况下，可以通过构建输入节点tf.keras.Input(shape)作为占位符传递给模型并启动模型，以便它从一开始就知道其输入形状

这种情况下，不用调用模型就可以直接查看模型的概况

model = keras.Sequential() # 创建一个空的层列表
model.add(keras.Input(3))
model.add(MyDense(in_features = 3, out_features=3))
model.add(FlexibleDense(out_features = 2))
model.summary()
# Model: "sequential"
# _________________________________________________________________
#  Layer (type)                Output Shape              Param #   
# =================================================================
#  module_wrapper (ModuleWrapp  (None, 3)                12        
#  er)                                                             
                                                                 
#  flexible_dense (FlexibleDen  (None, 2)                8         
#  se)                                                             
                                                                 
# =================================================================
# Total params: 20
# Trainable params: 20
# Non-trainable params: 0
# _________________________________________________________________

请注意，输入节点不显示为模型的一部分。因为它并不是一个层

print(keras.Input(3))

的结果是KerasTensor(type_spec=TensorSpec(shape=(None, 3), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'")

从结果可以看出输入的shape是忽略第一维的，即忽略了batch_size

输入数据维度为(1, 3)则只需填写(3)，创建出来的占位符input为(None, 3)

简化

另一种简化就是将input节点通过第一层的tf.keras.layers.Layer的子类input_shape参数传入（因为不是tf.keras.layers.Layer的子类并没有实现input_shape参数创建input节点）

这种方式是keras中最常用的方式

model = keras.Sequential() # 创建一个空的层列表
# 等效于model.add(keras.Input(3))
model.add(FlexibleDense(out_features = 2, input_shape = [3])) 
model.add(MyDense(in_features = 2, out_features=1))
model.summary()

函数式API式

Keras 函数式 API 是一种比tf.keras.SequentialAPI 更加灵活的模型创建方式。

函数式 API 可以处理具有非线性拓扑的模型、具有共享层的模型，以及具有多个输入或输出的模型。

创建步骤：

必须使用tf.keras.Input(shape)创建一个输入节点

注意这个shape是忽略第一维的，即忽略了batch_size

输入数据维度为(1, 3)则只需填写(3)
使用输入和输出的方式搭建模型
最后，通过在层计算图中指定模型的输入和输出来创建 Model

# 要使用函数式 API 构建此模型，必须要先使用tf.keras.Input创建一个输入节点
inputs = tf.keras.Input(shape=[3])

# 搭建模型
x = FlexibleDense(3)(inputs)
x = FlexibleDense(2)(x)

# 指定模型的输入和输出来创建 Model
model = tf.keras.Model(inputs=inputs, outputs=x)

model.summary()
# 结果与上面相同
# Model: "model"
# _________________________________________________________________
#  Layer (type)                Output Shape              Param #   
# =================================================================
#  input_1 (InputLayer)        [(None, 3)]               0         
                                                                 
#  flexible_dense_2 (FlexibleD  (None, 3)                12        
#  ense)                                                           
                                                                 
#  flexible_dense_3 (FlexibleD  (None, 2)                8         
#  ense)                                                           
                                                                 
# =================================================================
# Total params: 20
# Trainable params: 20
# Non-trainable params: 0
# _________________________________________________________________

print("Model results:", model(tf.constant([[2.0, 2.0, 2.0]])))

由于预先创建了输入节点（占位符），可以在喂入数据前就查看model.summary()

自定义损失函数和评估指标

有时候因为某些原因需要模型自定义自定义损失函数和评估指标

如：

输入输出的数据格式的原因，当是transformer的翻译任务时，需要对句子进行padding，而在label中padding的词语的loss是不计算的，acc也是不计算的。所以我们需要根据padding来自定义loss和metric

以下例子就是用transfomer的例子

自定义损失函数

自定义损失函数有两种形式：函数式定义和类形式定义

函数式定义

def sparse_loss_with_padding(y_true, y_pred):
    '''
    :param true: shape:(batch_size, target_seq_len)
    :param pred: shape: (batch_size, target_seq_len, target_vocab_size)
    :return:
    '''
    mask = 1-tf.cast(tf.math.equal(y_true, 0),tf.float32) # shape: (batch_size, target_seq_len)
    loss_function = tf.losses.SparseCategoricalCrossentropy(from_logits=False, reduction='none')
    loss = loss_function(y_true, y_pred) # shape: (batch_size, target_seq_len)
    return tf.reduce_mean(tf.multiply(loss, mask))

函数式定义很简单，只需要给定y_true和y_pred即可，其中y_true为模型的输出，y_pred为真实的标签

可是函数定义虽然简单，有个致命的缺点：无法接受其他的超参数，只允许接受y_true和y_pred两个参数

如果有另外的超参则需要用类型是定义

类形式定义

class SparseLossWithPadding(Loss):
    # 需要定义的超参在**kwargs定义即可，如threshold = 1.0
    def __init__(self, **kwargs): # def __init__(self,threshold = 1.0, **kwargs):
        super().__init__(**kwargs)
        # self.threshold = threshold
        
    def call(self, y_true, y_pred):
        mask = 1-tf.cast(tf.math.equal(y_true, 0),tf.float32) # shape: (batch_size, target_seq_len)
        loss_function = tf.losses.SparseCategoricalCrossentropy(from_logits=False, reduction='none')
        loss = loss_function(y_true, y_pred) # shape: (batch_size, target_seq_len)
        return tf.reduce_mean(tf.multiply(loss, mask))
    
    def get_config(self):
    # get_config方法返回一个参数字典，只有这个参数字典中的参数才会被保存成h5模型
    # 因此如果我们自定义了一些超参数并且想保存到模型的配置中，必须重写这个方法确保自定义超参数被添加在了字典中
    base_config = super().get_config()
    config.update({...自定义参数列表}) # 如{'threshold': self.threshold}
    return config

和前面自定义层和模型相似，超惨部分在__init__给出，计算逻辑在call方法中实现

同时，如果有超惨需要保存的在get_config()中更新参数字典

注意：

一般tf.keras自带实现的api中以下划式命名的为函数式实现如（tf.keras.metric.binary_accuracy），驼峰式命名的为类形式实现（tf.keras.metric.BinaryAccuracy）

他们在model.compile()作为loss的输入时也有区别：

函数式：model.compile(loss = binary_accuracy) ，即方法直接给方法名

类式：model.compile(loss = BinaryAccuracy())，即类需要实例化

自定义评估指标

其实可以将上面自定义loss直接拿来当作自定义metric，但是有个核心的问题就是：

我们不管是计算loss还是metric都希望是针对所有样本而言的平均loss和metric，这样才不会受到批次的划分影响

自定义loss计算方式并不是样本维度的平均loss (total_loss / total_sample_num)，

而是分为两步：

先对每个批次维度的求平均loss： batch_mean_loss = batch_toal_loss / batch_total_sample_num
再对每个批次的平均loss求平均得到总体样本的loss : (batch1_mean_loss + batch2_mean_loss...batchn_mean_loss) / n

自定义loss计算方式在每个批次的batch_total_sample_num相同时可以等价于样本维度的平均loss (batch1_toal_loss / batch1_total_sample_num + batch2_toal_loss / batch2_total_sample_num + ...batchn_toal_loss / batchn_total_sample_num) / n

= (batch1_toal_loss / batch + batch2_toal_loss...batchn_toal_loss) / batch_total_sample_num * n

= total_loss / total_sample_num

而在计算batch_mean_loss都是用tf.reduce_mean(batch_loss)，他们的batch_total_sample_num一般都相同，所以loss的批次平均loss的平均可以等价于整体loss

但是有很多情况batch_total_sample_num在每个batch中是不同的

有时候我们需要padding，我们不希望计算padding的acc。而这种padding的情况，每个批次中的样本数量是不同的（每条样本中的seq_len都不同）。

有时我们需要计算precision，precision是预测为1的标签有多少是真正为1的，TP / (TP + FP)，

其分母不再是batch_total_sample_num，而是pred中为1的样本总数。每个batch中预测为1的样本总数肯定不同。

在这些情况下，批次平均metric的平均不能等价于整体metric。

道理很简单，每个批次中的样本数量不一样时

批次平均metric的平均：（0/5 + 3/3）/ 2 = 0.5

整体metric：(0+3)/(3+5) = 0.8

此时，我们需要一个新的计算方式即流动计算方式，要实时记录总体样本的**total_metric(分子)和count(分母)**

class AccWithPadding(Metric):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        # 初始化必要参数: 需要迭代更新的参数, 注意所有的数据类型都要对齐
        self.total = self.add_weight("total", initializer= "zeros", dtype=tf.int32)
        self.count = self.add_weight("count", initializer= "zeros", dtype=tf.int32)

    def update_state(self, y_true, y_pred, sample_weight = None):
        # 注意这里必须要加上sample_weight = None这个参数，否则会报错
        mask = 1 - tf.cast((y_true == 0), tf.int32)
        y_pred = tf.argmax(y_pred, axis = -1, output_type=tf.int32)
        y_true = tf.cast(y_true, tf.int32)
        value = tf.cast(y_pred == y_true, tf.int32)
        self.total.assign_add(tf.reduce_sum(value  * mask))
        self.count.assign_add(tf.reduce_sum(mask))
    
    def result(self):
        return self.total/self.count

其中

__init__：初始化必要的记录参数（一般为self.total和self.count）

update_state(y_true, y_pred)：根据每个batch的y_true, y_pred来更新self.total和self.count

result: 根据参数计算最后metric的值（一般为self.total/self.count）

其中使用类的实例直接调用时（即__call__）：

会先执行update_state(y_true, y_pred)，再执行result()

原因是在metric类中：

class Metric:
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
       
    def update_state(self, y_true, y_pred, sample_weight = None):
        # 注意这里必须要加上sample_weight = None这个参数，否则会报错
        pass
    
    def result(self):
        pass
    
    def __call__(self, y_true, y_pred, sample_weight = None):
        # 这一步解释了为什么必须要加上sample_weight = None这个参数，因为调用时用到了sample_weight
        self.update_state(y_true, y_pred, sample_weight = sample_weight)
        return self.result()

所以我们使用类的实例直接调用一次时即完成了一次metric更新，而且保证了这是目前为止的总体Metric

from tensorflow.keras.metrics import Precision

precision = Precision()
# predict为1的准确率4/5
print(precision([0,1,1,1,0,1,0,1], 
                [1,1,0,1,0,1,0,1])) 
# tf.Tensor(0.8, shape=(), dtype=float32)

# predict为1的准确率0/3
print(precision([0,1,0,0,1,0,1,1], 
                [1,0,1,1,0,0,0,0]))

# tf.Tensor(0.5, shape=(), dtype=float32)
# (4+0)/(3+5) = 0.5

使用方法：

类式：model.compile(loss = Precision())，即类需要实例化

padding

pad_sequences(sequences, 
              maxlen=None,
              dtype='int32',
              padding='pre',
              truncating='pre', 
              value=0.)

sequences：浮点数或整数构成的两层嵌套列表，如[[1,3,2],[1],[4,2]]，里层的每个列表的长度可以不一致
maxlen：None或整数，为序列的最大长度。大于此长度的序列将被截短，小于此长度的序列将在后部填0.
dtype：返回的numpy array的数据类型
padding：‘pre’或‘post’，确定当需要补0时，在序列的起始还是结尾补

truncating：‘pre’或‘post’，确定当需要截断序列时，从起始还是结尾截断

value：浮点数，此值将在填充时代替默认的填充值0

from tensorflow.keras.preprocessing.sequence import pad_sequences

sequence = [[3,2,6],
            [9,2,3,4,6],
            [1,2],
            [7,3,5]]

sequence_padding = pad_sequences(sequence, maxlen = 4, padding = 'post', truncating= 'post')
# [[3 2 6 0]
#  [9 2 3 4]
#  [1 2 0 0]
#  [7 3 5 0]]

# 后续可搭配tf.where实现attention计算时padding的功能
tf.where(sequence_padding == 0, -1e10 * tf.ones_like(sequence_padding,dtype=tf.float32), tf.zeros_like(sequence_padding, dtype=tf.float32))

屏蔽日志信息

tensorflow运行时会输出一大串的日志信息
眼花缭乱，用以下方法可以去除错误之外的日志信息（屏蔽通知信息和警告信息）

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# 0 : all messages are logged (default behavior)
# 1 : INFO messages are not printed
# 2 : INFO and WARNING messages are not printed
# 3 : INFO, WARNING, and ERROR messages are not printed

import tensorflow as tf