【tensorflow学习】tensorflow自定义层类和组合层

往阳光走

于 2022-09-27 11:53:29 发布

阅读量920

点赞数

文章标签： tensorflow 学习 python

本文链接：https://blog.csdn.net/weixin_52530284/article/details/127061402

版权

tensorflow 自定义层

通常机器学习的模型可以表示为简单层的组合与堆叠，使用TensorFlow中的tf.keras来构建模型。

扩展tf.keras.Layer类并实现

init：在其中执行所有与输入无关的初始化
build：在其中获得输入张量的形状，并可以进行其余初始化
call：在其中进行前向计算

class MyDenseLayer(tf.keras.layers.Layer): # 自定义层
  def __init__(self, num_outputs):
    super(MyDenseLayer, self).__init__()
    self.num_outputs = num_outputs

  def build(self, input_shape):
    self.kernel = self.add_weight("kernel",
                                  shape=[int(input_shape[-1]),
                                         self.num_outputs])

  def call(self, inputs):
    return tf.matmul(inputs, self.kernel)

init()

在__init__中创建变量需要明确指定创建变量所需的形状，但是在build中创建就可以根据层要运算的输入的形状启用变量创建。构造函数__init__是一个特殊的类实例方法，每当创建一个类的实例对象时，python解释器都会自动调用它。因此我们创建自定义层类的实例对象时需要确定num_outputs

layer = MyDenseLayer(10)

build()

创建完自定义层类对象layer后，定义一个[10，5]大小的全为1的矩阵作为输入，这时调用的就是build()函数。

input=tf.ones([10,5])
test = layer(input)  # Calling the layer `.builds` it.

可以看到buidl()函数接收一个input_shape参数，并作为add_weight的传入参数。

add_weight()

def add_weight(self,
                   name, 	# String, the name for the weight variable.
                   shape, 	# The shape tuple of the weight.
                   dtype=None, 	# The dtype of the weight.
                   initializer=None, 	# An Initializer instance (callable).
                   regularizer=None, 	# An optional Regularizer instance.
                   trainable=True, 	# A boolean, whether the weight should
                					   #be trained via backprop or not (assuming
                					   #that the layer itself is also trainable).
                   constraint=None): 	# An optional Constraint instance.
        #Adds a weight variable to the layer.
        # Returns
            #The created weight variable.

add_weight()是继承层Layer的方法，用于为变量添加权重，参数trainable代表该参数的权重是否为可训练权重; 若trainable==True时，会执行self._trainable_weights.append(weight).此处的build()只传入了名称与大小。
因为矩阵相乘的第一个矩阵的列要和第二个矩阵的行数相同，如(n,k)*(k,m)得到(n,m)，所以input_shape[-1]获得了第二个矩阵的行，再将输出数num_outputs作为列数得到kernel。

print(layer.kernel)

<tf.Variable 'my_dense_layer_13/kernel:0' shape=(5, 10) dtype=float32, numpy=
array([[-0.16188782, -0.21436638, -0.23779735, -0.5654684 ,  0.3930928 ,
        -0.4126814 ,  0.33268565,  0.18108195, -0.48489177, -0.08284235],
       [-0.02564168,  0.549053  ,  0.42339212,  0.3485728 , -0.0736267 ,
         0.5685448 ,  0.27400726,  0.59572273, -0.4207679 ,  0.19071192],
       [-0.41549557, -0.15215197, -0.07686222, -0.16538733,  0.1426844 ,
         0.26849395,  0.03620464,  0.07866323,  0.32265216, -0.15471825],
       [-0.01316679, -0.44710308,  0.2655834 ,  0.21193522,  0.5465316 ,
        -0.1434204 , -0.35253885, -0.43908924, -0.5106529 ,  0.2494039 ],
       [ 0.52400905, -0.5664908 ,  0.37424153,  0.507786  ,  0.5197385 ,
        -0.00330818,  0.03005803, -0.62411946,  0.2804129 , -0.2383785 ]],
      dtype=float32)>

最后层对象返回的是输入和权重的矩阵相乘结果

test = layer(input)  # Calling the layer `.builds` it.
test

<tf.Tensor: shape=(10, 10), dtype=float32, numpy=
array([[-0.09218282, -0.8310592 ,  0.74855745,  0.3374383 ,  1.5284207 ,
         0.27762878,  0.32041672, -0.20774078, -0.8132475 , -0.03582329],
       [-0.09218282, -0.8310592 ,  0.74855745,  0.3374383 ,  1.5284207 ,
         0.27762878,  0.32041672, -0.20774078, -0.8132475 , -0.03582329],
       [-0.09218282, -0.8310592 ,  0.74855745,  0.3374383 ,  1.5284207 ,
         0.27762878,  0.32041672, -0.20774078, -0.8132475 , -0.03582329],
       [-0.09218282, -0.8310592 ,  0.74855745,  0.3374383 ,  1.5284207 ,
         0.27762878,  0.32041672, -0.20774078, -0.8132475 , -0.03582329],
       [-0.09218282, -0.8310592 ,  0.74855745,  0.3374383 ,  1.5284207 ,
         0.27762878,  0.32041672, -0.20774078, -0.8132475 , -0.03582329],
       [-0.09218282, -0.8310592 ,  0.74855745,  0.3374383 ,  1.5284207 ,
         0.27762878,  0.32041672, -0.20774078, -0.8132475 , -0.03582329],
       [-0.09218282, -0.8310592 ,  0.74855745,  0.3374383 ,  1.5284207 ,
         0.27762878,  0.32041672, -0.20774078, -0.8132475 , -0.03582329],
       [-0.09218282, -0.8310592 ,  0.74855745,  0.3374383 ,  1.5284207 ,
         0.27762878,  0.32041672, -0.20774078, -0.8132475 , -0.03582329],
       [-0.09218282, -0.8310592 ,  0.74855745,  0.3374383 ,  1.5284207 ,
         0.27762878,  0.32041672, -0.20774078, -0.8132475 , -0.03582329],
       [-0.09218282, -0.8310592 ,  0.74855745,  0.3374383 ,  1.5284207 ,
         0.27762878,  0.32041672, -0.20774078, -0.8132475 , -0.03582329]],
      dtype=float32)>

组合层

定义自己的网络模型主要是定义所需的每一个层，组合一些层称为块，就像ResNet中每一个残差块都是卷积、批次归一化的组合，层和块可以相互嵌套和组合。比如官方文档中定义的ResNet残差块如下

class ResnetIdentityBlock(tf.keras.Model):
  def __init__(self, kernel_size, filters):
    super(ResnetIdentityBlock, self).__init__(name='')
    filters1, filters2, filters3 = filters

    self.conv2a = tf.keras.layers.Conv2D(filters1, (1, 1))
    self.bn2a = tf.keras.layers.BatchNormalization()

    self.conv2b = tf.keras.layers.Conv2D(filters2, kernel_size, padding='same')
    self.bn2b = tf.keras.layers.BatchNormalization()

    self.conv2c = tf.keras.layers.Conv2D(filters3, (1, 1))
    self.bn2c = tf.keras.layers.BatchNormalization()

  def call(self, input_tensor, training=False):
    x = self.conv2a(input_tensor)
    x = self.bn2a(x, training=training)
    x = tf.nn.relu(x)

    x = self.conv2b(x)
    x = self.bn2b(x, training=training)
    x = tf.nn.relu(x)

    x = self.conv2c(x)
    x = self.bn2c(x, training=training)

    x += input_tensor
    return tf.nn.relu(x)

从 keras.Model 继承了：Model.fit,Model.evaluate, and Model.save (see Custom Keras layers and models for details).

创建ResNet残差快实例对象

block = ResnetIdentityBlock(1, [1, 2, 3])

传入的参数为卷积大小（决定的是输出层特征尺寸）和卷积核个数（决定输出通道数）

print(block.layers)

[<tensorflow.python.keras.layers.convolutional.Conv2D object at 0x000002837E6700A0>, <tensorflow.python.keras.layers.normalization_v2.BatchNormalization object at 0x000002837E6B5160>, <tensorflow.python.keras.layers.convolutional.Conv2D object at 0x000002837E682820>, <tensorflow.python.keras.layers.normalization_v2.BatchNormalization object at 0x000002837E682EE0>, <tensorflow.python.keras.layers.convolutional.Conv2D object at 0x000002837E6CE910>, <tensorflow.python.keras.layers.normalization_v2.BatchNormalization object at 0x000002837E6CED90>]

调用build()函数

input = tf.random.normal([1, 2, 3, 3],mean=2,stddev=0.5)
resnet = block(input)

查看input和resnet长的是一样的，感觉很奇怪，不应该是经过了残差块就变了吗，发现是函数定义的training设置默认为False了，改成True之后再输出就发生变化了。

同样的，使用 tf.keras.Sequential只需更少的代码就可以完成实现，因为都是逐一地调用层，如果不是需要设计自己的和较复杂地模型，应该可以直接使用sequential一层一层垒起来即可，下面的代码和上面用自定义类实的是一样的。

my_seq = tf.keras.Sequential([tf.keras.layers.Conv2D(1, (1, 1),
                                                    input_shape=(
                                                        None, None, 3)),
                             tf.keras.layers.BatchNormalization(),
                             tf.keras.layers.Conv2D(2, 1,
                                                    padding='same'),
                             tf.keras.layers.BatchNormalization(),
                             tf.keras.layers.Conv2D(3, (1, 1)),
                             tf.keras.layers.BatchNormalization()])
my_seq(tf.zeros([1, 2, 3, 3]))

查看模型整体

调用以下任何一条语句都是一样的结果（除了名称不一样

block.summary()
my_seq.summary()

Model: "sequential_13"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_54 (Conv2D)           (None, None, None, 1)     4         
_________________________________________________________________
batch_normalization_54 (Batc (None, None, None, 1)     4         
_________________________________________________________________
conv2d_55 (Conv2D)           (None, None, None, 2)     4         
_________________________________________________________________
batch_normalization_55 (Batc (None, None, None, 2)     8         
_________________________________________________________________
conv2d_56 (Conv2D)           (None, None, None, 3)     9         
_________________________________________________________________
batch_normalization_56 (Batc (None, None, None, 3)     12        
=================================================================
Total params: 41
Trainable params: 29
Non-trainable params: 12
_________________________________________________________________