从头开始编写层和模型

准备

!pip3 install tensorflow==2.0.0a0
%matplotlib inline
import tensorflow as tf
from tensorflow import keras
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: tensorflow==2.0.0a0 in /usr/local/lib/python3.7/site-packages (2.0.0a0)
Requirement already satisfied: google-pasta>=0.1.2 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (0.1.4)
Requirement already satisfied: numpy<2.0,>=1.14.5 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.16.2)
Requirement already satisfied: astor>=0.6.0 in /Users/fei/Library/Python/3.7/lib/python/site-packages (from tensorflow==2.0.0a0) (0.7.1)
Requirement already satisfied: keras-preprocessing>=1.0.5 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.0.9)
Requirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (0.33.1)
Requirement already satisfied: tf-estimator-nightly<1.14.0.dev2019030116,>=1.14.0.dev2019030115 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.14.0.dev2019030115)
Requirement already satisfied: keras-applications>=1.0.6 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.0.7)
Requirement already satisfied: gast>=0.2.0 in /Users/fei/Library/Python/3.7/lib/python/site-packages (from tensorflow==2.0.0a0) (0.2.2)
Requirement already satisfied: six>=1.10.0 in /Users/fei/Library/Python/3.7/lib/python/site-packages (from tensorflow==2.0.0a0) (1.12.0)
Requirement already satisfied: protobuf>=3.6.1 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (3.7.0)
Requirement already satisfied: tb-nightly<1.14.0a20190302,>=1.14.0a20190301 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.14.0a20190301)
Requirement already satisfied: absl-py>=0.7.0 in /Users/fei/Library/Python/3.7/lib/python/site-packages (from tensorflow==2.0.0a0) (0.7.0)
Requirement already satisfied: termcolor>=1.1.0 in /Users/fei/Library/Python/3.7/lib/python/site-packages (from tensorflow==2.0.0a0) (1.1.0)
Requirement already satisfied: grpcio>=1.8.6 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.19.0)
Requirement already satisfied: h5py in /Users/fei/Library/Python/3.7/lib/python/site-packages (from keras-applications>=1.0.6->tensorflow==2.0.0a0) (2.9.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/site-packages (from protobuf>=3.6.1->tensorflow==2.0.0a0) (40.8.0)
Requirement already satisfied: markdown>=2.6.8 in /Users/fei/Library/Python/3.7/lib/python/site-packages (from tb-nightly<1.14.0a20190302,>=1.14.0a20190301->tensorflow==2.0.0a0) (3.0.1)
Requirement already satisfied: werkzeug>=0.11.15 in /Users/fei/Library/Python/3.7/lib/python/site-packages (from tb-nightly<1.14.0a20190302,>=1.14.0a20190301->tensorflow==2.0.0a0) (0.14.1)

层的内容

层内部包含了一些变量(权重)和计算方法(负责将输入转换为输出)。

class Linear(keras.layers.Layer):
    def __init__(self, units=32, input_dim=32):
        super(Linear, self).__init__()
        self.w = tf.Variable(initial_value=tf.random_normal_initializer()(shape=(input_dim, units), dtype='float32'))
        self.b = tf.Variable(initial_value=tf.zeros_initializer()(shape=(units, ), dtype='float32'))
        
    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b
    
x = tf.ones((2, 2))
linear_layer = Linear(4, 2)
print(linear_layer(x))
tf.Tensor(
[[ 0.0960293   0.09410477 -0.01649074  0.14715078]
 [ 0.0960293   0.09410477 -0.01649074  0.14715078]], shape=(2, 4), dtype=float32)

需要注意的是wb会设置为层的(weights)属性并自动被跟踪。

assert linear_layer.weights == [linear_layer.w, linear_layer.b]

你可以使用层提供的内建方法add_weight来快速添加权重和偏置。

class Linear(keras.layers.Layer):
    def __init__(self, units=32, input_dim=32):
        super(Linear, self).__init__()
        self.w = self.add_weight(shape=(input_dim, units), initializer='random_normal', trainable=True)
        self.b = self.add_weight(shape=(units, ), initializer='zeros', trainable=True)
        
    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b
    
x = tf.ones((2, 2))
linear_layer = Linear(4, 2)
print(linear_layer(x))
tf.Tensor(
[[ 0.03054511  0.01756009 -0.04622959 -0.10992575]
 [ 0.03054511  0.01756009 -0.04622959 -0.10992575]], shape=(2, 4), dtype=float32)

层中的权重可以不参与训练

层中添加的权重也可以不参与训练,也就是说在你训练的过程中,该权重不会参与到反向传播的计算当中。
下面是一个添加不训练权重的例子。

class ComputeSum(keras.layers.Layer):
    def __init__(self, input_dim):
        super(ComputeSum, self).__init__()
        self.total = self.add_weight(shape=(input_dim, ), initializer='zeros', trainable=False)
    
    def call(self, inputs):
        self.total.assign_add(tf.reduce_sum(inputs, axis=0))
        return self.total
    
x = tf.ones((2, 2))
my_sum = ComputeSum(2)
y = my_sum(x)
print(y.numpy())
y = my_sum(x)
print(y.numpy())
[2. 2.]
[4. 4.]

不参与训练的权重依然是weights的一部分,但是会被标记为不参与训练。

print('weights: ', len(my_sum.weights))
print('non-trainable_weights: ', len(my_sum.non_trainable_weights))

# 这个层中是没有可训练的权重的
print('trainable_weights: ', len(my_sum.trainable_weights))
weights:  1
non-trainable_weights:  1
trainable_weights:  0

推迟权重创建

在上面逻辑回归的例子中,__init__方法接收了input_dim参数,用于计算权重的尺寸。
但是在很多情况下,可能无法提前知道输入的尺寸,此时就需要对权重的创建进行推迟,甚至需要推迟到实例化层之后。

class Linear(keras.layers.Layer):
    def __init__(self, units=32,):
        super(Linear, self).__init__()
        self.units = units
    
    def build(self, input_shape):
        self.w = self.add_weight(shape=(input_shape[-1], self.units), initializer='random_normal', trainable=True)
        self.b = self.add_weight(shape=(self.units, ), initializer='zeros', trainable=True)
        
    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

__call__方法会在第一次调用的时候调用build方法完成权重的延迟创建。这种方式可以很方便的实现延迟创建的功能。

linear_layer = Linear(32)
y = linear_layer(x)

层可以是嵌套的

有时候需要在一个层重用调用其它层,此时外部层会自动跟踪内部层中的权重属性。
推荐在外部层的__init__方法中实例化内部层对象,这样如果内部层实现了build方法,就可以将权重的初始化推迟到外部层获取到输入的时候。

class MLPBlock(keras.layers.Layer):
    def __init__(self):
        super(MLPBlock, self).__init__()
        self.linear_1 = Linear(32)
        self.linear_2 = Linear(32)
        self.linear_3 = Linear(1)
        
    def call(self, inputs):
        x = self.linear_1(inputs)
        x = keras.activations.relu(x)
        x = self.linear_2(x)
        x = keras.activations.relu(x)
        return self.linear_3(x)
    
x = tf.ones(shape=(3, 64))
mlp = MLPBlock()
y = mlp(x)
print('weights: ', len(mlp.weights))
print('trainable_weights: ', len(mlp.trainable_weights))
weights:  6
trainable_weights:  6

递归收集添加的损失

当实现层的call方法时,可以使用self.add_loss(value)将一个张量添加到损失中,以便后面使用。

class ActivityRegularizationLayer(keras.layers.Layer):
    def __init__(self, rate=1e-2):
        super(ActivityRegularizationLayer, self).__init__()
        self.rate = rate
    
    def call(self, inputs):
        self.add_loss(self.rate * tf.reduce_sum(inputs))
        return inputs

添加的损失(包括内部层添加的)可以通过层的losses集合获取,该集合会在每次调用__call__方法的时候重置,因此集合中只会包含最后一次前向传播中计算的结果。

class OuterLayer(keras.layers.Layer):
    def __init__(self):
        super(OuterLayer, self).__init__()
        self.activity_reg = ActivityRegularizationLayer()
    
    def call(self, inputs):
        return self.activity_reg(inputs)
    
layer = OuterLayer()
assert len(layer.losses) == 0   # 此时没有调用__call__方法,因此loss为空
_ = layer(tf.zeros((1, 1)))
assert len(layer.losses) == 1   # 上面调用了一次__call__,因此loss已经被添加到集合中
# loss集合在每一次调用__call__时被自动重置
_ = layer(tf.zeros((1, 1)))
assert len(layer.losses) == 1   # 上面调用了一次__call__,但由于被重置了一次,因此loss集合中依然只有一个

另外,losses集合中也会包含其他内部层添加的对权重或偏置的正则化。

class OuterLayer(keras.layers.Layer):
    def __init__(self):
        super(OuterLayer, self).__init__()
        self.dense = keras.layers.Dense(32, kernel_regularizer=keras.regularizers.l2(1e-3))
    
    def call(self, inputs):
        return self.dense(inputs)
    
layer = OuterLayer()
_ = layer(tf.zeros((1, 1)))

print(layer.losses)
[<tf.Tensor: id=234, shape=(), dtype=float32, numpy=0.0021141893>]

在编写训练过程时,需要将该集合中的损失取出并加到总的损失当中。

optimizer = keras.optimizers.SGD(learning_rate=1e-3)
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

for x_train_batch, y_train_batch in train_dataset:
    with tf.GradientTape() as tape:
        logits = model(x_train_batch)
        loss_value = loss_fn(y_train_batch, logits)
        
        loss_value += sum(model.losses)
    
        grads = tape.gradient(loss_value, model.trainable_variables)
        optimizer.apply_gradients(zip(grads, model.train_ables))

更多的训练细节,可以查看训练和验证这一节。

序列化

如果希望可以在后面对自定义的层进行序列化,那么可以通过get_config进行序列化。

class Linear(keras.layers.Layer):
    def __init__(self, units=32):
        super(Linear, self).__init__()
        self.units = units
    
    def build(self, input_shape):
        self.w = self.add_weight(shape=(input_shape[-1], self.units), initializer='random_normal', trainable=True)
        self.b = self.add_weight(shape=(self.units, ), initializer='zeros', trainable=True)
        
    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b
    
    def get_config(self):
        return {'units': self.units}
    
layer = Linear(64)
config = layer.get_config()
print(config)
new_layer = Linear.from_config(config)
{'units': 64}

层的基类的__init__会接收一些例如名称、数据类型的关键参数,一个好的习惯就是在子类中将这些传给父类以及写到get_config方法中。

class Linear(keras.layers.Layer):
    def __init__(self, units=32, **kwargs):
        super(Linear, self).__init__(**kwargs)
        self.units = units
    
    def build(self, input_shape):
        self.w = self.add_weight(shape=(input_shape[-1], self.units), initializer='random_normal', trainable=True)
        self.b = self.add_weight(shape=(self.units, ), initializer='zeros', trainable=True)
        
    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b
    
    def get_config(self):
        config = super(Linear, self).get_config()
        config.update({'units': self.units})
        return config
    
layer = Linear(64)
config = layer.get_config()
print(config)
new_layer = Linear.from_config(config)
{'name': 'linear_10', 'trainable': True, 'dtype': None, 'units': 64}

如果你想灵活的从config中恢复层,那么可以通过重载from_config来自己实现,下面是默认的from_config方法:

@classmethod
def from_config(cls, config):
    return cls(**config)

更多关于序列化和反序列化的方法,可以查看“保存和序列化模型”这一章。

call方法中特殊的训练参数

在一些特殊的层中,比如batch normalizationdropout,在训练和推断时的表现是不同的,对于这种类型的层,可以在调用call方法时使用training参数来控制其行为。
通过这个参数,你可以在训练和推断是正确控制模型的行为和输出。

class CustomDropout(keras.layers.Layer):
    def __init__(self, rate, **kwargs):
        super(CustomDropout, self).__init__(**kwargs)
        self.rate = rate

    def call(self, inputs, training=None):
        if training:
            return tf.nn.dropout(inputs, rate=self.rate)
        return inputs
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值