准备
!pip3 install tensorflow==2.0.0a0
%matplotlib inline
import tensorflow as tf
from tensorflow import keras
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: tensorflow==2.0.0a0 in /usr/local/lib/python3.7/site-packages (2.0.0a0)
Requirement already satisfied: google-pasta>=0.1.2 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (0.1.4)
Requirement already satisfied: numpy<2.0,>=1.14.5 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.16.2)
Requirement already satisfied: astor>=0.6.0 in /Users/fei/Library/Python/3.7/lib/python/site-packages (from tensorflow==2.0.0a0) (0.7.1)
Requirement already satisfied: keras-preprocessing>=1.0.5 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.0.9)
Requirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (0.33.1)
Requirement already satisfied: tf-estimator-nightly<1.14.0.dev2019030116,>=1.14.0.dev2019030115 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.14.0.dev2019030115)
Requirement already satisfied: keras-applications>=1.0.6 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.0.7)
Requirement already satisfied: gast>=0.2.0 in /Users/fei/Library/Python/3.7/lib/python/site-packages (from tensorflow==2.0.0a0) (0.2.2)
Requirement already satisfied: six>=1.10.0 in /Users/fei/Library/Python/3.7/lib/python/site-packages (from tensorflow==2.0.0a0) (1.12.0)
Requirement already satisfied: protobuf>=3.6.1 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (3.7.0)
Requirement already satisfied: tb-nightly<1.14.0a20190302,>=1.14.0a20190301 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.14.0a20190301)
Requirement already satisfied: absl-py>=0.7.0 in /Users/fei/Library/Python/3.7/lib/python/site-packages (from tensorflow==2.0.0a0) (0.7.0)
Requirement already satisfied: termcolor>=1.1.0 in /Users/fei/Library/Python/3.7/lib/python/site-packages (from tensorflow==2.0.0a0) (1.1.0)
Requirement already satisfied: grpcio>=1.8.6 in /usr/local/lib/python3.7/site-packages (from tensorflow==2.0.0a0) (1.19.0)
Requirement already satisfied: h5py in /Users/fei/Library/Python/3.7/lib/python/site-packages (from keras-applications>=1.0.6->tensorflow==2.0.0a0) (2.9.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/site-packages (from protobuf>=3.6.1->tensorflow==2.0.0a0) (40.8.0)
Requirement already satisfied: markdown>=2.6.8 in /Users/fei/Library/Python/3.7/lib/python/site-packages (from tb-nightly<1.14.0a20190302,>=1.14.0a20190301->tensorflow==2.0.0a0) (3.0.1)
Requirement already satisfied: werkzeug>=0.11.15 in /Users/fei/Library/Python/3.7/lib/python/site-packages (from tb-nightly<1.14.0a20190302,>=1.14.0a20190301->tensorflow==2.0.0a0) (0.14.1)
层
层的内容
层内部包含了一些变量(权重)和计算方法(负责将输入转换为输出)。
class Linear(keras.layers.Layer):
def __init__(self, units=32, input_dim=32):
super(Linear, self).__init__()
self.w = tf.Variable(initial_value=tf.random_normal_initializer()(shape=(input_dim, units), dtype='float32'))
self.b = tf.Variable(initial_value=tf.zeros_initializer()(shape=(units, ), dtype='float32'))
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
x = tf.ones((2, 2))
linear_layer = Linear(4, 2)
print(linear_layer(x))
tf.Tensor(
[[ 0.0960293 0.09410477 -0.01649074 0.14715078]
[ 0.0960293 0.09410477 -0.01649074 0.14715078]], shape=(2, 4), dtype=float32)
需要注意的是w
和b
会设置为层的(weights
)属性并自动被跟踪。
assert linear_layer.weights == [linear_layer.w, linear_layer.b]
你可以使用层提供的内建方法add_weight
来快速添加权重和偏置。
class Linear(keras.layers.Layer):
def __init__(self, units=32, input_dim=32):
super(Linear, self).__init__()
self.w = self.add_weight(shape=(input_dim, units), initializer='random_normal', trainable=True)
self.b = self.add_weight(shape=(units, ), initializer='zeros', trainable=True)
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
x = tf.ones((2, 2))
linear_layer = Linear(4, 2)
print(linear_layer(x))
tf.Tensor(
[[ 0.03054511 0.01756009 -0.04622959 -0.10992575]
[ 0.03054511 0.01756009 -0.04622959 -0.10992575]], shape=(2, 4), dtype=float32)
层中的权重可以不参与训练
层中添加的权重也可以不参与训练,也就是说在你训练的过程中,该权重不会参与到反向传播的计算当中。
下面是一个添加不训练权重的例子。
class ComputeSum(keras.layers.Layer):
def __init__(self, input_dim):
super(ComputeSum, self).__init__()
self.total = self.add_weight(shape=(input_dim, ), initializer='zeros', trainable=False)
def call(self, inputs):
self.total.assign_add(tf.reduce_sum(inputs, axis=0))
return self.total
x = tf.ones((2, 2))
my_sum = ComputeSum(2)
y = my_sum(x)
print(y.numpy())
y = my_sum(x)
print(y.numpy())
[2. 2.]
[4. 4.]
不参与训练的权重依然是weights
的一部分,但是会被标记为不参与训练。
print('weights: ', len(my_sum.weights))
print('non-trainable_weights: ', len(my_sum.non_trainable_weights))
# 这个层中是没有可训练的权重的
print('trainable_weights: ', len(my_sum.trainable_weights))
weights: 1
non-trainable_weights: 1
trainable_weights: 0
推迟权重创建
在上面逻辑回归的例子中,__init__
方法接收了input_dim
参数,用于计算权重的尺寸。
但是在很多情况下,可能无法提前知道输入的尺寸,此时就需要对权重的创建进行推迟,甚至需要推迟到实例化层之后。
class Linear(keras.layers.Layer):
def __init__(self, units=32,):
super(Linear, self).__init__()
self.units = units
def build(self, input_shape):
self.w = self.add_weight(shape=(input_shape[-1], self.units), initializer='random_normal', trainable=True)
self.b = self.add_weight(shape=(self.units, ), initializer='zeros', trainable=True)
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
__call__
方法会在第一次调用的时候调用build
方法完成权重的延迟创建。这种方式可以很方便的实现延迟创建的功能。
linear_layer = Linear(32)
y = linear_layer(x)
层可以是嵌套的
有时候需要在一个层重用调用其它层,此时外部层会自动跟踪内部层中的权重属性。
推荐在外部层的__init__
方法中实例化内部层对象,这样如果内部层实现了build
方法,就可以将权重的初始化推迟到外部层获取到输入的时候。
class MLPBlock(keras.layers.Layer):
def __init__(self):
super(MLPBlock, self).__init__()
self.linear_1 = Linear(32)
self.linear_2 = Linear(32)
self.linear_3 = Linear(1)
def call(self, inputs):
x = self.linear_1(inputs)
x = keras.activations.relu(x)
x = self.linear_2(x)
x = keras.activations.relu(x)
return self.linear_3(x)
x = tf.ones(shape=(3, 64))
mlp = MLPBlock()
y = mlp(x)
print('weights: ', len(mlp.weights))
print('trainable_weights: ', len(mlp.trainable_weights))
weights: 6
trainable_weights: 6
递归收集添加的损失
当实现层的call
方法时,可以使用self.add_loss(value)
将一个张量添加到损失中,以便后面使用。
class ActivityRegularizationLayer(keras.layers.Layer):
def __init__(self, rate=1e-2):
super(ActivityRegularizationLayer, self).__init__()
self.rate = rate
def call(self, inputs):
self.add_loss(self.rate * tf.reduce_sum(inputs))
return inputs
添加的损失(包括内部层添加的)可以通过层的losses
集合获取,该集合会在每次调用__call__
方法的时候重置,因此集合中只会包含最后一次前向传播中计算的结果。
class OuterLayer(keras.layers.Layer):
def __init__(self):
super(OuterLayer, self).__init__()
self.activity_reg = ActivityRegularizationLayer()
def call(self, inputs):
return self.activity_reg(inputs)
layer = OuterLayer()
assert len(layer.losses) == 0 # 此时没有调用__call__方法,因此loss为空
_ = layer(tf.zeros((1, 1)))
assert len(layer.losses) == 1 # 上面调用了一次__call__,因此loss已经被添加到集合中
# loss集合在每一次调用__call__时被自动重置
_ = layer(tf.zeros((1, 1)))
assert len(layer.losses) == 1 # 上面调用了一次__call__,但由于被重置了一次,因此loss集合中依然只有一个
另外,losses
集合中也会包含其他内部层添加的对权重或偏置的正则化。
class OuterLayer(keras.layers.Layer):
def __init__(self):
super(OuterLayer, self).__init__()
self.dense = keras.layers.Dense(32, kernel_regularizer=keras.regularizers.l2(1e-3))
def call(self, inputs):
return self.dense(inputs)
layer = OuterLayer()
_ = layer(tf.zeros((1, 1)))
print(layer.losses)
[<tf.Tensor: id=234, shape=(), dtype=float32, numpy=0.0021141893>]
在编写训练过程时,需要将该集合中的损失取出并加到总的损失当中。
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
for x_train_batch, y_train_batch in train_dataset:
with tf.GradientTape() as tape:
logits = model(x_train_batch)
loss_value = loss_fn(y_train_batch, logits)
loss_value += sum(model.losses)
grads = tape.gradient(loss_value, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.train_ables))
更多的训练细节,可以查看训练和验证这一节。
序列化
如果希望可以在后面对自定义的层进行序列化,那么可以通过get_config
进行序列化。
class Linear(keras.layers.Layer):
def __init__(self, units=32):
super(Linear, self).__init__()
self.units = units
def build(self, input_shape):
self.w = self.add_weight(shape=(input_shape[-1], self.units), initializer='random_normal', trainable=True)
self.b = self.add_weight(shape=(self.units, ), initializer='zeros', trainable=True)
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
def get_config(self):
return {'units': self.units}
layer = Linear(64)
config = layer.get_config()
print(config)
new_layer = Linear.from_config(config)
{'units': 64}
层的基类的__init__
会接收一些例如名称、数据类型的关键参数,一个好的习惯就是在子类中将这些传给父类以及写到get_config
方法中。
class Linear(keras.layers.Layer):
def __init__(self, units=32, **kwargs):
super(Linear, self).__init__(**kwargs)
self.units = units
def build(self, input_shape):
self.w = self.add_weight(shape=(input_shape[-1], self.units), initializer='random_normal', trainable=True)
self.b = self.add_weight(shape=(self.units, ), initializer='zeros', trainable=True)
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
def get_config(self):
config = super(Linear, self).get_config()
config.update({'units': self.units})
return config
layer = Linear(64)
config = layer.get_config()
print(config)
new_layer = Linear.from_config(config)
{'name': 'linear_10', 'trainable': True, 'dtype': None, 'units': 64}
如果你想灵活的从config
中恢复层,那么可以通过重载from_config
来自己实现,下面是默认的from_config
方法:
@classmethod
def from_config(cls, config):
return cls(**config)
更多关于序列化和反序列化的方法,可以查看“保存和序列化模型”这一章。
call
方法中特殊的训练参数
在一些特殊的层中,比如batch normalization
和dropout
,在训练和推断时的表现是不同的,对于这种类型的层,可以在调用call
方法时使用training
参数来控制其行为。
通过这个参数,你可以在训练和推断是正确控制模型的行为和输出。
class CustomDropout(keras.layers.Layer):
def __init__(self, rate, **kwargs):
super(CustomDropout, self).__init__(**kwargs)
self.rate = rate
def call(self, inputs, training=None):
if training:
return tf.nn.dropout(inputs, rate=self.rate)
return inputs