7.2.6 测试模型
开始运行这个模型,首先检查输出的形状,代码如下:
for input_example_batch, target_example_batch in dataset.take(1):
example_batch_predictions = model(input_example_batch)
print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")
执行后会输出:
(64, 100, 65) # (batch_size, sequence_length, vocab_size)
在上述代码中,输入的序列长度为 100, 但是这个模型可以在任何长度的输入上运行。通过如下代码查看模型的基本信息:
model.summary()
执行后会输出:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (64, None, 256) 16640
_________________________________________________________________
gru (GRU) (64, None, 1024) 3938304
_________________________________________________________________
dense (Dense) (64, None, 65) 66625
=================================================================
Total params: 4,021,569
Trainable params: 4,021,569
Non-trainable params: 0
_________________________________________________________________
为了获得模型的实际预测,需要从输出分布中抽样,以获得实际的字符索引。这个分布是根据对字符集的逻辑回归定义的。需要注意的是,从这个分布中抽样是很重要的,因为当取分布的最大值自变量点集(argmax)时,很容易使模型卡在循环中。
处理批次中的第一个样本,然后获取每个时间步预测的下一个字符的索引。代码如下:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()
sampled_indices
执行后会输出:
array([ 3, 19, 11, 8, 17, 50, 14, 5, 16, 57, 51, 53, 17, 54, 9, 11, 22,
13, 36, 57, 57, 50, 47, 22, 5, 7, 1, 59, 3, 26, 52, 2, 62, 30,
54, 18, 62, 9, 63, 2, 22, 11, 18, 12, 63, 0, 13, 16, 38, 49, 21,
25, 22, 53, 39, 63, 3, 26, 39, 15, 21, 56, 49, 39, 20, 55, 5, 39,
61, 29, 21, 39, 39, 63, 48, 11, 27, 42, 59, 0, 19, 58, 57, 27, 40,
13, 53, 13, 7, 4, 21, 32, 10, 57, 18, 30, 54, 36, 12, 3])
接下来进行解码处理的工作,查看未经训练的模型预测的文本。代码如下:
print("Input: \n", repr("".join(idx2char[input_example_batch[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices ])))
执行后会输出:
Input:
'e, I say! madam! sweet-heart! why, bride!\nWhat, not a word? you take your pennyworths now;\nSleep for'
Next Char Predictions:
"$G;.ElB'DsmoEp3;JAXssliJ'- u$Nn!xRpFx3y!J;F?y\nADZkIMJoay$NaCIrkaHq'awQIaayj;Odu\nGtsObAoA-&IT:sFRpX?$"
7.2.7 训练模型
此时整个问题可以被视为一个标准的分类问题:给定先前的 RNN 状态和这一时间步的输入,预测下一个字符的类别。首先添加优化器和损失函数,在此使用标准的 损失函数tf.keras.losses.sparse_categorical_crossentropy(),因为它被应用于预测的最后一个维度。因为我们的模型返回逻辑回归,所以需要设定命令行参数 from_logits。代码如下:
def loss(labels, logits):
return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
example_batch_loss = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss: ", example_batch_loss.numpy().mean())
执行后会输出:
Prediction shape: (64, 100, 65) # (batch_size, sequence_length, vocab_size)
scalar_loss: 4.1736827
然后使用 tf.keras.Model.compile()函数配置训练步骤,使用 tf.keras.optimizers.Adam并采用默认参数和损失函数。代码如下:
model.compile(optimizer='adam', loss=loss)
使用tf.keras.callbacks.ModelCheckpoint确保在训练过程中保存检查点,代码如下:
# 检查点保存至的目录
checkpoint_dir = './training_checkpoints'
# 检查点的文件名
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_prefix,
save_weights_only=True)
接下来开始训练,为了保持训练时间的合理性,使用 10 个周期来训练模型。在 Colab 中,将运行时设置为 GPU 以加速训练。代码如下:
EPOCHS=10
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])
执行后会输出:
Epoch 1/10
172/172 [==============================] - 5s 27ms/step - loss: 2.6663
Epoch 2/10
172/172 [==============================] - 5s 27ms/step - loss: 1.9452
Epoch 3/10
172/172 [==============================] - 5s 27ms/step - loss: 1.6797
Epoch 4/10
172/172 [==============================] - 5s 27ms/step - loss: 1.5355
Epoch 5/10
172/172 [==============================] - 5s 27ms/step - loss: 1.4493
Epoch 6/10
172/172 [==============================] - 5s 27ms/step - loss: 1.3900
Epoch 7/10
172/172 [==============================] - 5s 27ms/step - loss: 1.3457
Epoch 8/10
172/172 [==============================] - 5s 26ms/step - loss: 1.3076
Epoch 9/10
172/172 [==============================] - 5s 27ms/step - loss: 1.2732
Epoch 10/10
172/172 [==============================] - 5s 27ms/step - loss: 1.2412
7.2.8 生成文本
恢复为最新的检查点,为保持本次预测步骤尽量简单,将批大小设定为 1。由于 RNN 状态从时间步传递到时间步的方式,模型建立好之后只接受固定的批大小。如果要使用不同的 batch_size 来运行模型,需要重建模型并从检查点中恢复权重。代码如下:
tf.train.latest_checkpoint(checkpoint_dir)
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))
model.summary()
执行后会输出:
'./training_checkpoints/ckpt_10'
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (1, None, 256) 16640
_________________________________________________________________
gru_1 (GRU) (1, None, 1024) 3938304
_________________________________________________________________
dense_1 (Dense) (1, None, 65) 66625
=================================================================
Total params: 4,021,569
Trainable params: 4,021,569
Non-trainable params: 0
_________________________________________________________________
7.2.9 预测循环
首先设置起始字符串,初始化 RNN 状态并设置要生成的字符个数。用起始字符串和 RNN 状态,获取下一个字符的预测分布。然后用分类分布计算预测字符的索引,把这个预测字符当作模型的下一个输入。模型返回的 RNN 状态被输送回模型。现在,模型有更多上下文可以学习,而非只有一个字符。在预测出下一个字符后,更改过的 RNN 状态被再次输送回模型。模型就是这样,通过不断从前面预测的字符获得更多上下文进行学习。
查看生成的文本,会发现这个模型知道什么时候使用大写字母,什么时候分段,而且模仿出了莎士比亚式的词汇。由于训练的周期小,模型尚未学会生成连贯的句子。代码如下:
def generate_text(model, start_string):
# 评估步骤(用学习过的模型生成文本)
# 要生成的字符个数
num_generate = 1000
# 将起始字符串转换为数字(向量化)
input_eval = [char2idx[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)
# 空字符串用于存储结果
text_generated = []
# 低温度会生成更可预测的文本
# 较高温度会生成更令人惊讶的文本
# 可以通过试验以找到最好的设定
temperature = 1.0
# 这里批大小为 1
model.reset_states()
for i in range(num_generate):
predictions = model(input_eval)
# 删除批次的维度
predictions = tf.squeeze(predictions, 0)
# 用分类分布预测模型返回的字符
predictions = predictions / temperature
predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()
# 把预测字符和前面的隐藏状态一起传递给模型作为下一个输入
input_eval = tf.expand_dims([predicted_id], 0)
text_generated.append(idx2char[predicted_id])
return (start_string + ''.join(text_generated))
print(generate_text(model, start_string=u"ROMEO: "))
执行后会输出:
ROMEO: in't, Romeo rather
say, bid me not say, the adden, and you man for all.
Now, good Cart, or do held. Well, leaving her son,
Some stomacame, brother, Edommen.
PROSPERO:
My lord Hastings, for death,
Or as believell you be accoment.
TRANIO:
Mistraising? come, get abseng house:
The that was a life upon none of the equard sud,
Great Aufidius any joy;
For well a fool, and loveth one stay,
To whom Gare his moved me of Marcius shoulded.
Pite o'erposens to him.
KING RICHARD II:
Come, civil and live, if wet to help and raisen fellow.
CORIOLANUS:
Mark, here, sir. But the palace-hate will be at him in
some wondering danger, my bestilent.
DUKE OF AUMERLE:
You, my lord? my dearly uncles for,
If't be fown'd for truth enough not him,
He talk of youngest young princely sake.
ROMEO:
This let me have a still before the queen
First worthy angel. Would yes, by return.
BAPTISTA:
You have dan,
Dies, renown awrifes; I'll say you.
Provost:
And, come, make it out.
LEONTES:
They call thee, hangions,
Not
如果想改进运行结果,最简单的方式是延长训练时间(例如将EPOCHS设置为30)。另外,还可以试验使用不同的起始字符串,或者尝试增加另一个 RNN 层以提高模型的准确率,亦或调整温度参数以生成更多或者更少的随机预测。
7.2.10 自定义训练
为了实现稳的定模型和输出,请执行如下操作实现自定义训练:
-
- 首先使用 tf.keras.Model.reset_states()函数初始化 RNN 状态,然后迭代数据集(逐批次)并计算每次迭代对应的预测。
- 打开一个 tf.GradientTape 并计算该上下文时的预测和损失。
- 使用 tf.GradientTape.grads()函数计算当前模型变量情况下的损失梯度。
- 最后使用优化器的 tf.train.Optimizer.apply_gradients()函数向下迈出一步。
自定义训练的实现代码如下:
model = build_model(
vocab_size = len(vocab),
embedding_dim=embedding_dim,
rnn_units=rnn_units,
batch_size=BATCH_SIZE)
optimizer = tf.keras.optimizers.Adam()
@tf.function
def train_step(inp, target):
with tf.GradientTape() as tape:
predictions = model(inp)
loss = tf.reduce_mean(
tf.keras.losses.sparse_categorical_crossentropy(
target, predictions, from_logits=True))
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
return loss
# 训练步骤
EPOCHS = 10
for epoch in range(EPOCHS):
start = time.time()
# 在每个训练周期开始时,初始化隐藏状态
# 隐藏状态最初为 None
hidden = model.reset_states()
for (batch_n, (inp, target)) in enumerate(dataset):
loss = train_step(inp, target)
if batch_n % 100 == 0:
template = 'Epoch {} Batch {} Loss {}'
print(template.format(epoch+1, batch_n, loss))
# 每 5 个训练周期,保存(检查点)1 次模型
if (epoch + 1) % 5 == 0:
model.save_weights(checkpoint_prefix.format(epoch=epoch))
print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
print ('Time taken for 1 epoch {} sec\n'.format(time.time() - start))
model.save_weights(checkpoint_prefix.format(epoch=epoch))
执行后会输出:
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.iter
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_1
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_2
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.decay
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.learning_rate
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-0.embeddings
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-2.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-2.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.recurrent_kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-0.embeddings
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-2.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-2.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.recurrent_kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.bias
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
Epoch 1 Batch 0 Loss 4.173541069030762
Epoch 1 Batch 100 Loss 2.3451342582702637
Epoch 1 Loss 2.1603
Time taken for 1 epoch 6.5293896198272705 sec
Epoch 2 Batch 0 Loss 2.1137943267822266
Epoch 2 Batch 100 Loss 1.9266924858093262
Epoch 2 Loss 1.7417
Time taken for 1 epoch 5.6192779541015625 sec
Epoch 3 Batch 0 Loss 1.775771975517273
Epoch 3 Batch 100 Loss 1.657868504524231
Epoch 3 Loss 1.5520
Time taken for 1 epoch 5.231291770935059 sec