(7-2-02)TensorFlow自然语言处理实战:RNN生成文本(2)

7.2.6  测试模型

开始运行这个模型,首先检查输出的形状,代码如下:

for input_example_batch, target_example_batch in dataset.take(1):
  example_batch_predictions = model(input_example_batch)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

执行后会输出:

 (64, 100, 65) # (batch_size, sequence_length, vocab_size)

在上述代码中,输入的序列长度为 100, 但是这个模型可以在任何长度的输入上运行。通过如下代码查看模型的基本信息:

model.summary()

执行后会输出:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (64, None, 256)           16640     
_________________________________________________________________
gru (GRU)                    (64, None, 1024)          3938304   
_________________________________________________________________
dense (Dense)                (64, None, 65)            66625     
=================================================================
Total params: 4,021,569
Trainable params: 4,021,569
Non-trainable params: 0
_________________________________________________________________

为了获得模型的实际预测,需要从输出分布中抽样,以获得实际的字符索引。这个分布是根据对字符集的逻辑回归定义的。需要注意的是,从这个分布中抽样是很重要的,因为当取分布的最大值自变量点集(argmax)时,很容易使模型卡在循环中。

处理批次中的第一个样本,然后获取每个时间步预测的下一个字符的索引。代码如下:

sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

sampled_indices

执行后会输出:

array([ 3, 19, 11,  8, 17, 50, 14,  5, 16, 57, 51, 53, 17, 54,  9, 11, 22,
       13, 36, 57, 57, 50, 47, 22,  5,  7,  1, 59,  3, 26, 52,  2, 62, 30,
       54, 18, 62,  9, 63,  2, 22, 11, 18, 12, 63,  0, 13, 16, 38, 49, 21,
       25, 22, 53, 39, 63,  3, 26, 39, 15, 21, 56, 49, 39, 20, 55,  5, 39,
       61, 29, 21, 39, 39, 63, 48, 11, 27, 42, 59,  0, 19, 58, 57, 27, 40,
       13, 53, 13,  7,  4, 21, 32, 10, 57, 18, 30, 54, 36, 12,  3])

接下来进行解码处理的工作,查看未经训练的模型预测的文本。代码如下:

print("Input: \n", repr("".join(idx2char[input_example_batch[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices ])))

执行后会输出:

Input: 
 'e, I say! madam! sweet-heart! why, bride!\nWhat, not a word? you take your pennyworths now;\nSleep for'

Next Char Predictions: 
 "$G;.ElB'DsmoEp3;JAXssliJ'- u$Nn!xRpFx3y!J;F?y\nADZkIMJoay$NaCIrkaHq'awQIaayj;Odu\nGtsObAoA-&IT:sFRpX?$"

7.2.7  训练模型

此时整个问题可以被视为一个标准的分类问题:给定先前的 RNN 状态和这一时间步的输入,预测下一个字符的类别。首先添加优化器和损失函数,在此使用标准的 损失函数tf.keras.losses.sparse_categorical_crossentropy(),因为它被应用于预测的最后一个维度。因为我们的模型返回逻辑回归,所以需要设定命令行参数 from_logits。代码如下:

def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

example_batch_loss  = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss:      ", example_batch_loss.numpy().mean())

执行后会输出:

Prediction shape:  (64, 100, 65)  # (batch_size, sequence_length, vocab_size)
scalar_loss:       4.1736827

然后使用 tf.keras.Model.compile()函数配置训练步骤,使用 tf.keras.optimizers.Adam并采用默认参数和损失函数。代码如下:

model.compile(optimizer='adam', loss=loss)

使用tf.keras.callbacks.ModelCheckpoint确保在训练过程中保存检查点,代码如下:

# 检查点保存至的目录
checkpoint_dir = './training_checkpoints'
# 检查点的文件名
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

接下来开始训练,为了保持训练时间的合理性,使用 10 个周期来训练模型。在 Colab 中,将运行时设置为 GPU 以加速训练。代码如下:

EPOCHS=10
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

执行后会输出:

Epoch 1/10
172/172 [==============================] - 5s 27ms/step - loss: 2.6663
Epoch 2/10
172/172 [==============================] - 5s 27ms/step - loss: 1.9452
Epoch 3/10
172/172 [==============================] - 5s 27ms/step - loss: 1.6797
Epoch 4/10
172/172 [==============================] - 5s 27ms/step - loss: 1.5355
Epoch 5/10
172/172 [==============================] - 5s 27ms/step - loss: 1.4493
Epoch 6/10
172/172 [==============================] - 5s 27ms/step - loss: 1.3900
Epoch 7/10
172/172 [==============================] - 5s 27ms/step - loss: 1.3457
Epoch 8/10
172/172 [==============================] - 5s 26ms/step - loss: 1.3076
Epoch 9/10
172/172 [==============================] - 5s 27ms/step - loss: 1.2732
Epoch 10/10
172/172 [==============================] - 5s 27ms/step - loss: 1.2412

7.2.8  生成文本

恢复为最新的检查点,为保持本次预测步骤尽量简单,将批大小设定为 1。由于 RNN 状态从时间步传递到时间步的方式,模型建立好之后只接受固定的批大小。如果要使用不同的 batch_size 来运行模型,需要重建模型并从检查点中恢复权重。代码如下:

tf.train.latest_checkpoint(checkpoint_dir)
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

model.build(tf.TensorShape([1, None]))

model.summary()

执行后会输出:

'./training_checkpoints/ckpt_10'

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (1, None, 256)            16640     
_________________________________________________________________
gru_1 (GRU)                  (1, None, 1024)           3938304   
_________________________________________________________________
dense_1 (Dense)              (1, None, 65)             66625     
=================================================================
Total params: 4,021,569
Trainable params: 4,021,569
Non-trainable params: 0
_________________________________________________________________

7.2.9  预测循环

首先设置起始字符串,初始化 RNN 状态并设置要生成的字符个数。用起始字符串和 RNN 状态,获取下一个字符的预测分布。然后用分类分布计算预测字符的索引,把这个预测字符当作模型的下一个输入。模型返回的 RNN 状态被输送回模型。现在,模型有更多上下文可以学习,而非只有一个字符。在预测出下一个字符后,更改过的 RNN 状态被再次输送回模型。模型就是这样,通过不断从前面预测的字符获得更多上下文进行学习。

查看生成的文本,会发现这个模型知道什么时候使用大写字母,什么时候分段,而且模仿出了莎士比亚式的词汇。由于训练的周期小,模型尚未学会生成连贯的句子。代码如下:

def generate_text(model, start_string):
  # 评估步骤(用学习过的模型生成文本)

  # 要生成的字符个数
  num_generate = 1000

  # 将起始字符串转换为数字(向量化)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # 空字符串用于存储结果
  text_generated = []

  # 低温度会生成更可预测的文本
  # 较高温度会生成更令人惊讶的文本
  # 可以通过试验以找到最好的设定
  temperature = 1.0

  # 这里批大小为 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # 删除批次的维度
      predictions = tf.squeeze(predictions, 0)

      # 用分类分布预测模型返回的字符
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # 把预测字符和前面的隐藏状态一起传递给模型作为下一个输入
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

print(generate_text(model, start_string=u"ROMEO: "))

执行后会输出:

ROMEO: in't, Romeo rather
say, bid me not say, the adden, and you man for all.
Now, good Cart, or do held. Well, leaving her son,
Some stomacame, brother, Edommen.

PROSPERO:
My lord Hastings, for death,
Or as believell you be accoment.

TRANIO:
Mistraising? come, get abseng house:
The that was a life upon none of the equard sud,
Great Aufidius any joy;
For well a fool, and loveth one stay,
To whom Gare his moved me of Marcius shoulded.
Pite o'erposens to him.

KING RICHARD II:
Come, civil and live, if wet to help and raisen fellow.

CORIOLANUS:
Mark, here, sir. But the palace-hate will be at him in
some wondering danger, my bestilent.

DUKE OF AUMERLE:
You, my lord? my dearly uncles for,
If't be fown'd for truth enough not him,
He talk of youngest young princely sake.

ROMEO:
This let me have a still before the queen
First worthy angel. Would yes, by return.

BAPTISTA:
You have dan,
Dies, renown awrifes; I'll say you.

Provost:
And, come, make it out.

LEONTES:
They call thee, hangions,
Not

如果想改进运行结果,最简单的方式是延长训练时间(例如将EPOCHS设置为30)。另外,还可以试验使用不同的起始字符串,或者尝试增加另一个 RNN 层以提高模型的准确率,亦或调整温度参数以生成更多或者更少的随机预测。

7.2.10  自定义训练

为了实现稳的定模型和输出,请执行如下操作实现自定义训练:

    1. 首先使用 tf.keras.Model.reset_states()函数初始化 RNN 状态,然后迭代数据集(逐批次)并计算每次迭代对应的预测。
    2. 打开一个 tf.GradientTape 并计算该上下文时的预测和损失。
    3. 使用 tf.GradientTape.grads()函数计算当前模型变量情况下的损失梯度。
    4. 最后使用优化器的 tf.train.Optimizer.apply_gradients()函数向下迈出一步。

自定义训练的实现代码如下:

model = build_model(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

optimizer = tf.keras.optimizers.Adam()

@tf.function
def train_step(inp, target):
  with tf.GradientTape() as tape:
    predictions = model(inp)
    loss = tf.reduce_mean(
        tf.keras.losses.sparse_categorical_crossentropy(
            target, predictions, from_logits=True))
  grads = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(grads, model.trainable_variables))

  return loss

# 训练步骤
EPOCHS = 10

for epoch in range(EPOCHS):
  start = time.time()

  # 在每个训练周期开始时,初始化隐藏状态
  # 隐藏状态最初为 None
  hidden = model.reset_states()

  for (batch_n, (inp, target)) in enumerate(dataset):
    loss = train_step(inp, target)

    if batch_n % 100 == 0:
      template = 'Epoch {} Batch {} Loss {}'
      print(template.format(epoch+1, batch_n, loss))

  # 每 5 个训练周期,保存(检查点)1 次模型
  if (epoch + 1) % 5 == 0:
    model.save_weights(checkpoint_prefix.format(epoch=epoch))

  print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
  print ('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

model.save_weights(checkpoint_prefix.format(epoch=epoch))

执行后会输出:

WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.iter
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_1
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_2
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.decay
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.learning_rate
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-0.embeddings
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-2.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-2.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.recurrent_kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-0.embeddings
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-2.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-2.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.recurrent_kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.bias
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.

Epoch 1 Batch 0 Loss 4.173541069030762
Epoch 1 Batch 100 Loss 2.3451342582702637
Epoch 1 Loss 2.1603
Time taken for 1 epoch 6.5293896198272705 sec

Epoch 2 Batch 0 Loss 2.1137943267822266
Epoch 2 Batch 100 Loss 1.9266924858093262
Epoch 2 Loss 1.7417
Time taken for 1 epoch 5.6192779541015625 sec

Epoch 3 Batch 0 Loss 1.775771975517273
Epoch 3 Batch 100 Loss 1.657868504524231
Epoch 3 Loss 1.5520
Time taken for 1 epoch 5.231291770935059 sec

  • 17
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

码农三叔

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值