TensorFlow2.0 Guide官方教程学习笔记8- Keras custom callbacks

最新推荐文章于 2024-04-28 15:32:21 发布

黄水生

最新推荐文章于 2024-04-28 15:32:21 发布

阅读量620

点赞数

分类专栏： TensorFlow 2.0 学习笔记文章标签： TensorFlow2.0

本文链接：https://blog.csdn.net/jackhh1/article/details/102635498

版权

TensorFlow 2.0 学习笔记专栏收录该内容

24 篇文章 4 订阅

订阅专栏

本笔记参照TensorFlow官方教程，主要是对‘Keras custom callbacks’教程内容翻译和内容结构编排，原文链接：Keras custom callbacks

自定义回调是一个强大的工具，可以在培训、评估或推断(包括读取/更改Keras模型)期间自定义Keras模型的行为。
例子包括tf.keras.callbacks.TensorBoard,训练进度和结果可以用TensorBoard导出和显示，或者使用tf.keras.callbacks.ModelCheckpoint在训练过程中自动保存模型，等等。在本指南中，我们将了解什么是Keras回调，何时调用它，它可以做什么，以及如何构建自己的回调。在指南的最后，将演示如何创建几个简单的回调应用程序，以便我们开始使用自定义回调。

创建环境（Setup）

from __future__ import absolute_import, division, print_function, unicode_literals

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf

一、Keras callbacks介绍

在keras里，‘callbacks’是一个python类，它可以被子类化，在训练的不同阶段（包括batch/epoch开始和结束），测试和预测时调用一组方法来提供特定的功能。‘callbacks’对于在模型训练期间观察其内部状态和数据是很有用的。我们可以传递一系列回调（像关键字参数callbacks）给任何tf.keras.Model.fit(),tf.keras.Model.evaluate(),tf.keras.Model.predict()方法。然后在训练/评估/推断的不同阶段调用回调方法。首先我们顶一个一个简单的序列keras模型：

# Define the Keras model to add callbacks to
def get_model():
  model = tf.keras.Sequential()
  model.add(tf.keras.layers.Dense(1, activation = 'linear', input_dim = 784))
  model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.1), loss='mean_squared_error', metrics=['mae'])
  return model

然后，从Keras数据集API中加载MNIST用来训练和测试：

# Load example MNIST data and pre-process it
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

现在，我们定义一个简单回调来追踪每批次数据的开头和末尾，在这些回调期间，它将打印当前批次的索引。

import datetime

class MyCustomCallback(tf.keras.callbacks.Callback):

  def on_train_batch_begin(self, batch, logs=None):
    print('Training: batch {} begins at {}'.format(batch, datetime.datetime.now().time()))

  def on_train_batch_end(self, batch, logs=None):
    print('Training: batch {} ends at {}'.format(batch, datetime.datetime.now().time()))

  def on_test_batch_begin(self, batch, logs=None):
    print('Evaluating: batch {} begins at {}'.format(batch, datetime.datetime.now().time()))

  def on_test_batch_end(self, batch, logs=None):
    print('Evaluating: batch {} ends at {}'.format(batch, datetime.datetime.now().time()))

给模型方法比如tf.keras.Model.fit()提供回调，确保这些方法在这些阶段被调用：

model = get_model()
_ = model.fit(x_train, y_train,
          batch_size=64,
          epochs=1,
          steps_per_epoch=5,
          verbose=0,
          callbacks=[MyCustomCallback()])

Training: batch 0 begins at 03:11:47.415648
Training: batch 0 ends at 03:11:47.668073
Training: batch 1 begins at 03:11:47.668488
Training: batch 1 ends at 03:11:47.670361
Training: batch 2 begins at 03:11:47.670654
Training: batch 2 ends at 03:11:47.672033
Training: batch 3 begins at 03:11:47.672268
Training: batch 3 ends at 03:11:47.673521
Training: batch 4 begins at 03:11:47.674124
Training: batch 4 ends at 03:11:47.675451

二、接受回调的模型方法

我们可以向以下tf.keras.Model方法提供回调列表:

fit(),fit_generator():为固定数量的epoch(数据集上的迭代，或由Python生成器逐批生成的数据)训练模型。
evaluate(),evaluate_generator():评估给定数据或数据生成器的模型。输出评估中的代价和指标值。
predict()，predict_generator():未输入数据或数据生成器生成输入预测。

_ = model.evaluate(x_test, y_test, batch_size=128, verbose=0, steps=5,
          callbacks=[MyCustomCallback()])

Evaluating: batch 0 begins at 02:22:20.009467
Evaluating: batch 0 ends at 02:22:20.064032
Evaluating: batch 1 begins at 02:22:20.064414
Evaluating: batch 1 ends at 02:22:20.066675
Evaluating: batch 2 begins at 02:22:20.067058
Evaluating: batch 2 ends at 02:22:20.068866
Evaluating: batch 3 begins at 02:22:20.069116
Evaluating: batch 3 ends at 02:22:20.071007
Evaluating: batch 4 begins at 02:22:20.071241
Evaluating: batch 4 ends at 02:22:20.073058

三、回调方法概述

3.1训练/测试/预测的通用方法

对于培训、测试和预测，提供了以下方法。
on_(train|test|predict)begin(self, logs=None)：在ft/evaluate/predict开始时调用。
on(train|test|predict)end(self, logs=None):在ft/evaluate/predict结束时调用。
on(train|test|predict)batch_begin(self, batch, logs=None)：在训练/测试/预测过程中，在处理批数据之前调用。在此方法中，日志是一个包含批和大小可用键的字典，表示当前的批号和批处理的大小。
on(train|test|predict)_batch_end(self, batch, logs=None)：在一个批次训练/测试/预测结束时调用。在此方法中，日志是一个包含状态指标的字典。
训练指定模型
on_epoch_begin(self,epoch,logs=None):训练期间每个纪元开始时调用。
on_epoch_end(self,epoch,logs=None):训练期间一个纪元结束时调用。

3.2使用日志字典

日志字典包含在每个批或纪元结束时的代价值和所有的指标。下面是包含代价和均方差的例子：

class LossAndErrorPrintingCallback(tf.keras.callbacks.Callback):

  def on_train_batch_end(self, batch, logs=None):
    print('For batch {}, loss is {:7.2f}.'.format(batch, logs['loss']))

  def on_test_batch_end(self, batch, logs=None):
    print('For batch {}, loss is {:7.2f}.'.format(batch, logs['loss']))

  def on_epoch_end(self, epoch, logs=None):
    print('The average loss for epoch {} is {:7.2f} and mean absolute error is {:7.2f}.'.format(epoch, logs['loss'], logs['mae']))

model = get_model()
_ = model.fit(x_train, y_train,
          batch_size=64,
          steps_per_epoch=5,
          epochs=3,
          verbose=0,
          callbacks=[LossAndErrorPrintingCallback()])

For batch 0, loss is   30.73.
For batch 1, loss is  939.75.
For batch 2, loss is   15.50.
For batch 3, loss is    6.81.
For batch 4, loss is    8.70.
The average loss for epoch 0 is  200.30 and mean absolute error is    8.28.
For batch 0, loss is    6.61.
For batch 1, loss is    8.99.
For batch 2, loss is    5.07.
For batch 3, loss is    5.24.
For batch 4, loss is    5.22.
The average loss for epoch 1 is    6.23 and mean absolute error is    2.02.
For batch 0, loss is    4.12.
For batch 1, loss is    4.68.
For batch 2, loss is    4.23.
For batch 3, loss is    6.00.
For batch 4, loss is    3.84.
The average loss for epoch 2 is    4.57 and mean absolute error is    1.74.

同样，我们可以在‘evaluate（）’调用中提供一个回调：

_ = model.evaluate(x_test, y_test, batch_size=128, verbose=0, steps=20,
          callbacks=[LossAndErrorPrintingCallback()])

For batch 0, loss is    4.78.
For batch 1, loss is    3.78.
For batch 2, loss is    4.75.
For batch 3, loss is    4.55.
For batch 4, loss is    5.20.
For batch 5, loss is    4.22.
For batch 6, loss is    4.17.
For batch 7, loss is    4.30.
For batch 8, loss is    4.65.
For batch 9, loss is    5.68.
For batch 10, loss is    4.91.
For batch 11, loss is    5.17.
For batch 12, loss is    5.48.
For batch 13, loss is    6.97.
For batch 14, loss is    4.72.
For batch 15, loss is    4.45.
For batch 16, loss is    5.85.
For batch 17, loss is    6.13.
For batch 18, loss is    5.88.
For batch 19, loss is    4.24.

四、Keras回调应用示例

4.1提前止损（Early stopping at minimum loss）

第一个示例展示创建一个回调函数，它可以在模型达到最小代价时通过修改model.stop_training(boolean)属性来停止keras训练。另外，我们也可以提供一个‘耐心’（patience）参数指定训练在最终停止之前应该等待多少个纪元。
tf.keras.callbacks.EarlyStopping提供一个完整而且通用的实现。

import numpy as np

class EarlyStoppingAtMinLoss(tf.keras.callbacks.Callback):
  """Stop training when the loss is at its min, i.e. the loss stops decreasing.

  Arguments:
      patience: Number of epochs to wait after min has been hit. After this
      number of no improvement, training stops.
  """

  def __init__(self, patience=0):
    super(EarlyStoppingAtMinLoss, self).__init__()

    self.patience = patience

    # best_weights to store the weights at which the minimum loss occurs.
    self.best_weights = None

  def on_train_begin(self, logs=None):
    # The number of epoch it has waited when loss is no longer minimum.
    self.wait = 0
    # The epoch the training stops at.
    self.stopped_epoch = 0
    # Initialize the best as infinity.
    self.best = np.Inf

  def on_epoch_end(self, epoch, logs=None):
    current = logs.get('loss')
    if np.less(current, self.best):
      self.best = current
      self.wait = 0
      # Record the best weights if current results is better (less).
      self.best_weights = self.model.get_weights()
    else:
      self.wait += 1
      if self.wait >= self.patience:
        self.stopped_epoch = epoch
        self.model.stop_training = True
        print('Restoring model weights from the end of the best epoch.')
        self.model.set_weights(self.best_weights)

  def on_train_end(self, logs=None):
    if self.stopped_epoch > 0:
      print('Epoch %05d: early stopping' % (self.stopped_epoch + 1))

model = get_model()
_ = model.fit(x_train, y_train,
          batch_size=64,
          steps_per_epoch=5,
          epochs=30,
          verbose=0,
          callbacks=[LossAndErrorPrintingCallback(), EarlyStoppingAtMinLoss()])

For batch 0, loss is   28.80.
For batch 1, loss is 1001.38.
For batch 2, loss is   22.58.
For batch 3, loss is    9.72.
For batch 4, loss is    6.21.
The average loss for epoch 0 is  213.74 and mean absolute error is    8.47.
For batch 0, loss is    7.74.
For batch 1, loss is    6.69.
For batch 2, loss is    7.12.
For batch 3, loss is    7.07.
For batch 4, loss is    5.87.
The average loss for epoch 1 is    6.90 and mean absolute error is    2.19.
For batch 0, loss is    3.92.
For batch 1, loss is    6.67.
For batch 2, loss is    6.48.
For batch 3, loss is    4.78.
For batch 4, loss is    4.78.
The average loss for epoch 2 is    5.33 and mean absolute error is    1.90.
For batch 0, loss is    6.69.
For batch 1, loss is    5.20.
For batch 2, loss is    9.49.
For batch 3, loss is   11.50.
For batch 4, loss is   22.47.
The average loss for epoch 3 is   11.07 and mean absolute error is    2.67.
Restoring model weights from the end of the best epoch.
Epoch 00004: early stopping

4.2学习速率调度

在模型训练时有件事通常都会做：随着纪元已经通过，改变学习速率。Keras后端公开了可以用来设置变量的get_value api。在本例中，我们将展示如何使用自定义回调来动态地更改学习速率。

	注意：下面是个实现例子，更多实现方法可以去看callback.LeraningRateSchedule和keras.optimizers.schedules。

class LearningRateScheduler(tf.keras.callbacks.Callback):
  """Learning rate scheduler which sets the learning rate according to schedule.

  Arguments:
      schedule: a function that takes an epoch index
          (integer, indexed from 0) and current learning rate
          as inputs and returns a new learning rate as output (float).
  """

  def __init__(self, schedule):
    super(LearningRateScheduler, self).__init__()
    self.schedule = schedule

  def on_epoch_begin(self, epoch, logs=None):
    if not hasattr(self.model.optimizer, 'lr'):
      raise ValueError('Optimizer must have a "lr" attribute.')
    # Get the current learning rate from model's optimizer.
    lr = float(tf.keras.backend.get_value(self.model.optimizer.lr))
    # Call schedule function to get the scheduled learning rate.
    scheduled_lr = self.schedule(epoch, lr)
    # Set the value back to the optimizer before this epoch starts
    tf.keras.backend.set_value(self.model.optimizer.lr, scheduled_lr)
    print('\nEpoch %05d: Learning rate is %6.4f.' % (epoch, scheduled_lr))

LR_SCHEDULE = [
    # (epoch to start, learning rate) tuples
    (3, 0.05), (6, 0.01), (9, 0.005), (12, 0.001)
]

def lr_schedule(epoch, lr):
  """Helper function to retrieve the scheduled learning rate based on epoch."""
  if epoch < LR_SCHEDULE[0][0] or epoch > LR_SCHEDULE[-1][0]:
    return lr
  for i in range(len(LR_SCHEDULE)):
    if epoch == LR_SCHEDULE[i][0]:
      return LR_SCHEDULE[i][1]
  return lr

model = get_model()
_ = model.fit(x_train, y_train,
          batch_size=64,
          steps_per_epoch=5,
          epochs=15,
          verbose=0,
          callbacks=[LossAndErrorPrintingCallback(), LearningRateScheduler(lr_schedule)])

Epoch 00000: Learning rate is 0.1000.
For batch 0, loss is   39.11.
For batch 1, loss is  966.38.
For batch 2, loss is   27.87.
For batch 3, loss is    8.64.
For batch 4, loss is    6.31.
The average loss for epoch 0 is  209.66 and mean absolute error is    8.70.

Epoch 00001: Learning rate is 0.1000.
For batch 0, loss is    7.46.
For batch 1, loss is    5.93.
For batch 2, loss is    5.63.
For batch 3, loss is    4.83.
For batch 4, loss is    5.30.
The average loss for epoch 1 is    5.83 and mean absolute error is    2.00.

Epoch 00002: Learning rate is 0.1000.
For batch 0, loss is    4.68.
For batch 1, loss is    4.47.
For batch 2, loss is    3.88.
For batch 3, loss is    4.74.
For batch 4, loss is    4.48.
The average loss for epoch 2 is    4.45 and mean absolute error is    1.77.

Epoch 00003: Learning rate is 0.0500.
For batch 0, loss is    4.65.
For batch 1, loss is    3.85.
For batch 2, loss is    4.80.
For batch 3, loss is    5.23.
For batch 4, loss is    3.99.
The average loss for epoch 3 is    4.50 and mean absolute error is    1.74.

Epoch 00004: Learning rate is 0.0500.
For batch 0, loss is    4.68.
For batch 1, loss is    4.03.
For batch 2, loss is    4.64.
For batch 3, loss is    4.05.
For batch 4, loss is    4.36.
The average loss for epoch 4 is    4.35 and mean absolute error is    1.70.

Epoch 00005: Learning rate is 0.0500.
For batch 0, loss is    3.50.
For batch 1, loss is    3.80.
For batch 2, loss is    3.28.
For batch 3, loss is    4.86.
For batch 4, loss is    5.18.
The average loss for epoch 5 is    4.12 and mean absolute error is    1.56.

Epoch 00006: Learning rate is 0.0100.
For batch 0, loss is    5.42.
For batch 1, loss is    4.01.
For batch 2, loss is    3.34.
For batch 3, loss is    4.99.
For batch 4, loss is    5.75.
The average loss for epoch 6 is    4.70 and mean absolute error is    1.67.

Epoch 00007: Learning rate is 0.0100.
For batch 0, loss is    4.00.
For batch 1, loss is    4.75.
For batch 2, loss is    4.30.
For batch 3, loss is    5.19.
For batch 4, loss is    4.09.
The average loss for epoch 7 is    4.47 and mean absolute error is    1.71.

Epoch 00008: Learning rate is 0.0100.
For batch 0, loss is    3.30.
For batch 1, loss is    4.52.
For batch 2, loss is    3.56.
For batch 3, loss is    5.18.
For batch 4, loss is    4.35.
The average loss for epoch 8 is    4.18 and mean absolute error is    1.62.

Epoch 00009: Learning rate is 0.0050.
For batch 0, loss is    4.80.
For batch 1, loss is    3.53.
For batch 2, loss is    5.62.
For batch 3, loss is    4.65.
For batch 4, loss is    4.41.
The average loss for epoch 9 is    4.60 and mean absolute error is    1.70.

Epoch 00010: Learning rate is 0.0050.
For batch 0, loss is    3.98.
For batch 1, loss is    4.85.
For batch 2, loss is    4.38.
For batch 3, loss is    4.33.
For batch 4, loss is    5.79.
The average loss for epoch 10 is    4.67 and mean absolute error is    1.72.

Epoch 00011: Learning rate is 0.0050.
For batch 0, loss is    2.78.
For batch 1, loss is    3.62.
For batch 2, loss is    3.90.
For batch 3, loss is    5.34.
For batch 4, loss is    3.60.
The average loss for epoch 11 is    3.85 and mean absolute error is    1.56.

Epoch 00012: Learning rate is 0.0010.
For batch 0, loss is    4.60.
For batch 1, loss is    4.38.
For batch 2, loss is    4.02.
For batch 3, loss is    4.27.
For batch 4, loss is    3.36.
The average loss for epoch 12 is    4.13 and mean absolute error is    1.62.

Epoch 00013: Learning rate is 0.0010.
For batch 0, loss is    4.19.
For batch 1, loss is    3.89.
For batch 2, loss is    3.31.
For batch 3, loss is    3.50.
For batch 4, loss is    3.14.
The average loss for epoch 13 is    3.60 and mean absolute error is    1.50.

Epoch 00014: Learning rate is 0.0010.
For batch 0, loss is    4.57.
For batch 1, loss is    4.86.
For batch 2, loss is    5.10.
For batch 3, loss is    3.95.
For batch 4, loss is    3.71.
The average loss for epoch 14 is    4.44 and mean absolute error is    1.66.

4.3Keras回调标准

一定要通过访问api文档（tf api doc）来检查现有的Keras回调。应用程序包括记录到CSV，保存模型，在TensorBoard上可视化等等。

黄水生

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
TensorFlow2.0 Guide官方教程学习笔记8- Keras custom callbacks

本笔记参照TensorFlow官方教程，主要是对‘Keras custom callbacks’教程内容翻译和内容结构编排，原文链接：Keras custom callbacks目录创建环境（Setup）一、Keras callbacks介绍二、接受回调的模型方法三、回调方法概述3.1训练/测试/预测的通用方法3.2训练特定方法3.3使用日志字典四、Keras回调应用示例4.1...
复制链接

扫一扫