TensorFlow入门训练笔记（三）——保存&加载模型

最新推荐文章于 2024-07-16 09:09:45 发布

XS30

最新推荐文章于 2024-07-16 09:09:45 发布

阅读量509

点赞数

分类专栏： TensorFlow 文章标签： tensorflow 神经网络

本文链接：https://blog.csdn.net/u014798590/article/details/108213857

版权

TensorFlow 专栏收录该内容

11 篇文章 5 订阅

订阅专栏

PS：1、本文旨在对TF学习过程进行备忘，本人菜得抠脚，故文章难免会有一定错误，还望指出，谢谢；
2、本文程序代码使用Google TensorFlow所给出的官方入门教程；
3、本文使用tf.keras,对模型进行构建与训练。

1、在训练中保存模型参数（Cheakpoints）

本文通过keras所提供回调参数（callbacks）中的模型检查点（ModelCheckpoint)保存模型训练中的权重数据。然后创建一个未经训练的模型，测试集显示新模型准确度约为10.5%，后将保存的权重加载，重新使用训练集评估，准确度约为87.2%。

回调函数是一个函数的合集，会在训练的阶段中所使用。你可以使用回调函数来查看训练模型的内在状态和统计。
允许在训练的过程中和结束时回调保存的模型。

参考资料：https://keras.io/zh/callbacks/#_1u

import os
import tensorflow as tf
from tensorflow import keras
print(tf.version.VERSION)

#加载数据集（训练集、测试集）
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
#使用前1000个数据
train_labels = train_labels[:1000]
test_labels = test_labels[:1000]
#类归一化处理，将图像深度从0-255变为0-1
train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0

# 定义一个简单的序列模型
def create_model():
  model = tf.keras.models.Sequential([
      #全连接层模型，激活函数relu,输入维度784（28*28）
    keras.layers.Dense(512, activation='relu', input_shape=(784,)),
      #建立Dropout，防止过拟合,增加模型泛化能力，随机丢弃输入单元概率设置为0.2
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10)
  ])

  model.compile(optimizer='adam',
                loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])
  return model
# 创建一个基本的模型实例
model = create_model()
# 显示模型的结构
model.summary()




#在训练期间保存模型（以 checkpoints 形式保存）
#保存的路径和名称
checkpoint_path = "training_1/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

# 创建一个保存模型权重的回调
#ModelCheckpoint：在每个训练期之后保存模型
#filepath:文件路径
#save_weights_only=True：被监测数据最佳模型不会被覆盖
#verbose=1:打印详细信息
#period: 每个检查点之间的间隔（训练轮数）
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1)
# 使用新的回调训练模型
model.fit(train_images,
          train_labels,
          epochs=10,
          validation_data=(test_images,test_labels),
          callbacks=[cp_callback])  # 记录回调参数


# 创建一个基本模型实例
model = create_model()
# 评估模型
loss, acc = model.evaluate(test_images,  test_labels, verbose=2)
print("Untrained model, accuracy: {:5.2f}%".format(100*acc))
# 加载权重
model.load_weights(checkpoint_path)
# 重新评估模型
loss,acc = model.evaluate(test_images,  test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

输出结果：

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 512)               401920    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130      
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
 1/32 [..............................] - ETA: 0s - loss: 2.3353 - accuracy: 0.1562
Epoch 00001: saving model to training_1/cp.ckpt
32/32 [==============================] - 1s 18ms/step - loss: 1.2178 - accuracy: 0.6480 - val_loss: 0.7362 - val_accuracy: 0.7850
Epoch 2/10
 1/32 [..............................] - ETA: 0s - loss: 0.3494 - accuracy: 0.9375
Epoch 00002: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 15ms/step - loss: 0.4409 - accuracy: 0.8740 - val_loss: 0.5288 - val_accuracy: 0.8410
Epoch 3/10
14/32 [============>.................] - ETA: 0s - loss: 0.3190 - accuracy: 0.9219
Epoch 00003: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 15ms/step - loss: 0.2917 - accuracy: 0.9300 - val_loss: 0.4958 - val_accuracy: 0.8490
Epoch 4/10
 1/32 [..............................] - ETA: 0s - loss: 0.1519 - accuracy: 0.9688
Epoch 00004: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 13ms/step - loss: 0.2089 - accuracy: 0.9540 - val_loss: 0.4435 - val_accuracy: 0.8530
Epoch 5/10
 1/32 [..............................] - ETA: 0s - loss: 0.0919 - accuracy: 1.0000
Epoch 00005: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 13ms/step - loss: 0.1563 - accuracy: 0.9630 - val_loss: 0.4257 - val_accuracy: 0.8560
Epoch 6/10
30/32 [===========================>..] - ETA: 0s - loss: 0.1299 - accuracy: 0.9760
Epoch 00006: saving model to training_1/cp.ckpt
32/32 [==============================] - 1s 17ms/step - loss: 0.1316 - accuracy: 0.9760 - val_loss: 0.4221 - val_accuracy: 0.8630
Epoch 7/10
31/32 [============================>.] - ETA: 0s - loss: 0.0900 - accuracy: 0.9829
Epoch 00007: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 14ms/step - loss: 0.0896 - accuracy: 0.9830 - val_loss: 0.4172 - val_accuracy: 0.8740
Epoch 8/10
30/32 [===========================>..] - ETA: 0s - loss: 0.0660 - accuracy: 0.9917
Epoch 00008: saving model to training_1/cp.ckpt
32/32 [==============================] - 1s 18ms/step - loss: 0.0658 - accuracy: 0.9920 - val_loss: 0.4227 - val_accuracy: 0.8680
Epoch 9/10
31/32 [============================>.] - ETA: 0s - loss: 0.0495 - accuracy: 0.9980
Epoch 00009: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 15ms/step - loss: 0.0494 - accuracy: 0.9980 - val_loss: 0.4176 - val_accuracy: 0.8650
Epoch 10/10
 1/32 [..............................] - ETA: 0s - loss: 0.0210 - accuracy: 1.0000
Epoch 00010: saving model to training_1/cp.ckpt
32/32 [==============================] - 1s 17ms/step - loss: 0.0382 - accuracy: 0.9970 - val_loss: 0.4103 - val_accuracy: 0.8720
评估未训练的模型
32/32 - 0s - loss: 2.3249 - accuracy: 0.1050
Untrained model, accuracy: 10.50%
加载权重后重新评估模型
32/32 - 0s - loss: 0.4103 - accuracy: 0.8720
Restored model, accuracy: 87.20%

2、按频次保存Checkpoint

此外还可以根据一定频率epoch，保存多个具有唯一名称的回调参数，

import os
import tensorflow as tf
from tensorflow import keras
print(tf.version.VERSION)

#加载数据集（训练集、测试集）
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
#使用前1000个数据
train_labels = train_labels[:1000]
test_labels = test_labels[:1000]
#类归一化处理，将图像深度从0-255变为0-1
train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0

# 定义一个简单的序列模型
def create_model():
  model = tf.keras.models.Sequential([
      #全连接层模型，激活函数relu,输入维度784（28*28）
    keras.layers.Dense(512, activation='relu', input_shape=(784,)),
      #建立Dropout，防止过拟合,增加模型泛化能力，随机丢弃输入单元概率设置为0.2
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10)
  ])

  model.compile(optimizer='adam',
                loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])
  return model
# 创建一个基本的模型实例
model = create_model()
# 显示模型的结构
model.summary()


#在训练期间保存模型（以 checkpoints 形式保存）
#保存的路径和名称
checkpoint_path = "training_1/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

# 创建一个保存模型权重的回调
#ModelCheckpoint：在每个训练期之后保存模型
#filepath:文件路径
#save_weights_only=True：被监测数据最佳模型不会被覆盖
#verbose=1:打印详细信息
#period: 每个检查点之间的间隔（训练轮数）
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1)
#在文件名中包含 epoch (使用 `str.format`)
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)






# 创建一个回调，每 5 个 epochs 保存模型的权重
cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path,
    verbose=1,
    save_weights_only=True,
    period=5)


# 使用 `checkpoint_path` 格式保存权重
model.save_weights(checkpoint_path.format(epoch=0))

# 使用新的回调训练模型
model.fit(train_images,
          train_labels,
          epochs=50,
          callbacks=[cp_callback],
          validation_data=(test_images,test_labels),
          verbose=0)
#现在查看生成的 checkpoint 并选择最新的 checkpoint ：
#该功能由函数latest_checkpoint（）实现
latest = tf.train.latest_checkpoint(checkpoint_dir)
print('latest:',latest)
#如需选择其他节点保存的文件，可以直接调用对应文件名，如下
FristCheckPoint='training_2/cp-0000.ckpt'

#验证回调参数

# 创建一个新的模型实例
model = create_model()
# 加载以前保存的权重
model.load_weights(latest)
# 重新评估模型
loss, acc = model.evaluate(test_images,  test_labels, verbose=2)#显示结果
print("Restored model, accuracy: {:5.2f}%".format(100*acc))#输出准确度

输出结果

Epoch 00005: saving model to training_2/cp-0005.ckpt

Epoch 00010: saving model to training_2/cp-0010.ckpt

Epoch 00015: saving model to training_2/cp-0015.ckpt

Epoch 00020: saving model to training_2/cp-0020.ckpt

Epoch 00025: saving model to training_2/cp-0025.ckpt

Epoch 00030: saving model to training_2/cp-0030.ckpt

Epoch 00035: saving model to training_2/cp-0035.ckpt

Epoch 00040: saving model to training_2/cp-0040.ckpt

Epoch 00045: saving model to training_2/cp-0045.ckpt

Epoch 00050: saving model to training_2/cp-0050.ckpt
latest: training_2\cp-0050.ckpt
32/32 - 0s - loss: 0.4845 - accuracy: 0.8740
Restored model, accuracy: 87.40%

保存的文件
在这里插入图片描述

上述代码将权重存储到 checkpoint—— 格式化文件的集合中，这些文件仅包含二进制格式的训练权重。 Checkpoints 包含：

一个或多个包含模型权重的分片。
索引文件，指示哪些权重存储在哪个分片中。
如果你只在一台机器上训练一个模型，你将有一个带有后缀的碎片：.data-00000-of-00001

3、手动保存权重

您将了解如何将权重加载到模型中。使用 Model.save_weights 方法手动保存它们同样简单。默认情况下， tf.keras 和 save_weights 特别使用 TensorFlow checkpoints 格式 .ckpt 扩展名和 ( 保存在 HDF5 扩展名为 .h5 保存并序列化模型 )：

import os
import tensorflow as tf
from tensorflow import keras
print(tf.version.VERSION)

#加载数据集（训练集、测试集）
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
#使用前1000个数据
train_labels = train_labels[:1000]
test_labels = test_labels[:1000]
#类归一化处理，将图像深度从0-255变为0-1
train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0

# 定义一个简单的序列模型
def create_model():
  model = tf.keras.models.Sequential([
      #全连接层模型，激活函数relu,输入维度784（28*28）
    keras.layers.Dense(512, activation='relu', input_shape=(784,)),
      #建立Dropout，防止过拟合,增加模型泛化能力，随机丢弃输入单元概率设置为0.2
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10)
  ])

  model.compile(optimizer='adam',
                loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])
  return model
# 创建一个基本的模型实例
model = create_model()
# 显示模型的结构
model.summary()


#在训练期间保存模型（以 checkpoints 形式保存）
#保存的路径和名称
checkpoint_path = "training_1/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

# 创建一个保存模型权重的回调
#ModelCheckpoint：在每个训练期之后保存模型
#filepath:文件路径
#save_weights_only=True：被监测数据最佳模型不会被覆盖
#verbose=1:打印详细信息
#period: 每个检查点之间的间隔（训练轮数）
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1)
# 使用新的回调训练模型
model.fit(train_images,
          train_labels,
          epochs=10,
          validation_data=(test_images,test_labels),
          callbacks=[cp_callback])  # 记录回调参数


# 保存权重
#Saves all layer weights.
model.save_weights('./checkpoints/my_checkpoint')

# 创建模型实例
model = create_model()

# 恢复权重
model.load_weights('./checkpoints/my_checkpoint')

# 评估模型
loss,acc = model.evaluate(test_images,  test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

输出结果

Restored model, accuracy: 86.30%

4、以HDF5格式保存整个模型

此处代码使用提前停止的方式防止过拟合。

import os
import tensorflow as tf
from tensorflow import keras
print(tf.version.VERSION)

#加载数据集（训练集、测试集）
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
#使用前1000个数据
train_labels = train_labels[:1000]
test_labels = test_labels[:1000]
#类归一化处理，将图像深度从0-255变为0-1
train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0

# 定义一个简单的序列模型
def create_model():
  model = tf.keras.models.Sequential([
      #全连接层模型，激活函数relu,输入维度784（28*28）
    keras.layers.Dense(512, activation='relu', input_shape=(784,)),
      #建立Dropout，防止过拟合,增加模型泛化能力，随机丢弃输入单元概率设置为0.2
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10)
  ])

  model.compile(optimizer='adam',
                loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])
  return model

#以HDF5格式保存
# 创建并训练一个新的模型实例
model = create_model()


# patience 值用来检查改进 epochs 的数量
#当验证值没有提高上是自动停止训练。 我们将使用一个 EarlyStopping callback 来测试每个 epoch
#的训练条件。如果经过一定数量的 epochs 后没有改进，则自动停止训练。
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)
#再次训练模型，显示日志verbose=1
model.fit(train_images, train_labels, epochs=50,
 validation_split = 0.2, verbose=1, callbacks=[early_stop])

#model.fit(train_images, train_labels, epochs=20)

# 将整个模型保存为 HDF5 文件。
# '.h5' 扩展名指示应将模型保存到 HDF5。
model.save('my_model.h5')

# 重新创建完全相同的模型，包括其权重和优化程序
new_model = tf.keras.models.load_model('my_model.h5')

# 显示网络结构
new_model.summary()
#检查准确性
loss, acc = new_model.evaluate(test_images,  test_labels, verbose=2)
print('Restored model, accuracy: {:5.2f}%'.format(100*acc))

输出结果

Epoch 16/50
25/25 [==============================] - 0s 2ms/step - loss: 0.0124 - accuracy: 1.0000 - val_loss: 0.5265 - val_accuracy: 0.8650
Epoch 17/50
25/25 [==============================] - 0s 2ms/step - loss: 0.0124 - accuracy: 1.0000 - val_loss: 0.5186 - val_accuracy: 0.8750
Epoch 18/50
25/25 [==============================] - 0s 2ms/step - loss: 0.0110 - accuracy: 1.0000 - val_loss: 0.5418 - val_accuracy: 0.8750
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 512)               401920    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130      
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
32/32 - 0s - loss: 0.4602 - accuracy: 0.8680
Restored model, accuracy: 86.80%

参考资料:HDF5 数据文件简介

5、以SavedModel 格式保存整个模型

SavedModel 格式是序列化模型的另一种方法。以这种格式保存的模型，可以使用 tf.keras.models.load_model 还原，并且模型与 TensorFlow Serving 兼容。SavedModel 指南详细介绍了如何提供/检查 SavedModel。以下部分说明了保存和还原模型的步骤。
SavedModel 格式是一个包含 protobuf 二进制文件和 Tensorflow 检查点（checkpoint）的目录。

import os
import tensorflow as tf
from tensorflow import keras
print(tf.version.VERSION)

#加载数据集（训练集、测试集）
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
#使用前1000个数据
train_labels = train_labels[:1000]
test_labels = test_labels[:1000]
#类归一化处理，将图像深度从0-255变为0-1
train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0

# 定义一个简单的序列模型
def create_model():
  model = tf.keras.models.Sequential([
      #全连接层模型，激活函数relu,输入维度784（28*28）
    keras.layers.Dense(512, activation='relu', input_shape=(784,)),
      #建立Dropout，防止过拟合,增加模型泛化能力，随机丢弃输入单元概率设置为0.2
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10)
  ])

  model.compile(optimizer='adam',
                loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])
  return model
  
#以SavedModel 格式保存
# 创建并训练一个新的模型实例。
model = create_model()
model.fit(train_images, train_labels, epochs=5)

# 将整个模型另存为 SavedModel。
model.save('saved_model/my_model')
#从保存的模型重新加载新的 Keras 模型：
new_model = tf.keras.models.load_model('saved_model/my_model')
# 检查其架构
new_model.summary()
# 评估还原的模型
loss, acc = new_model.evaluate(test_images,  test_labels, verbose=2)
print('Restored model, accuracy: {:5.2f}%'.format(100*acc))
print(new_model.predict(test_images).shape)

输出结果

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 512)               401920    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130      
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
32/32 - 0s - loss: 0.4319 - accuracy: 0.8610
Restored model, accuracy: 86.10%
(1000, 10)