【TensorFlow基础】keras机器学习基础知识_globalaveragepooling1d 过拟合-CSDN博客

本文链接：https://blog.csdn.net/SunYutong_1234/article/details/126268448

1. 在训练期间保存模型（利用checkpoint）

2. 手动保存权重 model.save_weights

3. 保存整个模型 model.save

7. 使用keras tuner调整超参数

1. 基本图像分类-服装图像分类

# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt

1.1 导入数据集

使用Fashion MNIST数据集，该数据集包含 10 个类别的 70,000 个灰度图像。这些图像以低分辨率（28x28 像素）展示了单件衣物。直接从TensorFlow中导入和加载Fashion MNIST数据：

fashion_mnist = keras.datasets.fashion_mnist
# 返回四个numpy数组
# train_images, train_labels：训练集，模型用于学习的数据
# test_images, test_labels：测试集，模型用于测试的数据
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

图像是 28x28 的 NumPy 数组，像素值介于 0 到 255 之间。标签是整数数组，介于 0 到 9 之间。

# 存储标签0~9对应的类名称
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

1.2 浏览数据

# 训练集有60000个样本
train_images.shape
# 输出 (60000, 28, 28)

len(train_labels)
# 输出 60000

train_labels
# 输出 array([9, 0, 0, ..., 3, 0, 5], dtype=uint8)

# 测试集有10000个样本
test_images.shape
# 输出 (10000, 28, 28)

len(test_labels)
# 输出 10000

1.3 预处理数据

# 检查训练集中第一个图象
plt.figure()
plt.imshow(train_images[0])
plt.colorbar()
plt.grid(False)
plt.show()

# 将像素值缩小到0~1之间
train_images = train_images / 255.0
test_images = test_images / 255.0

查看数据集，显示训练集中前25个图像：

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[train_labels[i]])
plt.show()

1.4 构建模型

# 设置层，将简单的层链接在一起
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10)
])

tf.keras.layers.Flatten 将图像格式从二维数组（28 x 28 像素）转换成一维数组（28 x 28 = 784 像素）。将该层视为图像中未堆叠的像素行并将其排列起来。该层没有要学习的参数，它只会重新格式化数据。

tf.keras.layers.Dense 密集连接或全连接神经层。第一个Dense层有 128 个节点（或神经元）。第二个（也是最后一个）层会返回一个长度为 10 的 logits 数组。每个节点都包含一个得分，用来表示当前图像属于 10 个类中的哪一类。

# 编译模型
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

损失函数 loss - 用于测量模型在训练期间的准确率。通过最小化损失函数优化模型性能。
优化器 optimizer - 决定模型如何根据其看到的数据和自身的损失函数进行更新。
指标 metrics - 用于监控/衡量训练和测试步骤中的模型性能。

1.5 训练模型

# 训练模型
model.fit(train_images, train_labels, epochs=10)

# 评估准确率
test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
print('\nTest accuracy:', test_acc)

1.6 使用模型进行预测

该模型具有线性输出，可以通过softmax函数生成一个归一化的概率向量，将线性输出转换成概率。

# 预测模型（在训练好的模型后面添加一个softmax层）
probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])

# 进行预测
predictions = probability_model.predict(test_images)

# 查看预测结果
predictions[0]
# array([7.7300720e-06, 3.1858748e-11, 3.0451045e-07, 2.7817364e-09,
#       1.3059016e-09, 3.1923674e-04, 3.9461247e-06, 1.5980251e-02,
#       5.8933104e-08, 9.8368847e-01], dtype=float32)

# 查看预测结果中置信度最大的种类
np.argmax(predictions[0])
# 9

可以查看预测结果：

# Plot the first X test images, their predicted labels, and the true labels.
# Color correct predictions in blue and incorrect predictions in red.
num_rows = 5
num_cols = 3
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
  plt.subplot(num_rows, 2*num_cols, 2*i+1)
  plot_image(i, predictions[i], test_labels, test_images)
  plt.subplot(num_rows, 2*num_cols, 2*i+2)
  plot_value_array(i, predictions[i], test_labels)
plt.tight_layout()
plt.show()

# 使用训练好的模型
# Grab an image from the test dataset.
img = test_images[1]

# Add the image to a batch where it's the only member.
img = (np.expand_dims(img,0))
print(img.shape)
# (1,28,28)

predictions_single = probability_model.predict(img)
print(predictions_single)
# [[3.0789899e-05 4.1240561e-12 9.9947554e-01 1.6958888e-09 3.4095356e-04 8.6128709e-14 1.5278466e-04 5.1959396e-17 5.5406429e-11 1.4665751e-13]]

np.argmax(predictions_single[0])
# 2

tf.keras模型可以同时对一批或一组样本进行预测，因此即使只使用一个图像，也需要将其添加到列表中。

2. 基本文本分类-电影评论文本分类

import tensorflow as tf
from tensorflow import keras

import numpy as np

2.1 导入数据集

二分类问题，使用IMDB数据集，其包含 50,000 条影评文本。从该数据集切割出的25,000条评论用作训练，另外 25,000 条用作测试。训练集与测试集包含相等数量的积极和消极评论（平衡）。

# 参数num_words：保留训练数据中最常出现的num_words个单词
imdb = keras.datasets.imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

2.2 浏览数据

# 每个样本都是一个表示影评中词汇的整数数组
# 每个标签都是0/1，0表示消极评论，1表示积极评论
print("Training entries: {}, labels: {}".format(len(train_data), len(train_labels)))
# Training entries: 25000, labels: 25000

# 评论文本被转换为整数值，每个整数值代表一个单词
print(train_data[0])
# [1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]

# 由于电影评论具有不同的长度，因此每个样本长度不一样
len(train_data[0]), len(train_data[1])
# (218, 189)

# 将整数转换回单词的方法
# 一个映射单词到整数索引的词典
word_index = imdb.get_word_index()

# 保留第一个索引
word_index = {k:(v+3) for k,v in word_index.items()}
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2  # unknown
word_index["<UNUSED>"] = 3

reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

def decode_review(text):
    return ' '.join([reverse_word_index.get(i, '?') for i in text])

# 显示首条评论的文本
decode_review(train_data[0])
# "<START> this film ... <UNK> ... <UNK> ... <UNK> ... <UNK> ..."

2.3 预处理数据

影评/输入整数数组（特征）需要在输入神经网络之前转换为张量。有以下两种方式可以选择：

采用one-hot编码，用0/1表示该位置的单词是否出现，将其作为网络的第一层（处理浮点型向量数据的Dense层）。缺点是需要大量的内存（num_words*num_reviews的矩阵）
填充数组保证输入数据具有相同的长度，创建一个大小为max_length*num_reviews的整型张量，使用能够处理此数据的嵌入层（Embedding）作为网络的第一层（√）

# pad_sequences使电影评论长度标准化
# 使用"<PAD>"填充，在整数数组中为0
train_data = keras.preprocessing.sequence.pad_sequences(train_data,
                                                        value=word_index["<PAD>"],
                                                        padding='post',
                                                        maxlen=256)

test_data = keras.preprocessing.sequence.pad_sequences(test_data,
                                                       value=word_index["<PAD>"],
                                                       padding='post',
                                                       maxlen=256)

2.4 构建模型

# 输入形状是用于电影评论的词汇数目（10,000 词）
vocab_size = 10000

model = keras.Sequential()
model.add(keras.layers.Embedding(vocab_size, 16))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(16, activation='relu'))
model.add(keras.layers.Dense(1, activation='sigmoid'))

model.summary()

嵌入层Embedding：该层采用整数编码的词汇表，并查找每个词索引的嵌入向量（embedding vector）。这些向量是通过模型训练学习到的。向量向输出数组增加了一个维度。得到的维度为：（batch,sequence,embedding）即（批处理评论条数，评论长度256，16）。
GlobalAveragePooling1D：将通过对序列维度求平均值来为每个样本返回一个定长输出向量。这允许模型以尽可能最简单的方式处理变长输入。
Dense全连接层：上一层的定长输出向量通过有16个隐藏节点的全连接层传输。
Dense全连接层：使用Sigmoid激活函数由单个输出节点输出介于 0 与 1 之间的浮点数，表示概率或置信度。

* 嵌入层Embedding用于将一个对象表示为一个数值向量，起到降维或者升维的效果。在该场景下相当于将每批输入数据batch*256(max_length)转换为batch*256*16的数据。

* GlobalAveragePooling1D中Pooling的意义是把多维度的数据整合成一个数据，常见的整合方法有AveragePooling、MaxPooling等。GlobalAveragePooling1D通过求均值为每个样本返回一个定长的输出向量。

# 配置损失函数与优化器
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

'binary_crossentropy' 由于是一个二分类问题且模型输出概率值，因此选择更适合处理概率的'binary_crossentropy' 损失函数

2.5 训练模型

首先创建一个验证集：

# 从训练集中分离出验证集
x_val = train_data[:10000]
partial_x_train = train_data[10000:]

y_val = train_labels[:10000]
partial_y_train = train_labels[10000:]

# 训练模型
# history对象包含一个字典，包含训练阶段发生的事件
history = model.fit(partial_x_train,                  # 训练样本
                    partial_y_train,                  # 训练标签
                    epochs=40,                        # 训练周期数
                    batch_size=512,                   # batch大小
                    validation_data=(x_val, y_val),   # 验证集
                    verbose=1)

# 评估模型
results = model.evaluate(test_data,  test_labels, verbose=2)

print(results)
# [0.32977813482284546, 0.8728799819946289]

history对象中包含一个字典，其中有四个条目：训练过程的损失值loss和准确率accuracy、验证过程的损失值val_loss和准确率val_accuracy

# 查看history
history_dict = history.history
history_dict.keys()
# dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

# 绘制一个准确率accuracy和损失值loss随时间变化的图表
import matplotlib.pyplot as plt

acc = history_dict['accuracy']
val_acc = history_dict['val_accuracy']
loss = history_dict['loss']
val_loss = history_dict['val_loss']

epochs = range(1, len(acc) + 1)

# “bo”代表 "蓝点"
plt.plot(epochs, loss, 'bo', label='Training loss')
# b代表“蓝色实线”
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

plt.clf()   # 清除数字

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.show()

3. 使用TF Hub进行文本分类

使用来自TensorFlow Hub的与训练文本嵌入向量模型进行训练。tfhub.dev

# 安装库
pip install tensorflow-hub
pip install tensorflow-datasets

import numpy as np

import tensorflow as tf

!pip install tensorflow-hub
!pip install tfds-nightly
import tensorflow_hub as hub
import tensorflow_datasets as tfds

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print("GPU is", "available" if tf.config.experimental.list_physical_devices("GPU") else "NOT AVAILABLE")


# 导入数据集
# Split the training set into 60% and 40% to end up with 15,000 examples
# for training, 10,000 examples for validation and 25,000 examples for testing.
train_data, validation_data, test_data = tfds.load(
    name="imdb_reviews", 
    split=('train[:60%]', 'train[60%:]', 'test'),
    as_supervised=True)


# 导入已经训练好的嵌入层embedding
embedding = "https://tfhub.dev/google/nnlm-en-dim50/2"
hub_layer = hub.KerasLayer(embedding, input_shape=[], 
                           dtype=tf.string, trainable=True)
hub_layer(train_examples_batch[:3])


# 搭建模型
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1))

model.summary()  # 查看模型结构


# 编译模型
model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])


# 训练模型
history = model.fit(train_data.shuffle(10000).batch(512),
                    epochs=10,
                    validation_data=validation_data.batch(512),
                    verbose=1)


# 评估模型
results = model.evaluate(test_data.batch(512), verbose=2)

for name, value in zip(model.metrics_names, results):
  print("%s: %.3f" % (name, value))

4. 回归-预测燃油效率

回归 regression 问题中预测价格或概率之类的连续纸输出，而分类 classification 问题只需要从一系列的分类中选择出一个分类。本案例使用Auto MPG数据集，构建了一个用气缸数，排量，马力以及重量来预测汽车燃油效率的模型。

# 使用 seaborn 绘制矩阵图 (pairplot)
pip install -q seaborn

import pathlib

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers

4.1 导入数据集

# 下载数据集
dataset_path = keras.utils.get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")
dataset_path

# 使用pandas导入数据集
column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
raw_dataset = pd.read_csv(dataset_path, names=column_names,
                      na_values = "?", comment='\t',
                      sep=" ", skipinitialspace=True)

4.2 浏览数据

dataset = raw_dataset.copy()
dataset.tail()

4.3 预处理数据

由于数据集中包括一些未知值：

# 检测缺失值
dataset.isna().sum()

"""
MPG             0
Cylinders       0
Displacement    0
Horsepower      6
Weight          0
Acceleration    0
Model Year      0
Origin          0
dtype: int64
"""

# 删除这些行
dataset = dataset.dropna()

“Origin”代表分类，将其转换为one-hot编码

origin = dataset.pop('Origin')
dataset['USA'] = (origin == 1)*1.0
dataset['Europe'] = (origin == 2)*1.0
dataset['Japan'] = (origin == 3)*1.0
dataset.tail()

# 拆分训练集和测试集
train_dataset = dataset.sample(frac=0.8,random_state=0)
test_dataset = dataset.drop(train_dataset.index)

# 查看训练集中样本分布
sns.pairplot(train_dataset[["MPG", "Cylinders", "Displacement", "Weight"]], diag_kind="kde")

# 查看总体的数据统计
train_stats = train_dataset.describe()
train_stats.pop("MPG")
train_stats = train_stats.transpose()
train_stats

# 从标签中分离特征
train_labels = train_dataset.pop('MPG')
test_labels = test_dataset.pop('MPG')

# 特征归一化
def norm(x):
  return (x - train_stats['mean']) / train_stats['std']
normed_train_data = norm(train_dataset)
normed_test_data = norm(test_dataset)

4.4 构建模型

def build_model():
  model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=[len(train_dataset.keys())]),
    layers.Dense(64, activation='relu'),
    layers.Dense(1)
  ])

  optimizer = tf.keras.optimizers.RMSprop(0.001)

  model.compile(loss='mse',
                optimizer=optimizer,
                metrics=['mae', 'mse'])
  return model

model = build_model()

model.summary()

'''
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 64)                640       
_________________________________________________________________
dense_1 (Dense)              (None, 64)                4160      
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 65        
=================================================================
Total params: 4,865
Trainable params: 4,865
Non-trainable params: 0
_________________________________________________________________
'''

均方误差（MSE）是用于回归问题的常见损失函数（分类问题中使用不同的损失函数）。
类似的，用于回归的评估指标与分类不同。常见的回归指标是平均绝对误差（MAE）。

模型最终会输出一个dtype=float32的array

4.5 训练模型

# 通过为每个完成的时期打印一个点来显示训练进度，生成一堆.序列
class PrintDot(keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs):
    if epoch % 100 == 0: print('')
    print('.', end='')

EPOCHS = 1000

# 训练模型
history = model.fit(
  normed_train_data, train_labels,
  epochs=EPOCHS, validation_split = 0.2, verbose=0,
  callbacks=[PrintDot()])

# 使用 history 对象中存储的统计信息可视化模型的训练进度
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist.tail()

def plot_history(history):
  hist = pd.DataFrame(history.history)
  hist['epoch'] = history.epoch

  plt.figure()
  plt.xlabel('Epoch')
  plt.ylabel('Mean Abs Error [MPG]')
  plt.plot(hist['epoch'], hist['mae'],
           label='Train Error')
  plt.plot(hist['epoch'], hist['val_mae'],
           label = 'Val Error')
  plt.ylim([0,5])
  plt.legend()

  plt.figure()
  plt.xlabel('Epoch')
  plt.ylabel('Mean Square Error [$MPG^2$]')
  plt.plot(hist['epoch'], hist['mse'],
           label='Train Error')
  plt.plot(hist['epoch'], hist['val_mse'],
           label = 'Val Error')
  plt.ylim([0,20])
  plt.legend()
  plt.show()


plot_history(history)

由于在100个epochs后误差没有减小反而在上升，因此更新 model.fit 的调用，当验证集上的精度没有提高时自动停止训练。

model = build_model()

# patience 值用来检查改进 epochs 的数量
# 使用 EarlyStopping callback 来测试每个 epoch 的训练条件
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)

history = model.fit(normed_train_data, train_labels, epochs=EPOCHS,
                    validation_split = 0.2, verbose=0, callbacks=[early_stop, PrintDot()])

plot_history(history)

4.6 使用模型进行预测

# 在测试集上验证模型性能
loss, mae, mse = model.evaluate(normed_test_data, test_labels, verbose=2)

print("Testing set Mean Abs Error: {:5.2f} MPG".format(mae))

'''
3/3 - 0s - loss: 5.9941 - mae: 1.8809 - mse: 5.9941
Testing set Mean Abs Error:  1.88 MPG
'''

# 使用测试集中的数据预测
test_predictions = model.predict(normed_test_data).flatten()

plt.scatter(test_labels, test_predictions)
plt.xlabel('True Values [MPG]')
plt.ylabel('Predictions [MPG]')
plt.axis('equal')
plt.axis('square')
plt.xlim([0,plt.xlim()[1]])
plt.ylim([0,plt.ylim()[1]])
_ = plt.plot([-100, 100], [-100, 100])

# 查看误差分布
error = test_predictions - test_labels
plt.hist(error, bins = 25)
plt.xlabel("Prediction Error [MPG]")
_ = plt.ylabel("Count")

当数字输入数据特征的值存在不同范围时，每个特征应独立缩放到相同范围。
如果训练数据不多，一种方法是选择隐藏层较少的小网络，以避免过度拟合。
早期停止是一种防止过度拟合的有效技术。

5. 过拟合和欠拟合

防止过拟合：使用更完整的数据集、使用正则化等技术。

5.1 设置

import tensorflow as tf

from tensorflow.keras import layers
from tensorflow.keras import regularizers

!pip install git+https://github.com/tensorflow/docs

import tensorflow_docs as tfdocs
import tensorflow_docs.modeling
import tensorflow_docs.plots

from  IPython import display
from matplotlib import pyplot as plt

import numpy as np

import pathlib
import shutil
import tempfile

logdir = pathlib.Path(tempfile.mkdtemp())/"tensorboard_logs"
shutil.rmtree(logdir, ignore_errors=True)

5.2 Higgs数据集

数据集中包含11000000个样本，每个样本有28个特征和一个二分类的标签

# 下载数据集
gz = tf.keras.utils.get_file('HIGGS.csv.gz', 'http://mlphysics.ics.uci.edu/data/higgs/HIGGS.csv.gz')

FEATURES = 28

# tf.data.experimental.CsvDataset 类可用于直接从 gzip 文件读取 csv 记录，无需中间解压缩步骤。
# 返回每条样本的标量列表
ds = tf.data.experimental.CsvDataset(gz,[float(),]*(FEATURES+1), compression_type="GZIP")

# 将一条样本分成特征和标签数据对
def pack_row(*row):
  label = row[0]
  features = tf.stack(row[1:],1)
  return features, label

# TensorFlow无需单独重新打包每行，可以按照每批次10000应用pack_row函数
packed_ds = ds.batch(10000).map(pack_row).unbatch()

# 查看处理好的packed_ds
for features,label in packed_ds.batch(1000).take(1):
  print(features[0])
  plt.hist(features.numpy().flatten(), bins = 101)
'''
packed_ds
tf.Tensor(
[ 0.8692932  -0.6350818   0.22569026  0.32747006 -0.6899932   0.75420225
 -0.24857314 -1.0920639   0.          1.3749921  -0.6536742   0.9303491
  1.1074361   1.1389043  -1.5781983  -1.0469854   0.          0.65792954
 -0.01045457 -0.04576717  3.1019614   1.35376     0.9795631   0.97807616
  0.92000484  0.72165745  0.98875093  0.87667835], shape=(28,), dtype=float32)
'''

# 使用前1000个样本进行验证，使用接下来的10000个样本进行训练
N_VALIDATION = int(1e3)
N_TRAIN = int(1e4)
BUFFER_SIZE = int(1e4)
BATCH_SIZE = 500
STEPS_PER_EPOCH = N_TRAIN//BATCH_SIZE

# Dataset.take命令取对应数量的样本
# Dataset.skip命令跳过对应数量
# cache命令确保加载程序不需要在每个epoch从文件中重新读取数据
validate_ds = packed_ds.take(N_VALIDATION).cache()
train_ds = packed_ds.skip(N_VALIDATION).take(N_TRAIN).cache()

train_ds
# <CacheDataset element_spec=(TensorSpec(shape=(28,), dtype=tf.float32, name=None), TensorSpec(shape=(), dtype=tf.float32, name=None))>

# 使用Dataset.batch方法创建适合训练大小的批
validate_ds = validate_ds.batch(BATCH_SIZE)
train_ds = train_ds.shuffle(BUFFER_SIZE).repeat().batch(BATCH_SIZE)

5.3 演示过拟合

在训练期间逐渐降低学习率可以获得更好的训练效果，使用 tf.keras.optimizers.schedules 来降低一段时间内的学习率：

lr_schedule = tf.keras.optimizers.schedules.InverseTimeDecay(
  0.001,
  decay_steps=STEPS_PER_EPOCH*1000,
  decay_rate=1,
  staircase=False)

def get_optimizer():
  return tf.keras.optimizers.Adam(lr_schedule)

# 产看随着epoch变化的学习率
step = np.linspace(0,100000)
lr = lr_schedule(step)
plt.figure(figsize = (8,6))
plt.plot(step/STEPS_PER_EPOCH, lr)
plt.ylim([0,max(plt.ylim())])
plt.xlabel('Epoch')
_ = plt.ylabel('Learning Rate')

设置早停回调 tf.keras.callbacks.EarlyStopping 避免不必要的训练时间。

def get_callbacks(name):
  return [
    tfdocs.modeling.EpochDots(),
    tf.keras.callbacks.EarlyStopping(monitor='val_binary_crossentropy', patience=200),
    tf.keras.callbacks.TensorBoard(logdir/name),
  ]

为所有模型设置相同的model.compile和model.fit：

def compile_and_fit(model, name, optimizer=None, max_epochs=10000):
  if optimizer is None:
    optimizer = get_optimizer()
  model.compile(optimizer=optimizer,
                loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
                metrics=[
                  tf.keras.losses.BinaryCrossentropy(
                      from_logits=True, name='binary_crossentropy'),
                  'accuracy'])

  model.summary()

  history = model.fit(
    train_ds,
    steps_per_epoch = STEPS_PER_EPOCH,
    epochs=max_epochs,
    validation_data=validate_ds,
    callbacks=get_callbacks(name),
    verbose=0)
  return history

tiny_model的训练：

tiny_model = tf.keras.Sequential([
    layers.Dense(16, activation='elu', input_shape=(FEATURES,)),
    layers.Dense(1)
])

size_histories = {}

size_histories['Tiny'] = compile_and_fit(tiny_model, 'sizes/Tiny')

'''
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 16)                464       
                                                                 
 dense_1 (Dense)             (None, 1)                 17        
                                                                 
=================================================================
Total params: 481
Trainable params: 481
Non-trainable params: 0
_________________________________________________________________

Epoch: 0, accuracy:0.4915,  binary_crossentropy:0.8589,  loss:0.8589,  val_accuracy:0.4730,  val_binary_crossentropy:0.8619,  val_loss:0.8619,  
...
'''

# 查看训练过程
plotter = tfdocs.plots.HistoryPlotter(metric = 'binary_crossentropy', smoothing_std=10)
plotter.plot(size_histories)
plt.ylim([0.5, 0.7])

small_model的训练：

small_model = tf.keras.Sequential([
    # `input_shape` is only required here so that `.summary` works.
    layers.Dense(16, activation='elu', input_shape=(FEATURES,)),
    layers.Dense(16, activation='elu'),
    layers.Dense(1)
])

size_histories['Small'] = compile_and_fit(small_model, 'sizes/Small')

'''
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_2 (Dense)             (None, 16)                464       
                                                                 
 dense_3 (Dense)             (None, 16)                272       
                                                                 
 dense_4 (Dense)             (None, 1)                 17        
                                                                 
=================================================================
Total params: 753
Trainable params: 753
Non-trainable params: 0
_________________________________________________________________

Epoch: 0, accuracy:0.4831,  binary_crossentropy:0.7411,  loss:0.7411,  val_accuracy:0.4670,  val_binary_crossentropy:0.7131,  val_loss:0.7131,  
...
'''

medium_model的训练：

medium_model = tf.keras.Sequential([
    layers.Dense(64, activation='elu', input_shape=(FEATURES,)),
    layers.Dense(64, activation='elu'),
    layers.Dense(64, activation='elu'),
    layers.Dense(1)
])

size_histories['Medium']  = compile_and_fit(medium_model, "sizes/Medium")

'''
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_5 (Dense)             (None, 64)                1856      
                                                                 
 dense_6 (Dense)             (None, 64)                4160      
                                                                 
 dense_7 (Dense)             (None, 64)                4160      
                                                                 
 dense_8 (Dense)             (None, 1)                 65        
                                                                 
=================================================================
Total params: 10,241
Trainable params: 10,241
Non-trainable params: 0
_________________________________________________________________

Epoch: 0, accuracy:0.4828,  binary_crossentropy:0.7027,  loss:0.7027,  val_accuracy:0.5230,  val_binary_crossentropy:0.6887,  val_loss:0.6887,  
...
'''

large_model的训练：

large_model = tf.keras.Sequential([
    layers.Dense(512, activation='elu', input_shape=(FEATURES,)),
    layers.Dense(512, activation='elu'),
    layers.Dense(512, activation='elu'),
    layers.Dense(512, activation='elu'),
    layers.Dense(1)
])

size_histories['large'] = compile_and_fit(large_model, "sizes/large")

'''
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_9 (Dense)             (None, 512)               14848     
                                                                 
 dense_10 (Dense)            (None, 512)               262656    
                                                                 
 dense_11 (Dense)            (None, 512)               262656    
                                                                 
 dense_12 (Dense)            (None, 512)               262656    
                                                                 
 dense_13 (Dense)            (None, 1)                 513       
                                                                 
=================================================================
Total params: 803,329
Trainable params: 803,329
Non-trainable params: 0
_________________________________________________________________

Epoch: 0, accuracy:0.5092,  binary_crossentropy:0.8567,  loss:0.8567,  val_accuracy:0.5720,  val_binary_crossentropy:0.6973,  val_loss:0.6973,  
...
'''

查看四种规模的模型训练过程和验证过程的loss：

plotter.plot(size_histories)
a = plt.xscale('log')
plt.xlim([5, max(plt.xlim())])
plt.ylim([0.5, 0.7])
plt.xlabel("Epochs [Log Scale]")

5.4 防止过拟合的策略

模型训练期间的数据卸载TensorBoard日志中，复制上述模型的训练日志用于比较：

shutil.rmtree(logdir/'regularizers/Tiny', ignore_errors=True)
shutil.copytree(logdir/'sizes/Tiny', logdir/'regularizers/Tiny')

regularizer_histories = {}
regularizer_histories['Tiny'] = size_histories['Tiny']

方法一：正则化

正则化通过强制权重取较小值约束网络的复杂性，能够使权重的分布更加规则，通过在网络的损失函数中添加与大权重相关的cost完成：

L1正则化：添加的cost与权重系数的绝对值成正比，使权重向0靠近，鼓励稀疏模型；
L2正则化：添加的cost与权重系数的平方成正比，减小权重参数但是不会使模型变得稀疏；

# 添加L2权重正则化
l2_model = tf.keras.Sequential([
    layers.Dense(512, activation='elu',
                 kernel_regularizer=regularizers.l2(0.001),
                 input_shape=(FEATURES,)),
    layers.Dense(512, activation='elu',
                 kernel_regularizer=regularizers.l2(0.001)),
    layers.Dense(512, activation='elu',
                 kernel_regularizer=regularizers.l2(0.001)),
    layers.Dense(512, activation='elu',
                 kernel_regularizer=regularizers.l2(0.001)),
    layers.Dense(1)
])

regularizer_histories['l2'] = compile_and_fit(l2_model, "regularizers/l2")

'''
Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_14 (Dense)            (None, 512)               14848     
                                                                 
 dense_15 (Dense)            (None, 512)               262656    
                                                                 
 dense_16 (Dense)            (None, 512)               262656    
                                                                 
 dense_17 (Dense)            (None, 512)               262656    
                                                                 
 dense_18 (Dense)            (None, 1)                 513       
                                                                 
=================================================================
Total params: 803,329
Trainable params: 803,329
Non-trainable params: 0
_________________________________________________________________

Epoch: 0, accuracy:0.5052,  binary_crossentropy:0.8148,  loss:2.3360,  val_accuracy:0.4760,  val_binary_crossentropy:0.6928,  val_loss:2.1343,  
...
'''

l2(0.001) 每个权重都将增加网络的总损耗，0.001*weight_coefficient_value**2

方法二：添加Dropout层

Dropout层在训练阶段随机丢弃权重

dropout_model = tf.keras.Sequential([
    layers.Dense(512, activation='elu', input_shape=(FEATURES,)),
    layers.Dropout(0.5),
    layers.Dense(512, activation='elu'),
    layers.Dropout(0.5),
    layers.Dense(512, activation='elu'),
    layers.Dropout(0.5),
    layers.Dense(512, activation='elu'),
    layers.Dropout(0.5),
    layers.Dense(1)
])

regularizer_histories['dropout'] = compile_and_fit(dropout_model, "regularizers/dropout")

'''
Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_19 (Dense)            (None, 512)               14848     
                                                                 
 dropout (Dropout)           (None, 512)               0         
                                                                 
 dense_20 (Dense)            (None, 512)               262656    
                                                                 
 dropout_1 (Dropout)         (None, 512)               0         
                                                                 
 dense_21 (Dense)            (None, 512)               262656    
                                                                 
 dropout_2 (Dropout)         (None, 512)               0         
                                                                 
 dense_22 (Dense)            (None, 512)               262656    
                                                                 
 dropout_3 (Dropout)         (None, 512)               0         
                                                                 
 dense_23 (Dense)            (None, 1)                 513       
                                                                 
=================================================================
Total params: 803,329
Trainable params: 803,329
Non-trainable params: 0
_________________________________________________________________

Epoch: 0, accuracy:0.5034,  binary_crossentropy:0.8014,  loss:0.8014,  val_accuracy:0.5600,  val_binary_crossentropy:0.7065,  val_loss:0.7065,  
...
'''

方法三：结合L2正则化与Dropout层

combined_model = tf.keras.Sequential([
    layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001),
                 activation='elu', input_shape=(FEATURES,)),
    layers.Dropout(0.5),
    layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001),
                 activation='elu'),
    layers.Dropout(0.5),
    layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001),
                 activation='elu'),
    layers.Dropout(0.5),
    layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001),
                 activation='elu'),
    layers.Dropout(0.5),
    layers.Dense(1)
])

regularizer_histories['combined'] = compile_and_fit(combined_model, "regularizers/combined")

'''
Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_24 (Dense)            (None, 512)               14848     
                                                                 
 dropout_4 (Dropout)         (None, 512)               0         
                                                                 
 dense_25 (Dense)            (None, 512)               262656    
                                                                 
 dropout_5 (Dropout)         (None, 512)               0         
                                                                 
 dense_26 (Dense)            (None, 512)               262656    
                                                                 
 dropout_6 (Dropout)         (None, 512)               0         
                                                                 
 dense_27 (Dense)            (None, 512)               262656    
                                                                 
 dropout_7 (Dropout)         (None, 512)               0         
                                                                 
 dense_28 (Dense)            (None, 1)                 513       
                                                                 
=================================================================
Total params: 803,329
Trainable params: 803,329
Non-trainable params: 0
_________________________________________________________________

Epoch: 0, accuracy:0.5102,  binary_crossentropy:0.7920,  loss:0.9501,  val_accuracy:0.5270,  val_binary_crossentropy:0.6840,  val_loss:0.8413,  
...
'''

另外还有数据扩充 data augmentation （常用于图像）和批处理规范化 batch normalization 方法可以应用。

数据扩充Data augmentation | TensorFlow Core (google.cn)，批处理规范化 tf.keras.layers.BatchNormalization | TensorFlow Core v2.9.1 (google.cn)

6. 保存和加载模型

需要安装读取hdf5的功能包

pip install pyyaml h5py  # Required to save models in HDF5 format

import os

import tensorflow as tf
from tensorflow import keras

得到训练好的模型：

# 导入数据集
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

train_labels = train_labels[:1000]
test_labels = test_labels[:1000]

train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0

# 构建模型
# Define a simple sequential model
def create_model():
  model = tf.keras.models.Sequential([
    keras.layers.Dense(512, activation='relu', input_shape=(784,)),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10)
  ])

  model.compile(optimizer='adam',
                loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=[tf.metrics.SparseCategoricalAccuracy()])

  return model

# Create a basic model instance
model = create_model()

# Display the model's architecture
model.summary()

'''
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 512)               401920    
                                                                 
 dropout (Dropout)           (None, 512)               0         
                                                                 
 dense_1 (Dense)             (None, 10)                5130      
                                                                 
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
'''

1. 在训练期间保存模型（利用checkpoint）

tf.keras.callbacks.ModelCheckpoint 回调允许在训练期间和结束时保存模型

# 创建只在训练期间保存权重的 tf.keras.callbacks.ModelCheckpoint 回调
checkpoint_path = "training_1/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

# 创建 callback 保存模型权重
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1)

# 使用新的 callback 训练模型
model.fit(train_images, 
          train_labels,  
          epochs=10,
          validation_data=(test_images, test_labels),
          callbacks=[cp_callback])

# 创建 TensorFlow checkpoint 文件集合，在每个 epoch 结束时更新
os.listdir(checkpoint_dir)
# ['cp.ckpt.data-00000-of-00001', 'cp.ckpt.index', 'checkpoint']

# 重新创建一个未经训练的全新模型进行评估，再从checkpoint加载权重后评估
model = create_model()  # Create a basic model instance
loss, acc = model.evaluate(test_images, test_labels, verbose=2)  # Evaluate the model
print("Untrained model, accuracy: {:5.2f}%".format(100 * acc))

'''
32/32 - 0s - loss: 2.4002 - sparse_categorical_accuracy: 0.0930 - 261ms/epoch - 8ms/step
Untrained model, accuracy:  9.30%
'''

model.load_weights(checkpoint_path)  # Loads the weights
loss, acc = model.evaluate(test_images, test_labels, verbose=2)  # Re-evaluate the model
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))

'''
32/32 - 0s - loss: 0.3860 - sparse_categorical_accuracy: 0.8750 - 75ms/epoch - 2ms/step
Restored model, accuracy: 87.50%
'''

checkpoint回调选项：

# 在文件名中包含 epoch (uses `str.format`)
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

batch_size = 32

# 创建 callback 每5个 epoch 保存模型权重
cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path, 
    verbose=1, 
    save_weights_only=True,
    save_freq=5*batch_size)

# 创建一个新的模型
model = create_model()

# 使用 `checkpoint_path` 模式存储权重
model.save_weights(checkpoint_path.format(epoch=0))

# 使用新的 callback 训练模型
model.fit(train_images, 
          train_labels,
          epochs=50, 
          batch_size=batch_size, 
          callbacks=[cp_callback],
          validation_data=(test_images, test_labels),
          verbose=0)

'''
Epoch 5: saving model to training_2/cp-0005.ckpt

Epoch 10: saving model to training_2/cp-0010.ckpt
......
Epoch 50: saving model to training_2/cp-0050.ckpt
<keras.callbacks.History at 0x7fd55c465e80>
'''

# 查看生成的 checkpoint 并选择最新的
# 默认 TensorFlow 格式只保存最近的 5 个检查点
os.listdir(checkpoint_dir)
'''
['cp-0015.ckpt.index',
 'cp-0050.ckpt.index',
 'cp-0025.ckpt.data-00000-of-00001',
 'cp-0035.ckpt.data-00000-of-00001',
 'cp-0045.ckpt.index',
 'cp-0010.ckpt.data-00000-of-00001',
 'cp-0045.ckpt.data-00000-of-00001',
 'cp-0005.ckpt.index',
 'cp-0040.ckpt.data-00000-of-00001',
 'cp-0015.ckpt.data-00000-of-00001',
 'cp-0000.ckpt.data-00000-of-00001',
 'cp-0010.ckpt.index',
 'cp-0025.ckpt.index',
 'cp-0030.ckpt.index',
 'cp-0000.ckpt.index',
 'cp-0050.ckpt.data-00000-of-00001',
 'cp-0020.ckpt.index',
 'checkpoint',
 'cp-0040.ckpt.index',
 'cp-0020.ckpt.data-00000-of-00001',
 'cp-0035.ckpt.index',
 'cp-0030.ckpt.data-00000-of-00001',
 'cp-0005.ckpt.data-00000-of-00001']
'''

latest = tf.train.latest_checkpoint(checkpoint_dir)
latest
'''
'training_2/cp-0050.ckpt'
'''

# 重置模型并加载最新的 checkpoint
# Create a new model instance
model = create_model()

# Load the previously saved weights
model.load_weights(latest)

# Re-evaluate the model
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))

checkpoint的格式化文件中存储了模型权重，这些文件仅包含二进制格式的训练权重。checkpoint包含：

一个或多个包含模型权重的分片 shards
一个索引文件，指示哪些权重存储在哪个分片 shard 中

当在一台计算机上训练模型时，将获得具有如下后缀的分片：.data-00000-of-00001

2. 手动保存权重 model.save_weights

model.save_weights使用扩展名为.ckpt的检查点checkpoint格式，保存在扩展名为.h5的HDF5中

# Save the weights
model.save_weights('./checkpoints/my_checkpoint')

# Create a new model instance
model = create_model()

# Restore the weights
model.load_weights('./checkpoints/my_checkpoint')

# Evaluate the model
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))

3. 保存整个模型 model.save

model.save 可以用来保存模型的结构，权重和训练配置保存在单个文件/文件夹中。模型可以保存为两种不同的文件格式（SavedModel格式和HDF5格式）。

SavedModel格式

SavedModel格式是另一种序列化模型的方式，包含protobuf（.pb）二进制文件和TensorFlow检查点的目录，以这种格式保存的模型可以使用 tf.keras.models.load_model 恢复。

# 创建并训练一个模型
model = create_model()
model.fit(train_images, train_labels, epochs=5)

# 以SavedModel格式存储整个模型
!mkdir -p saved_model
model.save('saved_model/my_model')

# 加载保存的模型
new_model = tf.keras.models.load_model('saved_model/my_model')

new_model.summary()

'''
Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_10 (Dense)            (None, 512)               401920    
                                                                 
 dropout_5 (Dropout)         (None, 512)               0         
                                                                 
 dense_11 (Dense)            (None, 10)                5130      
                                                                 
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
'''

# 评估模型
loss, acc = new_model.evaluate(test_images, test_labels, verbose=2)
print('Restored model, accuracy: {:5.2f}%'.format(100 * acc))

print(new_model.predict(test_images).shape)

'''
32/32 - 0s - loss: 0.4311 - sparse_categorical_accuracy: 0.8660 - 171ms/epoch - 5ms/step
Restored model, accuracy: 86.60%
32/32 [==============================] - 0s 1ms/step
(1000, 10)
'''

HDF5格式

# 创建并训练一个新模型
model = create_model()
model.fit(train_images, train_labels, epochs=5)

# 以HDF5格式保存整个模型
model.save('my_model.h5')

# 重新创建模型，包括权重和优化器
new_model = tf.keras.models.load_model('my_model.h5')

new_model.summary()

'''
Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_12 (Dense)            (None, 512)               401920    
                                                                 
 dropout_6 (Dropout)         (None, 512)               0         
                                                                 
 dense_13 (Dense)            (None, 10)                5130      
                                                                 
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
'''

# 评估模型
loss, acc = new_model.evaluate(test_images, test_labels, verbose=2)
print('Restored model, accuracy: {:5.2f}%'.format(100 * acc))

'''
32/32 - 0s - loss: 0.4390 - sparse_categorical_accuracy: 0.8540 - 167ms/epoch - 5ms/step
Restored model, accuracy: 85.40%
'''

7. 使用keras tuner调整超参数

keras tuner库可以帮助TensorFlow程序选择最佳的超参数集，这个过程称为超参数调节。超参数具有两种类型：模型超参数（影响模型的选择，例如隐藏层的数量和宽度）、算法超参数（影响学习算法的速度和质量，例如随机梯度下降SGD的学习率以及k近邻KNN分类器的近邻数）。

以服装图像分类为例：

import tensorflow as tf
from tensorflow import keras

# 安装导入keras tuner
pip install -q -U keras-tuner
import keras_tuner as kt

# 1. 下载并准备数据集
(img_train, label_train), (img_test, label_test) = keras.datasets.fashion_mnist.load_data()

# 归一化
img_train = img_train.astype('float32') / 255.0
img_test = img_test.astype('float32') / 255.0

# 2. 构建模型
# 构建用于超参数调节的模型时，需要构建模型架构和超参数搜索空间
# 可以使用模型构建工具函数或者keras tuner API的HyperModel类构建模型
# 模型构建工具函数用来返回返回已编译的模型
def model_builder(hp):
  model = keras.Sequential()
  model.add(keras.layers.Flatten(input_shape=(28, 28)))

  # 调节第一个Dense层的神经元个数，优化值为32-512
  hp_units = hp.Int('units', min_value=32, max_value=512, step=32)
  model.add(keras.layers.Dense(units=hp_units, activation='relu'))
  model.add(keras.layers.Dense(10))

  # 调节优化器的学习率，0.01, 0.001, 0.0001
  hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])

  model.compile(optimizer=keras.optimizers.Adam(learning_rate=hp_learning_rate),
                loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])

  return model

# 3. 实例化调节器
# keras tuner提供了四种调节器RandomSearch、Hyperband、BayesianOptimization 和 Sklearn
# 使用Hyperband 调节器，必须指定超模型、要优化的 objective 和要训练的最大周期数 (max_epochs)
tuner = kt.Hyperband(model_builder,               # 模型
                     objective='val_accuracy',    # 优化对象
                     max_epochs=10,               # 训练的最大周期数
                     factor=3,
                     directory='my_dir',
                     project_name='intro_to_kt')

# 使用早停法：创建回调在验证损失达到指定值后提前停止训练
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

# 搜索超参数
tuner.search(img_train, label_train, epochs=50, validation_split=0.2, callbacks=[stop_early])

# 得到最优超参数
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]

print(f"""
The hyperparameter search is complete. 
The optimal number of units in the first densely-connected layer is {best_hps.get('units')} 
and the optimal learning rate for the optimizer is {best_hps.get('learning_rate')}.
""")

'''
Trial 30 Complete [00h 00m 39s]
val_accuracy: 0.8665833473205566

Best val_accuracy So Far: 0.8912500143051147
Total elapsed time: 00h 08m 13s
INFO:tensorflow:Oracle triggered exit

The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is 448 and the optimal learning rate for the optimizer
is 0.001.
'''

# 4. 训练模型
# 使用从搜索中获得的超参数找到训练模型的最佳周期数
model = tuner.hypermodel.build(best_hps)
history = model.fit(img_train, label_train, epochs=50, validation_split=0.2)

val_acc_per_epoch = history.history['val_accuracy']
best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
print('Best epoch: %d' % (best_epoch,))

# 重新实例化超模型，并使用最佳周期数对其进行训练
hypermodel = tuner.hypermodel.build(best_hps)

hypermodel.fit(img_train, label_train, epochs=best_epoch, validation_split=0.2)

# 5. 评估模型
eval_result = hypermodel.evaluate(img_test, label_test)
print("[test loss, test accuracy]:", eval_result)

pandas学习教程：

Pandas教程（非常详细） (biancheng.net)

User Guide — pandas 1.4.3 documentation (pydata.org)

TensorFlow中文文档：

TensorFlow官方文档_w3cschool

TensorFlow官方文档：

Dataset.

keras：

通过子类化创建新的层和模型 | TensorFlow Core (google.cn)