训练误差和泛化误差
前者指模型在训练数据集上表现出的误差,后者指模型在任意一个测试数据样本上表现出的误差的期望,并常常通过测试数据集上的误差来近似。计算训练误差和泛化误差可以使用之前介绍过的损失函数,例如线性回归用到的平方损失函数和softmax回归用到的交叉熵损失函数。
交叉验证
由于验证数据集不参与模型训练,当训练数据不够用时,预留大量的验证数据显得太奢侈。一种改善的方法是K折交叉验证(K-fold cross-validation)。在K折交叉验证中,我们把原始训练数据集分割成K个不重合的子数据集,然后我们做K次模型训练和验证。每一次,我们使用一个子数据集验证模型,并使用其他K−1个子数据集来训练模型。在这K次训练和验证中,每次用来验证模型的子数据集都不同。最后,我们对这K次训练误差和验证误差分别求平均。
欠拟合和过拟合
拟合图像:
代码实现
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
n_train, n_test, true_w, true_b = 100, 100, [1.2, -3.4, 5.6], 5
features = tf.random.normal(shape=(n_train + n_test, 1))
poly_features = tf.concat([features, tf.pow(features, 2), tf.pow(features, 3)], 1)
print(poly_features.shape)
labels = (true_w[0] * poly_features[:, 0] + true_w[1] * poly_features[:, 1] + true_w[2] * poly_features[:, 2] + true_b)
print(tf.shape(labels))
labels += tf.random.normal(labels.shape, 0, 0.1)
from IPython import display
def use_svg_display():
display.set_matplotlib_formats('svg')
def set_figsize(figsize=(3.5, 2.5)):
use_svg_display()
plt.rcParams['figure.figsize'] = figsize
def semilogy(x_vals, y_vals, x_label, y_label, x2_vals=None, y2_vals=None,
legend=None, figsize=(3.5, 2.5)):
set_figsize(figsize)
plt.xlabel(x_label)
plt.ylabel(y_label)
plt.semilogy(x_vals, y_vals)
if x2_vals and y2_vals:
plt.semilogy(x2_vals, y2_vals, linestyle=':')
plt.legend(legend)
plt.show()
num_epochs, loss = 100, tf.losses.MeanSquaredError()
def fit_and_plot(train_features, test_features, train_labels, test_labels):
net = tf.keras.Sequential()
net.add(tf.keras.layers.Dense(1))
batch_size = min(10, train_labels.shape[0])
train_iter = tf.data.Dataset.from_tensor_slices(
(train_features, train_labels)).batch(batch_size)
test_iter = tf.data.Dataset.from_tensor_slices(
(test_features, test_labels)).batch(batch_size)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)
train_ls, test_ls = [], []
for _ in range(num_epochs):
for X, y in train_iter:
with tf.GradientTape() as tape:
l = loss(y, net(X))
grads = tape.gradient(l, net.trainable_variables)
optimizer.apply_gradients(zip(grads, net.trainable_variables))
train_ls.append(loss(train_labels, net(train_features)).numpy().mean())
test_ls.append(loss(test_labels, net(test_features)).numpy().mean())
print('final epoch: train loss', train_ls[-1], 'test loss', test_ls[-1])
semilogy(range(1, num_epochs + 1), train_ls, 'epochs', 'loss',
range(1, num_epochs + 1), test_ls, ['train', 'test'])
print('weight:', net.get_weights()[0],
'\nbias:', net.get_weights()[1])
def split():
print('-------------------')
if __name__ == '__main__':
# 三阶多项式函数拟合(正常)
fit_and_plot(poly_features[:n_train, :], poly_features[n_train:, :],
labels[:n_train], labels[n_train:])
# 线性函数拟合(欠拟合)
fit_and_plot(features[:n_train, :], features[n_train:, :], labels[:n_train],
labels[n_train:])
# 训练样本不足(过拟合)
fit_and_plot(poly_features[0:2, :], poly_features[n_train:, :], labels[0:2],
labels[n_train:])
正常拟合图像:
拟合不足图像:
过拟合图像: