本篇教程基于TensorFlow 2.5,使用VGG16网络训练CIFAR10数据集,模型最终在测试集上的准确率超过91%。
请注意,如果想使用GPU进行训练,需要确保TensorFlow版本和CUDA版本对应。可用以下指令检查是否成功调用GPU,若返回False,则说明不能调用GPU:
import tensorflow as tf
tf.test.is_gpu_available()
1、构建数据集
CIFAR10数据集是一个用于识别普适物体的小型数据集,一共包含10个类别的RGB三通道彩色图片,图片尺寸大小为32x32,如图所示:
在使用 TensorFlow 时,我们可以直接使用 tensorflow.keras.datasets.cifar10.load_data()方法获取该数据集。
def load_images():
(x_img_train, y_label_train), (x_img_test, y_label_test) = cifar10.load_data()
x_img_train = x_img_train.astype(np.float32) # 数据类型转换
x_img_test = x_img_test.astype(np.float32)
(x_img_train, x_img_test) = normalization(x_img_train, x_img_test)
y_label_train = to_categorical(y_label_train, 10) # one-hot
y_label_test = to_categorical(y_label_test, 10)
return x_img_train, y_label_train, x_img_test, y_label_test
2、数据增强
为了提高模型的泛化性,防止训练时在训练集上过拟合,往往在训练的过程中会对训练集进行数据增强操作,例如归一化、随机翻转、遮挡等操作。我们这里对训练集做如下处理:
datagen = ImageDataGenerator(
featurewise_center=False, # 布尔值。将输入数据的均值设置为 0,逐特征进行。
samplewise_center=False, # 布尔值。将每个样本的均值设置为 0。
featurewise_std_normalization=False, # 布尔值。将输入除以数据标准差,逐特征进行。
samplewise_std_normalization=False, # 布尔值。将每个输入除以其标准差。
zca_whitening=False, # 布尔值。是否应用 ZCA 白化。
rotation_range=15, # 整数。随机旋转的度数范围 (degrees, 0 to 180)
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # 布尔值。随机水平翻转。
vertical_flip=False) # 布尔值。随机垂直翻转。
3、模型搭建
VGG是由Simonyan 和Zisserman在文献《Very Deep Convolutional Networks for Large Scale Image Recognition》中提出卷积神经网络模型,其名称来源于作者所在的牛津大学视觉几何组(Visual Geometry Group)的缩写。
该模型参加2014年的 ImageNet图像分类与定位挑战赛,取得了优异成绩:在分类任务上排名第二,在定位任务上排名第一。
VGG中根据卷积核大小和卷积层数目的不同,可分为 A,A-LRN,B,C,D,E 共6个配置(ConvNet Configuration),其中以 D,E 两种配置较为常用,分别称为VGG16和VGG19。
下图给出了VGG的六种结构配置:
针对VGG16进行具体分析可以发现,VGG16共包含:
- 13个卷积层(Convolutional Layer),分别用conv3-XXX表示
- 3个全连接层(Fully connected Layer),分别用FC-XXXX表示
- 5个池化层(Pool layer),分别用maxpool表示
其中,卷积层和全连接层具有权重系数,因此也被称为权重层,总数目为13+3=16,这即是VGG16中16的来源。(池化层不涉及权重,因此不属于权重层,不被计数)。
特点
VGG16的突出特点是简单,体现在:
1.卷积层均采用相同的卷积核参数
卷积层均表示为conv3-XXX,其中conv3说明该卷积层采用的卷积核的尺寸(kernel size)是3,即宽(width)和高(height)均为3,3*3是很小的卷积核尺寸,结合其它参数(步幅stride=1,填充方式padding=same),这样就能够使得每一个卷积层(张量)与前一层(张量)保持相同的宽和高。XXX代表卷积层的通道数。
2.池化层均采用相同的池化核参数
池化层的参数均为2×。
3.模型是由若干卷积层和池化层堆叠(stack)的方式构成,比较容易形成较深的网络结构(在2014年,16层已经被认为很深了)。
综合上述分析,可以概括VGG的优点为: Small filters, Deeper networks。
代码实现如下:
class ConvBNRelu(tf.keras.Model):
def __init__(self, filters, kernel_size=3, strides=1, padding='SAME', weight_decay=0.0005, rate=0.4, drop=True):
super(ConvBNRelu, self).__init__()
self.drop = drop
self.conv = keras.layers.Conv2D(filters=filters, kernel_size=kernel_size, strides=strides,
padding=padding, kernel_regularizer=tf.keras.regularizers.l2(weight_decay))
self.batchnorm = tf.keras.layers.BatchNormalization()
self.dropOut = keras.layers.Dropout(rate=rate)
def call(self, inputs): # , training=False
layer = self.conv(inputs)
layer = tf.nn.relu(layer)
layer = self.batchnorm(layer)
# 用来控制conv是否有dropout层,对应类ConvBNRelu中的self.drop属性
if self.drop:
layer = self.dropOut(layer)
return layer
class VGG16Model(tf.keras.Model):
def __init__(self):
super(VGG16Model, self).__init__()
self.conv1 = ConvBNRelu(filters=64, kernel_size=[3, 3], rate=0.3)
self.conv2 = ConvBNRelu(filters=64, kernel_size=[3, 3], drop=False)
self.maxPooling1 = keras.layers.MaxPooling2D(pool_size=(2, 2))
self.conv3 = ConvBNRelu(filters=128, kernel_size=[3, 3])
self.conv4 = ConvBNRelu(filters=128, kernel_size=[3, 3], drop=False)
self.maxPooling2 = keras.layers.MaxPooling2D(pool_size=(2, 2))
self.conv5 = ConvBNRelu(filters=256, kernel_size=[3, 3])
self.conv6 = ConvBNRelu(filters=256, kernel_size=[3, 3])
self.conv7 = ConvBNRelu(filters=256, kernel_size=[3, 3], drop=False)
self.maxPooling3 = keras.layers.MaxPooling2D(pool_size=(2, 2))
self.conv11 = ConvBNRelu(filters=512, kernel_size=[3, 3])
self.conv12 = ConvBNRelu(filters=512, kernel_size=[3, 3])
self.conv13 = ConvBNRelu(filters=512, kernel_size=[3, 3], drop=False)
self.maxPooling5 = keras.layers.MaxPooling2D(pool_size=(2, 2))
self.conv14 = ConvBNRelu(filters=512, kernel_size=[3, 3])
self.conv15 = ConvBNRelu(filters=512, kernel_size=[3, 3])
self.conv16 = ConvBNRelu(filters=512, kernel_size=[3, 3], drop=False)
self.maxPooling6 = keras.layers.MaxPooling2D(pool_size=(2, 2))
self.flat = keras.layers.Flatten()
self.dropOut = keras.layers.Dropout(rate=0.5)
self.dense1 = keras.layers.Dense(units=512,
activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.0005))
self.batchnorm = tf.keras.layers.BatchNormalization()
self.dense2 = keras.layers.Dense(units=10)
self.softmax = keras.layers.Activation('softmax')
def call(self, inputs): # , training=False
net = self.conv1(inputs)
net = self.conv2(net)
net = self.maxPooling1(net)
net = self.conv3(net)
net = self.conv4(net)
net = self.maxPooling2(net)
net = self.conv5(net)
net = self.conv6(net)
net = self.conv7(net)
net = self.maxPooling3(net)
net = self.conv11(net)
net = self.conv12(net)
net = self.conv13(net)
net = self.maxPooling5(net)
net = self.conv14(net)
net = self.conv15(net)
net = self.conv16(net)
net = self.maxPooling6(net)
net = self.dropOut(net)
net = self.flat(net)
net = self.dense1(net)
net = self.batchnorm(net)
net = self.dropOut(net)
net = self.dense2(net)
net = self.softmax(net)
return net
4、训练策略
在模型的训练上,我们采用的策略是:batch_size大小为256,初始学习率为0.1,每经过20个epoch学习率变为原来的0.5,共训练100个epoch,损失函数采用交叉熵,优化器为SGD。
# 超参数
training_epochs = 100
batch_size = 256
learning_rate = 0.1
momentum = 0.9 # SGD加速动量
weight_decay = 1e-6 # 权重衰减
lr_drop = 20 # 衰减倍数
def lr_scheduler(epoch): # 动态学习率衰减,epoch越大,lr衰减越剧烈。
return learning_rate * (0.5 ** (epoch // lr_drop))
reduce_lr = keras.callbacks.LearningRateScheduler(lr_scheduler)
optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate,
decay=weight_decay, momentum=momentum, nesterov=True)
# 交叉熵、优化器,评价标准。
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
5、结果可视化
可以借助 matplotlib 库绘制 accuracy 和 loss 曲线,以帮助我们分析训练过程,代码示例如下:
from matplotlib import pyplot as plt
plt.subplot(1, 2, 1)
plt.plot(accuracy, label='Training Accuracy')
plt.plot(val_accuracy, label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.savefig('./results.png')
plt.show()
6、完整代码
import numpy as np
import time
from matplotlib import pyplot as plt
import tensorflow as tf
# 调用显卡内存分配指令需要的包
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
from tensorflow import keras
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# 显卡内存分配指令:按需分配
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
def normalization(x_img_train, x_img_test):
mean = np.mean(x_img_train, axis=(0, 1, 2, 3)) # 四个维度 批数 像素x像素 通道数
std = np.std(x_img_train, axis=(0, 1, 2, 3))
# 测试集做一致的标准化 用到的均值和标准差 服从train的分布(有信息杂糅的可能)
x_img_train = (x_img_train - mean) / (std + 1e-7) # trick 加小数点 避免出现整数
x_img_test = (x_img_test - mean) / (std + 1e-7)
return x_img_train, x_img_test
# 数据读取
def load_images():
(x_img_train, y_label_train), (x_img_test, y_label_test) = cifar10.load_data()
x_img_train = x_img_train.astype(np.float32) # 数据类型转换
x_img_test = x_img_test.astype(np.float32)
(x_img_train, x_img_test) = normalization(x_img_train, x_img_test)
y_label_train = to_categorical(y_label_train, 10) # one-hot
y_label_test = to_categorical(y_label_test, 10)
return x_img_train, y_label_train, x_img_test, y_label_test
class ConvBNRelu(tf.keras.Model):
def __init__(self, filters, kernel_size=3, strides=1, padding='SAME', weight_decay=0.0005, rate=0.4, drop=True):
super(ConvBNRelu, self).__init__()
self.drop = drop
self.conv = keras.layers.Conv2D(filters=filters, kernel_size=kernel_size, strides=strides,
padding=padding, kernel_regularizer=tf.keras.regularizers.l2(weight_decay))
self.batchnorm = tf.keras.layers.BatchNormalization()
self.dropOut = keras.layers.Dropout(rate=rate)
def call(self, inputs): # , training=False
layer = self.conv(inputs)
layer = tf.nn.relu(layer)
layer = self.batchnorm(layer)
# 用来控制conv是否有dropout层,对应类ConvBNRelu中的self.drop属性
if self.drop:
layer = self.dropOut(layer)
return layer
class VGG16Model(tf.keras.Model):
def __init__(self):
super(VGG16Model, self).__init__()
self.conv1 = ConvBNRelu(filters=64, kernel_size=[3, 3], rate=0.3)
self.conv2 = ConvBNRelu(filters=64, kernel_size=[3, 3], drop=False)
self.maxPooling1 = keras.layers.MaxPooling2D(pool_size=(2, 2))
self.conv3 = ConvBNRelu(filters=128, kernel_size=[3, 3])
self.conv4 = ConvBNRelu(filters=128, kernel_size=[3, 3], drop=False)
self.maxPooling2 = keras.layers.MaxPooling2D(pool_size=(2, 2))
self.conv5 = ConvBNRelu(filters=256, kernel_size=[3, 3])
self.conv6 = ConvBNRelu(filters=256, kernel_size=[3, 3])
self.conv7 = ConvBNRelu(filters=256, kernel_size=[3, 3], drop=False)
self.maxPooling3 = keras.layers.MaxPooling2D(pool_size=(2, 2))
self.conv11 = ConvBNRelu(filters=512, kernel_size=[3, 3])
self.conv12 = ConvBNRelu(filters=512, kernel_size=[3, 3])
self.conv13 = ConvBNRelu(filters=512, kernel_size=[3, 3], drop=False)
self.maxPooling5 = keras.layers.MaxPooling2D(pool_size=(2, 2))
self.conv14 = ConvBNRelu(filters=512, kernel_size=[3, 3])
self.conv15 = ConvBNRelu(filters=512, kernel_size=[3, 3])
self.conv16 = ConvBNRelu(filters=512, kernel_size=[3, 3], drop=False)
self.maxPooling6 = keras.layers.MaxPooling2D(pool_size=(2, 2))
self.flat = keras.layers.Flatten()
self.dropOut = keras.layers.Dropout(rate=0.5)
self.dense1 = keras.layers.Dense(units=512,
activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.0005))
self.batchnorm = tf.keras.layers.BatchNormalization()
self.dense2 = keras.layers.Dense(units=10)
self.softmax = keras.layers.Activation('softmax')
def call(self, inputs): # , training=False
net = self.conv1(inputs)
net = self.conv2(net)
net = self.maxPooling1(net)
net = self.conv3(net)
net = self.conv4(net)
net = self.maxPooling2(net)
net = self.conv5(net)
net = self.conv6(net)
net = self.conv7(net)
net = self.maxPooling3(net)
net = self.conv11(net)
net = self.conv12(net)
net = self.conv13(net)
net = self.maxPooling5(net)
net = self.conv14(net)
net = self.conv15(net)
net = self.conv16(net)
net = self.maxPooling6(net)
net = self.dropOut(net)
net = self.flat(net)
net = self.dense1(net)
net = self.batchnorm(net)
net = self.dropOut(net)
net = self.dense2(net)
net = self.softmax(net)
return net
# 准备训练
if __name__ == '__main__':
print('tf.__version__:', tf.__version__)
print('keras.__version__:', keras.__version__)
# 超参数
training_epochs = 100
batch_size = 256
learning_rate = 0.1
momentum = 0.9 # SGD加速动量
weight_decay = 1e-6 # 权重衰减
lr_drop = 20 # 衰减倍数
tf.random.set_seed(2022) # 固定随机种子,可复现
def lr_scheduler(epoch): # 动态学习率衰减,epoch越大,lr衰减越剧烈。
return learning_rate * (0.5 ** (epoch // lr_drop))
reduce_lr = keras.callbacks.LearningRateScheduler(lr_scheduler)
x_img_train, y_label_train, x_img_test, y_label_test = load_images()
datagen = ImageDataGenerator(
featurewise_center=False, # 布尔值。将输入数据的均值设置为 0,逐特征进行。
samplewise_center=False, # 布尔值。将每个样本的均值设置为 0。
featurewise_std_normalization=False, # 布尔值。将输入除以数据标准差,逐特征进行。
samplewise_std_normalization=False, # 布尔值。将每个输入除以其标准差。
zca_whitening=False, # 布尔值。是否应用 ZCA 白化。
rotation_range=15, # 整数。随机旋转的度数范围 (degrees, 0 to 180)
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # 布尔值。随机水平翻转。
vertical_flip=False) # 布尔值。随机垂直翻转。
datagen.fit(x_img_train)
model = VGG16Model() # 调用模型
optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate,
decay=weight_decay, momentum=momentum, nesterov=True)
# 交叉熵、优化器,评价标准。
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
t1 = time.time()
history = model.fit(datagen.flow(x_img_train, y_label_train,
batch_size=batch_size), epochs=training_epochs, verbose=2, callbacks=[reduce_lr],
steps_per_epoch=x_img_train.shape[0] // batch_size, validation_data=(x_img_test, y_label_test))
t2 = time.time()
CNNfit = float(t2 - t1)
print("Time taken: {} seconds".format(CNNfit))
accuracy = history.history['accuracy']
val_accuracy = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
plt.subplot(1, 2, 1)
plt.plot(accuracy, label='Training Accuracy')
plt.plot(val_accuracy, label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.savefig('./results.png')
plt.show()