7. 使用重复元素的网络(VGG)
关于AlexNet
改进:
AlexNet在LeNet的基础上增加了3个卷积层,同时,对卷积窗口、输出通道数和构造顺序均做了大量的调整。
局限性:
虽然AlexNet指明了深度卷积神经网络可以取得出色的结果,但并没有提供简单的规则以指导后来的研究者如何设计新的网络。
VGG提出了可以通过重复使用简单的基础块来构建深度模型的思路。
其名称来源于论文作者所在的实验室Visual Geometry Group。
7.1 VGG块
组成规律:
连续使用数个相同的填充为1、窗口形状为3×3的卷积层;
后接一个步幅为2、窗口形状为2×2的最大池化层。
卷积层保持输入的高和宽不变,而池化层则对其减半。
使用vgg_block
函数来实现这个基础的VGG块:
def vgg_block(num_convs, num_channels):
blk = Sequential()
for _ in range(num_convs):
blk.add(Conv2D(num_channels, kernel_size=3, padding='same', activation='relu'))
blk.add(MaxPool2D())
return blk
7.2 VGG网络
类似于AlexNet和LeNet,VGG网络由卷积层模块后接全连接层模块构成。
卷积层模块串联数个vgg_block
,其超参数由变量conv_arch
定义。
该变量指定了每个VGG块里卷积层个数和输出通道数。
全连接模块则与AlexNet相同。
现构造一个VGG网络:
存在5个卷积块,前2块使用单卷积层,后3块使用双卷积层。
第一块的输出通道是64,之后每次对输出通道数翻倍,直到变为512。
因为这个网络使用了8个卷积层和3个全连接层,所以经常被称为VGG-11。
VGG-11的代码实现如下:
conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))
def vgg11(conv_arch):
net = Sequential()
for (num_convs, num_channels) in conv_arch:
net.add(vgg_block(num_convs, num_channels))
net.add(Sequential([
Flatten(),
Dense(4096, activation='relu'),
Dropout(0.5),
Dense(4096, activation='relu'),
Dropout(0.5),
Dense(10, activation='sigmoid')
]))
return net
net = vgg11(conv_arch)
构造一个高和宽均为224的单通道数据样本来观察每一层的输出形状:
X = tf.random.uniform((1, 224, 224, 1))
for blk in net.layers:
X = blk(X)
print(blk.name, "shape: ", X.shape)
输出:
sequential_15 shape: (1, 112, 112, 64)
sequential_16 shape: (1, 56, 56, 128)
sequential_17 shape: (1, 28, 28, 256)
sequential_18 shape: (1, 14, 14, 512)
sequential_19 shape: (1, 7, 7, 512)
sequential_20 shape: (1, 10)
由此可见,每次将输入的高和宽减半,直到最终高和宽变成7后传入全连接层。
与此同时,输出通道数每次翻倍,直到变成512。
因为每个卷积层的窗口大小一样,所以每层的模型参数尺寸和计算复杂度与输入高、输入宽、输入通道数和输出通道数的乘积成正比。
VGG这种高和宽减半以及通道翻倍的设计使得多数卷积层都有相同的模型参数尺寸和计算复杂度。
7.3 数据获取和模型训练
由于VGG-11计算上比AlexNet更加复杂,出于测试的目的,构造一个通道数更小(或者说更窄)的网络在Fashion-MNIST数据集上进行训练:
ratio = 4
small_conv_arch = [(pair[0], pair[1]//ratio) for pair in conv_arch]
net = vgg11(small_conv_arch)
数据获取
class DataLoader():
def __init__(self):
# fashion_mnist = tf.keras.datasets.fashion_mnist
# (self.train_images, self.train_labels), (self.test_images, self.test_labels) = fashion_mnist.load_data()
# load data from local
with open("../input/fashionmnist/train-labels-idx1-ubyte", 'rb') as f:
self.train_labels = np.frombuffer(f.read(), np.uint8, offset=8)
with open("../input/fashionmnist/train-images-idx3-ubyte", 'rb') as f:
self.train_images = np.frombuffer(f.read(), np.uint8, offset=16).reshape(len(self.train_labels), 28, 28)
with open("../input/fashionmnist/t10k-labels-idx1-ubyte", 'rb') as f:
self.test_labels = np.frombuffer(f.read(), np.uint8, offset=8)
with open("../input/fashionmnist/t10k-images-idx3-ubyte", 'rb') as f:
self.test_images = np.frombuffer(f.read(), np.uint8, offset=16).reshape(len(self.test_labels), 28, 28)
# np.expand_dims(images, axis=-1) -- convert (10000, 28, 28) into (10000, 28, 28, 1)
self.train_images = np.expand_dims(self.train_images.astype(np.float32)/255.0,axis=-1)
self.test_images = np.expand_dims(self.test_images.astype(np.float32)/255.0,axis=-1)
self.train_labels = self.train_labels.astype(np.int32)
self.test_labels = self.test_labels.astype(np.int32)
self.num_train, self.num_test = self.train_images.shape[0], self.test_images.shape[0]
def get_batch_train(self, batch_size):
"""
Examples
--------
>>> np.random.randint(0, 10, size=2)
array([5, 7])
"""
index = np.random.randint(0, np.shape(self.train_images)[0], batch_size)
resized_images = tf.image.resize_with_pad(self.train_images[index],224,224)
return resized_images.numpy(), self.train_labels[index]
def get_batch_test(self, batch_size):
index = np.random.randint(0, np.shape(self.test_images)[0], batch_size)
resized_images = tf.image.resize_with_pad(self.test_images[index],224,224)
return resized_images.numpy(), self.test_labels[index]
batch_size = 128
dataLoader = DataLoader()
x_batch, y_batch = dataLoader.get_batch_train(batch_size)
print("x_batch shape:",x_batch.shape,"y_batch shape:", y_batch.shape)
输出:
x_batch shape: (128, 224, 224, 1) y_batch shape: (128,)
模型训练
模型训练过程与(四)卷积神经网络 – 6 AlexNet 小节类似,且使用稍大些的学习率:
def train_vgg():
epoch = 5
num_iter = dataLoader.num_train//batch_size
for e in range(epoch):
for n in range(num_iter):
x_batch, y_batch = dataLoader.get_batch_train(batch_size)
net.fit(x_batch, y_batch)
if n%20 == 0:
net.save_weights("5.7_vgg_weights.h5")
optimizer = SGD(learning_rate=0.05, momentum=0.0, nesterov=False)
net.compile(optimizer=optimizer,
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
x_batch, y_batch = dataLoader.get_batch_train(batch_size)
net.fit(x_batch, y_batch)
train_vgg()
net.load_weights("5.7_vgg_weights.h5")
x_test, y_test = dataLoader.get_batch_test(2000)
net.evaluate(x_test, y_test, verbose=2)
输出:
63/63 - 1s - loss: 0.2325 - accuracy: 0.9185
[0.23246736824512482, 0.9185000061988831]