一个VGG块由一系列卷积层组成,后面再加上用于空间下采样的最大汇聚层。在最初的VGG论文中 [Simonyan & Zisserman, 2014],作者使用了带有3×3卷积核、填充为1(保持高度和宽度)的卷积层,和带有2×2汇聚窗口、步幅为2(每个块后的分辨率减半)的最大汇聚层。
与AlexNet、LeNet一样,VGG网络可以分为两部分:第一部分主要由卷积层和汇聚层组成,第二部分由全连接层组成。如下图所示:
VGG神经网络的几个VGG块(在vgg_block
函数中定义)。其中有超参数变量conv_arch
。该变量指定了每个VGG块里卷积层个数和输出通道数。全连接模块则与AlexNet中的相同。原始VGG网络有5个卷积块,其中前两个块各有一个卷积层,后三个块各包含两个卷积层。 第一个模块有64个输出通道,每个后续模块将输出通道数量翻倍,直到该数字达到512。由于该网络使用8个卷积层和3个全连接层,因此它通常被称为VGG-11。
同样,在VGG网络上使用fashion_mnist数据集,代码如下:
!pip install git+https://github.com/d2l-ai/d2l-zh@release # installing d2l
!pip install matplotlib_inline
!pip install matplotlib==3.0.0
import torch
from torch import nn
from d2l import torch as d2l
def vgg_block(num_convs, in_channels, out_channels):
layers = []
for _ in range(num_convs):
layers.append(nn.Conv2d(in_channels,out_channels,kernel_size=3,padding=1))
layers.append(nn.ReLU())
in_channels = out_channels
layers.append(nn.MaxPool2d(kernel_size=2,stride=2))
return nn.Sequential(*layers)
conv_arch = ((1,64),(1,128),(2,256),(2,512),(2,512))#设计卷积层
def vgg(conv_arch):
conv_blks = []
in_channels = 1
#卷积部分
for (num_convs,out_channels) in conv_arch:
conv_blks.append(vgg_block(num_convs,in_channels,out_channels))
in_channels = out_channels
return nn.Sequential(*conv_blks,nn.Flatten(),
nn.Linear(out_channels * 7 * 7,4096),nn.ReLU(),nn.Dropout(0.5),
nn.Linear(4096,4096),nn.ReLU(),nn.Dropout(0.5),
nn.Linear(4096,10))
net = vgg(conv_arch)
ratio = 4
samll_conv_arch = [(pair[0],pair[1]//ratio) for pair in conv_arch]#把网络变小一点 每一层输出通道都变小4倍
#[(1, 16), (1, 32), (2, 64), (2, 128), (2, 128)]
net = vgg(samll_conv_arch)
lr, num_epochs, batch_size = 0.05, 10, 128
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)
d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())
运行结果: