一、VGG的创新点
(一)使用了更小的卷积核
在vgg出现之前,大多数网络比如alexnet用的都是大卷积核提取特征,但是vgg采用堆叠小卷积核来达到大卷积核的方式,最显著的优点有两个:
1、堆叠3*3卷积核在和7*7卷积核达到相同效果时,产生的参数更小,计算如下
一个7*7卷积核的参数量:7*7*C*C=49*C*C
三个3*3卷积核的参数量:3*3*3*C*C=27*C*C
2、由于使用了小卷积核,可以在每一层后加上非线性激活函数,增强了模型的学习能力,增加特征抽象能力。
(二)小池化核
alexnet的池化层采用3*3的池化核,vgg池化层采用2*2的池化核,能够更有效地提取特征。
(三)层数更深,特征图
VGG网络里有两个很常用,vgg16和vgg19。vgg16有16层,结构简明适合更改;vgg19有19层,训练的精准度更高,但是计算参数量更大。在这种网络中,卷积核专注于扩大通道数、池化专注于缩小高和宽,在可接受的计算量的范围内使模型架构更深更宽。
(四)全连接层使用卷积操作
网络测试阶段将训练阶段的三个全连接层替换为三个卷积层,测试重用训练时的参数,使得测试得到的全卷积网络因为没有全连接的限制,可以接受任意宽或高为输入。这产生了怎样的变化呢?
#假设输入特征为7x7x5, stride=1,
#输出尺度为[1,1,1,1000] 可以降维为 [1,1000]
#假设输入特征为14x14x5
#输出尺度为[1,2,2,1000]
#得到1x2x2x1000的scoremap
#对每个2x2的特征图求均值
#求平均后并降维得到[1,1000]
这是参考了OverFeat可以接受任意分辨率的图片做出的变化
二、vgg16网络的复现
vgg16总共有16层,13个卷积层和3个全连接层,第一次经过64个卷积核的两次卷积后,采用一次pooling,第二次经过两次128个卷积核卷积后,再采用pooling,再重复两次三个512个卷积核卷积后,再pooling,最后经过三次全连接。
import torch.nn as nn
class vgg(nn.Module):
def __init__(self):
super(vgg, self).__init__()
self.Conv=nn.Sequential(
nn.Conv2d(3,64,3,1,1),
nn.ReLU(inplace=True),
nn.Conv2d(64, 64, 3, 1, 1),
nn.ReLU(inplace=True),
nn.MaxPool2d(2,2),
nn.Conv2d(64,128,3,1,1),
nn.ReLU(inplace=True),
nn.Conv2d(128,128,3,1,1),
nn.ReLU(inplace=True),
nn.MaxPool2d(2,2),
nn.Conv2d(128, 256, 3, 1, 1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, 3, 1, 1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, 3, 1, 1),
nn.ReLU(inplace=True),
nn.MaxPool2d(2, 2),
nn.Conv2d(256, 512, 3, 1, 1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, 3, 1, 1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, 3, 1, 1),
nn.ReLU(inplace=True),
nn.MaxPool2d(2, 2),
nn.Conv2d(512, 512, 3, 1, 1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, 3, 1, 1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, 3, 1, 1),
nn.ReLU(inplace=True),
nn.MaxPool2d(2, 2),
)
model=vgg()
三、vgg16的参数计算
INPUT: [224x224x3] memory: 224*224*3=150K weights: 0
CONV3-64: [224x224x64] memory: 224*224*64=3.2M weights: (3*3*3)*64 = 1,728
CONV3-64: [224x224x64] memory: 224*224*64=3.2M weights: (3*3*64)*64 = 36,864
POOL2: [112x112x64] memory: 112*112*64=800K weights: 0
CONV3-128: [112x112x128] memory: 112*112*128=1.6M weights: (3*3*64)*128 = 73,728
CONV3-128: [112x112x128] memory: 112*112*128=1.6M weights: (3*3*128)*128 = 147,456
POOL2: [56x56x128] memory: 56*56*128=400K weights: 0
CONV3-256: [56x56x256] memory: 56*56*256=800K weights: (3*3*128)*256 = 294,912
CONV3-256: [56x56x256] memory: 56*56*256=800K weights: (3*3*256)*256 = 589,824
CONV3-256: [56x56x256] memory: 56*56*256=800K weights: (3*3*256)*256 = 589,824
POOL2: [28x28x256] memory: 28*28*256=200K weights: 0
CONV3-512: [28x28x512] memory: 28*28*512=400K weights: (3*3*256)*512 = 1,179,648
CONV3-512: [28x28x512] memory: 28*28*512=400K weights: (3*3*512)*512 = 2,359,296
CONV3-512: [28x28x512] memory: 28*28*512=400K weights: (3*3*512)*512 = 2,359,296
POOL2: [14x14x512] memory: 14*14*512=100K weights: 0
CONV3-512: [14x14x512] memory: 14*14*512=100K weights: (3*3*512)*512 = 2,359,296
CONV3-512: [14x14x512] memory: 14*14*512=100K weights: (3*3*512)*512 = 2,359,296
CONV3-512: [14x14x512] memory: 14*14*512=100K weights: (3*3*512)*512 = 2,359,296
POOL2: [7x7x512] memory: 7*7*512=25K weights: 0
FC: [1x1x4096] memory: 4096 weights: 7*7*512*4096 = 102,760,448
FC: [1x1x4096] memory: 4096 weights: 4096*4096 = 16,777,216
FC: [1x1x1000] memory: 1000 weights: 4096*1000 = 4,096,000
TOTAL memory: 24M * 4 bytes ~= 93MB / image (only forward! ~*2 for bwd)
TOTAL params: 138M parameters