2014年,ImageNet图像识别挑战赛,GoogleNet脱颖而出。
GoogleNet
名字上致敬了LeNet网络,但内在吸收了NiN中网络串联的思想。
Inception块
GoogleNet中的基础卷积块叫做Inception块,(Inception:盗梦空间,贼好看,可以了解下!~哈哈)
如果说NiN是串联,那么汲取了NiN网络的GoogleNet网络则是融入了并联的思想。
Inception块里有4条并行的线路。
从输入开始:
- 1x1卷积层
- 1x1卷积层 + 3x3卷积层
- 1x1卷积层 + 5x5卷积层
- 3x3最大池化层 + 1x1卷积层
1x1、3x3、5x5主要目的是为了抽取不同空间尺寸下的信息,而3x3、5x5卷积前的1x1卷积,目的是为了改变通道数的同时,降低模型复杂度;最后一个则是使用了3x3的最大池化层;最终将四条线路进行汇总,合并作为输出。
GoogleNet模型
与VGG一样,在主体卷积上使用了5个模块(block),每个模块之间都使用了步长为2的3x3最大池化层来减小输出高宽。
详细见如下:
Sequential(
(0): Sequential(
(0): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
(1): ReLU()
(2): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
)
(1): Sequential(
(0): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
(1): Conv2d(64, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
)
(2): Sequential(
(0): Inception(
(p1_1): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1))
(p2_1): Conv2d(192, 96, kernel_size=(1, 1), stride=(1, 1))
(p2_2): Conv2d(96, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(p3_1): Conv2d(192, 16, kernel_size=(1, 1), stride=(1, 1))
(p3_2): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(p4_1): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(p4_2): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1))
)
(1): Inception(
(p1_1): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1))
(p2_1): Conv2d(192, 96, kernel_size=(1, 1), stride=(1, 1))
(p2_2): Conv2d(96, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(p3_1): Conv2d(192, 16, kernel_size=(1, 1), stride=(1, 1))
(p3_2): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(p4_1): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(p4_2): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1))
)
(2): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
)
(3): Sequential(
(0): Inception(
(p1_1): Conv2d(480, 192, kernel_size=(1, 1), stride=(1, 1))
(p2_1): Conv2d(480, 96, kernel_size=(1, 1), stride=(1, 1))
(p2_2): Conv2d(96, 208, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(p3_1): Conv2d(480, 16, kernel_size=(1, 1), stride=(1, 1))
(p3_2): Conv2d(16, 48, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(p4_1): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(p4_2): Conv2d(480, 64, kernel_size=(1, 1), stride=(1, 1))
)
(1): Inception(
(p1_1): Conv2d(512, 160, kernel_size=(1, 1), stride=(1, 1))
(p2_1): Conv2d(512, 112, kernel_size=(1, 1), stride=(1, 1))
(p2_2): Conv2d(112, 224, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(p3_1): Conv2d(512, 24, kernel_size=(1, 1), stride=(1, 1))
(p3_2): Conv2d(24, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(p4_1): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(p4_2): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))
)
(2): Inception(
(p1_1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
(p2_1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
(p2_2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(p3_1): Conv2d(512, 24, kernel_size=(1, 1), stride=(1, 1))
(p3_2): Conv2d(24, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(p4_1): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(p4_2): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))
)
(3): Inception(
(p1_1): Conv2d(512, 112, kernel_size=(1, 1), stride=(1, 1))
(p2_1): Conv2d(512, 144, kernel_size=(1, 1), stride=(1, 1))
(p2_2): Conv2d(144, 288, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(p3_1): Conv2d(512, 32, kernel_size=(1, 1), stride=(1, 1))
(p3_2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(p4_1): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(p4_2): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))
)
(4): Inception(
(p1_1): Conv2d(528, 256, kernel_size=(1, 1), stride=(1, 1))
(p2_1): Conv2d(528, 160, kernel_size=(1, 1), stride=(1, 1))
(p2_2): Conv2d(160, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(p3_1): Conv2d(528, 32, kernel_size=(1, 1), stride=(1, 1))
(p3_2): Conv2d(32, 128, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(p4_1): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(p4_2): Conv2d(528, 128, kernel_size=(1, 1), stride=(1, 1))
)
(5): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
)
(4): Sequential(
(0): Inception(
(p1_1): Conv2d(832, 256, kernel_size=(1, 1), stride=(1, 1))
(p2_1): Conv2d(832, 160, kernel_size=(1, 1), stride=(1, 1))
(p2_2): Conv2d(160, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(p3_1): Conv2d(832, 32, kernel_size=(1, 1), stride=(1, 1))
(p3_2): Conv2d(32, 128, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(p4_1): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(p4_2): Conv2d(832, 128, kernel_size=(1, 1), stride=(1, 1))
)
(1): Inception(
(p1_1): Conv2d(832, 384, kernel_size=(1, 1), stride=(1, 1))
(p2_1): Conv2d(832, 192, kernel_size=(1, 1), stride=(1, 1))
(p2_2): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(p3_1): Conv2d(832, 48, kernel_size=(1, 1), stride=(1, 1))
(p3_2): Conv2d(48, 128, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(p4_1): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(p4_2): Conv2d(832, 128, kernel_size=(1, 1), stride=(1, 1))
)
(2): GlobalAvgPool2d()
)
# 和NiN一样使用了全局平均池化层,置高和宽为1
(5): FlattenLayer()
(6): Linear(in_features=1024, out_features=10, bias=True)
)