Backbone-GoogLeNet
为什么不写成GoogleNet?因为这是在向LeNet致敬!
1.介绍
GoogLeNet有三个版本,其中v1出现在出现在2014年的《Going deeper with convolutions》中,在ILSVRC比赛中一骑绝尘,获得了的冠军。在赛后Google对其进行改进,提出了v2-v4。其有以下几个特点:
- GoogLeNet引入了Inceoption结构,融合不同尺度的特征信息,这是其最重要的创新点;
- 使用了1*1的卷积核进行降维以及映射处理,相当于把一个像素不同通道的值加权融合;
- 添加两个辅助分类器帮助训练。论文第六页下方提到,由于网络较深,梯度反向传导的效果可能不够好,故在网络中间引出了两个辅助分类器,把他们的loss加到总loss上进行训练;
- 丢弃全连接层,使用平均池化层,大大减少模型参数。
2.网络结构
GoogLeNet最核心的就是图1中的Inceoption结构了,将Previous layer分成四个通道,最后把四道通道的结果合并到一起。
图2是信息量最大的图了,和图3的结构一一对应,下面通过图2中inception(3a)来解释一下这个表格的含义。根据inception(3a)的上一层(max pool)的信息,知道max pool的输出大小为28*28*192,即有192个通道,长宽均为28。现在来看inception(3a)的信息:
- output size是inception(3a)的输出尺寸;
- depth代表inception(3a)的深度为2,从图1可以看出来;
- #1*1为64指的是1*1 convolutions层有64个卷积核,即经过这层卷积后,输出尺寸为64*28*28;
- #1*1reduce为96指的是3*3 convolutions层前面的1*1 convolutions层有96个卷积核;
- #3*3为128指的是3*3 convolutions层有128个卷积核;
- #5*5reduce为16指的是5*5 convolutions层前面的1*1 convolutions层有16个卷积核;
- #5*5为32指的是5*5 convolutions层有32个卷积核;
- pool proj为32指的是max pooling后的1*1 convolutions层有32个卷积核;
- 从output size可以看出来,inception(3a)输出通道数为256,刚好等于inception(3a)中各个路径输出的通道数之和,即256=64+128+32+32。
3.实现
相比于AlexNet和VGG,GooLeNet在工程多了一个Inception层,如何实现Inception层是问题的关键。实现过程部分参考了这个视频。
import torch.nn as nn
import torch
class GoogLeNet_V1(nn.Module):
def __init__(self, num_classes=1000, aux_logits=False):
super(GoogLeNet_V1, self).__init__()
self.aux_logits = aux_logits
self.pre = nn.Sequential(
BasicConv2d(3, 64, kernel_size=7, stride=2, padding=3), #112,112,64
nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True), #(112-3)/2+1=56 #56,56,64
BasicConv2d(64, 64, kernel_size=1, stride=1, padding=1), #(56-1+2)/1+1=56 #56,64,64
BasicConv2d(64, 192, kernel_size=3, stride=1, padding=1), #(56-3+2)/1+1=56 #192,56,56
nn.MaxPool2d(kernel_size=3, stride=2) #(56-3)/2+1=28 #192,28,28
)
self.inception3a = Inception(192, 64, 96, 128, 16, 32, 32) #28,28,256
self.inception3b = Inception(256, 128, 128, 192, 32, 96, 64) #28,28,480
self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True) #(28-3)/2+1=14 #14,14,480
self.inception4a = Inception(480, 192, 96, 208, 16, 48, 64)
self.inception4b = Inception(512, 160, 112, 224, 24, 64, 64)
self.inception4c = Inception(512, 128, 128, 256, 24, 64, 64)
self.inception4d = Inception(512, 112, 144, 288, 32, 64, 64)
self.inception4e = Inception(528, 256, 160, 320, 32, 128, 128) #14,14,832
self.maxpool2 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True) #(14-3)/2+1=7#7,7,832
self.inception5a = Inception(832, 256, 160, 320, 32, 128, 128)
self.inception5b = Inception(832, 384, 192, 384, 48, 128, 128) #7,7,1024
self.avgpool = nn.AvgPool2d(kernel_size=7, stride=1) #1,1,1024
if aux_logits == True:
self.aux1 = Inception(512, num_classes)
self.aux2 = Inception(528, num_classes)
self.classifier = nn.Sequential(
nn.Dropout(0.4),
nn.Linear(1024, num_classes),
nn.Softmax(dim=1),
)
def forward(self, x):
x = self.pre(x)
x = self.inception3a(x)
x = self.inception3b(x)
x = self.maxpool1(x)
x = self.inception4a(x)
if self.training and self.aux_logits:
aux1 = self.aux1(x)
x = self.inception4b(x)
x = self.inception4c(x)
x = self.inception4d(x)
if self.training and self.aux_logits:
aux2 = self.aux2(x)
x = self.inception4e(x)
x = self.maxpool2(x)
x = self.inception5a(x)
x = self.inception5b(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.classifier(x)
if self.training and self.aux_logits:
return x, aux2, aux1
return x
class Inception(nn.Module):
def __init__(self, in_channels, ch11, ch33reduce, ch33, ch55reduce, ch55, pool_proj):
super(Inception, self).__init__()
self.bran1 = BasicConv2d(in_channels, ch11, kernel_size=1)
self.bran2 = nn.Sequential(
BasicConv2d(in_channels, ch33reduce, kernel_size=1),
BasicConv2d(ch33reduce, ch33, kernel_size=3, stride=1, padding=1),
)
self.bran3 = nn.Sequential(
BasicConv2d(in_channels, ch55reduce, kernel_size=1),
BasicConv2d(ch55reduce, ch55, kernel_size=5, stride=1, padding=2),
)
self.bran4 = nn.Sequential(
nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
BasicConv2d(in_channels, pool_proj, kernel_size=1, stride=1),
)
def forward(self, x):
branch1 = self.bran1(x)
branch2 = self.bran2(x)
branch3 = self.bran3(x)
branch4 = self.bran4(x)
return torch.cat((branch1, branch2, branch3, branch4), dim=1)
class Aux(nn.Module):
def __init__(self, inchannels, num_classes=1000):
super(Aux, self).__init__()
self.cov = nn.Sequential(
nn.AvgPool2d(kernel_size=5, stride=3, padding=2),
BasicConv2d(inchannels, out_channels=128, kernel_size=1, stride=1), #[batch, 128, 4, 4]
)
self.linear = nn.Sequential(
nn.Linear(2048, 1024),
nn.ReLU(),
nn.Dropout(0.7),
nn.Linear(1024, num_classes),
)
def forward(self, x):
x = self.cov(x)
x = torch.flatten(x, 1)
x = self.linear(x)
return x
## 每个卷积都有一个relu,这么写比较方便
class BasicConv2d(nn.Module):
def __init__(self, in_channels, out_channels, **kwargs):
super(BasicConv2d, self).__init__()
self.conv = nn.Conv2d(in_channels, out_channels, **kwargs)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
x = self.conv(x)
x = self.relu(x)
return x
if __name__ == '__main__':
# Example
net = GoogLeNet_V1()
x = torch.rand(1, 3, 224, 224)
out = net.forward(x)
print(out.size())
print(out)
4.注意
- 网络结构调试的方法:如果网络结构报错了,在forward函数中注释掉全部代码,然后一行一行解除注释并运行,直到报错。以后养成标准代码习惯,将网络的输出尺寸批注在forward函数中。
- nn.Sequential也能包含自定义层,当然这个自定义层要继承nn.Module
- 辅助器的处理:这里定义的两个辅助器,在训练的时候也有输出,加上主要结构的输出,一共三个输出。这对应了3个loss,总的loss就是将这三个loss按照1、0.3、0.3(主、辅、辅)加权。
- 加上softmax层之后,输出全部为0.001是怎么回事?仔细看,并不是全部为0.001,还有极少数0.0011和0.0009。这是因为计算ex时,由于这里的x都很小,只比0大一点,所以ex基本上很接近于1。一共1000个类别,所以每个类别为1/1000=0.001。
- torch.cat((branch1, branch2, branch3, branch4), dim=1),这句话指的是将branch1-4这四个tensor按照第二通道合并。举个例子,如果四个变量尺寸分别为[1,10,7,7]、[1,20,7,7]、[1,30,7,7]、[1,40,7,7],那么输出储存为[1,100,7,7],其中100=10+20+30+40。