GooLeNet Inception V1(2014)
最早出现在:《Going deeper with convolutions》
14年比赛冠军,这个model证明了一件事:用更多的卷积,更深的层次可以得到更好的结构。
这个model基本上构成部件和alexnet差不多,不过中间有好几个inception的结构, Inception结构经过了一次改进,原版本如下:
对上图做以下说明:
1 . 采用不同大小的卷积核意味着不同大小的感受野,最后拼接意味着不同尺度特征的融合;
2 . 之所以卷积核大小采用1、3和5,主要是为了方便对齐。设定卷积步长stride=1之后,只要分别设定pad=0、1、2,那么卷积之后便可以得到相同维度的特征,然后这些特征就可以直接拼接在一起了;
3 . 文章说很多地方都表明pooling挺有效,所以Inception里面也嵌入了。
4 . 网络越到后面,特征越抽象,而且每个特征所涉及的感受野也更大了,因此随着层数的增加,3x3和5x5卷积的比例也要增加。
但是,使用5x5的卷积核仍然会带来巨大的计算量。 为此,文章借鉴NIN2,采用1x1卷积核来进行降维。
例如:上一层的输出为100x100x128,经过具有256个输出的5x5卷积层之后(stride=1,pad=2),输出数据为100x100x256。其中,卷积层的参数为128x5x5x256。假如上一层输出先经过具有32个输出的1x1卷积层,再经过具有256个输出的5x5卷积层,那么最终的输出数据仍为为100x100x256,但卷积参数量已经减少为128x1x1x32 + 32x5x5x256,大约减少了4倍。
具体改进后的Inception Module如下图:
对上图做如下说明:
1 . 显然GoogLeNet采用了模块化的结构,方便增添和修改;
2 . 网络最后采用了average pooling来代替全连接层,想法来自NIN,事实证明可以将TOP1 accuracy提高0.6%。但是,实际在最后还是加了一个全连接层,主要是为了方便以后大家finetune;
3 . 虽然移除了全连接,但是网络中依然使用了Dropout ;
4 . 为了避免梯度消失,网络额外增加了2个辅助的softmax用于向前传导梯度。文章中说这两个辅助的分类器的loss应该加一个衰减系数,但看caffe中的model也没有加任何衰减。此外,实际测试的时候,这两个额外的softmax会被去掉。
Conclusion:
本文的主要想法其实是想通过构建密集的块结构来近似最优的稀疏结构,从而达到提高性能而又不大量增加计算量的目的。GoogleNet的caffemodel大小约50M,但性能却很优异。
pytorch版本模型定义
import torch
import torch.nn as nn
__all__ = ['InceptionV1', 'inception_v1']
# modified according to https://github.com/minghao-wu/DeepLearningFromScratch/blob/master/GoogLeNet/GoogLeNet.py
# aux_classifier and dropout
def inception_v1(**kwargs):
return InceptionV1(**kwargs)
class Inception(nn.Module):
def __init__(self, in_planes, n1x1, n3x3red, n3x3, n5x5red, n5x5,
pool_planes):
super(Inception, self).__init__()
# 1x1 conv branch
self.b1 = nn.Sequential(
nn.Conv2d(in_planes, n1x1, kernel_size=1),
nn.ReLU(True),
)
# 1x1 conv -> 3x3 conv branch
self.b2 = nn.Sequential(
nn.Conv2d(in_planes, n3x3red, kernel_size=1),
nn.ReLU(True),
nn.Conv2d(n3x3red, n3x3, kernel_size=3, padding=1),
nn.ReLU(True),
)
# 1x1 conv -> 5x5 conv branch
self.b3 = nn.Sequential(
nn.Conv2d(in_planes, n5x5red, kernel_size=1),
nn.ReLU(True),
nn.Conv2d(n5x5red, n5x5, kernel_size=5, padding=2),
nn.ReLU(True),
)
# 3x3 pool -> 1x1 conv branch
self.b4 = nn.Sequential(
nn.MaxPool2d(3, stride=1, padding=1),
nn.Conv2d(in_planes, pool_planes, kernel_size=1),
nn.ReLU(True),
)
def forward(self, x):
y1 = self.b1(x)
y2 = self.b2(x)
y3 = self.b3(x)
y4 = self.b4(x)
return torch.cat([y1, y2, y3, y4], 1)
class AuxClassifier(nn.Module):
def __init__(self, in_channels, num_classes):
super(AuxClassifier, self).__init__()
self.pool1 = nn.AvgPool2d(kernel_size=5, stride=3)
self.conv1 = nn.Sequential(
nn.Conv2d(
in_channels=in_channels, out_channels=128, kernel_size=1),
nn.ReLU(inplace=True))
self.fc1 = nn.Sequential(
nn.Linear(in_features=4 * 4 * 128, out_features=1024),
nn.ReLU(inplace=True))
self.drop = nn.Dropout(p=0.3)
self.fc2 = nn.Linear(in_features=1024, out_features=num_classes)
def forward(self, x):
x = self.pool1(x)
x = self.conv1(x)
x = x.view(x.size(0), -1)
x = self.fc1(x)
x = self.drop(x)
x = self.fc2(x)
return (x)
class InceptionV1(nn.Module):
def __init__(self, num_classes=1000, aux_classifier=True):
super(InceptionV1, self).__init__()
self.aux_classifier = aux_classifier
self.c1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
nn.ReLU(True),
)
self.c2 = nn.Sequential(
nn.Conv2d(64, 64, kernel_size=1, stride=1),
nn.ReLU(True),
)
self.c3 = nn.Sequential(
nn.Conv2d(64, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(True),
)
self.a3 = Inception(192, 64, 96, 128, 16, 32, 32)
self.b3 = Inception(256, 128, 128, 192, 32, 96, 64)
self.maxpool = nn.MaxPool2d(3, stride=2, padding=1)
self.lrn = nn.LocalResponseNorm(2)
self.a4 = Inception(480, 192, 96, 208, 16, 48, 64)
if aux_classifier:
self.aux0 = AuxClassifier(in_channels=512, num_classes=num_classes)
self.b4 = Inception(512, 160, 112, 224, 24, 64, 64)
self.c4 = Inception(512, 128, 128, 256, 24, 64, 64)
self.d4 = Inception(512, 112, 144, 288, 32, 64, 64)
if aux_classifier:
self.aux1 = AuxClassifier(in_channels=528, num_classes=num_classes)
self.e4 = Inception(528, 256, 160, 320, 32, 128, 128)
self.a5 = Inception(832, 256, 160, 320, 32, 128, 128)
self.b5 = Inception(832, 384, 192, 384, 48, 128, 128)
self.avgpool = nn.AvgPool2d(7, stride=1)
self.drop = nn.Dropout(p=0.4)
self.linear = nn.Linear(1024, num_classes)
def forward(self, x):
out = self.c1(x)
out = self.maxpool(out)
out = self.lrn(out)
out = self.c2(out)
out = self.c3(out)
out = self.lrn(out)
out = self.maxpool(out)
out = self.a3(out)
out = self.b3(out)
out = self.maxpool(out)
out = self.a4(out)
if self.training and self.aux_classifier:
output0 = self.aux0(out)
out = self.b4(out)
out = self.c4(out)
out = self.d4(out)
if self.training and self.aux_classifier:
output1 = self.aux1(out)
out = self.e4(out)
out = self.maxpool(out)
out = self.a5(out)
out = self.b5(out)
out = self.avgpool(out)
out = out.view(out.size(0), -1)
out = self.drop(out)
out = self.linear(out)
if self.training and self.aux_classifier:
out += (output0 + output1) * 0.3
return out
GoogLeNet Inception V2(2015)
Paper: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
对比InceptionV1,Inception V2的变化:
- 每一层Conv后面增加了BN操作。
- 5x5 卷积层被替换为2个连续的 3x3 卷积层. 网络的最大深度增加 9 个权重层. 降低了计算量,减少了参数量(28%).两个 3x3 卷积层作用可以代替一个 5x5 卷积层.
- 28x28 的 Inception 模块的数量由 2 增加到了 3.
- Inception 模块,Ave 和 Max Pooling 层均有用到. 参考表格.
- 两个 Inception 模块间不再使用 pooling 层;而在模块 3c 和 4e 中的 concatenation 前采用了 stride-2 conv/pooling 层.
- 网络结构的第一个卷积层采用了深度乘子为 8 的可分离卷积(separable convolution with depth multiplier 8),减少了计算量,但训练时增加了内存消耗.
注:BN那篇文章算是V1的加强版,后来又出了一篇文章把55的卷积改成了两个33的卷积串联,它说一个55的卷积看起来像是一个55的全连接,所以干脆用两个3*3的卷积,第一层是卷积,第二层相当于全连接,这样可以增加网络的深度,并且减少了很多参数。
import torch
import torch.nn as nn
__all__ = ['InceptionV2', 'inception_v2']
# modified according to
# https://github.com/Cadene/pretrained-models.pytorch/blob/master/pretrainedmodels/models/bninception.py
# batch normalization & 3×3*2 & delete maxpool in 3c and 4e
def inception_v2(**kwargs):
return InceptionV2(**kwargs)
class Inception_2(nn.Module):
def __init__(self,
in_planes,
n1x1,
n3x3red,
n3x3,
n5x5red,
n5x5,
pool_planes,
pool_type='avg'):
super(Inception_2, self).__init__()
# 1x1 conv branch
self.b1 = nn.Sequential(
nn.Conv2d(in_planes, n1x1, kernel_size=1),
nn.BatchNorm2d(n1x1, affine=True),
nn.ReLU(True),
)
# 1x1 conv -> 3x3 conv branch
self.b2 = nn.Sequential(
nn.Conv2d(in_planes, n3x3red, kernel_size=1),
nn.BatchNorm2d(n3x3red, affine=True),
nn.ReLU(True),
nn.Conv2d(n3x3red, n3x3, kernel_size=3, padding=1),
nn.BatchNorm2d(n3x3, affine=True),
nn.ReLU(True),
)
# 1x1 conv -> 5x5 conv branch
self.b3 = nn.Sequential(
nn.Conv2d(in_planes, n5x5red, kernel_size=1),
nn.BatchNorm2d(n5x5red, affine=True),
nn.ReLU(True),
nn.Conv2d(n5x5red, n5x5, kernel_size=3, padding=1),
nn.BatchNorm2d(n5x5, affine=True),
nn.ReLU(True),
nn.Conv2d(n5x5, n5x5, kernel_size=3, padding=1),
nn.BatchNorm2d(n5x5, affine=True),
nn.ReLU(True),
)
# 3x3 pool
if pool_type == 'avg':
self.b4 = nn.Sequential(nn.AvgPool2d(3, stride=1, padding=1), )
else:
self.b4 = nn.Sequential(nn.MaxPool2d(3, stride=1, padding=1), )
# 1x1 conv branch
self.b5 = nn.Sequential(
nn.Conv2d(in_planes, pool_planes, kernel_size=1),
nn.BatchNorm2d(pool_planes, affine=True),
nn.ReLU(True),
)
def forward(self, x):
y1 = self.b1(x)
y2 = self.b2(x)
y3 = self.b3(x)
y4_pool = self.b4(x)
y4 = self.b5(y4_pool)
return torch.cat([y1, y2, y3, y4], 1)
class Inception_through(nn.Module):
def __init__(self, in_planes, n3x3red, n3x3, n3x3red_double, n3x3_double):
super(Inception_through, self).__init__()
# 1x1 conv -> 3x3 conv branch
self.b2 = nn.Sequential(
nn.Conv2d(in_planes, n3x3red, kernel_size=1),
nn.BatchNorm2d(n3x3red, affine=True),
nn.ReLU(True),
nn.Conv2d(n3x3red, n3x3, kernel_size=3, stride=2, padding=1),
nn.BatchNorm2d(n3x3, affine=True),
nn.ReLU(True),
)
# 1x1 conv -> 5x5 conv branch
self.b3 = nn.Sequential(
nn.Conv2d(in_planes, n3x3red_double, kernel_size=1),
nn.BatchNorm2d(n3x3red_double, affine=True),
nn.ReLU(True),
nn.Conv2d(n3x3red_double, n3x3_double, kernel_size=3, padding=1),
nn.BatchNorm2d(n3x3_double, affine=True),
nn.ReLU(True),
nn.Conv2d(
n3x3_double, n3x3_double, kernel_size=3, stride=2, padding=1),
nn.BatchNorm2d(n3x3_double, affine=True),
nn.ReLU(True),
)
# 3x3 pool -> 1x1 conv branch
self.b4 = nn.Sequential(nn.MaxPool2d(3, stride=2, padding=1), )
def forward(self, x):
y2 = self.b2(x)
y3 = self.b3(x)
y4 = self.b4(x)
return torch.cat([y2, y3, y4], 1)
class InceptionV2(nn.Module):
def __init__(self, num_classes=1000):
super(InceptionV2, self).__init__()
self.c1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
nn.ReLU(True),
)
self.c2 = nn.Sequential(
nn.Conv2d(64, 64, kernel_size=1, stride=1),
nn.ReLU(True),
nn.Conv2d(64, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(True),
)
self.a3 = Inception_2(192, 64, 64, 64, 64, 96, 32)
self.b3 = Inception_2(256, 64, 64, 96, 64, 96, 64)
self.c3 = Inception_through(320, 128, 160, 64, 96)
self.maxpool = nn.MaxPool2d(3, stride=2, padding=1)
self.lrn = nn.LocalResponseNorm(2)
self.a4 = Inception_2(576, 224, 64, 96, 96, 128, 128)
self.b4 = Inception_2(576, 192, 96, 128, 96, 128, 128)
self.c4 = Inception_2(576, 160, 128, 160, 128, 160, 128)
self.d4 = Inception_2(608, 96, 128, 192, 160, 192, 128)
self.e4 = Inception_through(608, 128, 192, 192, 256)
self.a5 = Inception_2(1056, 352, 192, 320, 160, 224, 128)
self.b5 = Inception_2(1024, 352, 192, 320, 192, 224, 128, 'max')
self.avgpool = nn.AvgPool2d(7, stride=1)
self.linear = nn.Linear(1024, num_classes)
def forward(self, x):
out = self.c1(x)
out = self.maxpool(out)
out = self.lrn(out)
out = self.c2(out)
out = self.lrn(out)
out = self.maxpool(out)
out = self.a3(out)
out = self.b3(out)
out = self.c3(out)
out = self.a4(out)
out = self.b4(out)
out = self.c4(out)
out = self.d4(out)
out = self.e4(out)
out = self.a5(out)
out = self.b5(out)
out = self.avgpool(out)
out = out.view(out.size(0), -1)
out = self.linear(out)
return out
GoogLeNet Inception V3(2016)
Paper: Rethinking the Inception Architecture for Computer Vision
Google团队又对其进行了进一步发掘改进,产生了升级版本的GoogLeNet。这一节介绍的版本记为V3,文章为:《Rethinking the Inception Architecture for Computer Vision》。文章试图找到一种方法在扩大网络的同时又尽可能地发挥计算性能。
首先,GoogLeNet V1出现的同期,性能与之接近的大概只有VGGNet了,并且二者在图像分类之外的很多领域都得到了成功的应用。但是相比之下,GoogLeNet的计算效率明显高于VGGNet,大约只有500万参数,只相当于Alexnet的1/12(GoogLeNet的caffemodel大约50M,VGGNet的caffemodel则要超过600M)。
GoogLeNet的表现很好,但是,如果想要通过简单地放大Inception结构来构建更大的网络,则会立即提高计算消耗。此外,在V1版本中,文章也没给出有关构建Inception结构注意事项的清晰描述。因此,在文章中作者首先给出了一些已经被证明有效的用于放大网络的通用准则和优化方法。这些准则和方法适用但不局限于Inception结构。
General Design Principles
下面的准则来源于大量的实验,因此包含一定的推测,但实际证明基本都是有效的。
1 . 避免表达瓶颈,特别是在网络靠前的地方。 信息流前向传播过程中显然不能经过高度压缩的层,即表达瓶颈。从input到output,feature map的宽和高基本都会逐渐变小,但是不能一下子就变得很小。比如你上来就来个kernel = 7, stride = 5 ,这样显然不合适。
另外输出的维度channel,一般来说会逐渐增多(每层的num_output),否则网络会很难训练。(特征维度并不代表信息的多少,只是作为一种估计的手段)
2 . 高维特征更易处理。 高维特征更易区分,会加快训练。
- 可以在低维嵌入上进行空间汇聚而无需担心丢失很多信息。 比如在进行3x3卷积之前,可以对输入先进行降维而不会产生严重的后果。假设信息可以被简单压缩,那么训练就会加快。
4 . 平衡网络的宽度与深度。
上述的这些并不能直接用来提高网络质量,而仅用来在大环境下作指导。
Factorizing Convolutions with Large Filter Size
大尺寸的卷积核可以带来更大的感受野,但也意味着更多的参数,比如5x5卷积核参数是3x3卷积核的25/9=2.78倍。为此,作者提出可以用2个连续的3x3卷积层(stride=1)组成的小网络来代替单个的5x5卷积层,(保持感受野范围的同时又减少了参数量)如下图:
2个疑问:
1 . 这种替代会造成表达能力的下降吗?
后面有大量实验可以表明不会造成表达缺失;
2 . 3x3卷积之后还要再加激活吗?
作者也做了对比试验,表明添加非线性激活会提高性能。
从上面来看,大卷积核完全可以由一系列的3x3卷积核来替代,那能不能分解的更小一点呢。文章考虑了 nx1 卷积核。
如下图所示的取代3x3卷积:
于是,任意nxn的卷积都可以通过1xn卷积后接nx1卷积来替代 。实际上,作者发现在网络的前期使用这种分解效果并不好,在中度大小的feature map上使用效果才会更好。(对于mxm大小的feature map,建议m在12到20之间)。
总结如下图:
前两个inception模块都是根据上面提到的规则3建立的。最后一个模块时根据规则2建立的,也就是在不同的方向上找不同的特征加速收敛。
三个inception模块的摆放位置也有学问,figure 6在中间,因为作者发现非对称卷积用在网络中靠中间的层级才有较好的效果。figure7放最后面,此时特征图抽象层次已经比较高,使用不同方向的非对称卷积生成相关性低的特征(正交),加速训练的收敛速度。
(1) 图4是GoogLeNet V1中使用的Inception结构;
(2) 图5是用两个3x3卷积来代替5x5大卷积核;
(3) 图6是用nx1卷积来代替大卷积核,这里设定n=7来应对17x17大小的feature map。该结构被正式用在GoogLeNet V3中。
(根据规则1)除了上述三个inception模块,还有一个并行Pooling结构:
在“Inception-v2”行,变化是累积的并且接下来的每一行都包含除了前面的变化之外的新变化。最后一行是所有的变化,称为“Inception-v3”。
(注意,这篇论文把最低配的V3称为V2,但是前面一篇讲BN的论文才是通常意义上的V2)
V3相对于V2的变化:
- 用RMSProp训练
- 使用Label smoothing进行模型正则
- 非对称卷积(n=7)
- 加入带BN的辅助分类器
总结网络细节:
- 没有大的卷积核,55用两个33代替。
- 1x3和3x1代替3x3卷积核
- inception V3把googlenet里一些77的卷积变成了17和71的两层串联,33的也一样,变成了13和31,这样加速了计算,还增加了网络的非线性,减小过拟合的概率。另外,网络的输入从224改成了299.
- inception v3把结构弄的更复杂,主要想减小计算量并提高了精度。相比inception v1,参数数量和计算量增加有限(参考上面最后一个表最后一列),但是精度大为提高了。
(其它细节:inception V1的inception模块内的pool方法是max pool, 从inception V2开始都变成averange pool了)
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
__all__ = ['InceptionV3', 'inception_v3']
# modified according to https://github.com/JJBOY/CNN-repository/blob/master/model/Inception_v3.py
def inception_v3(**kwargs):
return InceptionV3(aux_logits=False, **kwargs)
class InceptionV3(nn.Module):
def __init__(self,
num_classes=1000,
aux_logits=True):
super(InceptionV3, self).__init__()
self.aux_logits = aux_logits
self.Conv2d_1a_3x3 = BasicConv2d(3, 32, kernel_size=3, stride=2)
self.Conv2d_2a_3x3 = BasicConv2d(32, 32, kernel_size=3)
self.Conv2d_2b_3x3 = BasicConv2d(32, 64, kernel_size=3, padding=1)
self.Conv2d_3a_3x3 = BasicConv2d(64, 80, kernel_size=3)
self.Conv2d_3b_3x3 = BasicConv2d(80, 192, kernel_size=3, stride=2)
self.Conv2d_3c_3x3 = BasicConv2d(192, 288, kernel_size=3, padding=1)
self.Mixed_5b = InceptionA(288, pool_features=64)
self.Mixed_5c = InceptionA(288, pool_features=64)
self.Mixed_5d = InceptionA(288, pool_features=64)
self.Mixed_5to6 = InceptionB(288)
self.Mixed_6a = InceptionC(768, channels_7x7=128)
self.Mixed_6b = InceptionC(768, channels_7x7=160)
self.Mixed_6c = InceptionC(768, channels_7x7=160)
self.Mixed_6d = InceptionC(768, channels_7x7=160)
self.Mixed_6e = InceptionC(768, channels_7x7=192)
if aux_logits:
self.AuxLogits = InceptionAux(768, num_classes)
self.Mixed_7a = InceptionD(768)
self.Mixed_7b = InceptionE(1280)
self.Mixed_7c = InceptionE(2048)
self.avg_pool = nn.AvgPool2d(8)
self.fc = nn.Linear(2048, num_classes)
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
'''
if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
import scipy.stats as stats
stddev = m.stddev if hasattr(m, 'stddev') else 0.1
X = stats.truncnorm(-2, 2, scale=stddev)
values = torch.Tensor(X.rvs(m.weight.numel()))
values = values.view(m.weight.size())
m.weight.data = values
#m.weight.data.copy_(values)
'''
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
def forward(self, x):
# 299 x 299 x 3
x = self.Conv2d_1a_3x3(x)
# 149 x 149 x 32
x = self.Conv2d_2a_3x3(x)
# 147 x 147 x 32
x = self.Conv2d_2b_3x3(x)
# 147 x 147 x 64
x = F.max_pool2d(x, kernel_size=3, stride=2)
# 73 x 73 x 64
x = self.Conv2d_3a_3x3(x)
# 71 x 71 x 80
x = self.Conv2d_3b_3x3(x)
# 35 x 35 x 192
x = self.Conv2d_3c_3x3(x)
# 35 x 35 x 288
x = self.Mixed_5b(x)
# 35 x 35 x 288
x = self.Mixed_5c(x)
# 35 x 35 x 288
x = self.Mixed_5d(x)
# 35 x 35 x 288
x = self.Mixed_5to6(x)
# 17 x 17 x 768
x = self.Mixed_6a(x)
# 17 x 17 x 768
x = self.Mixed_6b(x)
# 17 x 17 x 768
x = self.Mixed_6c(x)
# 17 x 17 x 768
x = self.Mixed_6d(x)
# 17 x 17 x 768
x = self.Mixed_6e(x)
# 17 x 17 x 768
if self.training and self.aux_logits:
aux = self.AuxLogits(x)
# 17 x 17 x 768
x = self.Mixed_7a(x)
# 8 x 8 x 1280
x = self.Mixed_7b(x)
# 8 x 8 x 2048
x = self.Mixed_7c(x)
# 8 x 8 x 2048
x = self.avg_pool(x)
# x = F.avg_pool2d(x, kernel_size=8)
# 1 x 1 x 2048
# x = F.dropout(x, training=self.training)
# 1 x 1 x 2048
x = x.view(x.size(0), -1)
# 2048
x = self.fc(x)
# 1000 (num_classes)
if self.training and self.aux_logits:
return x, aux
return x
class InceptionA(nn.Module):
def __init__(self, in_channels, pool_features):
super(InceptionA, self).__init__()
self.branch1x1 = BasicConv2d(in_channels, 64, kernel_size=1)
self.branch5x5_1 = BasicConv2d(in_channels, 48, kernel_size=1)
self.branch5x5_2 = BasicConv2d(48, 64, kernel_size=5, padding=2)
self.branch3x3dbl_1 = BasicConv2d(in_channels, 64, kernel_size=1)
self.branch3x3dbl_2 = BasicConv2d(64, 96, kernel_size=3, padding=1)
self.branch3x3dbl_3 = BasicConv2d(96, 96, kernel_size=3, padding=1)
self.branch_pool = BasicConv2d(
in_channels, pool_features, kernel_size=1)
def forward(self, x):
branch1x1 = self.branch1x1(x)
branch5x5 = self.branch5x5_1(x)
branch5x5 = self.branch5x5_2(branch5x5)
branch3x3dbl = self.branch3x3dbl_1(x)
branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
branch3x3dbl = self.branch3x3dbl_3(branch3x3dbl)
branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
branch_pool = self.branch_pool(branch_pool)
outputs = [branch1x1, branch5x5, branch3x3dbl, branch_pool]
return torch.cat(outputs, 1)
class InceptionB(nn.Module):
def __init__(self, in_channels):
super(InceptionB, self).__init__()
self.branch3x3_1 = BasicConv2d(in_channels, 64, kernel_size=1)
self.branch3x3_2 = BasicConv2d(64, 384, kernel_size=3, stride=2)
self.branch3x3dbl_1 = BasicConv2d(in_channels, 64, kernel_size=1)
self.branch3x3dbl_2 = BasicConv2d(64, 96, kernel_size=3, padding=1)
self.branch3x3dbl_3 = BasicConv2d(96, 96, kernel_size=3, stride=2)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2)
def forward(self, x):
branch3x3 = self.branch3x3_1(x)
branch3x3 = self.branch3x3_2(branch3x3)
branch3x3dbl = self.branch3x3dbl_1(x)
branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
branch3x3dbl = self.branch3x3dbl_3(branch3x3dbl)
branch_pool = self.maxpool(x)
outputs = [branch3x3, branch3x3dbl, branch_pool]
return torch.cat(outputs, 1)
class InceptionC(nn.Module):
def __init__(self, in_channels, channels_7x7):
super(InceptionC, self).__init__()
self.branch1x1 = BasicConv2d(in_channels, 192, kernel_size=1)
c7 = channels_7x7
self.branch7x7_1 = BasicConv2d(in_channels, c7, kernel_size=1)
self.branch7x7_2 = BasicConv2d(
c7, c7, kernel_size=(1, 7), padding=(0, 3))
self.branch7x7_3 = BasicConv2d(
c7, 192, kernel_size=(7, 1), padding=(3, 0))
self.branch7x7dbl_1 = BasicConv2d(in_channels, c7, kernel_size=1)
self.branch7x7dbl_2 = BasicConv2d(
c7, c7, kernel_size=(7, 1), padding=(3, 0))
self.branch7x7dbl_3 = BasicConv2d(
c7, c7, kernel_size=(1, 7), padding=(0, 3))
self.branch7x7dbl_4 = BasicConv2d(
c7, c7, kernel_size=(7, 1), padding=(3, 0))
self.branch7x7dbl_5 = BasicConv2d(
c7, 192, kernel_size=(1, 7), padding=(0, 3))
self.branch_pool = BasicConv2d(in_channels, 192, kernel_size=1)
def forward(self, x):
branch1x1 = self.branch1x1(x)
branch7x7 = self.branch7x7_1(x)
branch7x7 = self.branch7x7_2(branch7x7)
branch7x7 = self.branch7x7_3(branch7x7)
branch7x7dbl = self.branch7x7dbl_1(x)
branch7x7dbl = self.branch7x7dbl_2(branch7x7dbl)
branch7x7dbl = self.branch7x7dbl_3(branch7x7dbl)
branch7x7dbl = self.branch7x7dbl_4(branch7x7dbl)
branch7x7dbl = self.branch7x7dbl_5(branch7x7dbl)
branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
branch_pool = self.branch_pool(branch_pool)
outputs = [branch1x1, branch7x7, branch7x7dbl, branch_pool]
return torch.cat(outputs, 1)
class InceptionD(nn.Module):
def __init__(self, in_channels):
super(InceptionD, self).__init__()
self.branch3x3_1 = BasicConv2d(in_channels, 192, kernel_size=1)
self.branch3x3_2 = BasicConv2d(192, 320, kernel_size=3, stride=2)
self.branch7x7x3_1 = BasicConv2d(in_channels, 192, kernel_size=1)
self.branch7x7x3_2 = BasicConv2d(
192, 192, kernel_size=(1, 7), padding=(0, 3))
self.branch7x7x3_3 = BasicConv2d(
192, 192, kernel_size=(7, 1), padding=(3, 0))
self.branch7x7x3_4 = BasicConv2d(192, 192, kernel_size=3, stride=2)
def forward(self, x):
branch3x3 = self.branch3x3_1(x)
branch3x3 = self.branch3x3_2(branch3x3)
branch7x7x3 = self.branch7x7x3_1(x)
branch7x7x3 = self.branch7x7x3_2(branch7x7x3)
branch7x7x3 = self.branch7x7x3_3(branch7x7x3)
branch7x7x3 = self.branch7x7x3_4(branch7x7x3)
branch_pool = F.max_pool2d(x, kernel_size=3, stride=2)
outputs = [branch3x3, branch7x7x3, branch_pool]
return torch.cat(outputs, 1)
class InceptionE(nn.Module):
def __init__(self, in_channels):
super(InceptionE, self).__init__()
self.branch1x1 = BasicConv2d(in_channels, 320, kernel_size=1)
self.branch3x3_1 = BasicConv2d(in_channels, 384, kernel_size=1)
self.branch3x3_2a = BasicConv2d(
384, 384, kernel_size=(1, 3), padding=(0, 1))
self.branch3x3_2b = BasicConv2d(
384, 384, kernel_size=(3, 1), padding=(1, 0))
self.branch3x3dbl_1 = BasicConv2d(in_channels, 448, kernel_size=1)
self.branch3x3dbl_2 = BasicConv2d(448, 384, kernel_size=3, padding=1)
self.branch3x3dbl_3a = BasicConv2d(
384, 384, kernel_size=(1, 3), padding=(0, 1))
self.branch3x3dbl_3b = BasicConv2d(
384, 384, kernel_size=(3, 1), padding=(1, 0))
self.branch_pool = BasicConv2d(in_channels, 192, kernel_size=1)
def forward(self, x):
branch1x1 = self.branch1x1(x)
branch3x3 = self.branch3x3_1(x)
branch3x3 = [
self.branch3x3_2a(branch3x3),
self.branch3x3_2b(branch3x3),
]
branch3x3 = torch.cat(branch3x3, 1)
branch3x3dbl = self.branch3x3dbl_1(x)
branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
branch3x3dbl = [
self.branch3x3dbl_3a(branch3x3dbl),
self.branch3x3dbl_3b(branch3x3dbl),
]
branch3x3dbl = torch.cat(branch3x3dbl, 1)
branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
branch_pool = self.branch_pool(branch_pool)
outputs = [branch1x1, branch3x3, branch3x3dbl, branch_pool]
return torch.cat(outputs, 1)
class InceptionAux(nn.Module):
def __init__(self, in_channels, num_classes):
super(InceptionAux, self).__init__()
self.conv0 = BasicConv2d(in_channels, 128, kernel_size=1)
self.conv1 = BasicConv2d(128, 768, kernel_size=5)
self.conv1.stddev = 0.01
self.fc = nn.Linear(768, num_classes)
self.fc.stddev = 0.001
def forward(self, x):
# 17 x 17 x 768
x = F.avg_pool2d(x, kernel_size=5, stride=3)
# 5 x 5 x 768
x = self.conv0(x)
# 5 x 5 x 128
x = self.conv1(x)
# 1 x 1 x 768
x = x.view(x.size(0), -1)
# 768
x = self.fc(x)
# 1000
return x
class BasicConv2d(nn.Module):
def __init__(self, in_channels, out_channels, **kwargs):
super(BasicConv2d, self).__init__()
self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
self.bn = nn.BatchNorm2d(out_channels, eps=0.001)
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
return F.relu(x, inplace=True)
if __name__ == '__main__':
model = inception_v3()
print(model)
GoogLeNet Inception V4(2016)
Paper:Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Inception V4的论文总共提出三个网络:Inception-V4、Inception-ResNet-V1、Inception-ResNet-V2。
Inception-ResNet-V1可以看做是对Inception-V3的改进,计算量和最终测试性能都差不多。
Inception-ResNet-V2测试结果和Inception-V4也十分接近。
一个模型进化4次,基本套路都熟悉了,所以作者把网络的大体架构都固定下来了:
- 入口称为Stem模块(V3里就有类似的部分,只是没作为整体拿出来讲),负责提取底层特征并把分辨率从299x299降到35x35;
- 三种inception模块,分别在35x35、17x17和8x8上工作;
- 然后是两种Reduction模块,放在三种inception之间
所以每个网络只要讲明白3种inception模块、2种Reduction模块和Stem模块共6个模块的细节,它就基本确定了。
Inception-V4的架构图如下:
Inception-ResNet-V1、Inception-ResNet-V2共用相同的架构图:
架构看上去比较眼熟,和Inception-V4基本一样。
由于架构上没有大的变化,所以变化的只有Stem模块、Inception模块和Reduction模块的细节。这里主要讲一下inception-V4的变化。
相对Inception-V3, inception-V4的stem模块有了较大变化:
这个stem模块是inception-V4和Inception-ResNet-V2共用的,相对Inception-V3简单的多层卷积核池化的累加,Inception-V4的stem模块用到了Inception-V3论文提到了一些基础原则,比如并行池化、非对称卷积等。
Inception-V4的三个inception模块和两个reduction模块,和Inception-V3基本一致(V3论文没有提到inception每个分支的宽度,通过谷歌官方的源码比较,每一种模块V4都增加了通道数,相当于网络变宽了),只是每种inception模块V4都多堆了几个,也就是在增加宽度的同时增加深度。
Inception-ResNet-V1的stem没有inception-V4那么复杂,和inception-V3基本一致(只是多了一层1x1卷积)。
Inception-ResNet-V1的三种Inception-ResNet模块、两种Reduction模块都是和Inception-ResNet-V2很像的,只是Inception-ResNet-V1的通道稍微少了一些。可以认为Inception-ResNet-V1是Inception-ResNet-V2的低配版。
这里只贴一个Inception-ResNet-V1的第一个Inception-ResNet模块,可以看到相比Inception模块,它的宽度降低了。每个Inception块后紧连接着滤波层(没有激活函数的1×1卷积)以进行维度变换,以实现输入的匹配。这样补偿了在Inception块中的维度降低。
Conclusion
Inception-V4:和Inception-V3共享相同的inception模块和reduction模块,但是通道数比Inception-V3多,使用的inception模块个数也多,深度和宽度相对V3都提升了,和Inception-ResNet-V2共用了相同的Stem,网络的预处理部分能力得到了加强;
Inception-ResNet-V1:和Inception-V3共享相同(其实多了一层1x1)的stem模块,和Inception-ResNet-V2共用相同的Inception-ResNet模块和Redution模块,只是网络更窄一点;
Inception-ResNet-V2:和Inception-4共用了相同的Stem;和Inception-ResNet-V1共用相同的Inception-ResNet模块和Redution模块,通道数有所增加。
网络的效果对比:
本文给出实验证明,残差连接可以明显加速Inception网络的训练。同时实验也证明,相比没有残差连接的消耗相似的Inception网络,残差Inception网络在性能上具有微弱的优势。但是优势不明显,所以残差连接在训练深度卷积模型不是必要的。另外作者发现单个框架性能的提升不会引起组合性能大幅的提高,但是多个网络组合提升很多。
import math
import torch
import torch.nn as nn
__all__ = ['InceptionV4', 'inception_v4']
class BasicConv2d(nn.Module):
def __init__(self, in_planes, out_planes, kernel_size, stride, padding=0):
super(BasicConv2d, self).__init__()
self.conv = nn.Conv2d(
in_planes,
out_planes,
kernel_size=kernel_size,
stride=stride,
padding=padding,
bias=False) # verify bias false
self.bn = nn.BatchNorm2d(
out_planes,
eps=0.001, # value found in tensorflow
momentum=0.1, # default pytorch value
affine=True)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
x = self.relu(x)
return x
class Mixed_3a(nn.Module):
def __init__(self):
super(Mixed_3a, self).__init__()
self.maxpool = nn.MaxPool2d(3, stride=2)
self.conv = BasicConv2d(64, 96, kernel_size=3, stride=2)
def forward(self, x):
x0 = self.maxpool(x)
x1 = self.conv(x)
out = torch.cat((x0, x1), 1)
return out
class Mixed_4a(nn.Module):
def __init__(self):
super(Mixed_4a, self).__init__()
self.branch0 = nn.Sequential(
BasicConv2d(160, 64, kernel_size=1, stride=1),
BasicConv2d(64, 96, kernel_size=3, stride=1))
self.branch1 = nn.Sequential(
BasicConv2d(160, 64, kernel_size=1, stride=1),
BasicConv2d(64, 64, kernel_size=(1, 7), stride=1, padding=(0, 3)),
BasicConv2d(64, 64, kernel_size=(7, 1), stride=1, padding=(3, 0)),
BasicConv2d(64, 96, kernel_size=(3, 3), stride=1))
def forward(self, x):
x0 = self.branch0(x)
x1 = self.branch1(x)
out = torch.cat((x0, x1), 1)
return out
class Mixed_5a(nn.Module):
def __init__(self):
super(Mixed_5a, self).__init__()
self.conv = BasicConv2d(192, 192, kernel_size=3, stride=2)
self.maxpool = nn.MaxPool2d(3, stride=2)
def forward(self, x):
x0 = self.conv(x)
x1 = self.maxpool(x)
out = torch.cat((x0, x1), 1)
return out
class Inception_A(nn.Module):
def __init__(self):
super(Inception_A, self).__init__()
self.branch0 = BasicConv2d(384, 96, kernel_size=1, stride=1)
self.branch1 = nn.Sequential(
BasicConv2d(384, 64, kernel_size=1, stride=1),
BasicConv2d(64, 96, kernel_size=3, stride=1, padding=1))
self.branch2 = nn.Sequential(
BasicConv2d(384, 64, kernel_size=1, stride=1),
BasicConv2d(64, 96, kernel_size=3, stride=1, padding=1),
BasicConv2d(96, 96, kernel_size=3, stride=1, padding=1))
self.branch3 = nn.Sequential(
nn.AvgPool2d(3, stride=1, padding=1),
BasicConv2d(384, 96, kernel_size=1, stride=1))
def forward(self, x):
x0 = self.branch0(x)
x1 = self.branch1(x)
x2 = self.branch2(x)
x3 = self.branch3(x)
out = torch.cat((x0, x1, x2, x3), 1)
return out
class Reduction_A(nn.Module):
def __init__(self):
super(Reduction_A, self).__init__()
self.branch0 = BasicConv2d(384, 384, kernel_size=3, stride=2)
self.branch1 = nn.Sequential(
BasicConv2d(384, 192, kernel_size=1, stride=1),
BasicConv2d(192, 224, kernel_size=3, stride=1, padding=1),
BasicConv2d(224, 256, kernel_size=3, stride=2))
self.branch2 = nn.MaxPool2d(3, stride=2)
def forward(self, x):
x0 = self.branch0(x)
x1 = self.branch1(x)
x2 = self.branch2(x)
out = torch.cat((x0, x1, x2), 1)
return out
class Inception_B(nn.Module):
def __init__(self):
super(Inception_B, self).__init__()
self.branch0 = BasicConv2d(1024, 384, kernel_size=1, stride=1)
self.branch1 = nn.Sequential(
BasicConv2d(1024, 192, kernel_size=1, stride=1),
BasicConv2d(
192, 224, kernel_size=(1, 7), stride=1, padding=(0, 3)),
BasicConv2d(
224, 256, kernel_size=(7, 1), stride=1, padding=(3, 0)))
self.branch2 = nn.Sequential(
BasicConv2d(1024, 192, kernel_size=1, stride=1),
BasicConv2d(
192, 192, kernel_size=(1, 7), stride=1, padding=(0, 3)),
BasicConv2d(
192, 224, kernel_size=(7, 1), stride=1, padding=(3, 0)),
BasicConv2d(
224, 224, kernel_size=(1, 7), stride=1, padding=(0, 3)),
BasicConv2d(
224, 256, kernel_size=(7, 1), stride=1, padding=(3, 0)))
self.branch3 = nn.Sequential(
nn.AvgPool2d(3, stride=1, padding=1),
BasicConv2d(1024, 128, kernel_size=1, stride=1))
def forward(self, x):
x0 = self.branch0(x)
x1 = self.branch1(x)
x2 = self.branch2(x)
x3 = self.branch3(x)
out = torch.cat((x0, x1, x2, x3), 1)
return out
class Reduction_B(nn.Module):
def __init__(self):
super(Reduction_B, self).__init__()
self.branch0 = nn.Sequential(
BasicConv2d(1024, 192, kernel_size=1, stride=1),
BasicConv2d(192, 192, kernel_size=3, stride=2))
self.branch1 = nn.Sequential(
BasicConv2d(1024, 256, kernel_size=1, stride=1),
BasicConv2d(
256, 256, kernel_size=(1, 7), stride=1, padding=(0, 3)),
BasicConv2d(
256, 320, kernel_size=(7, 1), stride=1, padding=(3, 0)),
BasicConv2d(320, 320, kernel_size=3, stride=2))
self.branch2 = nn.MaxPool2d(3, stride=2)
def forward(self, x):
x0 = self.branch0(x)
x1 = self.branch1(x)
x2 = self.branch2(x)
out = torch.cat((x0, x1, x2), 1)
return out
class Inception_C(nn.Module):
def __init__(self):
super(Inception_C, self).__init__()
self.branch0 = BasicConv2d(1536, 256, kernel_size=1, stride=1)
self.branch1_0 = BasicConv2d(1536, 384, kernel_size=1, stride=1)
self.branch1_1a = BasicConv2d(
384, 256, kernel_size=(1, 3), stride=1, padding=(0, 1))
self.branch1_1b = BasicConv2d(
384, 256, kernel_size=(3, 1), stride=1, padding=(1, 0))
self.branch2_0 = BasicConv2d(1536, 384, kernel_size=1, stride=1)
self.branch2_1 = BasicConv2d(
384, 448, kernel_size=(1, 3), stride=1, padding=(0, 1))
self.branch2_2 = BasicConv2d(
448, 512, kernel_size=(3, 1), stride=1, padding=(1, 0))
self.branch2_3a = BasicConv2d(
512, 256, kernel_size=(1, 3), stride=1, padding=(0, 1))
self.branch2_3b = BasicConv2d(
512, 256, kernel_size=(3, 1), stride=1, padding=(1, 0))
self.branch3 = nn.Sequential(
nn.AvgPool2d(3, stride=1, padding=1),
BasicConv2d(1536, 256, kernel_size=1, stride=1))
def forward(self, x):
x0 = self.branch0(x)
x1_0 = self.branch1_0(x)
x1_1a = self.branch1_1a(x1_0)
x1_1b = self.branch1_1b(x1_0)
x1 = torch.cat((x1_1a, x1_1b), 1)
x2_0 = self.branch2_0(x)
x2_1 = self.branch2_1(x2_0)
x2_2 = self.branch2_2(x2_1)
x2_3a = self.branch2_3a(x2_2)
x2_3b = self.branch2_3b(x2_2)
x2 = torch.cat((x2_3a, x2_3b), 1)
x3 = self.branch3(x)
out = torch.cat((x0, x1, x2, x3), 1)
return out
class InceptionV4(nn.Module):
def __init__(self, num_classes=1000):
super(InceptionV4, self).__init__()
# Special attributs
self.input_space = None
self.input_size = (299, 299, 3)
self.mean = None
self.std = None
# Modules
self.features = nn.Sequential(
BasicConv2d(3, 32, kernel_size=3, stride=2),
BasicConv2d(32, 32, kernel_size=3, stride=1),
BasicConv2d(32, 64, kernel_size=3, stride=1, padding=1),
Mixed_3a(),
Mixed_4a(),
Mixed_5a(),
Inception_A(),
Inception_A(),
Inception_A(),
Inception_A(),
Reduction_A(), # Mixed_6a
Inception_B(),
Inception_B(),
Inception_B(),
Inception_B(),
Inception_B(),
Inception_B(),
Inception_B(),
Reduction_B(), # Mixed_7a
Inception_C(),
Inception_C(),
Inception_C())
self.avg_pool = nn.AvgPool2d(8)
self.drop = nn.Dropout(p=0.2)
self.last_linear = nn.Linear(1536, num_classes)
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
def logits(self, features):
x = self.avg_pool(features)
x = self.drop(x)
x = x.view(x.size(0), -1)
x = self.last_linear(x)
return x
def forward(self, input):
x = self.features(input)
x = self.logits(x)
return x
def inception_v4(**kwargs):
model = InceptionV4(**kwargs)
return model