【论文阅读】GoogLeNet(2014)

题目:Going deeper with convolutions、Rethinking the Inception Architecture for Computer Vision
链接:https://arxiv.org/abs/1409.4842、https://arxiv.org/abs/1512.00567
作者:Christian Szegedy等
摘要:
在这里插入图片描述在这里插入图片描述
Inception-v1是一篇挺好玩的文章,一方面模块“Inception”和标题“go deeper”致敬了经典科幻电影盗梦空间(《Inception》&“We need to go deeper”);另一方面GoogLeNet也致敬了开山的LeNet。

简介

GoogLeNet受Network in network的启发,引入了 1 × 1 1\times 1 1×1的卷积层来增加网络的表示能力,从而能够加宽和加深网络(22层)。同时,GoogLeNet也考虑了参数数量(AlexNet的 1 12 \frac{1}{12} 121)和推断时间的权衡,从而具有很大的实际意义。题目中的deeper既指表面上的增加网络深度,也指探索新型的网络架构Inception。

动机

最直接的提高DNN性能的方式就是增加深度和宽度了。然而这样做有两个缺点:过拟合&时空低效率。于是作者提出把全联接层改成稀疏连接(尽管卷积本身就是一种稀疏连接)。稀疏连接在数值计算方面也有着明显的高效性。

网络架构

1 × 1 1\times 1 1×1卷积

这里先解释一下为什么 1 × 1 1\times 1 1×1卷积等效于全连接。
对于 5 × 5 5\times 5 5×5的feature map,如果用25*10的转移矩阵,就会得到10维向量。按照矩阵乘法的定义,向量的每个元素都是feature map中25个元素的线性组合;而如果换成 10 × 1 × 1 10\times 1\times 1 10×1×1的卷积核,则会得到 10 × 5 × 5 10\times 5\times 5 10×5×5的feature map,每个通道内求和即得到等效的10维向量。

辅助模块

在v1中,辅助模块主要用于在训练阶段增加浅层的特征提取能力,然而后续版本中提到,这一模块的主要作用是进行正则化,和dropout、bn类似。但是pytorch代码中缺少了ReLU和Dropout。
在这里插入图片描述

上图是v1;下面的代码是PyTorch Inceptionv3,见Rethink一文的fig.8。

class InceptionAux(nn.Module):

    def __init__(self, in_channels, num_classes):
        super(InceptionAux, self).__init__()
        self.conv0 = BasicConv2d(in_channels, 128, kernel_size=1)
        self.conv1 = BasicConv2d(128, 768, kernel_size=5)
        self.conv1.stddev = 0.01
        self.fc = nn.Linear(768, num_classes)
        self.fc.stddev = 0.001

    def forward(self, x):
        # N x 768 x 17 x 17
        x = F.avg_pool2d(x, kernel_size=5, stride=3)
        # N x 768 x 5 x 5
        x = self.conv0(x)
        # N x 128 x 5 x 5
        x = self.conv1(x)
        # N x 768 x 1 x 1
        # Adaptive average pooling
        x = F.adaptive_avg_pool2d(x, (1, 1))
        # N x 768 x 1 x 1
        x = x.view(x.size(0), -1)
        # N x 768
        x = self.fc(x)
        # N x 1000
        return x
Inception之前

在这里插入图片描述

图来自v1;代码来自v3,将大核改进成为感受域相同的小卷积核。

    def forward(self, x):
        if self.transform_input:
            x_ch0 = torch.unsqueeze(x[:, 0], 1) * (0.229 / 0.5) + (0.485 - 0.5) / 0.5
            x_ch1 = torch.unsqueeze(x[:, 1], 1) * (0.224 / 0.5) + (0.456 - 0.5) / 0.5
            x_ch2 = torch.unsqueeze(x[:, 2], 1) * (0.225 / 0.5) + (0.406 - 0.5) / 0.5
            x = torch.cat((x_ch0, x_ch1, x_ch2), 1)
        # N x 3 x 299 x 299
        x = self.Conv2d_1a_3x3(x)
        # N x 32 x 149 x 149
        x = self.Conv2d_2a_3x3(x)
        # N x 32 x 147 x 147
        x = self.Conv2d_2b_3x3(x)
        # N x 64 x 147 x 147
        x = F.max_pool2d(x, kernel_size=3, stride=2)
        # N x 64 x 73 x 73
        x = self.Conv2d_3b_1x1(x)
        # N x 80 x 73 x 73
        x = self.Conv2d_4a_3x3(x)
        # N x 192 x 71 x 71
        x = F.max_pool2d(x, kernel_size=3, stride=2)
        # N x 192 x 35 x 35
v3中的各种Inception模块

以下代码均来自 https://github.com/pytorch/vision/blob/master/torchvision/models/inception.py (2019.3.1)
另附TF代码
https://github.com/tensorflow/models/tree/master/research/inception

#InceptionA
    def forward(self, x):
        branch1x1 = self.branch1x1(x)

        branch5x5 = self.branch5x5_1(x)
        branch5x5 = self.branch5x5_2(branch5x5)

        branch3x3dbl = self.branch3x3dbl_1(x)
        branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
        branch3x3dbl = self.branch3x3dbl_3(branch3x3dbl)

        branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
        branch_pool = self.branch_pool(branch_pool)

        outputs = [branch1x1, branch5x5, branch3x3dbl, branch_pool]
        return torch.cat(outputs, 1)
# InceptionB
    def forward(self, x):
        branch3x3 = self.branch3x3(x)

        branch3x3dbl = self.branch3x3dbl_1(x)
        branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
        branch3x3dbl = self.branch3x3dbl_3(branch3x3dbl)

        branch_pool = F.max_pool2d(x, kernel_size=3, stride=2)

        outputs = [branch3x3, branch3x3dbl, branch_pool]
        return torch.cat(outputs, 1)
# InceptionC
    def forward(self, x):
        branch1x1 = self.branch1x1(x)

        branch7x7 = self.branch7x7_1(x)
        branch7x7 = self.branch7x7_2(branch7x7)
        branch7x7 = self.branch7x7_3(branch7x7)

        branch7x7dbl = self.branch7x7dbl_1(x)
        branch7x7dbl = self.branch7x7dbl_2(branch7x7dbl)
        branch7x7dbl = self.branch7x7dbl_3(branch7x7dbl)
        branch7x7dbl = self.branch7x7dbl_4(branch7x7dbl)
        branch7x7dbl = self.branch7x7dbl_5(branch7x7dbl)

        branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
        branch_pool = self.branch_pool(branch_pool)

        outputs = [branch1x1, branch7x7, branch7x7dbl, branch_pool]
        return torch.cat(outputs, 1)
# InceptionD
    def forward(self, x):
        branch3x3 = self.branch3x3_1(x)
        branch3x3 = self.branch3x3_2(branch3x3)

        branch7x7x3 = self.branch7x7x3_1(x)
        branch7x7x3 = self.branch7x7x3_2(branch7x7x3)
        branch7x7x3 = self.branch7x7x3_3(branch7x7x3)
        branch7x7x3 = self.branch7x7x3_4(branch7x7x3)

        branch_pool = F.max_pool2d(x, kernel_size=3, stride=2)
        outputs = [branch3x3, branch7x7x3, branch_pool]
        return torch.cat(outputs, 1)
# InceptionE
    def forward(self, x):
        branch1x1 = self.branch1x1(x)

        branch3x3 = self.branch3x3_1(x)
        branch3x3 = [
            self.branch3x3_2a(branch3x3),
            self.branch3x3_2b(branch3x3),
        ]
        branch3x3 = torch.cat(branch3x3, 1)

        branch3x3dbl = self.branch3x3dbl_1(x)
        branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
        branch3x3dbl = [
            self.branch3x3dbl_3a(branch3x3dbl),
            self.branch3x3dbl_3b(branch3x3dbl),
        ]
        branch3x3dbl = torch.cat(branch3x3dbl, 1)

        branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
        branch_pool = self.branch_pool(branch_pool)

        outputs = [branch1x1, branch3x3, branch3x3dbl, branch_pool]
        return torch.cat(outputs, 1)

训练方式

随机梯度下降,momentum0.9,学习率按照epoch递减,通过Polyak平均更新参数。

参考:
https://blog.csdn.net/docrazy5351/article/details/78993269
https://blog.csdn.net/dcrmg/article/details/79246654
https://www.cnblogs.com/Allen-rg/p/5833919.html
https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202

后记:
以前读这篇文章的时候敷衍了事却又始终耿耿于怀,商汤面试终究还是问到了模型的细节,算是墨菲定律了hhhh。现在结合代码重读此文,也算是解开了一直以来的一个心结,也算是不白挂。
——19.3.1

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值