GoogLeNet详解-CSDN博客

本文链接：https://blog.csdn.net/weixin_68114439/article/details/147773227

GoogLeNet是Google在2014年提出的深度卷积神经网络架构，在ILSVRC 2014比赛中取得了冠军。它的主要创新在于提出了"Inception"模块，通过精心设计的网络结构在保持计算效率的同时显著提高了性能。

1. 核心创新：Inception模块

Inception模块是GoogLeNet的核心构建块，其设计动机是：在卷积神经网络中，不同尺度的特征提取都很重要。

原始Inception概念（Naive Inception）

最初的Inception模块同时使用：

1×1卷积
3×3卷积
5×5卷积
3×3最大池化
然后将所有滤波器的输出在深度方向上拼接起来

输入
│
├── 1×1卷积 ──┐
├── 3×3卷积 ──┤
├── 5×5卷积 ──┼── 深度拼接 ── 输出
└── 3×3最大池化 ─┘

改进的Inception模块（加入1×1卷积降维）

原始设计存在计算量过大的问题，因此在3×3和5×5卷积前以及池化后加入1×1卷积进行降维：

输入
│
├── 1×1卷积 ────────────────┐
├── 1×1卷积 → 3×3卷积 ───────┤
├── 1×1卷积 → 5×5卷积 ───────┼── 深度拼接 ── 输出
└── 3×3最大池化 → 1×1卷积 ───┘

这种设计：

减少了计算量（1×1卷积可以显著减少特征图深度）
增加了网络的深度和非线性（每个1×1卷积后都有ReLU激活）
保持了多尺度特征提取的能力

2. GoogLeNet整体架构

GoogLeNet（又称Inception v1）由9个Inception模块堆叠而成，整体结构如下：

初始卷积层
- 7×7卷积，步长2，输出112×112×64
- 最大池化3×3，步长2
局部响应归一化(LRN)
1×1卷积降维
3×3卷积
LRN
最大池化
Inception(3a)到Inception(5b)的堆叠（共9个Inception模块）
平均池化层（替代全连接层）
Dropout(40%)
全连接层+Softmax

辅助分类器（Auxiliary Classifiers）

为了解决深度网络中的梯度消失问题，GoogLeNet在网络中间层添加了两个辅助分类器：

位于Inception(4a)和Inception(4d)之后
结构：平均池化 → 1×1卷积 → 全连接 → 全连接 → Softmax
训练时三个分类器的损失加权求和（主分类器权重1，辅助分类器各0.3）
测试时只使用主分类器

3. 关键技术与优势

1×1卷积的作用：
- 降维减少计算量
- 增加非线性（配合ReLU）
- 跨通道信息整合
全局平均池化：
- 替代全连接层减少参数
- 降低过拟合风险
高效的计算分配：
- 大部分计算集中在3×3和5×5卷积
- 通过1×1卷积合理控制计算量
多尺度特征融合：
- 不同大小的卷积核并行处理
- 自动学习最优的特征组合

4. 后续发展

GoogLeNet之后又发展出多个改进版本：

Inception v2/v3：
- 引入BN（批归一化）
- 分解大卷积核（如5×5分解为两个3×3）
- 更高效的降维方式
Inception v4：
- 结合残差连接（ResNet思想）
- 更统一的Inception模块设计
Xception：
- 极端Inception（Extreme Inception）
- 深度可分离卷积的应用

5.PyTorch实现

下面我将详细介绍GoogLeNet架构，并提供完整的PyTorch实现代码。

5.1. Inception模块实现

import torch
import torch.nn as nn
import torch.nn.functional as F

class Inception(nn.Module):
    def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
        super(Inception, self).__init__()
        
        # 1x1卷积分支
        self.branch1 = nn.Sequential(
            nn.Conv2d(in_channels, ch1x1, kernel_size=1),
            nn.BatchNorm2d(ch1x1),
            nn.ReLU(inplace=True)
        )
        
        # 1x1卷积 + 3x3卷积分支
        self.branch2 = nn.Sequential(
            nn.Conv2d(in_channels, ch3x3red, kernel_size=1),
            nn.BatchNorm2d(ch3x3red),
            nn.ReLU(inplace=True),
            nn.Conv2d(ch3x3red, ch3x3, kernel_size=3, padding=1),
            nn.BatchNorm2d(ch3x3),
            nn.ReLU(inplace=True)
        )
        
        # 1x1卷积 + 5x5卷积分支
        self.branch3 = nn.Sequential(
            nn.Conv2d(in_channels, ch5x5red, kernel_size=1),
            nn.BatchNorm2d(ch5x5red),
            nn.ReLU(inplace=True),
            nn.Conv2d(ch5x5red, ch5x5, kernel_size=5, padding=2),
            nn.BatchNorm2d(ch5x5),
            nn.ReLU(inplace=True)
        )
        
        # 3x3池化 + 1x1卷积分支
        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            nn.Conv2d(in_channels, pool_proj, kernel_size=1),
            nn.BatchNorm2d(pool_proj),
            nn.ReLU(inplace=True)
        )
    
    def forward(self, x):
        branch1 = self.branch1(x)
        branch2 = self.branch2(x)
        branch3 = self.branch3(x)
        branch4 = self.branch4(x)
        
        outputs = [branch1, branch2, branch3, branch4]
        return torch.cat(outputs, 1)

5.2. 辅助分类器实现

class AuxiliaryClassifier(nn.Module):
    def __init__(self, in_channels, num_classes):
        super(AuxiliaryClassifier, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d((4, 4))
        self.conv = nn.Sequential(
            nn.Conv2d(in_channels, 128, kernel_size=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True)
        )
        self.fc1 = nn.Linear(128 * 4 * 4, 1024)
        self.fc2 = nn.Linear(1024, num_classes)
        self.dropout = nn.Dropout(0.7)
    
    def forward(self, x):
        x = self.avg_pool(x)
        x = self.conv(x)
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

5.3. 完整的GoogLeNet实现

class GoogLeNet(nn.Module):
    def __init__(self, num_classes=1000, aux_logits=True):
        super(GoogLeNet, self).__init__()
        self.aux_logits = aux_logits
        
        # 初始卷积层
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )
        
        self.conv2 = nn.Sequential(
            nn.Conv2d(64, 64, kernel_size=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 192, kernel_size=3, padding=1),
            nn.BatchNorm2d(192),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )
        
        # Inception模块
        self.inception3a = Inception(192, 64, 96, 128, 16, 32, 32)
        self.inception3b = Inception(256, 128, 128, 192, 32, 96, 64)
        self.maxpool3 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        
        self.inception4a = Inception(480, 192, 96, 208, 16, 48, 64)
        self.inception4b = Inception(512, 160, 112, 224, 24, 64, 64)
        self.inception4c = Inception(512, 128, 128, 256, 24, 64, 64)
        self.inception4d = Inception(512, 112, 144, 288, 32, 64, 64)
        self.inception4e = Inception(528, 256, 160, 320, 32, 128, 128)
        self.maxpool4 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        
        self.inception5a = Inception(832, 256, 160, 320, 32, 128, 128)
        self.inception5b = Inception(832, 384, 192, 384, 48, 128, 128)
        
        # 辅助分类器
        if self.aux_logits:
            self.aux1 = AuxiliaryClassifier(512, num_classes)
            self.aux2 = AuxiliaryClassifier(528, num_classes)
        
        # 全局平均池化和分类器
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(0.4)
        self.fc = nn.Linear(1024, num_classes)
        
    def forward(self, x):
        # 初始卷积层
        x = self.conv1(x)
        x = self.conv2(x)
        
        # Inception模块组1
        x = self.inception3a(x)
        x = self.inception3b(x)
        x = self.maxpool3(x)
        
        # Inception模块组2
        x = self.inception4a(x)
        
        # 辅助分类器1
        if self.training and self.aux_logits:
            aux1 = self.aux1(x)
        
        x = self.inception4b(x)
        x = self.inception4c(x)
        x = self.inception4d(x)
        
        # 辅助分类器2
        if self.training and self.aux_logits:
            aux2 = self.aux2(x)
        
        x = self.inception4e(x)
        x = self.maxpool4(x)
        
        # Inception模块组3
        x = self.inception5a(x)
        x = self.inception5b(x)
        
        # 全局平均池化和分类
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.dropout(x)
        x = self.fc(x)
        
        if self.training and self.aux_logits:
            return x, aux1, aux2
        return x

5.4. 模型使用示例

# 创建模型实例
model = GoogLeNet(num_classes=1000)

# 输入示例
input_tensor = torch.randn(1, 3, 224, 224)  # 假设输入为224x224的RGB图像

# 前向传播
output = model(input_tensor)

# 训练时输出三个分类结果，测试时只输出主分类结果
if model.training and model.aux_logits:
    main_output, aux1_output, aux2_output = output
    print("Main output shape:", main_output.shape)
    print("Auxiliary output 1 shape:", aux1_output.shape)
    print("Auxiliary output 2 shape:", aux2_output.shape)
else:
    print("Output shape:", output.shape)