GoogleNet重点介绍和源码

Philo`

已于 2022-10-26 13:49:31 修改

阅读量1.5k

点赞数 1

分类专栏：经典网络复现文章标签： 1024程序员节 python 深度学习计算机视觉神经网络

于 2022-10-24 14:45:15 首次发布

本文链接：https://blog.csdn.net/qq_44864833/article/details/127447721

版权

GoogLeNet Inception模块辅助分类器卷积神经网络参数计算

关键词由CSDN通过智能技术生成

经典网络复现专栏收录该内容

7 篇文章 2 订阅

订阅专栏

1 GoogleNet背景

GoogLeNet在2014年由Google团队提出（与VGG网络同年，注意GoogLeNet中的L大写是为了致敬LeNet），斩获当年ImageNet竞赛中Classification Task (分类任务) 第一名。
创新点：

引入Inception模块，获取不同尺度的图像特征
使用1*1的卷积层进行数据降维和整个多通道间的信息和特征
添加两个辅助分类器帮助训练
丢弃最后的全连接层使用平均池化层，减少参数
结构图：

2 Inception模块

使用并联的结构，获取图片中不同尺寸的特征，并且在最后进行通道整合，大大减少了参数量并且提升了特征感知力，同时还可以扩宽和扩深网络

2.1 InceptionV1

在这里插入图片描述

要求四个通道最后的输出长宽需要相同，这样才不用裁剪，这里给出Conv2d和MaxPool2d尺寸计算公式：尺寸计算

2.2 InceptionV2

在这里插入图片描述

这里和V1比较，就是2，3，4通道都添加了1*1的卷积层，这是为了减少参数。

假设Previous layer输入的是64✖192✖56✖56 (batch_size,channel,width,height)
从左往右1，2，3，4通道最后需要输出的通道数是64，512，256，128，同时1*1的卷积输出通道为64
则V1的最后参数是 sum1 = 64✖192✖1✖1 + 512✖192✖3✖3 + 256✖192✖5✖5 + 128✖192✖1✖1 = 2150400
则V2的最后参数是 sum2 = 64✖192✖1✖1 + (64✖192✖1✖1 + 512✖64✖3✖3) + (64✖192✖1✖1 + 256✖64✖5✖5) + 128✖192✖1✖1 = 765952
比值：sum1/sum2 = 0.3561
附上CNN参数计算公式：CNN参数个数 = 卷积核尺寸×卷积核深度 × 卷积核组数 = 卷积核尺寸 × 输入特征矩阵深度 × 输出特征矩阵深度

从这个假设中可以看出，是有很明显的改善的.

3 辅助分类器（Auxiliary Classifier）

就是结构图中的右边部分，在训练模型时，将两个辅助分类器的损失乘以权重（论文中是0.3）加到网络的整体损失上，再进行反向传播。
辅助分类器的作用：

可以把他看做inception网络中的一个小细节，它确保了即便是隐藏单元和中间层也参与了特征计算，他们也能预测图片的类别，他在inception网络中起到一种调整的效果，并且能防止网络发生过拟合。
给定深度相对较大的网络，有效传播梯度反向通过所有层的能力是一个问题。通过将辅助分类器添加到这些中间层，可以期望较低阶段分类器的判别力。在训练期间，它们的损失以折扣权重（辅助分类器损失的权重是0.3）加到网络的整个损失上。
使用参数：
The exact structure of the extra network on the side, including the auxiliary classifier, is as follows:
An average pooling layer with 5×5 filter size and stride 3, resulting in an 4×4×512 output for the (4a), and 4×4×528 for the (4d) stage.
A 1×1 convolution with 128 filters for dimension reduction and rectified linear activation.
A fully connected layer with 1024 units and rectified linear activation.
A dropout layer with 70% ratio of dropped outputs.
A linear layer with softmax loss as the classifier (predicting the same 1000 classes as the main classifier, but removed at inference time).
*~~具体代码见最后！~~ *
后来从其他论文中得到，这个辅助分类器实际用途不大，就没实现，有需要的可以自己尝试一下！

4 模型参数

在这里插入图片描述
属性说明：
#1x1 ：对应着分支1上1x1的卷积核个数
#3x3reduce ：对应着分支2上1x1的卷积核个数
#3x3 ：对应着分支2上3x3的卷积核个数
#5x5reduce ：对应着分支3上1x1的卷积核个数
#5x5 ：对应着分支3上5x5的卷积核个数
poolproj：对应着分支4上1x1的卷积核个数。

5 源码分享

5.1 使用方法1

使用Kaggle平台，其免费提供GPU算计，同时数据集也提供，以下是使用情况： Kaggle搜索这个数据集，然后新建一个文本即可使用。

import torch
from torch import nn, optim 
import torch.nn.functional as F
import torchvision

# dataLoader
import os
from torch.utils.data import DataLoader
from torchvision import datasets, transforms, models
from torchvision.utils import make_grid

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# pred_list = os.listdir("../input/intel-image-classification/seg_pred/seg_pred")

# 前期准备工作 一下仅供参考
train_transforms = transforms.Compose([
    transforms.RandomRotation(10),
    transforms.RandomHorizontalFlip(),
    transforms.Resize(227),
    transforms.CenterCrop(227),
    transforms.ToTensor(),
    transforms.Normalize([0.485,0.456,0.406],
                         [0.229,0.224,0.225])
])

test_transforms = transforms.Compose([
    transforms.Resize([227,227]),
    transforms.ToTensor(),
    transforms.Normalize([0.485,0.456,0.406],
                         [0.229,0.224,0.225])
])

# 第一步 分别传入训练集和测试集
train_dataset = datasets.ImageFolder(root="./seg_train/seg_train", transform=train_transforms)
test_dataset = datasets.ImageFolder(root="./seg_test/seg_test" , transform = test_transforms)

# 第二步 不需要索引了， 直接使用数据加载器进行加载 
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=16, shuffle=True)

# Parameters
NUM_CLASSES = 6
EPOCH = 150
LR = 0.001

# inception block
class Inception(nn.Module):
    # c1, c2, c3, c4 为四条线路中的通道数量
    def __init__(self, in_c, c1, c2, c3, c4):
        super(Inception, self).__init__()
        # path1 with 1*1
        self.p1_1 = nn.Conv2d(in_c, c1, kernel_size=1)
        
        # path2 with 1*1 + 3*3
        self.p2_1 = nn.Conv2d(in_c, c2[0], kernel_size=1)
        self.p2_2 = nn.Conv2d(c2[0], c2[1], kernel_size=3, padding=1)
        
        # path3 with 1*1 + 5*5
        self.p3_1 = nn.Conv2d(in_c, c3[0], kernel_size=1)
        self.p3_2 = nn.Conv2d(c3[0], c3[1], kernel_size=5, padding=2)
        
        # path4 with 3*3Pool + 1*1
        self.p4_1 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
        self.p4_2 = nn.Conv2d(in_c, c4, kernel_size=1)
    
    def forward(self,x):
        p1 = F.relu(self.p1_1(x))
        p2 = F.relu(self.p2_2(F.relu(self.p2_1(x))))
        p3 = F.relu(self.p3_2(F.relu(self.p3_1(x))))
        p4 = F.relu(self.p4_2(self.p4_1(x)))
        return torch.cat((p1,p2,p3,p4), dim=1)  # (batch, channel, w,h) 在通道上链接

class GlobalAvgPool2d(nn.Module):
    def __init__(self):
        super(GlobalAvgPool2d, self).__init__()
    def forward(self, x):
        return F.avg_pool2d(x, kernel_size=x.size()[2:])
    
class FlattenLayer(nn.Module):  # 将其直接利用reshape转化为线性数据
    def __init__(self):
        super(FlattenLayer, self).__init__()
    def forward(self, x):
        return x.view(x.shape[0], -1)


class GoogleNet(nn.Module):
    def __init__(self ,num_classes=NUM_CLASSES):
        super(GoogleNet, self).__init__()
        self.b1 = nn.Sequential(nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
                                nn.ReLU(inplace=True),
                                # nn.BatchNorm2d(64),
                                nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )
        
        self.b2 = nn.Sequential(nn.Conv2d(64, 64, kernel_size=1),
                                nn.Conv2d(64, 192, kernel_size=3, padding=1),
                                nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )
        
        self.b3 = nn.Sequential(Inception(192, 64, (96,128), (16,32), 32),
                                Inception(256, 128, (128,192), (32,96), 64),
                                nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )
        

        
        self.b4 = nn.Sequential(Inception(480, 192, (96,208), (16,48), 64),
                               Inception(512, 160, (112, 224), (24, 64), 64),
                               Inception(512, 128, (128, 256), (24, 64), 64),
                               Inception(512, 112, (144, 288), (32, 64), 64),
                               Inception(528, 256, (160, 320), (32, 128), 128),
                               nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )
        

        
        self.b5 = nn.Sequential(Inception(832, 256, (160, 320), (32, 128), 128),
                               Inception(832, 384, (192, 384), (48, 128), 128),
                               GlobalAvgPool2d()
        )
        
        self.output = nn.Sequential(FlattenLayer(),
                                    nn.Dropout(p=0.4),
                                    nn.Linear(1024, NUM_CLASSES)
        )
        
    def forward(self, x):
        x = self.b1(x)
        x = self.b2(x)
        x = self.b3(x)
        x = self.b4(x)
        x = self.b5(x)
        x = self.output(x)
        return x

# Initializtion

device  = torch.device("cuda" if torch.cuda.is_available() else "cpu")

net = GoogleNet(NUM_CLASSES).to(device)

criterion = nn.CrossEntropyLoss().to(device)

optimizer = torch.optim.Adam(net.parameters(), lr=LR)

lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5)

# X = torch.rand(1, 3, 227, 227).to(device)

# for blk in net.children():
    # X  = blk(X)
    # print( "output shape:", X.shape)

def count_parameters(model):
    params = [p.numel() for p in model.parameters() if p.requires_grad]
#     for item in params:
#         print(f'{item:>8}')
    print(f'________\n{sum(params):>8}')
count_parameters(net)

import time
start_time = time.time()
train_losses = []
test_losses = []
train_acc = []
test_acc = []
for i in range(EPOCH):
    start_time = time.time()
    total_train_loss = 0
    total_train_acc = 0
    for idx, (x_train, y_train) in enumerate(train_loader):
        x_train, y_train = x_train.to(device), y_train.to(device)
        y_pred = net(x_train)
        
        loss = criterion(y_pred, y_train)
        total_train_loss += loss.item()
        acc = (y_pred.argmax(1) == y_train).sum()
        total_train_acc += acc

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    lr_scheduler.step()
        
    train_losses.append(total_train_loss)
    train_acc.append(total_train_acc/len(train_dataset))
    
    total_test_loss = total_test_acc = 0
    with torch.no_grad():
        for idx, (x_test, y_test) in enumerate(test_loader):
            x_test, y_test = x_test.to(device), y_test.to(device)
            
            y_pred = net(x_test)
            loss = criterion(y_pred, y_test)
            total_test_loss += loss.item()
            
            acc = (y_pred.argmax(1) == y_test).sum()
            total_test_acc += acc
            
        test_losses.append(total_test_loss)
        test_acc.append(total_test_acc/len(test_dataset))
    end_time = time.time()
    print(f"{i+1}/{EPOCH}, time：{end_time-start_time} \t train_loss:{total_train_loss}\t train_acc:{total_train_acc/len(train_dataset)}\t test_loss:{total_test_loss} \t test_acc:{total_test_acc/len(test_dataset)}")

5.2 使用方法2

Kaggle数据集下载
点击Download下载即可，然后复制以上代码，调整好输入路径，就可以跑起来了。

训练结果：
在这里插入图片描述

更新一下代码，之前代码中的lr_scheduler放错地方了，导致出现不收敛的问题，代码已更正，可放心食用。

Philo`

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
打赏
0
评论
GoogleNet重点介绍和源码

可以把他看做inception网络中的一个小细节，它确保了即便是隐藏单元和中间层也参与了特征计算，他们也能预测图片的类别，他在inception网络中起到一种调整的效果，并且能防止网络发生过拟合。就是结构图中的右边部分，在训练模型时，将两个辅助分类器的损失乘以权重（论文中是0.3）加到网络的整体损失上，再进行反向传播。从左往右1，2，3，4通道最后需要输出的通道数是64，512，256，128，同时1*1的卷积输出通道为64。这里和V1比较，就是2，3，4通道都添加了1*1的卷积层，这是为了减少参数。
复制链接

扫一扫