手撕深度学习：经典CNN网络结构理解汇总Lenet、Alexnet、VGG、Googlenet、Resnet

Nathan0921

于 2024-08-03 16:01:11 发布

阅读量1.4k

点赞数 35

文章标签：深度学习 cnn 人工智能

本文链接：https://blog.csdn.net/Nathan0921/article/details/140743162

版权

开篇
五种网络结构
项目结构代码实现
小结
一、开篇

也算是对卷积神经网络的一个小总结了，对网络的单一手撕感觉在2024年这个时间节点已经没有必要了，寻求统一的方式对过往经典卷积神经网络进行汇总说明是快速进入下一阶段内容学习的小技巧。因此，这次就完整的对五种经典卷积神经网络结构进行汇总。

二、五种网络结构

在这一章节，我们将详细介绍五种经典的卷积神经网络结构：LeNet、AlexNet、VGG、GoogLeNet和ResNet。这些网络在深度学习的发展过程中起到了至关重要的作用，它们的设计思想和创新点为后续的研究和应用提供了宝贵的经验和灵感。
1. LeNet

LeNet是由Yann LeCun等人在1998年提出的，主要用于手写数字识别（MNIST数据集）。LeNet是最早的卷积神经网络之一，其结构简单但有效，为后来的深度学习网络奠定了基础。

网络结构：
输入层：28x28x1的灰度图像
卷积层1：6个5x5的卷积核，步长为1，输出尺寸为28x28x6
池化层1：2x2的平均池化，步长为2，输出尺寸为14x14x6
卷积层2：16个5x5的卷积核，步长为1，输出尺寸为10x10x16
池化层2：2x2的平均池化，步长为2，输出尺寸为5x5x16
全连接层1：120个神经元
全连接层2：84个神经元
输出层：10个神经元（对应0-9的数字）
2. AlexNet

AlexNet是由Alex Krizhevsky等人在2012年提出的，它在ImageNet图像分类挑战赛中取得了突破性的成果，标志着深度学习在计算机视觉领域的广泛应用。

网络结构：
输入层：224x224x3的RGB图像
卷积层1：96个11x11的卷积核，步长为4，输出尺寸为55x55x96
池化层1：3x3的最大池化，步长为2，输出尺寸为27x27x96
卷积层2：256个5x5的卷积核，步长为1，输出尺寸为27x27x256
池化层2：3x3的最大池化，步长为2，输出尺寸为13x13x256
卷积层3：384个3x3的卷积核，步长为1，输出尺寸为13x13x384
卷积层4：384个3x3的卷积核，步长为1，输出尺寸为13x13x384
卷积层5：256个3x3的卷积核，步长为1，输出尺寸为13x13x256
池化层3：3x3的最大池化，步长为2，输出尺寸为6x6x256
全连接层1：4096个神经元
全连接层2：4096个神经元
输出层：1000个神经元（对应1000个类别）
输入层：224x224x3的RGB图像
卷积层1：2个3x3的卷积核，步长为1，输出尺寸为224x224x64
输入层：224x224x3的RGB图像
初始卷积层和池化层：7x7卷积，stride=2；3x3最大池化，stride=2
Inception模块：每个模块包含1x1卷积、3x3卷积、5x5卷积以及3x3最大池化，这些操作并行进行，最后将结果在通道维度上连接起来
全局平均池化：代替传统的全连接层，减少参数数量和过拟合风险
输出层：1000个神经元（对应1000个类别）
输入层：224x224x3的RGB图像
初始卷积层和池化层：7x7卷积，stride=2；3x3最大池化，stride=2
残差块：由多个BasicBlock或Bottleneck Block组成，通过恒等映射直接将输入加到输出上，解决深度网络的退化问题
Stage 1：64通道的残差块
Stage 2：128通道的残差块，步长2
Stage 3：256通道的残差块，步长2
Stage 4：512通道的残差块，步长2
全局平均池化：减少特征图的尺寸
输出层：1000个神经元（对应1000个类别）
池化层1：2x2的最大池化，步长为2，输出尺寸为112x112x64
卷积层2：2个3x3的卷积核，步长为1，输出尺寸为112x112x128
池化层2：2x2的最大池化，步长为2，输出尺寸为56x56x128
卷积层3：3个3x3的卷积核，步长为1，输出尺寸为56x56x256
池化层3：2x2的最大池化，步长为2，输出尺寸为28x28x256
卷积层4：3个3x3的卷积核，步长为1，输出尺寸为28x28x512
池化层4：2x2的最大池化，步长为2，输出尺寸为14x14x512
卷积层5：3个3x3的卷积核，步长为1，输出尺寸为14x14x512
池化层5：2x2的最大池化，步长为2，输出尺寸为7x7x512
全连接层1：4096个神经元
全连接层2：4096个神经元
输出层：1000个神经元（对应1000个类别）

AlexNet通过更深的网络结构和更大的卷积核，结合ReLU激活函数和Dropout正则化，显著提高了图像分类的性能。

3. VGG

VGG是由牛津大学视觉几何组提出的，其设计思想是通过堆叠较小的卷积核（3x3）来加深网络，以提取更细致的特征。VGG在ImageNet图像分类任务中取得了优异的成绩。

网络结构：

VGG的结构简单且统一，通过深层网络结构和小卷积核的组合，展现了强大的特征提取能力。

4. GoogLeNet (Inception)

GoogLeNet是由Google提出的，其主要创新在于引入了Inception模块，通过不同尺度的卷积核并行操作来提取多尺度特征。

网络结构：

GoogLeNet通过Inception模块有效地融合了不同尺度的特征，并通过全局平均池化减少了参数量。

5. ResNet

ResNet是由微软研究院提出的，其主要创新在于引入了残差块（Residual Block），通过恒等映射（Identity Mapping）解决了深度网络中的梯度消失问题，使得网络可以训练得更深。

网络结构：

ResNet通过残差块有效地解决了深度网络中的梯度消失和退化问题，使得网络可以达到更深的层次。

通过理解和掌握这五种经典的卷积神经网络结构，我们可以深入了解卷积神经网络的发展历程和设计思路，为后续的深度学习研究和应用奠定坚实的基础。

三、代码实现

在这一章节，我们将详细的给出五种模型的网络结构，以方便通用调用来实现模型训练。

1. LeNet

import torch
from torch import nn
import torch.nn.functional as F


class Lenet(nn.Module):
    def __init__(self, num_classes=10):
        super(Lenet, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, stride=1, padding=2)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, stride=1, padding=0)
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, num_classes)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        x = F.relu(self.conv2(x)) 
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

if __name__ == '__main__':
    X = torch.randn(1, 1, 28, 28)
    model = Lenet(num_classes=5)
    output = model(X)
    print(output.shape)

2. Alexnet

import torch 
from torch import nn
import torch.nn.functional as F

class Alexnet(nn.Module):
    def __init__(self,num_classes = 1000):
        super(Alexnet, self).__init__()
        # 计算公式： (input_size - kernel_size + 2 * padding) / stride + 1 = output_size
        # (224-11+2*2)/4 + 1 = 55
        self.features = nn.Sequential(
            # first conv2d
            # 输入 (1, 3, 224, 224)
            # 计算 （224-11 + 2*2）
            # 输出（1，55，55，96）
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            # second conv2d
            nn.Conv2d(96, 256 ,kernel_size=5, stride=1, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            # third conv2d
            nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            # four conv2d
            nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            # five conv2
            nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )

        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Linear(256*6*6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(4096,4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096,num_classes),
        )
    
    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x




if __name__ == '__main__':
    X = torch.randn(1, 3, 227, 227)
    model = Alexnet()
    output = model(X)
    print(output.shape)

3. VGG

import torch 
from torch import nn
import torch.nn.functional as F

# 构建字典存储不同VGG网络的配置列表
cfgs = {
    'vgg11' : [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M'],
    'vgg13' : [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'vgg16' : [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'vgg19' : [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}

# 结构化构建
def make_feature(cfg: list): # nn.features
    layers = []
    in_channels = 3
    for v in cfg:
        # print(v)
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            layers += [conv2d, nn.ReLU(True)]
            in_channels = v
    return nn.Sequential(*layers)

class VGG(nn.Module):
    def __init__(self, features, num_classses=1000):
        super(VGG, self).__init__()
        self.features = features
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Linear(512*7*7, 2048), 
            nn.ReLU(True),
            nn.Dropout(p=0.5),
            nn.Linear(2048,2048),
            nn.ReLU(True),
            nn.Linear(2048, num_classses)
        )
    
    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, start_dim=1)
        x = self.classifier(x)
        return x


def vgg(model_name='vgg16', num_classes=10):
    cfg = cfgs[model_name]
    model = VGG(make_feature(cfg), num_classses=num_classes)
    return model

if __name__ == '__main__':
    X = torch.randn(1, 3, 224, 224)
    model = vgg(model_name='vgg16', num_classes=10)
    output = model(X)
    print(output.shape)

4. Googlenet

import torch.nn as nn
import torch
import torch.nn.functional as F


class GoogLeNet(nn.Module):
    def __init__(self, num_classes=1000, aux_logits=True, init_weights=False):
        super(GoogLeNet, self).__init__()
        self.aux_logits = aux_logits

        self.conv1 = BasicConv2d(3, 64, kernel_size=7, stride=2, padding=3)
        self.maxpool1 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.conv2 = BasicConv2d(64, 64, kernel_size=1)
        self.conv3 = BasicConv2d(64, 192, kernel_size=3, padding=1)
        self.maxpool2 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.inception3a = Inception(192, 64, 96, 128, 16, 32, 32)
        self.inception3b = Inception(256, 128, 128, 192, 32, 96, 64)
        self.maxpool3 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.inception4a = Inception(480, 192, 96, 208, 16, 48, 64)
        self.inception4b = Inception(512, 160, 112, 224, 24, 64, 64)
        self.inception4c = Inception(512, 128, 128, 256, 24, 64, 64)
        self.inception4d = Inception(512, 112, 144, 288, 32, 64, 64)
        self.inception4e = Inception(528, 256, 160, 320, 32, 128, 128)
        self.maxpool4 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.inception5a = Inception(832, 256, 160, 320, 32, 128, 128)
        self.inception5b = Inception(832, 384, 192, 384, 48, 128, 128)

        if self.aux_logits:
            self.aux1 = InceptionAux(512, num_classes)
            self.aux2 = InceptionAux(528, num_classes)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(0.4)
        self.fc = nn.Linear(1024, num_classes)
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        # N x 3 x 224 x 224
        x = self.conv1(x)
        # N x 64 x 112 x 112
        x = self.maxpool1(x)
        # N x 64 x 56 x 56
        x = self.conv2(x)
        # N x 64 x 56 x 56
        x = self.conv3(x)
        # N x 192 x 56 x 56
        x = self.maxpool2(x)

        # N x 192 x 28 x 28
        x = self.inception3a(x)
        # N x 256 x 28 x 28
        x = self.inception3b(x)
        # N x 480 x 28 x 28
        x = self.maxpool3(x)
        # N x 480 x 14 x 14
        x = self.inception4a(x)
        # N x 512 x 14 x 14
        if self.training and self.aux_logits:    # eval model lose this layer
            aux1 = self.aux1(x)

        x = self.inception4b(x)
        # N x 512 x 14 x 14
        x = self.inception4c(x)
        # N x 512 x 14 x 14
        x = self.inception4d(x)
        # N x 528 x 14 x 14
        if self.training and self.aux_logits:    # eval model lose this layer
            aux2 = self.aux2(x)

        x = self.inception4e(x)
        # N x 832 x 14 x 14
        x = self.maxpool4(x)
        # N x 832 x 7 x 7
        x = self.inception5a(x)
        # N x 832 x 7 x 7
        x = self.inception5b(x)
        # N x 1024 x 7 x 7

        x = self.avgpool(x)
        # N x 1024 x 1 x 1
        x = torch.flatten(x, 1)
        # N x 1024
        x = self.dropout(x)
        x = self.fc(x)
        # N x 1000 (num_classes)
        if self.training and self.aux_logits:   # eval model lose this layer
            return x, aux2, aux1
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)


class Inception(nn.Module):
    def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
        super(Inception, self).__init__()

        self.branch1 = BasicConv2d(in_channels, ch1x1, kernel_size=1)

        self.branch2 = nn.Sequential(
            BasicConv2d(in_channels, ch3x3red, kernel_size=1),
            BasicConv2d(ch3x3red, ch3x3, kernel_size=3, padding=1)   # 保证输出大小等于输入大小
        )

        self.branch3 = nn.Sequential(
            BasicConv2d(in_channels, ch5x5red, kernel_size=1),
            # 在官方的实现中，其实是3x3的kernel并不是5x5，这里我也懒得改了，具体可以参考下面的issue
            # Please see https://github.com/pytorch/vision/issues/906 for details.
            BasicConv2d(ch5x5red, ch5x5, kernel_size=5, padding=2)   # 保证输出大小等于输入大小
        )

        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            BasicConv2d(in_channels, pool_proj, kernel_size=1)
        )

    def forward(self, x):
        branch1 = self.branch1(x)
        branch2 = self.branch2(x)
        branch3 = self.branch3(x)
        branch4 = self.branch4(x)

        outputs = [branch1, branch2, branch3, branch4]
        return torch.cat(outputs, 1)


class InceptionAux(nn.Module):
    def __init__(self, in_channels, num_classes):
        super(InceptionAux, self).__init__()
        self.averagePool = nn.AvgPool2d(kernel_size=5, stride=3)
        self.conv = BasicConv2d(in_channels, 128, kernel_size=1)  # output[batch, 128, 4, 4]

        self.fc1 = nn.Linear(2048, 1024)
        self.fc2 = nn.Linear(1024, num_classes)

    def forward(self, x):
        # aux1: N x 512 x 14 x 14, aux2: N x 528 x 14 x 14
        x = self.averagePool(x)
        # aux1: N x 512 x 4 x 4, aux2: N x 528 x 4 x 4
        x = self.conv(x)
        # N x 128 x 4 x 4
        x = torch.flatten(x, 1)
        x = F.dropout(x, 0.5, training=self.training)
        # N x 2048
        x = F.relu(self.fc1(x), inplace=True)
        x = F.dropout(x, 0.5, training=self.training)
        # N x 1024
        x = self.fc2(x)
        # N x num_classes
        return x


class BasicConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(BasicConv2d, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, **kwargs)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.conv(x)
        x = self.relu(x)
        return x


if __name__ == '__main__':
    X = torch.randn(1, 3, 224, 224)
    model = GoogLeNet(num_classes=10)
    output = model(X)
    print(output.shape)

5. Resnet

import torch
import torch.nn as nn
import torch.nn.functional as F

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = F.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = F.relu(out)

        return out


class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=1000):
        super(ResNet, self).__init__()
        self.in_channels = 64
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)

    def _make_layer(self, block, out_channels, blocks, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != out_channels * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels * block.expansion),
            )

        layers = []
        layers.append(block(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.in_channels, out_channels))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)

        return x

def resnet18(num_classes=1000):
    return ResNet(BasicBlock, [2, 2, 2, 2], num_classes)

# Example usage:
model = resnet18()
print(model)