
  • 开篇
  • 五种网络结构
  • 项目结构代码实现
  • 小结





  • 1. LeNet

    LeNet是由Yann LeCun等人在1998年提出的,主要用于手写数字识别(MNIST数据集)。LeNet是最早的卷积神经网络之一,其结构简单但有效,为后来的深度学习网络奠定了基础。


  • 输入层:28x28x1的灰度图像
  • 卷积层1:6个5x5的卷积核,步长为1,输出尺寸为28x28x6
  • 池化层1:2x2的平均池化,步长为2,输出尺寸为14x14x6
  • 卷积层2:16个5x5的卷积核,步长为1,输出尺寸为10x10x16
  • 池化层2:2x2的平均池化,步长为2,输出尺寸为5x5x16
  • 全连接层1:120个神经元
  • 全连接层2:84个神经元
  • 输出层:10个神经元(对应0-9的数字)
    2. AlexNet

    AlexNet是由Alex Krizhevsky等人在2012年提出的,它在ImageNet图像分类挑战赛中取得了突破性的成果,标志着深度学习在计算机视觉领域的广泛应用。


  • 输入层:224x224x3的RGB图像
  • 卷积层1:96个11x11的卷积核,步长为4,输出尺寸为55x55x96
  • 池化层1:3x3的最大池化,步长为2,输出尺寸为27x27x96
  • 卷积层2:256个5x5的卷积核,步长为1,输出尺寸为27x27x256
  • 池化层2:3x3的最大池化,步长为2,输出尺寸为13x13x256
  • 卷积层3:384个3x3的卷积核,步长为1,输出尺寸为13x13x384
  • 卷积层4:384个3x3的卷积核,步长为1,输出尺寸为13x13x384
  • 卷积层5:256个3x3的卷积核,步长为1,输出尺寸为13x13x256
  • 池化层3:3x3的最大池化,步长为2,输出尺寸为6x6x256
  • 全连接层1:4096个神经元
  • 全连接层2:4096个神经元
  • 输出层:1000个神经元(对应1000个类别)
  • 输入层:224x224x3的RGB图像
  • 卷积层1:2个3x3的卷积核,步长为1,输出尺寸为224x224x64
  • 输入层:224x224x3的RGB图像
  • 初始卷积层和池化层:7x7卷积,stride=2;3x3最大池化,stride=2
  • Inception模块:每个模块包含1x1卷积、3x3卷积、5x5卷积以及3x3最大池化,这些操作并行进行,最后将结果在通道维度上连接起来
  • 全局平均池化:代替传统的全连接层,减少参数数量和过拟合风险
  • 输出层:1000个神经元(对应1000个类别)
  • 输入层:224x224x3的RGB图像
  • 初始卷积层和池化层:7x7卷积,stride=2;3x3最大池化,stride=2
  • 残差块:由多个BasicBlock或Bottleneck Block组成,通过恒等映射直接将输入加到输出上,解决深度网络的退化问题
  • Stage 1:64通道的残差块
  • Stage 2:128通道的残差块,步长2
  • Stage 3:256通道的残差块,步长2
  • Stage 4:512通道的残差块,步长2
  • 全局平均池化:减少特征图的尺寸
  • 输出层:1000个神经元(对应1000个类别)
  • 池化层1:2x2的最大池化,步长为2,输出尺寸为112x112x64
  • 卷积层2:2个3x3的卷积核,步长为1,输出尺寸为112x112x128
  • 池化层2:2x2的最大池化,步长为2,输出尺寸为56x56x128
  • 卷积层3:3个3x3的卷积核,步长为1,输出尺寸为56x56x256
  • 池化层3:2x2的最大池化,步长为2,输出尺寸为28x28x256
  • 卷积层4:3个3x3的卷积核,步长为1,输出尺寸为28x28x512
  • 池化层4:2x2的最大池化,步长为2,输出尺寸为14x14x512
  • 卷积层5:3个3x3的卷积核,步长为1,输出尺寸为14x14x512
  • 池化层5:2x2的最大池化,步长为2,输出尺寸为7x7x512
  • 全连接层1:4096个神经元
  • 全连接层2:4096个神经元
  • 输出层:1000个神经元(对应1000个类别)


3. VGG




4. GoogLeNet (Inception)




5. ResNet

ResNet是由微软研究院提出的,其主要创新在于引入了残差块(Residual Block),通过恒等映射(Identity Mapping)解决了深度网络中的梯度消失问题,使得网络可以训练得更深。






1. LeNet
import torch
from torch import nn
import torch.nn.functional as F

class Lenet(nn.Module):
    def __init__(self, num_classes=10):
        super(Lenet, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, stride=1, padding=2)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, stride=1, padding=0)
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, num_classes)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        x = F.relu(self.conv2(x)) 
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

if __name__ == '__main__':
    X = torch.randn(1, 1, 28, 28)
    model = Lenet(num_classes=5)
    output = model(X)
2. Alexnet
import torch 
from torch import nn
import torch.nn.functional as F

class Alexnet(nn.Module):
    def __init__(self,num_classes = 1000):
        super(Alexnet, self).__init__()
        # 计算公式: (input_size - kernel_size + 2 * padding) / stride + 1 = output_size
        # (224-11+2*2)/4 + 1 = 55
        self.features = nn.Sequential(
            # first conv2d
            # 输入 (1, 3, 224, 224)
            # 计算 (224-11 + 2*2)
            # 输出(1,55,55,96)
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=2),
            nn.MaxPool2d(kernel_size=3, stride=2),
            # second conv2d
            nn.Conv2d(96, 256 ,kernel_size=5, stride=1, padding=2),
            nn.MaxPool2d(kernel_size=3, stride=2),
            # third conv2d
            nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
            # four conv2d
            nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1),
            # five conv2
            nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(kernel_size=3, stride=2),

        self.classifier = nn.Sequential(
            nn.Linear(256*6*6, 4096),
    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

if __name__ == '__main__':
    X = torch.randn(1, 3, 227, 227)
    model = Alexnet()
    output = model(X)
3. VGG
import torch 
from torch import nn
import torch.nn.functional as F

# 构建字典存储不同VGG网络的配置列表
cfgs = {
    'vgg11' : [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M'],
    'vgg13' : [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'vgg16' : [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'vgg19' : [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],

# 结构化构建
def make_feature(cfg: list): # nn.features
    layers = []
    in_channels = 3
    for v in cfg:
        # print(v)
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            layers += [conv2d, nn.ReLU(True)]
            in_channels = v
    return nn.Sequential(*layers)

class VGG(nn.Module):
    def __init__(self, features, num_classses=1000):
        super(VGG, self).__init__()
        self.features = features
        self.classifier = nn.Sequential(
            nn.Linear(512*7*7, 2048), 
            nn.Linear(2048, num_classses)
    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, start_dim=1)
        x = self.classifier(x)
        return x

def vgg(model_name='vgg16', num_classes=10):
    cfg = cfgs[model_name]
    model = VGG(make_feature(cfg), num_classses=num_classes)
    return model

if __name__ == '__main__':
    X = torch.randn(1, 3, 224, 224)
    model = vgg(model_name='vgg16', num_classes=10)
    output = model(X)
4. Googlenet
import torch.nn as nn
import torch
import torch.nn.functional as F

class GoogLeNet(nn.Module):
    def __init__(self, num_classes=1000, aux_logits=True, init_weights=False):
        super(GoogLeNet, self).__init__()
        self.aux_logits = aux_logits

        self.conv1 = BasicConv2d(3, 64, kernel_size=7, stride=2, padding=3)
        self.maxpool1 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.conv2 = BasicConv2d(64, 64, kernel_size=1)
        self.conv3 = BasicConv2d(64, 192, kernel_size=3, padding=1)
        self.maxpool2 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.inception3a = Inception(192, 64, 96, 128, 16, 32, 32)
        self.inception3b = Inception(256, 128, 128, 192, 32, 96, 64)
        self.maxpool3 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.inception4a = Inception(480, 192, 96, 208, 16, 48, 64)
        self.inception4b = Inception(512, 160, 112, 224, 24, 64, 64)
        self.inception4c = Inception(512, 128, 128, 256, 24, 64, 64)
        self.inception4d = Inception(512, 112, 144, 288, 32, 64, 64)
        self.inception4e = Inception(528, 256, 160, 320, 32, 128, 128)
        self.maxpool4 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.inception5a = Inception(832, 256, 160, 320, 32, 128, 128)
        self.inception5b = Inception(832, 384, 192, 384, 48, 128, 128)

        if self.aux_logits:
            self.aux1 = InceptionAux(512, num_classes)
            self.aux2 = InceptionAux(528, num_classes)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(0.4)
        self.fc = nn.Linear(1024, num_classes)
        if init_weights:

    def forward(self, x):
        # N x 3 x 224 x 224
        x = self.conv1(x)
        # N x 64 x 112 x 112
        x = self.maxpool1(x)
        # N x 64 x 56 x 56
        x = self.conv2(x)
        # N x 64 x 56 x 56
        x = self.conv3(x)
        # N x 192 x 56 x 56
        x = self.maxpool2(x)

        # N x 192 x 28 x 28
        x = self.inception3a(x)
        # N x 256 x 28 x 28
        x = self.inception3b(x)
        # N x 480 x 28 x 28
        x = self.maxpool3(x)
        # N x 480 x 14 x 14
        x = self.inception4a(x)
        # N x 512 x 14 x 14
        if self.training and self.aux_logits:    # eval model lose this layer
            aux1 = self.aux1(x)

        x = self.inception4b(x)
        # N x 512 x 14 x 14
        x = self.inception4c(x)
        # N x 512 x 14 x 14
        x = self.inception4d(x)
        # N x 528 x 14 x 14
        if self.training and self.aux_logits:    # eval model lose this layer
            aux2 = self.aux2(x)

        x = self.inception4e(x)
        # N x 832 x 14 x 14
        x = self.maxpool4(x)
        # N x 832 x 7 x 7
        x = self.inception5a(x)
        # N x 832 x 7 x 7
        x = self.inception5b(x)
        # N x 1024 x 7 x 7

        x = self.avgpool(x)
        # N x 1024 x 1 x 1
        x = torch.flatten(x, 1)
        # N x 1024
        x = self.dropout(x)
        x = self.fc(x)
        # N x 1000 (num_classes)
        if self.training and self.aux_logits:   # eval model lose this layer
            return x, aux2, aux1
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

class Inception(nn.Module):
    def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
        super(Inception, self).__init__()

        self.branch1 = BasicConv2d(in_channels, ch1x1, kernel_size=1)

        self.branch2 = nn.Sequential(
            BasicConv2d(in_channels, ch3x3red, kernel_size=1),
            BasicConv2d(ch3x3red, ch3x3, kernel_size=3, padding=1)   # 保证输出大小等于输入大小

        self.branch3 = nn.Sequential(
            BasicConv2d(in_channels, ch5x5red, kernel_size=1),
            # 在官方的实现中,其实是3x3的kernel并不是5x5,这里我也懒得改了,具体可以参考下面的issue
            # Please see https://github.com/pytorch/vision/issues/906 for details.
            BasicConv2d(ch5x5red, ch5x5, kernel_size=5, padding=2)   # 保证输出大小等于输入大小

        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            BasicConv2d(in_channels, pool_proj, kernel_size=1)

    def forward(self, x):
        branch1 = self.branch1(x)
        branch2 = self.branch2(x)
        branch3 = self.branch3(x)
        branch4 = self.branch4(x)

        outputs = [branch1, branch2, branch3, branch4]
        return torch.cat(outputs, 1)

class InceptionAux(nn.Module):
    def __init__(self, in_channels, num_classes):
        super(InceptionAux, self).__init__()
        self.averagePool = nn.AvgPool2d(kernel_size=5, stride=3)
        self.conv = BasicConv2d(in_channels, 128, kernel_size=1)  # output[batch, 128, 4, 4]

        self.fc1 = nn.Linear(2048, 1024)
        self.fc2 = nn.Linear(1024, num_classes)

    def forward(self, x):
        # aux1: N x 512 x 14 x 14, aux2: N x 528 x 14 x 14
        x = self.averagePool(x)
        # aux1: N x 512 x 4 x 4, aux2: N x 528 x 4 x 4
        x = self.conv(x)
        # N x 128 x 4 x 4
        x = torch.flatten(x, 1)
        x = F.dropout(x, 0.5, training=self.training)
        # N x 2048
        x = F.relu(self.fc1(x), inplace=True)
        x = F.dropout(x, 0.5, training=self.training)
        # N x 1024
        x = self.fc2(x)
        # N x num_classes
        return x

class BasicConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(BasicConv2d, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, **kwargs)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.conv(x)
        x = self.relu(x)
        return x

if __name__ == '__main__':
    X = torch.randn(1, 3, 224, 224)
    model = GoogLeNet(num_classes=10)
    output = model(X)
5. Resnet
import torch
import torch.nn as nn
import torch.nn.functional as F

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = F.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = F.relu(out)

        return out

class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=1000):
        super(ResNet, self).__init__()
        self.in_channels = 64
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)

    def _make_layer(self, block, out_channels, blocks, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != out_channels * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels * block.expansion),

        layers = []
        layers.append(block(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.in_channels, out_channels))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)

        return x

def resnet18(num_classes=1000):
    return ResNet(BasicBlock, [2, 2, 2, 2], num_classes)

# Example usage:
model = resnet18()





