第J1周：ResNet-50算法实战与解析

最新推荐文章于 2025-03-11 01:49:57 发布

DLearner

最新推荐文章于 2025-03-11 01:49:57 发布

阅读量3.6k

点赞数 51

分类专栏：机器学习365学习文章标签：算法 pytorch 人工智能机器学习 python

本文链接：https://blog.csdn.net/DLearner/article/details/144612615

版权

机器学习365学习专栏收录该内容

17 篇文章

订阅专栏

FROM

🍨 本文为🔗365天深度学习训练营中的学习记录博客
🍖 原作者：K同学啊

我的环境

语言环境：Python 3.11.9
开发工具：Jupyter Lab
深度学习环境：
- torch==2.3.1+cu121
- torchvision==0.18.1+cu121

一、本周内容及个人收获

1. 本周内容

本周学习内容包括：

CNN 算法发展
残差网络的由来
ResNet-50 介绍
手动构建 ResNet-50 网络模型

2. 个人收获

2.1 ResNet

ResNet（残差网络）是一种深度学习架构，由微软研究院的Kaiming He等人在2015年提出，主要解决深度神经网络训练中的退化问题。ResNet的核心思想是引入“残差学习”，即网络的每个层学习的是输入与输出之间的残差（差异），而不是直接学习输出。这种设计使得网络能够通过简单地堆叠更多的层来增加深度，而不会降低训练性能，反而能够提高性能。
1. 残差块（Residual Block）
ResNet的基本构建单元是残差块，如下图所示，每个残差块包含输入和输出之间的一条捷径（shortcut connection）或恒等连接（identity shortcut）。这种结构允许梯度在网络中直接传播，从而缓解梯度消失问题。
2. 恒等映射（Identity Mapping）
在残差块中，如果输入和输出的维度相同，输入可以直接通过捷径连接添加到输出上，即 F(x)+x 这种结构使得网络即使在增加深度时也能保持性能不会下降，解决了退化问题。
在这里插入图片描述
3. 维度匹配（Dimension Matching）
当输入和输出的维度不匹配时，通过1x1的卷积进行降维或升维，以确保输入和输出可以通过捷径连接相加，如下图(b)。

4. Identity Block 恒等块
Identity Block是ResNet中的一种基本构建块，其特点是输入和输出具有相同的维度。这种块的设计允许网络在增加深度时不会降低性能，因为它提供了一个直接的路径（shortcut connection），使得输入可以直接传递到输出，从而避免了梯度消失的问题。

Input ──────────────────────────────────┐
  │                                     │
  ↓                                     │
1x1 Conv (filters1)                     │
  │                                     │
  ↓                                     │
BatchNorm + ReLU                        │
  │                                     │
  ↓                                     │
3x3 Conv (filters2)                     │
  │                                     │
  ↓                                     │
BatchNorm + ReLU                        │
  │                                     │
  ↓                                     │
1x1 Conv (filters3)                     │
  │                                     │
  ↓                                     │
BatchNorm                               │
  │                                     │
  ↓                                     ↓
  └─────────────── + ──────────────────┘
                    │
                    ↓
                   ReLU

Identity Block的结构通常包括：

1x1卷积层：用于降低或增加通道数，通常用于降维。
3x3卷积层：用于提取特征，保持通道数不变。
1x1卷积层：再次用于调整通道数，使其与输入匹配。
Batch Normalization（批量归一化）：在每个卷积层后应用，以加速训练过程并提高性能。
Activation（激活函数）：通常是ReLU，引入非线性。

如果输入和输出的维度相同，Identity Block的shortcut connection直接将输入添加到输出上。如果维度不同，Identity Block不包含额外的卷积层来调整维度，因为输入和输出的维度已经匹配。

5. Convolutional Block（卷积块）
Convolutional Block与Identity Block的主要区别在于，它在shortcut connection中包含一个1x1的卷积层，用于调整输入的维度以匹配输出。这种块通常用于网络中需要改变特征图维度的地方，例如在不同的stage之间。

Input ─────────────────────────────────────┐
  │                                        │
  ↓                                        ↓
1x1 Conv (filters1, stride=2)     1x1 Conv (filters3, stride=2)
  │                                        │
  ↓                                        │
BatchNorm + ReLU                           │
  │                                        │
  ↓                                        │
3x3 Conv (filters2)                        │
  │                                        │
  ↓                                        │
BatchNorm + ReLU                           │
  │                                        │
  ↓                                        │
1x1 Conv (filters3)                        │
  │                                        │
  ↓                                        │
BatchNorm                            BatchNorm
  │                                        │
  ↓                                        ↓
  └──────────────── + ──────────────────┘
                     │
                     ↓
                    ReLU

Convolutional Block的结构通常包括：

1x1卷积层：用于降维，同时应用stride来下采样。
3x3卷积层：用于提取特征。
1x1卷积层：用于升维，以匹配shortcut connection的输出维度。
Batch Normalization：在每个卷积层后应用。
Activation：通常是ReLU。

在Convolutional Block中，由于输入和输出的维度可能不同，因此需要一个额外的1x1卷积层来调整shortcut connection的维度，以便可以将输入添加到输出上。

2.2 ResNet50中的"50"

ResNet50中的"50"代表网络的深度，即网络中包含50个卷积层
由多个残差块堆叠而成
5个阶段的卷积块组成

输入图像(224x224x3)
      ↓
7x7卷积, 64, /2
      ↓
3x3最大池化, /2
      ↓
[1x1卷积, 64   ]
[3x3卷积, 64   ] x 3
[1x1卷积, 256  ]
      ↓
[1x1卷积, 128  ]
[3x3卷积, 128  ] x 4
[1x1卷积, 512  ]
      ↓
[1x1卷积, 256  ]
[3x3卷积, 256  ] x 6
[1x1卷积, 1024 ]
      ↓
[1x1卷积, 512  ]
[3x3卷积, 512  ] x 3
[1x1卷积, 2048 ]
      ↓
全局平均池化
      ↓
1000-d 全连接层
      ↓
Softmax

二、代码运行及截图

1. 数据处理及可视化

import os,PIL,random,pathlib
import matplotlib.pyplot as plt

plt.rcParams['axes.unicode_minus'] = False  # 用来正常显示负号

data_dir = './bird_photos/'
data_dir = pathlib.Path(data_dir)

data_paths  = list(data_dir.glob('*'))
classeNames = [str(path).split("/")[1] for path in data_paths]
print(classeNames)


image_count = len(list(data_dir.glob('*/*')))
print("图片总数为：", image_count)

output:
在这里插入图片描述

train_transforms = transforms.Compose([
    transforms.Resize([224, 224]),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

test_transforms = transforms.Compose([
    transforms.Resize([224, 224]),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

total_data = datasets.ImageFolder("bird_photos/", transform=train_transforms)
total_data

output:

Dataset ImageFolder
    Number of datapoints: 568
    Root location: bird_photos/
    StandardTransform
Transform: Compose(
               Resize(size=[224, 224], interpolation=bilinear, max_size=None, antialias=warn)
               ToTensor()
               Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
           )

train_size = int(0.8 * len(total_data))
test_size = len(total_data) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(total_data, [train_size, test_size])
batch_size = 32

train_loader = DataLoader(
    train_dataset,
    batch_size=batch_size,
    shuffle=True,  # 对应 TensorFlow 的 shuffle
    num_workers=4,  # 多进程加载数据，类似 AUTOTUNE
    pin_memory=True,  # 将数据直接加载到 GPU 缓存，加速数据传输
    prefetch_factor=2,  # 预加载的批次数
    persistent_workers=True  # 保持工作进程存活，减少重新创建的开销
)

test_loader = DataLoader(
    test_dataset,
    batch_size=batch_size,
    shuffle=False,  # 验证集不需要打乱
    num_workers=4,
    pin_memory=True,
    prefetch_factor=2,
    persistent_workers=True
)


for X, y in test_loader:
    print("Shape of X [N, C, H, W]: ", X.shape)
    print("Shape of y: ", y.shape, y.dtype)
    break

import matplotlib.pyplot as plt

plt.figure(figsize=(10,5))  # 图形的宽为10高为5
# 从 DataLoader 中获取一个批次的数据
images, labels = next(iter(train_loader))  # PyTorch 使用 next(iter()) 替代 TensorFlow 的 take(1)

mean = torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1)
std = torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1)

for i in range(8):
    ax = plt.subplot(2, 4, i + 1)

    img = images[i] * std + mean
    img = img.permute(1, 2, 0)
    img = (img * 255).clamp(0, 255).numpy().astype('uint8')
    
    plt.imshow(img)
    plt.title(classeNames[labels[i]])
    
    plt.axis("off")
plt.show()

output:
在这里插入图片描述

2. ResNet-50 网络 torch 构建

import torch
import torch.nn as nn
import torchsummary as summary
import torchvision.models as models

def identity_block(input_tensor, kernel_size, filters, stage, block):
    """
    构建残差网络的恒等映射块
    Args:
        input_tensor: 输入张量
        kernel_size: 卷积核大小
        filters: [f1, f2, f3] 形式的过滤器数量列表
        stage: 阶段编号
        block: 块编号
    """
    filters1, filters2, filters3 = filters
    name_base = f'{stage}{block}_identity_block_'
    
    # 第一个 1x1 卷积层
    x = nn.Conv2d(input_tensor.size(1), filters1, 1, bias=False)(input_tensor)
    x = nn.BatchNorm2d(filters1)(x)
    x = nn.ReLU(inplace=True)(x)
    
    # 3x3 卷积层
    x = nn.Conv2d(filters1, filters2, kernel_size, padding=kernel_size//2, bias=False)(x)
    x = nn.BatchNorm2d(filters2)(x)
    x = nn.ReLU(inplace=True)(x)
    
    # 第二个 1x1 卷积层
    x = nn.Conv2d(filters2, filters3, 1, bias=False)(x)
    x = nn.BatchNorm2d(filters3)(x)
    
    # 添加跳跃连接
    x = x + input_tensor
    x = nn.ReLU(inplace=True)(x)
    
    return x

def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2,2)):
    """
    构建残差网络的卷积块
    Args:
        input_tensor: 输入张量
        kernel_size: 卷积核大小
        filters: [f1, f2, f3] 形式的过滤器数量列表
        stage: 阶段编号
        block: 块编号
        strides: 步长元组
    """
    filters1, filters2, filters3 = filters
    name_base = f'{stage}{block}_conv_block_'
    
    # 主路径
    x = nn.Conv2d(input_tensor.size(1), filters1, 1, stride=strides, bias=False)(input_tensor)
    x = nn.BatchNorm2d(filters1)(x)
    x = nn.ReLU(inplace=True)(x)
    
    x = nn.Conv2d(filters1, filters2, kernel_size, padding=kernel_size//2, bias=False)(x)
    x = nn.BatchNorm2d(filters2)(x)
    x = nn.ReLU(inplace=True)(x)
    
    x = nn.Conv2d(filters2, filters3, 1, bias=False)(x)
    x = nn.BatchNorm2d(filters3)(x)
    
    # shortcut 路径
    shortcut = nn.Conv2d(input_tensor.size(1), filters3, 1, stride=strides, bias=False)(input_tensor)
    shortcut = nn.BatchNorm2d(filters3)(shortcut)
    
    # 添加跳跃连接
    x = x + shortcut
    x = nn.ReLU(inplace=True)(x)
    
    return x

def ResNet50(input_shape=[224,224,3], num_classes=1000):
    """
    构建 ResNet50 模型
    Args:
        input_shape: 输入图像的形状 [H, W, C]
        num_classes: 分类类别数
    """
    # 输入层
    inputs = torch.randn(1, input_shape[2], input_shape[0], input_shape[1])
    
    # 初始卷积块 - 修改 ZeroPadding2d 为 pad 操作
    x = nn.functional.pad(inputs, (3, 3, 3, 3))  # 替换 ZeroPadding2d
    x = nn.Conv2d(input_shape[2], 64, 7, stride=2, bias=False)(x)
    x = nn.BatchNorm2d(64)(x)
    x = nn.ReLU(inplace=True)(x)
    x = nn.MaxPool2d(3, stride=2, padding=1)(x)
    
    # Stage 2
    x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1,1))
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')
    
    # Stage 3
    x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')
    
    # Stage 4
    x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')
    for block in ['b', 'c', 'd', 'e', 'f']:
        x = identity_block(x, 3, [256, 256, 1024], stage=4, block=block)
    
    # Stage 5
    x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a')
    x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b')
    x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c')
    
    # 分类层
    x = nn.AdaptiveAvgPool2d((1, 1))(x)
    x = torch.flatten(x, 1)
    x = nn.Linear(2048, num_classes)(x)
    
    # 修改模型创建和前向传播的方式
    class ResNet(nn.Module):
        def __init__(self):
            super(ResNet, self).__init__()
            # 在这里定义所有层
            
        def forward(self, x):
            # 定义前向传播
            return x

    model = ResNet()
    # 移除 load_weights，改用 PyTorch 的加载方式
    model.load_state_dict(torch.load("resnet50_pretrained.pth"))
    return model

model = models.resnet50().to(device)
model

output:

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (layer2): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (3): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (layer3): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (3): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (4): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (5): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (layer4): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=2048, out_features=1000, bias=True)
)

torchsummary.summary(model, (3, 224, 224))

output:

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]           4,096
       BatchNorm2d-6           [-1, 64, 56, 56]             128
              ReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
             ReLU-10           [-1, 64, 56, 56]               0
----------------------------------------------------------------
          Conv2d-166            [-1, 512, 7, 7]       2,359,296
     BatchNorm2d-167            [-1, 512, 7, 7]           1,024
            ReLU-168            [-1, 512, 7, 7]               0
          Conv2d-169           [-1, 2048, 7, 7]       1,048,576
     BatchNorm2d-170           [-1, 2048, 7, 7]           4,096
            ReLU-171           [-1, 2048, 7, 7]               0
      Bottleneck-172           [-1, 2048, 7, 7]               0
AdaptiveAvgPool2d-173           [-1, 2048, 1, 1]               0
          Linear-174                 [-1, 1000]       2,049,000
================================================================
Total params: 25,557,032
Trainable params: 25,557,032
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 286.56
Params size (MB): 97.49
Estimated Total Size (MB): 384.62
----------------------------------------------------------------

3. 训练及可视化

# 训练循环
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)  
    num_batches = len(dataloader)   

    train_loss, train_acc = 0, 0  # 初始化训练损失和正确率
    
    for X, y in dataloader:  # 获取图片及其标签
        X, y = X.to(device), y.to(device)
        
        # 计算预测误差
        pred = model(X)          
        loss = loss_fn(pred, y) 
        
        # 反向传播
        optimizer.zero_grad() 
        loss.backward()       
        optimizer.step()       #
        
        # 记录acc与loss
        train_acc  += (pred.argmax(1) == y).type(torch.float).sum().item()
        train_loss += loss.item()
            
    train_acc  /= size
    train_loss /= num_batches

    return train_acc, train_loss

def test (dataloader, model, loss_fn):
    size        = len(dataloader.dataset)  
    num_batches = len(dataloader)          
    test_loss, test_acc = 0, 0
    

    with torch.no_grad():
        for imgs, target in dataloader:
            imgs, target = imgs.to(device), target.to(device)
            
            # 计算loss
            target_pred = model(imgs)
            loss        = loss_fn(target_pred, target)
            
            test_loss += loss.item()
            test_acc  += (target_pred.argmax(1) == target).type(torch.float).sum().item()

    test_acc  /= size
    test_loss /= num_batches

    return test_acc, test_loss

epochs     = 20
train_loss = []
train_acc  = []
test_loss  = []
test_acc   = []

for epoch in range(epochs):
    model.train()
    epoch_train_acc, epoch_train_loss = train(train_loader, model, loss_fn, opt)
    
    model.eval()
    epoch_test_acc, epoch_test_loss = test(test_loader, model, loss_fn)
    
    train_acc.append(epoch_train_acc)
    train_loss.append(epoch_train_loss)
    test_acc.append(epoch_test_acc)
    test_loss.append(epoch_test_loss)
    
    template = ('Epoch:{:2d}, Train_acc:{:.1f}%, Train_loss:{:.3f}, Test_acc:{:.1f}%，Test_loss:{:.3f}')
    print(template.format(epoch+1, epoch_train_acc*100, epoch_train_loss, epoch_test_acc*100, epoch_test_loss))
print('Done')

output:
在这里插入图片描述

import matplotlib.pyplot as plt

plt.rcParams['axes.unicode_minus'] = False      # 用来正常显示负号
plt.rcParams['figure.dpi']         = 100        #分辨率

epochs_range = range(epochs)

plt.figure(figsize=(12, 3))
plt.subplot(1, 2, 1)

plt.plot(epochs_range, train_acc, label='Training Accuracy')
plt.plot(epochs_range, test_acc, label='Test Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_loss, label='Training Loss')
plt.plot(epochs_range, test_loss, label='Test Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()