定义ResNet3d的网络类，以及定义不同层数网络的配置

鱼儿会飞吗

已于 2024-04-18 10:33:15 修改

阅读量873

点赞数 24

文章标签：笔记人工智能 pytorch 深度学习

于 2024-04-18 10:29:53 首次发布

本文链接：https://blog.csdn.net/qq_34425255/article/details/137910932

版权

这篇文章介绍了ResNet3dPyTorch模块，它是3D卷积神经网络的骨干，支持不同深度（如18、34、50、101和152）和自定义阶段结构。它提供了灵活的配置选项，包括基础块类型、预训练、输入通道数等。

摘要由CSDN通过智能技术生成

@BACKBONES.register_module()
class ResNet3d(nn.Module):
    """ResNet 3d backbone.

    Args:
        depth (int): Depth of resnet, from {18, 34, 50, 101, 152}. Default: 50.
        pretrained (str | None): Name of pretrained model.
        stage_blocks (tuple | None): Set number of stages for each res layer. Default: None.
        pretrained2d (bool): Whether to load pretrained 2D model. Default: True.
        in_channels (int): Channel num of input features. Default: 3.
        base_channels (int): Channel num of stem output features. Default: 64.
        out_indices (tuple[int]): Indices of output feature. Default: (3, ).
        num_stages (int): Resnet stages. Default: 4.
        spatial_strides (tuple[int]): Spatial strides of residual blocks of each stage. Default: (1, 2, 2, 2).
        temporal_strides (tuple[int]): Temporal strides of residual blocks of each stage. Default: (1, 1, 1, 1).
        conv1_kernel (tuple[int]): Kernel size of the first conv layer. Default: (3, 7, 7).
        conv1_stride (tuple[int]): Stride of the first conv layer (temporal, spatial). Default: (1, 2).
        pool1_stride (tuple[int]): Stride of the first pooling layer (temporal, spatial). Default: (1, 2).
        advanced (bool): Flag indicating if an advanced design for downsample is adopted. Default: False.
        frozen_stages (int): Stages to be frozen (all param fixed). -1 means not freezing any parameters. Default: -1.
        inflate (tuple[int]): Inflate Dims of each block. Default: (1, 1, 1, 1).
        inflate_style (str): '3x1x1' or '3x3x3'. which determines the kernel sizes and padding strides
            for conv1 and conv2 in each block. Default: '3x1x1'.
        conv_cfg (dict): Config for conv layers. required keys are 'type'. Default: 'dict(type='Conv3d')'.
        norm_cfg (dict): Config for norm layers. required keys are 'type' and 'requires_grad'.
            Default: 'dict(type='BN3d', requires_grad=True)'.
        act_cfg (dict): Config dict for activation layer. Default: 'dict(type='ReLU', inplace=True)'.
        norm_eval (bool): Whether to set BN layers to eval mode, namely, freeze running stats (mean and var).
            Default: False.
        zero_init_residual (bool): Whether to use zero initialization for residual block. Default: True.
    """

    arch_settings = {
        18: (BasicBlock3d, (2, 2, 2, 2)),
        34: (BasicBlock3d, (3, 4, 6, 3)),
        50: (Bottleneck3d, (3, 4, 6, 3)),
        101: (Bottleneck3d, (3, 4, 23, 3)),
        152: (Bottleneck3d, (3, 8, 36, 3))
    }

分析：这个代码定义了一个名为 ResNet3d 的 PyTorch 模块,它实现了 3D ResNet 的骨干网络

class ResNet3d(nn.Module): 定义了一个名为 ResNet3d 的类,它继承自 PyTorch 的 nn.Module 类,表示这是一个神经网络模块。
"""ResNet 3d backbone.""" 这是一个文档字符串,描述了该模块的作用,即作为 3D ResNet 的骨干网络。
Args: 这是一个参数列表,定义了该模块可接受的各种参数。
arch_settings = {...} 这是一个字典,定义了不同深度的 ResNet3d 网络中使用的基础块的类型和每个阶段的块数。

总之,这个代码定义了一个可以用于 3D 视频处理的 ResNet 骨干网络,并提供了许多可配置的参数来满足不同的需求。

arch_settings = {
        18: (BasicBlock3d, (2, 2, 2, 2)),
        34: (BasicBlock3d, (3, 4, 6, 3)),
        50: (Bottleneck3d, (3, 4, 6, 3)),
        101: (Bottleneck3d, (3, 4, 23, 3)),
        152: (Bottleneck3d, (3, 8, 36, 3))
    }

详细解释一下这个 arch_settings 字典中的每一行:

18: (BasicBlock3d, (2, 2, 2, 2)),
- 当 depth 参数设置为 18 时,使用 BasicBlock3d 作为基础块,每个阶段的块数为 (2, 2, 2, 2)。
34: (BasicBlock3d, (3, 4, 6, 3)),
- 当 depth 参数设置为 34 时,使用 BasicBlock3d 作为基础块,每个阶段的块数为 (3, 4, 6, 3)。
50: (Bottleneck3d, (3, 4, 6, 3)),
- 当 depth 参数设置为 50 时,使用 Bottleneck3d 作为基础块,每个阶段的块数为 (3, 4, 6, 3)。
101: (Bottleneck3d, (3, 4, 23, 3)),
- 当 depth 参数设置为 101 时,使用 Bottleneck3d 作为基础块,每个阶段的块数为 (3, 4, 23, 3)。
152: (Bottleneck3d, (3, 8, 36, 3)),
- 当 depth 参数设置为 152 时,使用 Bottleneck3d 作为基础块,每个阶段的块数为 (3, 8, 36, 3)。