FastInst 之 Backbone:ResNet-vd-dcn

结合了经典ResNet结构的残差块(Bottleneck)概念与可形变卷积(Deformable Convolution)技术:

  1. 低级特征提取(conv1层)
    起点:初始卷积层以较小的滤波器(如3x3)扫描图像,寻找基本的视觉线索。
    功能:识别图像的最基本元素,如边缘、纹理、色彩变化和亮度变化。

  2. 特征细化与通道调整(layer1)
    过渡: 经过初始的低级特征提取后,网络进入更深层次的特征细化阶段,通常通过残差结构(如Bottleneck)实现。
    通道调整:通过1x1卷积减少通道数,随后3x3卷积进行特征提取,再通过另一次1x1卷积增加通道数,这一系列操作在控制计算成本的同时增强了特征表达力。

  3. 空间下采样与特征聚合(layer2及以后)
    目的: 随着网络深度增加,降低特征图的空间分辨率成为关键,这不仅减少了计算负担,而且促进了对更大上下文的理解。
    操作:使用步长大于1的卷积或池化(如Max/Avg Pooling)进行空间下采样,特征图的尺寸减小,但每个特征覆盖了更大的图像区域,有利于捕获全局结构和上下文信息。

  4. 可形变卷积的引入(layer3和layer4)
    创新点: 在网络的较深层次,引入可形变卷积(Deformable Convolution)是提升对复杂形变性和尺度不变性的关键步骤。
    动态性: 通过在执行卷积前预测偏移量,可形变卷积的采样点可以根据输入特征动态调整,更好地适应对象的形状变化和位置变动,这在目标检测、分割等任务中尤为重要。

那是不是就说明 我们可以在某个层做一些改动作为创新点呢???
可以怎么改呢??

在这里插入图片描述

(backbone): ResNet(
(conv1): Sequential(
  (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  (1): FrozenBatchNorm2d(num_features=32, eps=1e-05)
  (2): ReLU(inplace=True)
  (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (4): FrozenBatchNorm2d(num_features=32, eps=1e-05)
  (5): ReLU(inplace=True)
  (6): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
)
(bn1): FrozenBatchNorm2d(num_features=64, eps=1e-05)
(act1): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
  (0): Bottleneck(
    (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): FrozenBatchNorm2d(num_features=64, eps=1e-05)
    (act1): ReLU(inplace=True)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): FrozenBatchNorm2d(num_features=64, eps=1e-05)
    (drop_block): Identity()
    (act2): ReLU(inplace=True)
    (aa): Identity()
    (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): FrozenBatchNorm2d(num_features=256, eps=1e-05)
    (act3): ReLU(inplace=True)
    (downsample): Sequential(
      (0): Identity()
      (1): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (2): FrozenBatchNorm2d(num_features=256, eps=1e-05)
    )
  )
  (1): Bottleneck(
    (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): FrozenBatchNorm2d(num_features=64, eps=1e-05)
    (act1): ReLU(inplace=True)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): FrozenBatchNorm2d(num_features=64, eps=1e-05)
    (drop_block): Identity()
    (act2): ReLU(inplace=True)
    (aa): Identity()
    (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): FrozenBatchNorm2d(num_features=256, eps=1e-05)
    (act3): ReLU(inplace=True)
  )
  (2): Bottleneck(
    (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): FrozenBatchNorm2d(num_features=64, eps=1e-05)
    (act1): ReLU(inplace=True)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): FrozenBatchNorm2d(num_features=64, eps=1e-05)
    (drop_block): Identity()
    (act2): ReLU(inplace=True)
    (aa): Identity()
    (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): FrozenBatchNorm2d(num_features=256, eps=1e-05)
    (act3): ReLU(inplace=True)
  )
)
  
(layer2): Sequential(
  (0): Bottleneck(
    (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): FrozenBatchNorm2d(num_features=128, eps=1e-05)
    (act1): ReLU(inplace=True)
    (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (bn2): FrozenBatchNorm2d(num_features=128, eps=1e-05)
    (drop_block): Identity()
    (act2): ReLU(inplace=True)
    (aa): Identity()
    (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): FrozenBatchNorm2d(num_features=512, eps=1e-05)
    (act3): ReLU(inplace=True)
    (downsample): Sequential(
      (0): AvgPool2d(kernel_size=2, stride=2, padding=0)
      (1): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (2): FrozenBatchNorm2d(num_features=512, eps=1e-05)
    )
  )
  (1): Bottleneck(
    (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): FrozenBatchNorm2d(num_features=128, eps=1e-05)
    (act1): ReLU(inplace=True)
    (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): FrozenBatchNorm2d(num_features=128, eps=1e-05)
    (drop_block): Identity()
    (act2): ReLU(inplace=True)
    (aa): Identity()
    (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): FrozenBatchNorm2d(num_features=512, eps=1e-05)
    (act3): ReLU(inplace=True)
  )
  (2): Bottleneck(
    (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): FrozenBatchNorm2d(num_features=128, eps=1e-05)
    (act1): ReLU(inplace=True)
    (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): FrozenBatchNorm2d(num_features=128, eps=1e-05)
    (drop_block): Identity()
    (act2): ReLU(inplace=True)
    (aa): Identity()
    (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): FrozenBatchNorm2d(num_features=512, eps=1e-05)
    (act3): ReLU(inplace=True)
  )
  (3): Bottleneck(
    (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): FrozenBatchNorm2d(num_features=128, eps=1e-05)
    (act1): ReLU(inplace=True)
    (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): FrozenBatchNorm2d(num_features=128, eps=1e-05)
    (drop_block): Identity()
    (act2): ReLU(inplace=True)
    (aa): Identity()
    (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): FrozenBatchNorm2d(num_features=512, eps=1e-05)
    (act3): ReLU(inplace=True)
  )
)
  
(layer3): Sequential(
  (0): DeformableBottleneck(
    (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): FrozenBatchNorm2d(num_features=256, eps=1e-05)
    (act1): ReLU(inplace=True)
    (conv2_offset): Conv2d(256, 18, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (conv2): DeformConv(in_channels=256, out_channels=256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
    (bn2): FrozenBatchNorm2d(num_features=256, eps=1e-05)
    (act2): ReLU(inplace=True)
    (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
    (act3): ReLU(inplace=True)
    (downsample): Sequential(
      (0): AvgPool2d(kernel_size=2, stride=2, padding=0)
      (1): Conv2d(512, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (2): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
    )
  )
  (1): DeformableBottleneck(
    (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): FrozenBatchNorm2d(num_features=256, eps=1e-05)
    (act1): ReLU(inplace=True)
    (conv2_offset): Conv2d(256, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (conv2): DeformConv(in_channels=256, out_channels=256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
    (bn2): FrozenBatchNorm2d(num_features=256, eps=1e-05)
    (act2): ReLU(inplace=True)
    (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
    (act3): ReLU(inplace=True)
  )
  (2): DeformableBottleneck(
    (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): FrozenBatchNorm2d(num_features=256, eps=1e-05)
    (act1): ReLU(inplace=True)
    (conv2_offset): Conv2d(256, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (conv2): DeformConv(in_channels=256, out_channels=256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
    (bn2): FrozenBatchNorm2d(num_features=256, eps=1e-05)
    (act2): ReLU(inplace=True)
    (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
    (act3): ReLU(inplace=True)
  )
  (3): DeformableBottleneck(
    (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): FrozenBatchNorm2d(num_features=256, eps=1e-05)
    (act1): ReLU(inplace=True)
    (conv2_offset): Conv2d(256, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (conv2): DeformConv(in_channels=256, out_channels=256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
    (bn2): FrozenBatchNorm2d(num_features=256, eps=1e-05)
    (act2): ReLU(inplace=True)
    (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
    (act3): ReLU(inplace=True)
  )
  (4): DeformableBottleneck(
    (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): FrozenBatchNorm2d(num_features=256, eps=1e-05)
    (act1): ReLU(inplace=True)
    (conv2_offset): Conv2d(256, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (conv2): DeformConv(in_channels=256, out_channels=256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
    (bn2): FrozenBatchNorm2d(num_features=256, eps=1e-05)
    (act2): ReLU(inplace=True)
    (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
    (act3): ReLU(inplace=True)
  )
  (5): DeformableBottleneck(
    (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): FrozenBatchNorm2d(num_features=256, eps=1e-05)
    (act1): ReLU(inplace=True)
    (conv2_offset): Conv2d(256, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (conv2): DeformConv(in_channels=256, out_channels=256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
    (bn2): FrozenBatchNorm2d(num_features=256, eps=1e-05)
    (act2): ReLU(inplace=True)
    (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
    (act3): ReLU(inplace=True)
  )
)
  
(layer4): Sequential(
  (0): DeformableBottleneck(
    (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): FrozenBatchNorm2d(num_features=512, eps=1e-05)
    (act1): ReLU(inplace=True)
    (conv2_offset): Conv2d(512, 18, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (conv2): DeformConv(in_channels=512, out_channels=512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
    (bn2): FrozenBatchNorm2d(num_features=512, eps=1e-05)
    (act2): ReLU(inplace=True)
    (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
    (act3): ReLU(inplace=True)
    (downsample): Sequential(
      (0): AvgPool2d(kernel_size=2, stride=2, padding=0)
      (1): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (2): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
    )
  )
  (1): DeformableBottleneck(
    (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): FrozenBatchNorm2d(num_features=512, eps=1e-05)
    (act1): ReLU(inplace=True)
    (conv2_offset): Conv2d(512, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (conv2): DeformConv(in_channels=512, out_channels=512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
    (bn2): FrozenBatchNorm2d(num_features=512, eps=1e-05)
    (act2): ReLU(inplace=True)
    (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
    (act3): ReLU(inplace=True)
  )
  (2): DeformableBottleneck(
    (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn1): FrozenBatchNorm2d(num_features=512, eps=1e-05)
    (act1): ReLU(inplace=True)
    (conv2_offset): Conv2d(512, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (conv2): DeformConv(in_channels=512, out_channels=512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
    (bn2): FrozenBatchNorm2d(num_features=512, eps=1e-05)
    (act2): ReLU(inplace=True)
    (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn3): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
    (act3): ReLU(inplace=True)
  )
)

)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值