结合了经典ResNet结构的残差块(Bottleneck)概念与可形变卷积(Deformable Convolution)技术:
-
低级特征提取(conv1层)
起点:初始卷积层以较小的滤波器(如3x3)扫描图像,寻找基本的视觉线索。
功能:识别图像的最基本元素,如边缘、纹理、色彩变化和亮度变化。 -
特征细化与通道调整(layer1)
过渡: 经过初始的低级特征提取后,网络进入更深层次的特征细化阶段,通常通过残差结构(如Bottleneck)实现。
通道调整:通过1x1卷积减少通道数,随后3x3卷积进行特征提取,再通过另一次1x1卷积增加通道数,这一系列操作在控制计算成本的同时增强了特征表达力。 -
空间下采样与特征聚合(layer2及以后)
目的: 随着网络深度增加,降低特征图的空间分辨率成为关键,这不仅减少了计算负担,而且促进了对更大上下文的理解。
操作:使用步长大于1的卷积或池化(如Max/Avg Pooling)进行空间下采样,特征图的尺寸减小,但每个特征覆盖了更大的图像区域,有利于捕获全局结构和上下文信息。 -
可形变卷积的引入(layer3和layer4)
创新点: 在网络的较深层次,引入可形变卷积(Deformable Convolution)是提升对复杂形变性和尺度不变性的关键步骤。
动态性: 通过在执行卷积前预测偏移量,可形变卷积的采样点可以根据输入特征动态调整,更好地适应对象的形状变化和位置变动,这在目标检测、分割等任务中尤为重要。
那是不是就说明 我们可以在某个层做一些改动作为创新点呢???
可以怎么改呢??
(backbone): ResNet(
(conv1): Sequential(
(0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): FrozenBatchNorm2d(num_features=32, eps=1e-05)
(2): ReLU(inplace=True)
(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): FrozenBatchNorm2d(num_features=32, eps=1e-05)
(5): ReLU(inplace=True)
(6): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
)
(bn1): FrozenBatchNorm2d(num_features=64, eps=1e-05)
(act1): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(num_features=64, eps=1e-05)
(act1): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(num_features=64, eps=1e-05)
(drop_block): Identity()
(act2): ReLU(inplace=True)
(aa): Identity()
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(num_features=256, eps=1e-05)
(act3): ReLU(inplace=True)
(downsample): Sequential(
(0): Identity()
(1): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(num_features=64, eps=1e-05)
(act1): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(num_features=64, eps=1e-05)
(drop_block): Identity()
(act2): ReLU(inplace=True)
(aa): Identity()
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(num_features=256, eps=1e-05)
(act3): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(num_features=64, eps=1e-05)
(act1): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(num_features=64, eps=1e-05)
(drop_block): Identity()
(act2): ReLU(inplace=True)
(aa): Identity()
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(num_features=256, eps=1e-05)
(act3): ReLU(inplace=True)
)
)
(layer2): Sequential(
(0): Bottleneck(
(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(num_features=128, eps=1e-05)
(act1): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(num_features=128, eps=1e-05)
(drop_block): Identity()
(act2): ReLU(inplace=True)
(aa): Identity()
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(num_features=512, eps=1e-05)
(act3): ReLU(inplace=True)
(downsample): Sequential(
(0): AvgPool2d(kernel_size=2, stride=2, padding=0)
(1): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(1): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(num_features=128, eps=1e-05)
(act1): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(num_features=128, eps=1e-05)
(drop_block): Identity()
(act2): ReLU(inplace=True)
(aa): Identity()
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(num_features=512, eps=1e-05)
(act3): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(num_features=128, eps=1e-05)
(act1): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(num_features=128, eps=1e-05)
(drop_block): Identity()
(act2): ReLU(inplace=True)
(aa): Identity()
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(num_features=512, eps=1e-05)
(act3): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(num_features=128, eps=1e-05)
(act1): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(num_features=128, eps=1e-05)
(drop_block): Identity()
(act2): ReLU(inplace=True)
(aa): Identity()
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(num_features=512, eps=1e-05)
(act3): ReLU(inplace=True)
)
)
(layer3): Sequential(
(0): DeformableBottleneck(
(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(num_features=256, eps=1e-05)
(act1): ReLU(inplace=True)
(conv2_offset): Conv2d(256, 18, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(conv2): DeformConv(in_channels=256, out_channels=256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
(bn2): FrozenBatchNorm2d(num_features=256, eps=1e-05)
(act2): ReLU(inplace=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
(act3): ReLU(inplace=True)
(downsample): Sequential(
(0): AvgPool2d(kernel_size=2, stride=2, padding=0)
(1): Conv2d(512, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(1): DeformableBottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(num_features=256, eps=1e-05)
(act1): ReLU(inplace=True)
(conv2_offset): Conv2d(256, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): DeformConv(in_channels=256, out_channels=256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
(bn2): FrozenBatchNorm2d(num_features=256, eps=1e-05)
(act2): ReLU(inplace=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
(act3): ReLU(inplace=True)
)
(2): DeformableBottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(num_features=256, eps=1e-05)
(act1): ReLU(inplace=True)
(conv2_offset): Conv2d(256, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): DeformConv(in_channels=256, out_channels=256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
(bn2): FrozenBatchNorm2d(num_features=256, eps=1e-05)
(act2): ReLU(inplace=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
(act3): ReLU(inplace=True)
)
(3): DeformableBottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(num_features=256, eps=1e-05)
(act1): ReLU(inplace=True)
(conv2_offset): Conv2d(256, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): DeformConv(in_channels=256, out_channels=256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
(bn2): FrozenBatchNorm2d(num_features=256, eps=1e-05)
(act2): ReLU(inplace=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
(act3): ReLU(inplace=True)
)
(4): DeformableBottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(num_features=256, eps=1e-05)
(act1): ReLU(inplace=True)
(conv2_offset): Conv2d(256, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): DeformConv(in_channels=256, out_channels=256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
(bn2): FrozenBatchNorm2d(num_features=256, eps=1e-05)
(act2): ReLU(inplace=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
(act3): ReLU(inplace=True)
)
(5): DeformableBottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(num_features=256, eps=1e-05)
(act1): ReLU(inplace=True)
(conv2_offset): Conv2d(256, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): DeformConv(in_channels=256, out_channels=256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
(bn2): FrozenBatchNorm2d(num_features=256, eps=1e-05)
(act2): ReLU(inplace=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
(act3): ReLU(inplace=True)
)
)
(layer4): Sequential(
(0): DeformableBottleneck(
(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(num_features=512, eps=1e-05)
(act1): ReLU(inplace=True)
(conv2_offset): Conv2d(512, 18, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(conv2): DeformConv(in_channels=512, out_channels=512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
(bn2): FrozenBatchNorm2d(num_features=512, eps=1e-05)
(act2): ReLU(inplace=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
(act3): ReLU(inplace=True)
(downsample): Sequential(
(0): AvgPool2d(kernel_size=2, stride=2, padding=0)
(1): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(1): DeformableBottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(num_features=512, eps=1e-05)
(act1): ReLU(inplace=True)
(conv2_offset): Conv2d(512, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): DeformConv(in_channels=512, out_channels=512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
(bn2): FrozenBatchNorm2d(num_features=512, eps=1e-05)
(act2): ReLU(inplace=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
(act3): ReLU(inplace=True)
)
(2): DeformableBottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(num_features=512, eps=1e-05)
(act1): ReLU(inplace=True)
(conv2_offset): Conv2d(512, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): DeformConv(in_channels=512, out_channels=512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=(1, 1), groups=1, deformable_groups=1, bias=False)
(bn2): FrozenBatchNorm2d(num_features=512, eps=1e-05)
(act2): ReLU(inplace=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
(act3): ReLU(inplace=True)
)
)
)