★★★ 本文源自AlStudio社区精品项目,【点击此处】查看更多精品内容 >>>
如果你还不会自己搭建网络,那这个项目一定是一个让你打基础的好项目!
本文想要复现的是经典的FPN网络,特征图金字塔网络FPN(Feature Pyramid Networks)是2017年提出的一种网络,FPN主要解决的是物体检测中的多尺度问题,通过简单的网络连接改变,在基本不增加原有模型计算量的情况下,大幅度提升了小物体检测的性能。
论文:
卷积网络中,深层网络容易响应语义特征,浅层网络容易响应图像特征。然而,在目标检测和图像分割中往往因为卷积网络的这个特征带来了不少麻烦:
高层网络虽然能响应语义特征,但是由于Feature Map的尺寸太小,拥有的几何信息并不多,不利于目标的检测;浅层网络虽然包含比较多的几何信息,但是图像的语义特征并不多,不利于图像的分类。这个问题在小目标检测中更为突出。
这里先直接放上作者原文中的,网络演变图,大家就能看出来他清晰的实现过程:

简而言之就是:单尺度到多尺度再到多尺度融合
(了解YOLOV5的可能就会发现,这玩意跟YOLOV5怎么一个D样)
再看一下具体的图:

代码复现
知道了模型结构图,那下面就可以开始一步步的复现了
经过了解,FPN一般不会作为单独的网络出现使用,都会借助ResNet作为backbone,根据此我做了如下的图片

(图片请勿直接copy使用)
仔细看上面的网络结构,我把他分为三大结构:
- 获取resnet的四个输出
- 通道转换 C -> 256
- 特征融合
那我们进行代码复现的时候,自然也需要这三个大块来完成。
1. ResNet的四个输出
resnet目前已经是一个非常常用的网络结构了,我这里直接从paddle的源码中找出了他的实现,并做了一定的简化和适配修改。
(代码还是比较长的,不想看的可以直接拉到代码最后一部分,看测试输出,即可知道他做了些什么)
简化:
去除了杂七杂八的网络、预训练模型
修改:
因为我们需要用到四个输出,所以需要在forward时保留四个输出。
import paddle
from paddle import nn
from paddle.utils.download import get_weights_path_from_url
__all__ = []
model_urls = {
'resnet50': (
'https://paddle-hapi.bj.bcebos.com/models/resnet50.pdparams',
'ca6f485ee1ab0492d38f323885b0ad80',
)
}
class BasicBlock(nn.Layer):
expansion = 1
def __init__(
self,
inplanes,
planes,
stride=1,
downsample=None,
groups=1,
base_width=64,
dilation=1,
norm_layer=None,
):
super().__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2D
if dilation > 1:
raise NotImplementedError(
"Dilation > 1 not supported in BasicBlock"
)
self.conv1 = nn.Conv2D(
inplanes, planes, 3, padding=1, stride=stride, bias_attr=False
)
self.bn1 = norm_layer(planes)
self.relu = nn.ReLU()
self.conv2 = nn.Conv2D(planes, planes, 3, padding=1, bias_attr=False)
self.bn2 = norm_layer(planes)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class BottleneckBlock(nn.Layer):
expansion = 4
def __init__(
self,
inplanes,
planes,
stride=1,
downsample=None,
groups=1,
base_width=64,
dilation=1,
norm_layer=None,
):
super().__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2D
width = int(planes * (base_width / 64.0)) * groups
self.conv1 = nn.Conv2D(inplanes, width, 1, bias_attr=False)
self.bn1 = norm_layer(width)
self.conv2 = nn.Conv2D(
width,
width,
3,
padding=dilation,
stride=stride,
groups=groups,
dilation=dilation,
bias_attr=False,
)
self.bn2 = norm_layer(width)
self.conv3 = nn.Conv2D(
width, planes * self.expansion, 1, bias_attr=False
)
self.bn3 = norm_layer(planes * self.expansion)
self.relu = nn.ReLU()
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class ResNet(nn.Layer):
"""ResNet model from
Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>
"""
def __init__(
self,
block,
depth=50,
width=64,
num_classes=2,
with_pool=True,
groups=1,
):
super().__init__()
layer_cfg = {
50: [3, 4, 6, 3]
}
layers = layer_cfg[depth]
self.groups = groups
self.base_width = width
self.num_classes = num_classes
self.with_pool = with_pool
self._norm_layer = nn.BatchNorm2D
self.inplanes = 64
self.dilation = 1
self.conv1 = nn.Conv2D(
3,
self.inplanes,
kernel_size=3,
stride=1,
padding=1,
bias_attr=False,
)
self.bn1 = self._norm_layer(self.inplanes)
self.relu = nn.ReLU()
self.maxpool = nn.MaxPool2D(kernel_size=3, stride=1, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
if with_pool:
self.avgpool = nn.AdaptiveAvgPool2D((1, 1))
if num_classes > 0:
self.fc = nn.Linear(512 * block.expansion, num_classes)
def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
norm_layer = self._norm_layer
downsample = None
previous_dilation = self.dilation
if dilate:
self.dilation *= stride
stride = 1
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2D(
self.inplanes,
planes * block.expansion,
1,
stride=stride,
bias_attr=False,
),
norm_layer(planes * block.expansion),
)
layers = []
layers.append(
block(
self.inplanes,
planes,
stride,
downsample,
self.groups,
self.base_width,
previous_dilation,
norm_layer,
)
)
self.inplanes = planes * block.expansion
for _ in range(1, blocks):
layers.append(
block(
self.inplanes,
planes,
groups=self.groups,
base_width=self.base_width,
norm_layer=norm_layer,
)
)
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x1 = self.layer1(x)
x2 = self.layer2(x1)
x3 = self.layer3(x2)
x4 = self.layer4(x3)
x = x4
if self.with_pool:
x = self.avgpool(x)
if self.num_classes > 0:
x = paddle.flatten(x, 1)
out = self.fc(x)
return [x1, x2, x3, x4, out]
def _resnet(arch, num_classes, Block, depth, pretrained, **kwargs):
model = ResNet(Block, depth, num_classes=num_classes, **kwargs)
if pretrained:
assert (
arch in model_urls
), "{} model do not have a pretrained model now, you should set pretrained=False".format(
arch
)
weight_path = get_weights_path_from_url(
model_urls[arch][0], model_urls[arch][1]
)
param = paddle.load(weight_path)
model.set_dict(param)
return model
def resnet50(num_classes,pretrained=False, **kwargs):
return _resnet('resnet50', num_classes, BottleneckBlock, 50, pretrained, **kwargs)
if __name__ == "__main__":
model = resnet50(num_classes=2,pretrained=False)
# print(paddle.summary(model, (1, 3, 224, 224)))
x = paddle.rand([4, 3, 256, 256])
out = model(x)
print(out[0].shape) # [b,256,256,256]
print(out[1].shape) # [b,512,128,128]
print(out[2].shape) # [b,1024,64,64]
print(out[3].shape) # [b,2048,32,32]
print(out[4].shape) # [b,2] batch_size, num_classes=2 我们用不到这一层,这一层用于图像分类等。
W0524 15:13:14.657167 10362 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0524 15:13:14.661573 10362 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:712: UserWarning: When training, we now always track global mean and variance.
"When training, we now always track global mean and variance."
[4, 256, 256, 256]
[4, 512, 128, 128]
[4, 1024, 64, 64]
[4, 2048, 32, 32]
[4, 2]
2. 通道转换 C -> 256
通道的转化,我们可以通过使用Conv2D来实现通道数的转换。
最经典的用法就是:
Conv2D(in, out, kernel_size=1, stride=1, padding=0) 通道数变为out HW不变
Conv2D(in, out, kernel_size=3, stride=1, padding=1) 通道数变为out HW不变
Conv2D(in, out, kernel_size=3, stride=2, padding=1) 通道数变为out HW减半
等等,可以多试试
3. 特征融合
所谓特征融合,其实就是将图片当中上一级的小分辨率图片扩大一倍和当前级图像融合,以此反复,即可完成。
下面就需要进行模块的对接,像搭积木一样把三块进行缝合。
我将代码的解释都放入代码正文中,相信你一看就懂!
因为我们每次使用卷积操作时,都会约定俗成的使用bn和relu进行归一化和激活,所以将此卷积直接封装为一个块,方便后续使用
# 引入可能要用到的库
import paddle
import paddle.nn as nn
from paddle.nn import functional as F
# 构建Conv+BN+ReLU块:
class ConvBnReLU(nn.Layer):
def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1):
super(ConvBnReLU, self).__init__()
self.conv = nn.Conv2D(in_channels, out_channels, kernel_size, stride, padding, bias_attr=False)
self.bn = nn.BatchNorm2D(out_channels)
self.relu = nn.ReLU()
def forward(self, x):
return self.relu(self.bn(self.conv(x)))
# 构建FPN网络
class FPN(nn.Layer):
def __init__(self, backbone_out_channels=[256, 512, 1024, 2048]):
super(FPN, self).__init__()
self.backbone_out_channels = backbone_out_channels
# 定义 1x1的卷积操作,用于调整通道数 变为256
self.conv1 = ConvBnReLU(self.backbone_out_channels[0], 256)
self.conv2 = ConvBnReLU(self.backbone_out_channels[1], 256)
self.conv3 = ConvBnReLU(self.backbone_out_channels[2], 256)
self.conv4 = ConvBnReLU(self.backbone_out_channels[3], 256)
# 定义 FPN 中的上采样和下采样操作
self.upsample = nn.Upsample(scale_factor=2.0, mode='nearest')
# 定义 3x3的卷积操作 平滑特征图
self.smooth = nn.Conv2D(256, 256, kernel_size=3, stride=1, padding=1)
def forward(self, inputs):
input1, input2, input3, input4, _ = inputs
# 定义不同层级的特征图
# input1 R1 256 -> 256
# input2 R2 512 -> 256
# input3 R3 1024 -> 256
# input4 R4 2048 -> 256
r1 = self.conv1(input1)
r2 = self.conv2(input2)
r3 = self.conv3(input3)
r4 = self.conv4(input4)
# 定义不同层级的特征图融合
f4 = r4
f3 = r3 + self.upsample(r4) # 32 -> 64 + 64 -> 64
f2 = r2 + self.upsample(f3) # 64 -> 128 + 128 -> 128
f1 = r1 + self.upsample(f2) # 128 -> 256 + 256 -> 256
return self.smooth(f1), self.smooth(f2), self.smooth(f3), self.smooth(f4)
验证
if __name__ == '__main__':
resnet50_model = resnet50(2,pretrained=False)
x = paddle.rand([2, 3, 256, 256])
out = resnet50_model(x)
fpn = FPN()
y = fpn(out)
for i in y:
print(i.shape)
[2, 256, 256, 256]
[2, 256, 128, 128]
[2, 256, 64, 64]
[2, 256, 32, 32]
256, 32, 32]
可以看到模型输出和前面绘制的模型图完全吻合,复现完成。
项目总结
这个一个比较入门且基础的网络搭建教学项目,欢迎大家提出修改意见和fork学习
此文章为搬运
原项目链接

1万+

被折叠的 条评论
为什么被折叠?



