带你一步步完成经典FPN网络复现【小白友好】

★★★ 本文源自AlStudio社区精品项目,【点击此处】查看更多精品内容 >>>

如果你还不会自己搭建网络,那这个项目一定是一个让你打基础的好项目!

本文想要复现的是经典的FPN网络,特征图金字塔网络FPN(Feature Pyramid Networks)是2017年提出的一种网络,FPN主要解决的是物体检测中的多尺度问题,通过简单的网络连接改变,在基本不增加原有模型计算量的情况下,大幅度提升了小物体检测的性能。

论文:

Feature_Pyramid_Networks_CVPR_2017_paper

卷积网络中,深层网络容易响应语义特征,浅层网络容易响应图像特征。然而,在目标检测和图像分割中往往因为卷积网络的这个特征带来了不少麻烦:

高层网络虽然能响应语义特征,但是由于Feature Map的尺寸太小,拥有的几何信息并不多,不利于目标的检测;浅层网络虽然包含比较多的几何信息,但是图像的语义特征并不多,不利于图像的分类。这个问题在小目标检测中更为突出。

这里先直接放上作者原文中的,网络演变图,大家就能看出来他清晰的实现过程:

简而言之就是:单尺度到多尺度再到多尺度融合

(了解YOLOV5的可能就会发现,这玩意跟YOLOV5怎么一个D样)

再看一下具体的图:

代码复现

知道了模型结构图,那下面就可以开始一步步的复现了

经过了解,FPN一般不会作为单独的网络出现使用,都会借助ResNet作为backbone,根据此我做了如下的图片

(图片请勿直接copy使用)

仔细看上面的网络结构,我把他分为三大结构:

  1. 获取resnet的四个输出
  2. 通道转换 C -> 256
  3. 特征融合

那我们进行代码复现的时候,自然也需要这三个大块来完成。

1. ResNet的四个输出

resnet目前已经是一个非常常用的网络结构了,我这里直接从paddle的源码中找出了他的实现,并做了一定的简化和适配修改。

(代码还是比较长的,不想看的可以直接拉到代码最后一部分,看测试输出,即可知道他做了些什么)

简化:

去除了杂七杂八的网络、预训练模型

修改:

因为我们需要用到四个输出,所以需要在forward时保留四个输出。

import paddle
from paddle import nn
from paddle.utils.download import get_weights_path_from_url

__all__ = []

model_urls = {
    'resnet50': (
        'https://paddle-hapi.bj.bcebos.com/models/resnet50.pdparams',
        'ca6f485ee1ab0492d38f323885b0ad80',
    )
}


class BasicBlock(nn.Layer):
    expansion = 1

    def __init__(
        self,
        inplanes,
        planes,
        stride=1,
        downsample=None,
        groups=1,
        base_width=64,
        dilation=1,
        norm_layer=None,
    ):
        super().__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2D

        if dilation > 1:
            raise NotImplementedError(
                "Dilation > 1 not supported in BasicBlock"
            )

        self.conv1 = nn.Conv2D(
            inplanes, planes, 3, padding=1, stride=stride, bias_attr=False
        )
        self.bn1 = norm_layer(planes)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2D(planes, planes, 3, padding=1, bias_attr=False)
        self.bn2 = norm_layer(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class BottleneckBlock(nn.Layer):

    expansion = 4

    def __init__(
        self,
        inplanes,
        planes,
        stride=1,
        downsample=None,
        groups=1,
        base_width=64,
        dilation=1,
        norm_layer=None,
    ):
        super().__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2D
        width = int(planes * (base_width / 64.0)) * groups

        self.conv1 = nn.Conv2D(inplanes, width, 1, bias_attr=False)
        self.bn1 = norm_layer(width)

        self.conv2 = nn.Conv2D(
            width,
            width,
            3,
            padding=dilation,
            stride=stride,
            groups=groups,
            dilation=dilation,
            bias_attr=False,
        )
        self.bn2 = norm_layer(width)

        self.conv3 = nn.Conv2D(
            width, planes * self.expansion, 1, bias_attr=False
        )
        self.bn3 = norm_layer(planes * self.expansion)
        self.relu = nn.ReLU()
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class ResNet(nn.Layer):
    """ResNet model from
    Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>

    """

    def __init__(
        self,
        block,
        depth=50,
        width=64,
        num_classes=2,
        with_pool=True,
        groups=1,
    ):
        super().__init__()
        layer_cfg = {
            50: [3, 4, 6, 3]
        }
        layers = layer_cfg[depth]
        self.groups = groups
        self.base_width = width
        self.num_classes = num_classes
        self.with_pool = with_pool
        self._norm_layer = nn.BatchNorm2D

        self.inplanes = 64
        self.dilation = 1

        self.conv1 = nn.Conv2D(
            3,
            self.inplanes,
            kernel_size=3,
            stride=1,
            padding=1,
            bias_attr=False,
        )
        self.bn1 = self._norm_layer(self.inplanes)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2D(kernel_size=3, stride=1, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        if with_pool:
            self.avgpool = nn.AdaptiveAvgPool2D((1, 1))

        if num_classes > 0:
            self.fc = nn.Linear(512 * block.expansion, num_classes)

    def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
        norm_layer = self._norm_layer
        downsample = None
        previous_dilation = self.dilation
        if dilate:
            self.dilation *= stride
            stride = 1
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2D(
                    self.inplanes,
                    planes * block.expansion,
                    1,
                    stride=stride,
                    bias_attr=False,
                ),
                norm_layer(planes * block.expansion),
            )

        layers = []
        layers.append(
            block(
                self.inplanes,
                planes,
                stride,
                downsample,
                self.groups,
                self.base_width,
                previous_dilation,
                norm_layer,
            )
        )
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(
                block(
                    self.inplanes,
                    planes,
                    groups=self.groups,
                    base_width=self.base_width,
                    norm_layer=norm_layer,
                )
            )

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x1 = self.layer1(x)
        x2 = self.layer2(x1)
        x3 = self.layer3(x2)
        x4 = self.layer4(x3)
        x = x4

        if self.with_pool:
            x = self.avgpool(x)

        if self.num_classes > 0:
            x = paddle.flatten(x, 1)
            out = self.fc(x)

        return [x1, x2, x3, x4, out]


def _resnet(arch, num_classes, Block, depth, pretrained, **kwargs):
    model = ResNet(Block, depth, num_classes=num_classes, **kwargs)
    if pretrained:
        assert (
            arch in model_urls
        ), "{} model do not have a pretrained model now, you should set pretrained=False".format(
            arch
        )
        weight_path = get_weights_path_from_url(
            model_urls[arch][0], model_urls[arch][1]
        )

        param = paddle.load(weight_path)
        model.set_dict(param)

    return model

def resnet50(num_classes,pretrained=False, **kwargs):    
    return _resnet('resnet50', num_classes, BottleneckBlock, 50, pretrained, **kwargs)


if __name__ == "__main__":
    
    model = resnet50(num_classes=2,pretrained=False)
    # print(paddle.summary(model, (1, 3, 224, 224)))
    x = paddle.rand([4, 3, 256, 256])
    out = model(x)
    print(out[0].shape) # [b,256,256,256] 
    print(out[1].shape) # [b,512,128,128]
    print(out[2].shape) # [b,1024,64,64]
    print(out[3].shape) # [b,2048,32,32]
    print(out[4].shape) # [b,2] batch_size, num_classes=2 我们用不到这一层,这一层用于图像分类等。
W0524 15:13:14.657167 10362 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0524 15:13:14.661573 10362 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:712: UserWarning: When training, we now always track global mean and variance.
  "When training, we now always track global mean and variance."


[4, 256, 256, 256]
[4, 512, 128, 128]
[4, 1024, 64, 64]
[4, 2048, 32, 32]
[4, 2]

2. 通道转换 C -> 256

通道的转化,我们可以通过使用Conv2D来实现通道数的转换。

最经典的用法就是:

Conv2D(in, out, kernel_size=1, stride=1, padding=0) 通道数变为out HW不变
Conv2D(in, out, kernel_size=3, stride=1, padding=1) 通道数变为out HW不变
Conv2D(in, out, kernel_size=3, stride=2, padding=1) 通道数变为out HW减半

等等,可以多试试

3. 特征融合

所谓特征融合,其实就是将图片当中上一级的小分辨率图片扩大一倍和当前级图像融合,以此反复,即可完成。

下面就需要进行模块的对接,像搭积木一样把三块进行缝合。

我将代码的解释都放入代码正文中,相信你一看就懂!

因为我们每次使用卷积操作时,都会约定俗成的使用bn和relu进行归一化和激活,所以将此卷积直接封装为一个块,方便后续使用

# 引入可能要用到的库
import paddle
import paddle.nn as nn
from paddle.nn import functional as F

# 构建Conv+BN+ReLU块:
class ConvBnReLU(nn.Layer):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1):
        super(ConvBnReLU, self).__init__()
        self.conv = nn.Conv2D(in_channels, out_channels, kernel_size, stride, padding, bias_attr=False)
        self.bn = nn.BatchNorm2D(out_channels)
        self.relu = nn.ReLU()

    def forward(self, x):
        return self.relu(self.bn(self.conv(x)))
# 构建FPN网络
class FPN(nn.Layer):
    def __init__(self, backbone_out_channels=[256, 512, 1024, 2048]):
        super(FPN, self).__init__()
        self.backbone_out_channels = backbone_out_channels
        
        # 定义 1x1的卷积操作,用于调整通道数 变为256
        self.conv1 = ConvBnReLU(self.backbone_out_channels[0], 256)
        self.conv2 = ConvBnReLU(self.backbone_out_channels[1], 256)
        self.conv3 = ConvBnReLU(self.backbone_out_channels[2], 256)
        self.conv4 = ConvBnReLU(self.backbone_out_channels[3], 256)
        
        # 定义 FPN 中的上采样和下采样操作
        self.upsample = nn.Upsample(scale_factor=2.0, mode='nearest')
        
        # 定义 3x3的卷积操作 平滑特征图
        self.smooth = nn.Conv2D(256, 256, kernel_size=3, stride=1, padding=1)
        
    def forward(self, inputs):
        
        input1, input2, input3, input4, _ = inputs
        
        # 定义不同层级的特征图
        # input1 R1  256 -> 256
        # input2 R2  512 -> 256
        # input3 R3  1024  -> 256
        # input4 R4  2048  -> 256
        r1 = self.conv1(input1)
        r2 = self.conv2(input2)
        r3 = self.conv3(input3)
        r4 = self.conv4(input4)
        
        # 定义不同层级的特征图融合
        f4 = r4
        f3 = r3 + self.upsample(r4) # 32 -> 64 + 64 -> 64
        f2 = r2 + self.upsample(f3) # 64 -> 128 + 128 -> 128
        f1 = r1 + self.upsample(f2) # 128 -> 256 + 256 -> 256
        
        return self.smooth(f1), self.smooth(f2), self.smooth(f3), self.smooth(f4)

验证

if __name__ == '__main__':
    
    resnet50_model = resnet50(2,pretrained=False)
    x = paddle.rand([2, 3, 256, 256])
    out = resnet50_model(x)
    fpn = FPN()
    y = fpn(out)
    for i in y:
        print(i.shape)
[2, 256, 256, 256]
[2, 256, 128, 128]
[2, 256, 64, 64]
[2, 256, 32, 32]

256, 32, 32]

可以看到模型输出和前面绘制的模型图完全吻合,复现完成。

项目总结

这个一个比较入门且基础的网络搭建教学项目,欢迎大家提出修改意见和fork学习

此文章为搬运
原项目链接

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值