【目标检测】FPN(Fature Pyramid Network)详解

论文题目:《Feature Pyramid Networks for Object Detection》
论文地址:https://arxiv.org/pdf/1612.03144.pdf

概述

       这篇论文主要解决的问题是目标检测在处理多尺度变化问题时的不足,现在的很多网络都使用了利用单个高层特征(比如说Faster R-CNN利用下采样四倍的卷积层——Conv4,进行后续的物体的分类和bounding box的回归),但是这样做有一个明显的缺陷,即小物体本身具有的像素信息较少,在下采样的过程中极易被丢失。我们知道低层的特征语义信息比较少,但是目标位置准确;高层的特征语义信息比较丰富,但是目标位置比较粗略,即浅层特征有助于检测物体。为了处理多尺度的问题,经典的方法是利用图像金字塔的方式进行多尺度变化增强,但这样会带来极大的计算量。所以这篇论文提出了特征金字塔的网络结构,能在增加极小的计算量的情况下,处理好物体检测中的多尺度变化问题。

1. 前言

       下图中描述了四种不同的得到一张图片多尺度特征的方法:

### FPN(Feature Pyramid Network)模型 示例代码 实现 以下是基于 PyTorch 的 Feature Pyramid Network (FPN) 模型的一个简单实现示例。此代码展示了如何构建一个多尺度特征金字塔结构,并将其应用于目标检测任务。 ```python import torch import torch.nn as nn import torchvision.models as models class FPN(nn.Module): def __init__(self, out_channels=256): super(FPN, self).__init__() # 使用 ResNet-50 作为 Backbone 提取基础特征 resnet = models.resnet50(pretrained=True) self.layer1 = nn.Sequential(resnet.conv1, resnet.bn1, resnet.relu, resnet.maxpool, resnet.layer1) self.layer2 = resnet.layer2 self.layer3 = resnet.layer3 self.layer4 = resnet.layer4 # Lateral connections to reduce channel size self.latlayer1 = nn.Conv2d(2048, out_channels, kernel_size=1, stride=1, padding=0) self.latlayer2 = nn.Conv2d(1024, out_channels, kernel_size=1, stride=1, padding=0) self.latlayer3 = nn.Conv2d(512, out_channels, kernel_size=1, stride=1, padding=0) # Top-down pathway and smooth layers self.toplayer = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1) self.smooth1 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1) self.smooth2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1) def _upsample_add(self, x, y): """Upsample tensor `y` to match the spatial dimensions of `x`, then add them together.""" _, _, H, W = x.size() return torch.nn.functional.interpolate(y, size=(H,W), mode='bilinear', align_corners=False) + x def forward(self, x): c1 = self.layer1(x) # Output shape: [B, 256, H/4, W/4] c2 = self.layer2(c1) # Output shape: [B, 512, H/8, W/8] c3 = self.layer3(c2) # Output shape: [B, 1024, H/16, W/16] c4 = self.layer4(c3) # Output shape: [B, 2048, H/32, W/32] p4 = self.latlayer1(c4) # Apply lateral connection on C4 p3 = self._upsample_add(p4, self.latlayer2(c3)) # Upsample P4 and add it with C3 after applying a lateral layer. p2 = self._upsample_add(p3, self.latlayer3(c2)) # Repeat process for C2. p4 = self.toplayer(p4) # Smooth top-layer output using convolutional smoothing operation. p3 = self.smooth1(p3) # Smooth intermediate outputs similarly. p2 = self.smooth2(p2) # Final smoothed result at highest resolution level. return p2, p3, p4 # Return multi-scale feature maps from different levels of pyramid structure. # Example usage: if __name__ == "__main__": model = FPN(out_channels=256).cuda() # Initialize FPN module with specified number of channels per map. input_tensor = torch.randn((1, 3, 224, 224)).cuda() # Create dummy batched image data tensor shaped appropriately. features = model(input_tensor) # Pass through network; get list containing three tensors corresponding to each scale's processed version. print([f.shape for f in features]) # Print shapes just so we can verify correctness visually here too! Should see something like [(1,256,H,W)] where height & width depend upon original input size divided by powers-of-two according to respective stages within architecture design itself... ``` 上述代码实现了基本的 FPN 结构,其中使用了 ResNet-50 作为骨干网络来提取多级特征图[^1]。通过横向连接和自顶向下的路径,该架构可以生成具有相同通道数但分辨率不同的多个特征图,从而支持多尺度目标检测需求[^4]。 #### 关键点解释 - **Backbone**: 这里选择了预训练好的 ResNet-50 模型作为主干网路,负责提取原始输入图片的基础特征[^3]。 - **Lateral Connections**: 利用卷积操作减少高层特征图的维度至统一尺寸以便后续融合处理。 - **Top-Down Pathway**: 将低分辨率高语义级别的特征逐步插值放大并与更高分辨率较低层次的信息相结合形成最终输出。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值