【深度学习|学习笔记】Fully Connected Network with Pyramid Pooling(FCN-PP)(全连接网络与金字塔池化结合的模型)详解。(附代码)
【深度学习|学习笔记】Fully Connected Network with Pyramid Pooling(FCN-PP)(全连接网络与金字塔池化结合的模型)详解。(附代码)
文章目录
欢迎铁子们点赞、关注、收藏!
祝大家逢考必过!逢投必中!上岸上岸上岸!upupup
大多数高校硕博生毕业要求需要参加学术会议,发表EI或者SCI检索的学术论文会议论文。详细信息可关注VX “
学术会议小灵通
”或参考学术信息专栏:https://blog.csdn.net/2401_89898861/article/details/146957339
一、起源
随着深度卷积神经网络(CNN)在语义分割、目标检测等任务中的成功,FCN(Fully Convolutional Networks)在图像分割中被广泛应用。但 FCN 存在以下不足:
- 感受野不足:只能感知局部上下文,导致边界模糊或小目标识别困难;
- 语义上下文缺失:不能很好地整合全局信息。
为此,Zhao et al. 于 2017 年在论文 PSPNet: Pyramid Scene Parsing Network 中提出了 Pyramid Pooling Module (PPM),并将其嵌入 FCN 中以获取多尺度上下文信息,提升分割性能。
二、原理
1. Fully Connected Network(FCN)
FCN 通过将卷积层和上采样层连接起来,生成与输入图像同尺寸的语义分割图,核心思想:
- 去掉全连接层,用 1x1 卷积替代分类器;
- 加入 转置卷积(上采样)恢复空间维度;
- 可端到端训练,实现像素级预测。
2. Pyramid Pooling Module(PPM)
PPM 用于捕获图像中不同尺度的上下文信息,结构如下:
- 对 feature map 进行不同尺寸的池化(如 1×1、2×2、3×3、6×6);
- 将池化结果上采样到原始尺寸并与原始特征图进行拼接;
- 最终经过 1×1 卷积整合成融合特征。
整体结构图:
+------------------------+
| Feature Map (C×H×W) |
+-----------+------------+
|
+------------+------------+-------------------+
| | | |
Pool(1x1) Pool(2x2) Pool(3x3) Pool(6x6)
↓ ↓ ↓ ↓
1x1 conv 1x1 conv 1x1 conv 1x1 conv
↓ ↓ ↓ ↓
Upsample Upsample Upsample Upsample
↓ ↓ ↓ ↓
+----------------------------------------------------+
| Concatenate along Channel axis |
+----------------------------------------------------+
↓
1x1 conv
↓
Final Feature Map
三、发展
四、改进方向
五、应用领域
- 遥感图像分割(如地表分类、滑坡检测)
- 城市街景分割(如 Cityscapes 数据集)
- 医学图像分析(如肿瘤区域识别)
- 自动驾驶视觉感知
- 场景理解与3D重建
六、PyTorch代码示例:Pyramid Pooling Module
✅ PPM 实现 + 示例主干网络(以 ResNet 为 backbone)
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision.models import resnet50
# Pyramid Pooling Module
class PyramidPoolingModule(nn.Module):
def __init__(self, in_channels, pool_sizes=(1, 2, 3, 6)):
super(PyramidPoolingModule, self).__init__()
self.stages = nn.ModuleList([
nn.Sequential(
nn.AdaptiveAvgPool2d(output_size=ps),
nn.Conv2d(in_channels, in_channels // len(pool_sizes), kernel_size=1),
nn.ReLU(inplace=True)
)
for ps in pool_sizes
])
self.bottleneck = nn.Conv2d(in_channels + in_channels, in_channels, kernel_size=1)
def forward(self, x):
h, w = x.shape[2], x.shape[3]
pyramids = [x]
for stage in self.stages:
out = stage(x)
out = F.interpolate(out, size=(h, w), mode='bilinear', align_corners=True)
pyramids.append(out)
output = torch.cat(pyramids, dim=1)
output = self.bottleneck(output)
return output
# FCN + PPM 主干模型
class FCN_PPM(nn.Module):
def __init__(self, num_classes):
super(FCN_PPM, self).__init__()
backbone = resnet50(pretrained=True)
self.layer0 = nn.Sequential(backbone.conv1, backbone.bn1, backbone.relu, backbone.maxpool)
self.layer1 = backbone.layer1
self.layer2 = backbone.layer2
self.layer3 = backbone.layer3
self.layer4 = backbone.layer4 # 输出特征较小 2048 x H/32 x W/32
self.ppm = PyramidPoolingModule(in_channels=2048)
self.classifier = nn.Sequential(
nn.Conv2d(2048, 512, kernel_size=3, padding=1),
nn.ReLU(),
nn.Dropout(0.1),
nn.Conv2d(512, num_classes, kernel_size=1)
)
def forward(self, x):
input_size = x.size()[2:]
x = self.layer0(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.ppm(x)
x = self.classifier(x)
x = F.interpolate(x, size=input_size, mode='bilinear', align_corners=True)
return x
# 测试模型结构
if __name__ == "__main__":
model = FCN_PPM(num_classes=21)
dummy_input = torch.randn(2, 3, 256, 256)
output = model(dummy_input)
print("Output shape:", output.shape) # Expect: [2, 21, 256, 256]