PPM 金字塔池化模块 - PSPNet

酿久诗

已于 2022-04-11 10:29:42 修改

阅读量1.8w

点赞数 19

分类专栏：分割与抠图文章标签： PPM PSPNet

于 2022-01-18 14:01:07 首次发布

本文链接：https://blog.csdn.net/qq_41731861/article/details/122557035

版权

分割与抠图专栏收录该内容

19 篇文章

订阅专栏

本文介绍2017年提出的金字塔池化模块（PPM），旨在聚合不同区域的上下文信息，提高网络获取全局信息能力。通过不同尺度池化操作融合多种尺度特征图，兼顾全局语义与局部细节。并提供了PyTorch实现代码。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

原理浅析

金字塔池化模块（Pyramid Pooling Module，PPM）于 2017 年提出，论文相关信息如下：

论文：《Pyramid Scene Parsing Network》
作者：Hengshuang Zhao et al.（香港中文大学 & 商汤科技）
来源：CVPR 2017

PPM 提出的目的，是为了聚合不同区域的上下文信息，以提高网络获取全局信息的能力。具体做法为：在原始特征图上使用不同尺度的池化，得到多个不同尺寸的特征图，再在通道维度上拼接这些特征图 (含原始特征图)，最终输出一个糅合了多种尺度的复合特征图，从而达到兼顾全局语义信息与局部细节信息的目的。PSPNet 网络结构如下：

$(a)$ 输入图片；
$(b)$ 通过 CNN 提取的原始特征图 ( $\times 6$ )；
$(c)$ PPM 模块：对原始特征图进行不同尺度的池化操作，得到多个不同尺寸的特征图（图中为 4 个）。对得到的特征图进行上采样操作，恢复至原始特征图大小 ( $\times 6$ )，最后在通道维度上进行拼接，得到最终的复合特征图；

红：使用 ( $\times 6$ ) 的池化，输出尺寸为 ( $\times 1$ ) ，再通过双线性插值上采样至 ( $\times 6$ )；
橙：使用 ( $\times 3$ ) 的池化，输出尺寸为 ( $\times 2$ ) ，再通过双线性插值上采样至 ( $\times 6$ )；
蓝：使用 ( $\times 2$ ) 的池化，输出尺寸为 ( $\times 3$ ) ，再通过双线性插值上采样至 ( $\times 6$ )；
绿：使用 ( $\times 1$ ) 的池化，输出尺寸为 ( $\times 6$ ) 。

$(d)$ 通过末层卷积实现场景解析，即像素级别的分类。

代码实现 - pytorch

# _*_coding:utf-8_*_
import torch
import torch.nn as nn
import torch.nn.functional as F


class PPM(nn.Module):
    def __init__(self, in_dim, out_dim, bins):
        super(PPM, self).__init__()
        self.features = []
        for bin in bins:
            self.features.append(nn.Sequential(
                nn.AdaptiveAvgPool2d(bin),
                nn.Conv2d(in_dim, out_dim, kernel_size=1, bias=False),
                nn.BatchNorm2d(out_dim),
                nn.ReLU(inplace=True)
            ))
        self.features = nn.ModuleList(self.features)

    def forward(self, x):
        x_size = x.size()
        out = [x]
        for f in self.features:
            temp = f(x)
            temp = F.interpolate(temp, x_size[2:], mode="bilinear", align_corners=True)
            out.append(temp)

        return torch.cat(out, 1)


if __name__ == "__main__":
    # inputs: (B, C, H, W)
    inputs = torch.rand((8, 3, 16, 16))
    # PPM params: (in_dim, out_dim, sizeList)
    ppm = PPM(3, 2, [1, 2, 3, 6])
    # outputs: (B=8, C=3+2*4=11, H=16, W=16)
    outputs = ppm(inputs)
    print("Outputs shape:", outputs.size())