【EfficientFCN】《EfficientFCN：Holistically-guided Decoding for Semantic Segmentation》

bryant_meng

于 2022-08-31 20:47:49 发布

阅读量314

点赞数 1

分类专栏： CNN / Transformer 文章标签：深度学习计算机视觉人工智能

本文链接：https://blog.csdn.net/bryant_meng/article/details/113501198

版权

CNN / Transformer 专栏收录该内容

215 篇文章 8 订阅

订阅专栏

在这里插入图片描述

ECCV-2020

文章目录

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
5 Experiments
6 Conclusion（own） / Future work

1 Background and Motivation

在这里插入图片描述

FCN 首次展示了全卷积神经网络在 semantic segmentation 任务上的成功，然而其 “跃进式” 的 upsampling 操作会丢失大量的 detailed spatial information

随着 CNN 技术的发展，由 FCN 衍生出来的 DilatedFCN based methods 和 Encoder-Decoder based methods 针对 FCN 的缺点进行了改进

1）DilatedFCN based 方法

采用 dilation convolution 来维持特征图的分辨率，极大的缓解了快速 upsampling 造成的 detailed spatial information 的丢失，但该类方法 require high computational complexity and memory consumption

2）Encoder-Decoder（U-Net为代表）based 方法

利用插值或者反卷积来逐步恢复特征图分辨率，也能有效缓解 detailed spatial information 的丢失，但作者认为 “lower-level high-resolution feature maps cannot provide abstractive enough features for achieving high performance segmentation”（也就是说 U-Net 的这种方法的 decoder 方式还不够完美）

且插值或者反卷积操作都是基于局部区域进行的，没有全局观，这样恢复出来的高分辨率特征图难免“鼠目寸光”

DilatedFCN 好，Encoder-Decoder 快，作者中西结合，提出 EfficientFCN，对 Decoder 部分进行了改进，力求又快又好的进行图片分割

2 Related Work

DilatedFCN
Deeplab V2 / PSPNet / EncNet / CFNet Gated-SCNN / DANet / ACNet / DMNet
Encoder-Decoder
UNet / DUsampling / FastFCN

3 Advantages / Contributions

EfficientFCN 在 UNet 的基础上，对 decoder 部分进行改进，提出了 holistically-guided decoder 模块，在 PASCAL Context，PASCAL VOC，ADE20K 三个数据集上实现了 competitive (or better) 的结果，with 1/3 fewer FLOPS

4 Method

Holistically-guided Decoder for Semantic-rich Feature Upsampling
在这里插入图片描述

三个核心组件

Multi-scale features fusion
Holistic codebook generation
根据上一步的特征进一步提取出 bag of words 特征
Codeword assembly for high-resolution feature upsampling
根据权重对 bag of words 特征进行组合，实现 upsampling

4.1 Multi-scale features fusion

在这里插入图片描述

主干网络的特征为 $f_8$ 、 $f_{16}$ 、 $f_{32}$

经 1×1 conv 变成 channels 都为 512 的 $e_8$ 、 $e_{16}$ 、 $e_{32}$

融合的方式为 biliner 插值和 concatenate，把 $e_8$ 、 $e_{16}$ 、 $e_{32}$ 融合成 $m_{8}$ 和 $m_{32}$

在这里插入图片描述

↑↓ 箭头就是双线性插值操作来实现上采样和下采样（哈哈，习惯用它做上采样，下采样还是头一次碰到）
[;;] 为 concatenate 操作

实验发现融合多个特征的效果会好一些

4.2 Holistic codebook generation

$m_{32}$ 丢失的细节信息太多，作者认为传统的 U-Net decoder 无法弥补

在这里插入图片描述

上述公式是在 spatial 维度进行归一化，其中

$\in \mathbb{R}^{n×(H / 32)×(W / 32)}$ ， spatial weighting maps
$A_i \in \mathbb{R}^{(H / 32)×(W / 32)}$

在这里插入图片描述

$\in \mathbb{R}^{1024×(H / 32)×(W / 32)}$ ，codeword base map
$c_i \in \mathbb{R}^{1024}$ , $i$ - $t h$ codeword
$[c_1, · · · , c_n] \in \mathbb{R}^{n × 1024}$ to encode high-level global features， holistic codewords

4.3 Codeword assembly for high-resolution feature upsampling

在这里插入图片描述

恢复 structural information

$\in \mathbb{R}^{1024×(H / 8)×(W / 8)}$ ，raw codeword assembly guidance feature map
$\bar{B} \in \mathbb{R}^{1024}$ ，global average vector of the codeword based map