ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-high Resolution Segmentation CVPR2022
论文介绍:在高分辨率的图像分割中,计算和内存的需求都是非常大的。原有的方法主要是采用全局-局部细化的路线,虽然能够很好地考虑到内存的消耗,但是忽略了推断速度。本文主要关注于在整张图像上进行直接推理。本文提出一个高分辨率图像分割框架(Integrating Shallow and Deep Networks, ISDNet),他能很好地整合浅层和深层网络,并显著的提高推理速度并有一个精准的分割结果。为了更好地利用浅层特征和深层特征之间的关联性,提出一个关系感知特征融合模块(Relational-Aware feature Fusion module),以保证网络分割的性能和鲁棒性。
高分辨率图像分割框架对比图
本文的分割网络框架图
深层网络和浅层网络的输入和输出:
High-frequency residuals的计算方式如下:
High-frequency residuals处理之后的图像,作为浅层网络的输入,以学习那些互补的空间细节。对于深层网络,是把原始图像下采样为小尺度图像,作为输入。深层网络分支有三个损失,即:
辅助分割头(an auxiliary segmentation head) :标准的cross-entropy函数
超分辨率头(a super-resolution head)
结构蒸馏损失(a structure distillation loss):
最后浅层网络输出1/8和1/16的特征图;深层网络输出1/32的特征图。
特征融合部分:(浅层网络+深层网络特征的关系感知特征融合模块)
首先对深层特征和浅层特征图进行channel-wise attention操作,之后通过內积,求解两者的关联矩阵,最后融合:
实验分析:
(1)DeepGlobe. The DeepGlobe dataset contains 803 images with 2448 × 2448 resolution. It contains 7 classes of landscape regions, in which the class named ”unknown” is not considered in the evaluation. We follow the protocol as [3], by splitting images all of the images into training,
validation and test set with 455, 207 and 142 images respectively.
(2)Inria Aerial. The Inria Aerial dataset provides 180 images with 5000 × 5000 resolution and dense annotations with a binary mask for building and non-building areas. Following [3], we split images into training, validation and test set with 126, 27 and 27 images respectively.
(3)Cityscapes. The Cityscapes dataset is a popular generic dataset for semantic segmentation, which has 5,000 fine annotated images with 1024×2048 resolution. We follow the
official data split for our experiments, which contains 2,975 images for training, 500 images for validation and the rest 1525 images for testing.