Paper reading: ISDNet: Integrating Shallow and Deep Networks CVPR2022

我是家家

已于 2022-06-26 09:57:35 修改

阅读量1.4k

点赞数 1

分类专栏： Paper reading 每日一记文章标签：大数据深度学习人工智能

于 2022-06-26 09:53:28 首次发布

本文链接：https://blog.csdn.net/yihaizhiyan/article/details/125448530

版权

Paper reading 每日一记专栏收录该内容

6 篇文章

订阅专栏

ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-high Resolution Segmentation CVPR2022

论文介绍：在高分辨率的图像分割中，计算和内存的需求都是非常大的。原有的方法主要是采用全局-局部细化的路线，虽然能够很好地考虑到内存的消耗，但是忽略了推断速度。本文主要关注于在整张图像上进行直接推理。本文提出一个高分辨率图像分割框架（Integrating Shallow and Deep Networks， ISDNet），他能很好地整合浅层和深层网络，并显著的提高推理速度并有一个精准的分割结果。为了更好地利用浅层特征和深层特征之间的关联性，提出一个关系感知特征融合模块（Relational-Aware feature Fusion module），以保证网络分割的性能和鲁棒性。

高分辨率图像分割框架对比图

本文的分割网络框架图

深层网络和浅层网络的输入和输出：

High-frequency residuals的计算方式如下：

High-frequency residuals处理之后的图像，作为浅层网络的输入，以学习那些互补的空间细节。对于深层网络，是把原始图像下采样为小尺度图像，作为输入。深层网络分支有三个损失，即：

辅助分割头（an auxiliary segmentation head）：标准的cross-entropy函数

超分辨率头（a super-resolution head）

结构蒸馏损失（a structure distillation loss）：

最后浅层网络输出1/8和1/16的特征图；深层网络输出1/32的特征图。

特征融合部分：(浅层网络+深层网络特征的关系感知特征融合模块)

首先对深层特征和浅层特征图进行channel-wise attention操作，之后通过內积，求解两者的关联矩阵，最后融合：

实验分析：

（1）DeepGlobe. The DeepGlobe dataset contains 803 images with 2448 × 2448 resolution. It contains 7 classes of landscape regions, in which the class named ”unknown” is not considered in the evaluation. We follow the protocol as [3], by splitting images all of the images into training,
validation and test set with 455, 207 and 142 images respectively.

（2）Inria Aerial. The Inria Aerial dataset provides 180 images with 5000 × 5000 resolution and dense annotations with a binary mask for building and non-building areas. Following [3], we split images into training, validation and test set with 126, 27 and 27 images respectively.

（3）Cityscapes. The Cityscapes dataset is a popular generic dataset for semantic segmentation, which has 5,000 fine annotated images with 1024×2048 resolution. We follow the
official data split for our experiments, which contains 2,975 images for training, 500 images for validation and the rest 1525 images for testing.