【语义分割】BiSeNet -- Bilateral Segmentation Network for Real-time Semantic Segmentation

最新推荐文章于 2023-10-10 21:44:37 发布

1273545169

最新推荐文章于 2023-10-10 21:44:37 发布

阅读量1.3k

点赞数 2

本文链接：https://blog.csdn.net/baidu_27643275/article/details/90300882

版权

Official unofficial

文章目录

similar concept

high level features -> semantic information -> context information -> receptive field -> classfication ability

high level semantic context information to get sufficient receptive field

low level -> spatial information -> detail information -> resolution information

Abstact

语义分割的任务是： assign semantic labels to each pixel, 语义分割需要 rich spatial information 和 sizeable receptive feld。然而当前的方法为实现real-time inference time，通常减小空间分辨率，导致性能下降。BiSeNet首先设计了一个Spatial Path保留空间信息，产生高分辨率的特征图。与此同时，Context Path采取快速下采样策略，获取丰富的感受野。最后还引入了一个Feature Fusion Module将两个Path的输出进行融合。BiseNet在速度和分割性能上实现了平衡。在Cityscapes test上mIOU 为68.4%，速度105FPS。

现有模型加速常用方法：

在这里插入图片描述

restrict the input sizeto reduce the computation complexity by cropping or resizing，但丢失了图片边界以及小目标的细节信息。
prune the channels of the network to boost the inference speed
drop the last stage of the model：如：ENet，导致模型的感受野不够大

Spatial information：

U-shape：为弥补以上spatial details的损失，可采用U-shape结构。但加入U-shape通常减慢模型的运行速度，另外大多数空间信息其实在prune或者crop阶段已经丢失，即使引入了浅层特征也并不能轻易恢复：U-net、RefineNet
dilated convolution：PSPNet、DeepLab

总结而言，实时性语义分割算法中，加速的同时也需要重视空间信息。

其他方法：

larger kernel：Global Convolution Network
multi-scale feature ensemble：ASPP moduel、PSPNet

BiSeNet

BiSeNet网络包括两部分： Spatial Path (SP) and Context Path (CP)。

SP：affluent spatial details；采用三层 stride = 2的卷积得到1/8大小的的特征图。

CP：sufficient receptive field；基于Xception网络，并在Xception网络尾部添加global average pooling 。global average pooling捕捉全局上下文信息，从而获得最大的感受野。

Global average pooling can provide the maximum receptive field with global context information.

在这里插入图片描述
除此之外，BiSeNet还有两个模块： Feature Fusion Module (FFM) 和 Attention Refinement Module (ARM)。

ARM：refine每个阶段的特征，采用 global average pooling 捕捉全局上下文并且计算一个attention vector 引导网络学习（具体参考SENet）。

ARM employs global average pooling to capture global context and
computes an attention vector to guide the feature learning.

Experimental Results

Ablation study

在这里插入图片描述

U-shape-8s：combine the features of the last two stage in Xception39 network
U-shape-4s：standard U-shape structure
在这里插入图片描述

CP：5.23
SP：0.81
FFM：0.6
GP：1
ARM：1.3
GP + ARM：2.98

Speed and Accuracy Analysis

在这里插入图片描述

总结

优点：采用多分支架构将low-level spatial details 和 high-level semantic context information结合
缺点：

SP分支限制了速度
两分支独立限制了模型的学习能力

1273545169

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【语义分割】BiSeNet -- Bilateral Segmentation Network for Real-time Semantic Segmentation

similar concepthigh level features -> semantic information -> context information -> receptive field -> classfication abilityhigh level semantic context information to get sufficient rec...
复制链接

扫一扫