2017年(CVPR)Rethinking Atrous Convolution for Semantic Image Segmentation(The augment of the ASPP)

Abstract:

In this work,we propose to augment our previously proposed Atrous Spatial Pyramid Pooling (ASPP)module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance.

Motivation:

Two challenges in semantic segmentation:The first one is the reduced feature resolution caused by consecutive pooling operations or convolution striding.To overcome this problem, we advocate the use of atrous convolution.Another difficulty comes from the existence of objects at multiple scales(objects at different scales become prominent at different feature maps).

Overview:

In this work, we revisit applying atrous convolution, which allows us to effectively enlarge the field of view of filters to incorporate multi-scale context, in the framework of both cascaded modules and spatial pyramid pooling. In particular, our proposed module consists of atrous convolution with various rates and batch normalization layers which we found important to be trained as well.  We discuss an important practical issue when applying a 3 × 3 atrous convolution with an extremely large rate, which fails to capture long range information due to image boundary effects, effectively simply degenerating to 1 × 1 convolution, and propose to incorporate image-level features into the ASPP module. Furthermore, we elaborate on implementation details and share experience on training the proposed models, including a simple yet effective bootstrapping method for handling rare and finely annotated objects. In the end, our proposed model, 'DeepLabv3' improves over our previous works.

Method:

We review how atrous convolution is applied to extract dense features for semantic segmentation. We then discuss the proposed modules with atrous convolution modules employed in cascade or in parallel.

Different from DeepLabv2, we include batch normalization within ASPP.

The motivation behind this model is that the introduced striding makes it easy to capture long range information in the deeper blocks. For example, the whole image feature could be summarized in the last small resolution feature map, as illustrated in Fig. 3 (a). However, we discover that the consecutive striding(连续步进) is harmful for semantic segmentation (see Tab. 1 in Sec. 4) since detail information is decimated, and thus we apply atrous convolution with rates determined by the desired output stride value, as shown in Fig. 3 (b) whereoutput stride = 16.

The detail of the method:We apply global average pooling on the last feature map of the model, feed the resulting image-level features to a 1 × 1 convolution with 256 filters (and batch normalization [38 ]), and then bilinearly upsample the feature to the desired spatial dimension. In the end, our improved ASPP consists of (a) one 1×1 convolution and three 3 × 3 convolutions with rates = (6, 12, 18) whenoutput stride = 16 (all with 256 filters and batch normalization), and (b) the image-level features, as shown in Fig. 5. Note that the rates are doubled when output stride = 8. The resulting features from all the branches are then concatenated and pass through another 1 × 1 convolution (also with 256 filters and batch normalization) before the final 1 × 1convolution which generates the final logits.

Conclusion:

Specifically, to encode multi-scale information, our proposed cascaded module gradually doubles the atrous rates while our proposed atrous spatial pyramid pooling module augmented with image-level features probes the features with filters at multiple sampling rates and effective field-of-views.

补充:

一、连续的步进(Continuous stride)是指在进行语义分割任务时,对输入图像进行滑动窗口或滑动卷积操作时的步长为1,即每次滑动一个像素进行预测。这种操作通常会导致以下问题:

  1. 重叠区域:连续的步进会导致预测窗口之间存在重叠区域。这意味着同一个像素可能会被多次预测,而不同预测之间可能存在差异,从而导致分割结果不连续或不一致。

  2. 计算成本高:连续的步进会导致进行大量冗余的计算。由于重叠区域的存在,相邻窗口之间的计算会存在大量的重复,从而增加了计算成本。

  3. 内存占用大:连续的步进会导致生成的预测结果图像的尺寸比输入图像大。如果输入图像较大,那么生成的预测结果图像可能会非常大,从而占用更多的内存空间。

针对这些问题,可以采取一些方法来减轻连续步进的负面影响:

  1. 重叠窗口融合:对于重叠区域的预测结果,可以进行融合操作,例如取平均值或使用权重进行加权平均,以减少不一致性。

  2. 跳跃步进:可以使用大于1的步长进行跳跃式的滑动窗口操作,从而减少重叠区域和计算量。这样可以在一定程度上平衡计算成本和预测结果的准确性。

  3. 图像金字塔:可以对输入图像进行多尺度的处理,通过缩放图像大小来生成不同分辨率的输入,并在每个尺度上进行语义分割。这样可以在不同尺度上获得更全面的语义信息,并且可以避免生成过大的预测结果图像。

二、连续的步进对语义分割是有害的主要原因是它可能导致空间上的语义信息丢失或模糊。语义分割旨在将图像中的每个像素标记为特定的语义类别,例如人、汽车、树等。为了实现这一目标,通常会使用卷积神经网络(CNN)等深度学习模型。

在语义分割任务中,图像中的每个像素都被认为是独立的,并且其分类标签与周围像素的标签无关。这意味着每个像素的分类都是基于局部信息进行决策的。当使用连续的步进来处理图像时,每个像素的分类决策都会受到其邻近像素的影响。

然而,这样的连续步进可能导致两个问题:

  1. 空间信息丢失:连续步进可能导致模型忽略图像中的细节和空间上的上下文信息。由于每个像素的分类决策都是基于局部信息,因此模型可能无法充分利用像素之间的空间关系。这可能导致模型在处理图像边缘或细小目标时产生模糊或不准确的分割结果。

  2. 分割不连续性:连续步进可能导致分割结果中的不连续性。当相邻像素具有不同的语义类别时,连续的步进可能会导致模型在它们之间产生不连续的边界。这可能使得分割结果看起来不自然或不准确。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值