（2017年Pami）DeepLab: Semantic Image Segmentation with Deep Convolutional, Atrous Convolution, CRFs

m0_55384957

已于 2023-09-21 17:46:53 修改

阅读量24

点赞数

文章标签：人工智能

于 2023-09-10 21:31:43 首次发布

本文链接：https://blog.csdn.net/m0_55384957/article/details/132795781

版权

一、解决的问题

In particular we consider three challenges in the application of DCNNs to semantic image segmentation: (1) reduced feature resolution, (2) existence of objects at multiple scales, and (3) reduced localization accuracy due to DCNN invariance. Next, we discuss these challenges and our approach to overcome them in our proposed DeepLab system.

二、提出的方法

(1)First, Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. (2)Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. (3)Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance.

三、方法详解

(1)DCNN的不变性是如何导致图像定位准确度下降的？

深度卷积神经网络（DCNN）的不变性是指它在图像处理过程中对某些变化或细节具有较强的不敏感性。这种不变性可能导致图像定位准确度下降的原因有以下几点：

平移不变性：DCNN在处理图像时对平移变换具有不变性，也就是说，无论物体在图像中的位置如何变化，DCNN仍然能够识别出同一物体。然而，这也意味着在进行物体定位时，DCNN可能无法提供精确的位置信息。
尺度不变性：DCNN对物体的尺度变化也相对不敏感。例如，如果物体在图像中变大或缩小，DCNN仍然能够识别出同一物体，但无法提供准确的尺度信息。这可能导致在定位任务中难以精确确定物体的大小和位置。
形变不变性：DCNN对一定程度的形变也具有一定的不变性。也就是说，即使物体在图像中发生了形变，DCNN仍然能够识别出该物体。然而，这也可能导致在定位任务中无法准确捕获物体的形状和姿态信息。

综上所述，由于DCNN的不变性，它可能无法提供精确的位置、尺度和形状信息，从而导致图像定位准确度下降。这是因为DCNN在设计中更注重于提取物体的特征和识别，而非精确定位。

(3)CRF是如何确定图像边界的，进而解决DCNN的不变性？

在卷积神经网络（Convolutional Neural Network，CNN）中使用条件随机场（CRF）可以改善图像边界的确定性。传统的CNN在图像边界处可能存在模糊或错误的边界预测，而CRF可以通过考虑像素之间的空间关系来提高边界的准确性。

以下是在卷积网络中使用CRF来确定图像边界的一般步骤：

训练卷积神经网络：首先，使用有标签的图像数据集对卷积神经网络进行训练，以学习图像的特征表示和边界预测。这可以通过常见的图像分割或边界检测任务进行监督学习。
提取特征图：在CRF中使用卷积网络时，通常会使用卷积层的输出作为输入特征。在进行CRF之前，需要提取卷积网络的特征图作为输入。
定义势函数：CRF的关键是定义势函数，它描述了像素之间的关系。在图像边界确定中，可以考虑像素之间的相似性、颜色差异、空间距离等因素。根据具体任务的需求，可以自定义势函数。
定义标签变量：将每个像素视为一个标签变量，并将卷积网络的输出作为初始标签概率。这些标签变量形成了CRF的节点。
建立图结构：将图像的每个像素作为CRF的节点，并根据像素之间的空间关系建立图结构。一般使用4连通或8连通来定义像素之间的邻接关系。
CRF推断：使用推断算法（如维特比算法）在CRF中进行推断，以获得更准确的标签预测。推断过程中，可以基于定义的势函数和图结构来计算标签之间的条件概率。
结合结果：将CRF的标签预测结果与卷积网络的边界预测结果进行结合。可以根据需要进行加权融合或后处理，以获得最终的图像边界确定结果。

通过将卷积神经网络和条件随机场相结合，可以充分利用CNN对图像特征的学习能力，同时利用CRF的空间关系建模能力，提高图像边界的准确性和连续性。这种方法在图像分割、物体识别和语义分割等任务中都取得了较好的效果。

该论文中的CRF具体公式：

（3)ASPP结构的提出

We have experimented with two approaches to handling scale variability in semantic segmentation. The first approach amounts to standard multiscale processing. We extract DCNN score maps from multiple (three in our experiments) rescaled versions of the original image using parallel DCNN branches that share the same parameters. To produce the final result, we bilinearly interpolate the feature maps from the parallel DCNN branches to the original image resolution and fuse them, by taking at each position the maximum response across the different scales. We do this both during training and testing. Multiscale processing significantly improves performance, but at the cost of computing feature responses at all DCNN layers for multiple scales of input. The second approach is inspired by the success of the R-CNN spatial pyramid pooling method, which showed that regions of an arbitrary scale can be accurately and efficiently classified by resampling convolutional features extracted at a single scale. We have implemented a variant of their scheme which uses multiple parallel atrous convolutional layers with different sampling rates. The features extracted for each sampling rate are further processed in separate branches and fused to generate the final result. The proposed "atrous spatial pyramid pooling" (DeepLabASPP) approach generalizes our DeepLab-LargeFOV variant and is illustrated in Fig. 4.

三、比较：

(2015年Pami)Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition（SPPNet)-CSDN博客

SPPNet（Spatial Pyramid Pooling Network）和ASPP（Atrous Spatial Pyramid Pooling）是两种用于图像语义分割的神经网络结构，它们在不同的方面有所区别。

SPPNet是由Kaiming He等人在2014年提出的一种用于解决图像分类和定位任务的网络结构。其主要思想是引入空间金字塔池化（Spatial Pyramid Pooling）层来处理输入图像的不同尺度。传统的卷积神经网络（CNN）在输入图像的尺寸固定时，对于不同尺度的物体目标处理效果有限。SPPNet通过引入空间金字塔池化层，可以在不同尺度上提取特征，从而使网络具有对不同尺度物体的感知能力。SPPNet首先对输入图像进行卷积和池化操作，然后使用空间金字塔池化层，在不同尺度下对特征图进行池化操作，最后连接不同尺度的特征向量进行分类或定位。

ASPP是由Hengshuang Zhao等人在2017年提出的一种用于图像语义分割的网络结构，用于捕捉输入图像中不同尺度上下文信息的有效性。ASPP通过引入扩张卷积（Atrous Convolution）操作和空间金字塔池化层来实现。扩张卷积允许网络在不增加参数数量的情况下增大感受野，从而提高了网络对于大范围上下文信息的感知能力。空间金字塔池化层则与SPPNet类似，用于处理不同尺度的特征。ASPP结构通过并行使用不同扩张率的卷积核进行卷积操作，并对结果进行池化，最后将不同尺度的特征进行融合。

总结来说，SPPNet和ASPP都是用于处理图像任务的网络结构，都引入了空间金字塔池化的思想来处理不同尺度的特征。然而，SPPNet主要用于图像分类和定位任务，而ASPP主要用于图像语义分割任务，通过扩张卷积和空间金字塔池化来提高网络的感知能力。

m0_55384957

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫