【DeepLab-v1】《Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs》

bryant_meng

已于 2024-03-25 19:28:44 修改

阅读量962

点赞数 31

分类专栏： CNN / Transformer 文章标签：深度学习人工智能 deeplabv1 deeplab CRF

于 2024-03-25 19:26:52 首次发布

本文链接：https://blog.csdn.net/bryant_meng/article/details/106079834

版权

CNN / Transformer 专栏收录该内容

204 篇文章 7 订阅

订阅专栏

在这里插入图片描述
ICLR-2015

参考学习来自：

文章目录

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
- 4.1 Convolutional Neural Networks for Dense Image Labeling
- 4.2 Detailed Boundary Recovery：Fully-connected Conditional Random Fields and Multi-scale Prediction
5 Experiments
- 5.1 Datasets and Metrics
- 5.2 Experimental Evaluation
6 Conclusion（own） / Future work

1 Background and Motivation

近年来，深度卷积神经网络（Deep Convolutional Neural Network，DCNN）把计算机视觉系统的性能提升到了一个新的高度，eg 分类任务、目标检测、细粒度分类（fine-grained categorization）等！DCNN 的成功可以部分归功于 the built-in invariance of DCNNs to local image transformations（提取到了本质的特征，例如猫分类，无论你的猫怎么妖娆，摆什么样的 pose，我如果能提取到本质的特征，白骨精是逃不过火眼金睛的如意棒的），DCNN 的这种特性支持 DCNN learn hierarchical abstractions of data.

然而，这种 invariance 比较适用于 high-level 的视觉任务（区域级，目标级，图像级的），会阻碍 low-level 的视觉任务（像素级，比如 dense prediction 中的姿态评估、语义分割等）！因为 low-level 的视觉任务，追求的是 precision localization，而不是 the abstraction of the spatial details（可以理解为高级语义信息和低级空间细节之间的矛盾）

将 DCNN 运用到 image labeling 任务中是会存在如下两个技术障碍

signal downsampling（随着网络的深入，空间分辨率越来越小，空间细节丢失较多），作者采用 ‘atrous’ (with holes) 方法来规避这个问题——类似空洞卷积
spatial insensitivity（因为 CNN 具有平移不变性），作者采用 a fully-connected Conditional Random Field (CRF) 来解决这个问题

We show that responses at the final layer of DCNNs are not sufficiently localized for accurate object segmentation

2 Related Work

略

3 Advantages / Contributions

Speed：网络 8 fps，while Mean Field Inference for the fully-connected CRF requires 0.5 second
Accuracy：在 PASCAL VOC 上超过了 SOTA 7.2%
Simplicity：DCNN（空洞）+ CRFs + Multi-scale Prediction

4 Method

在这里插入图片描述

4.1 Convolutional Neural Networks for Dense Image Labeling

基于 VGG16 的改进，fc 变成 conv，引入空洞卷积

在这里插入图片描述

‘hole algorithm’ (‘atrous algorithm’)

在这里插入图片描述

正常卷积 vs 空洞卷积

在这里插入图片描述
分割类别，21 类

损失函数

the sum of cross-entropy terms for each spatial position in the CNN output map (subsampled by 8 compared to the original image)

4.2 Detailed Boundary Recovery：Fully-connected Conditional Random Fields and Multi-scale Prediction

（1）Conditional Random Fields
在这里插入图片描述
the class score maps (corresponding to log-probabilities)

第一行感觉是 logits，第二行是 softmax 之后的

图2可以看出，DCNN score maps can reliably predict the presence and rough position of objects in an image but are less well suited for pin-pointing their exact outline

There is a natural trade-off between classification accuracy and localization accuracy with convolutional networks

如果提升 DCNN 的定位能力，Recent work 的解决办法

harness information from multiple layers
employ a super-pixel representation

作者的解决方法，采用 Conditional Random Fields（CRF）

DCNN 的预测已经quite smooth and produce homogeneous classification results. In this regime, using short-range CRFs（传统的） can be detrimental，as our goal should be to recover detailed local structure rather than further smooth it.

作者采用的是 fully connected CRF model（来自《Efficient inference in fully connected crfs with gaussian edge potentials》——NIPS2011）

DCNN用于像素的分类与确定大概像素边界，全连接CRFs用于后处理，恢复精确的物体像素边界

下面介绍下作者使用的 fully connected CRF，基础知识参考（Conditional Random Field）

capture fine edge details
cater for long range dependencies

先看一个比较直观的理解

这里 $y_i$ 是观测值（网络预测）

更专业的解读

损失函数用能量函数来表征，能量越低，模型越稳定，最小化能量函数

能量函数采用了高斯核

能量函数中的 $x$ is the label assignment for pixels

$\theta_{ij}(x_i, x_j)$ 公式中 pixel positions (denoted as $p$ ) and pixel color intensities (denoted as $I$ )

（2）Multi-scale Prediction

下面再看看 Multi-scale Prediction

在这里插入图片描述

architecture of deeplab v1

5 Experiments

5.1 Datasets and Metrics

数据集
PASCAL VOC 2012 segmentation benchmark：20 个前景，1个背景类

评价指标
pixel intersection-over-union (IOU) averaged across the 21 classes.

5.2 Experimental Evaluation

After the DCNN has been fine-tuned（pre-trained）, we cross-validate the parameters of the fully connected CRF mode

（1）Multi-Scale features
在这里插入图片描述
DeepLab-MSc

直观的感受下使用前后的效果
在这里插入图片描述
（2）Field of View

空洞的 stride，以及第一个 fc 替换成 conv 后的 kernel size

（3）Mean Pixel IOU along Object Boundaries
在这里插入图片描述
引入 CRF 后确实有提升

（4）Mean Pixel IOU along Object Boundaries
在这里插入图片描述
Trimap是一种用于确定图像中每个像素点属于哪个区域的辅助信息，是对给定图像的一种粗略划分，将给定图像划分为前景、背景和待求未知区域

（5）Comparison with State-of-art
在这里插入图片描述
作者的方法细节确实会更丰富一些，但是复杂场景下（公交车旁的人以及司机）的小人还是没有分割出来

（6）Test set results
在这里插入图片描述

6 Conclusion（own） / Future work

Future work：
- fully integrating its two main components (CNN and CRF) and train the whole system in an end-to-end fashion
- convolutional neural networks and probabilistic graphical models 整合，explore their synergistic potential
核心创新点，空洞和CRF，都非作者原创，组合在一起效果起飞了才是王道👍
classification head of FCN and Deeplabv1
Trimap
Field of View

bryant_meng

关注

31
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
【DeepLab-v1】《Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs》

ICLR-2015声名显赫的家族，是时候顶礼膜拜一波了！文章目录1 Background and Motivation2 Advantages / Contributions3 Method4 Experiments4.1 Datasets5 Conclusion（own） / Future work1 Background and Motivation近年来，深度卷积神经网络（Deep C）在计算机视觉系统的性能提高到了一个新的高度We show that responses at th.
复制链接

扫一扫