转载请注明:http://blog.csdn.net/G_Dragon_Li/article/details/79585542,作者:小小希的权世界(用户名:g_dragon_li)
本次博客简单分享一下3月14日arxiv刚公布的论文《EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching》,在详细的阅读论文之前,我看了一下论文中绘制的网路结构图,和以往的端到端的网络结构图所不一样的是,它打破了目前为止端到端双目视觉网络都是在DispNetCorr网络上做扩展的现状,提出了一个全新的网络EdgeStereo。
关键词:上下文信息,残差金字塔,边缘检测,多任务学习
一、前言
作者认为之前的基于双目视觉的网络结构都很少关注编码上下文信息:
less attention is paid on encoding context information, simplifying two-stage disparity learning pipeline and improving details in disparity maps
对于这个问题,作者提出了上下文信息金字塔和多任务学习网络——EdgeStereo
1、Firstly, we propose an one-stage context pyramid based residual pyramid network (CP-RPN) for disparity estimation, in which a context pyramid is embedded to encode multi-scale context clues explicitly.
2、Next, we design a CNN based multi-task learning network called EdgeStereo to recover missing details in disparity maps, utilizing mid-level features from edge detection task.
2、Next, we design a CNN based multi-task learning network called EdgeStereo to recover missing details in disparity maps, utilizing mid-level features from edge detection task.
1、立体匹配的步骤
- 匹配代价计算:估量两幅图像中两个像素点的相似性;
- 代价聚合:平滑像素周围的匹配代价;
- 视差计算:根据代价值预测最初的视差图;
- 视差优化:对不稳定点进行优化。
2、立体匹配的分类
- 传统方法
- 基于块的神经网络模型
- 端到端的视差网络
3、现有算法和缺陷
- CRL:在DispNet网络的基础上级联一个视差优化网络去改善初始化视差图
- GC-Net:使用3D卷积直接在特殊设计的特征卷上学习视差图
- 缺陷:
- (i) Semantical context information isn’t encoded explicitly for ill-posed regions, such as occlusions and textureless regions.
- (ii) Experiments reveal that multi-stage refinement structures are not efficient.
- (iii) High-dimensional feature volume based 3D convolution is computationally expensive.
二、算法的提出
为了打破上述限制,文中提出了一个端到端的基于残差金字塔的上下文信息金字塔网络(CP-RPN:context pyramid based residual pyramid network)。该网络的提出有两个要点:
(i) An embedding context pyramid.
(ii) A residual pyramid for learning and refining the disparity map in a single encoder-decoder.
1、存在的问题&解决方法
a、存在的问题:stereo matching will benefit from semantical context cues, especially for ill-posed regions. However, utilizing a single-scale context cue can’t encode context for objects with arbitrary sizes and may cause over-smoothing.
解决方法:Hence encoding both local priors and multi-scale context cues is crucial for stereo matching.
b、存在的问题:the introduced CP-RPN inevitably lose some subtle details in the disparity map, due to the lack of mid-level features after too much forward propagation.
解决方法: resort to the multi-task learning framework to obtain extra mid-level information from other visual cues;
design an end-to-end edge network called HEDβ based on the baseline model HED, in which the shared aggregated edge channel feature" is constructed for multi-task interactions.
2、multi-scale decoder
文中的多尺度解码器分在CP-RPN中作为残差金字塔:
(i) A smallest-scale disparity map is directly learned at the bottom of residual pyramid.
(ii) For other scales, a residual map is learned for refinement then added to a coarse disparity map upsampled from
previous scale, obtaining a larger and refined disparity map.
previous scale, obtaining a larger and refined disparity map.
(iii) At each scale besides the smallest, right image is downsampled then warped according to the upsampled disparity map from previous scale, obtaining a synthesized left image.
其中,
经过视差图合成的左图与真实左图的Error map将作为每个尺度下学习残差图的几何约束。
3、任务HEDβ和CP-RPN的交互
(i) The aggregated edge channel feature is downsampled to varied sizes through a differentiable interpolation layer, then concatenated at the corresponding scale in residual pyramid.
(ii) The edge map output from HEDβ is also downsampled to varied sizes, serving as a regularization prior for learning disparity or residual map at corresponding scale.
4、本文算法的贡献点:
(i)We design a novel onestage network for stereo matching, in which a context pyramid and a residual pyramid are embedded.