【论文简述】Occlusions, Motion and Depth Boundaries witha Generic Network for Disparity...（ECCV 2018）

华科附小第一名

已于 2023-03-24 18:58:46 修改

阅读量258

点赞数

分类专栏：场景流立体匹配光流文章标签：视差光流场景流遮挡卷积

于 2023-03-24 18:57:37 首次发布

本文链接：https://blog.csdn.net/qq_43307074/article/details/129618112

版权

立体匹配同时被 3 个专栏收录

18 篇文章 13 订阅

订阅专栏

光流

17 篇文章 11 订阅

订阅专栏

场景流

4 篇文章 0 订阅

订阅专栏

论文提出了将遮挡估计与深度或运动边界估计整合到基于FlowNet2.0的深度网络中，以改善视差和光流估计。通过联合训练，网络能更准确地估计遮挡和边界，提高运动分割的准确性，并在多个数据集上展现出良好的泛化性能。实验结果显示，这种方法显著提升了遮挡和边界估计的质量，同时加快了整体运行速度。

摘要由CSDN通过智能技术生成

Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation

一、论文简述

1. 第一作者：Eddy Ilg, Tonmoy Saikia

2. 发表年份：2018

3. 发表期刊：ECCV

4. 关键词：视差、光流、场景流、遮挡、卷积

5. 探索动机：估计被遮挡区域的一种典型方法是计算两个方向的对应关系，并事后验证其一致性。然而，由于遮挡和对应是相互依赖的，遮挡的存在已经对对应估计本身产生了负面影响，因此后处理是次优的，导致不可靠的遮挡估计。

The areas in one image that are occluded in the other image are important to get an indication of potentially unreliable estimates due to missing measurements. A typical approach to estimate occluded areas is by computing correspondences in both directions and verifying their consistency post-hoc. However, since occlusions and correspondences are mutually dependent and the presence of occlusions already negatively influences the correspondence estimation itself, post-processing is suboptimal and leads to unreliable occlusion estimates.

6. 工作目标：视差图和光流场中另一个有价值的额外信息分别是明确的深度和运动边界，因此可以用于估计遮挡。

Another valuable extra information in disparity maps and flow fields are explicit depth and motion boundaries, respectively. Referring to the classic work of Black&Jepson, ”motion boundaries may be useful for navigation, structure from motion, video compression, perceptual organization and object recognition”.

we show that training a network for occlusion estimation is clearly beneficial, especially if the trained network is combined with a network formulation of disparity or optical flow estimation. We do not try to disentangle the chicken-and-egg problem, but instead solve this problem using the joint training procedure.

7. 核心思想：将遮挡估计以及深度或运动边界估计优雅地集成到基于FlowNet 2.0的深度网络中，用于视差或光流估计，并将这些量显式地作为输出提供。结果训练一个用于遮挡估计的网络显然是有益的，特别是如果训练的网络与视差或光流估计的网络公式相结合。不试图解开鸡生蛋还是蛋生鸡的问题，而是使用联合训练程序来解决这个问题。

We integrate occlusion estimation as well as depth or motion boundary estimation elegantly with a deep network for disparity or optical flow estimation based on FlowNet 2.0 and provide these quantities explicitly as output.
Using our predicted occlusions as input, we present a network that learns to interpolate the occluded areas to avoid the erroneous or missing information when computing the motion compensated difference between two disparity maps for scene flow.

8. 实验结果：

In contrast to many prior works, this leads to much improved occlusion and boundary estimates and much faster overall runtimes. We quantify this improvement directly by measuring the accuracy of the occlusions and motion boundaries. We also quantify the effect of this improved accuracy on motion segmentation.
Furthermore we improved on some details in the implementation of the disparity and optical flow estimation networks from, which gives us stateof-the-art results on the KITTI benchmarks.
Moreover, the networks show good generic performance on various datasets if we do not fine-tune them to a particular scenario. While these are smaller technical contributions, they are very relevant for applications of optical flow and disparity.
Finally, with state-of-theart optical flow, disparity and occlusion estimates in place, we put everything together to achieve good scene flow performance at a high frame-rate, using only 2D motion information.

9.论文下载：

https://openaccess.thecvf.com/content_ECCV_2018/papers/Eddy_Ilg_Occlusions_Motion_and_ECCV_2018_paper.pdf

二、实现过程

用CNN以及视差和光流估计遮挡和深度或运动边界。为此，构建了来自FlowNet的卷积编码器-解码器结构和来自FlowNet 2.0的堆栈。修改如图1(a)所示。

在FlowNet 2.0中，去掉了小位移网络。事实上，重新实现的版本的实验表明，没有它堆栈可以很好地执行小位移。仍然保留了之前的融合网络，它也进行了平滑和锐化(见图1(a))。在网络命名时用字母“R”表示这个网络(例如FlowNet-CSSR)。此网络仅用于改进，不能看到第二张图片。通过整合Pang等人的建议进一步修改了堆栈，并在改进网络中添加了残差连接。在FlowNet 2.0中，也输入了warp的图像，但忽略了光度误差输入，因为这些可以很容易地通过网络计算出来。

最后，添加遮挡和深度或运动边界。虽然遮挡从一开始就很重要，但边界只是在后期的改进阶段才需要。因此，在第三个网络中添加边界。实验中还发现，在早期的网络中加入深度或运动边界预测时，这些网络对细节的预测更好，但在出现错误的情况下失败得更严重。提前预测精确的边界与改进管道的概念相违背。

一般来说，在一个被遮挡的区域，从第一幅图像到第二幅图像的正向光流与从第二幅图像到第一幅图像的反向光流并不匹配。如果向前的光流被正确地插入到被遮挡的区域，它类似于背景物体的光流。由于该对象在第二幅图像中是不可见的，所以目标位置的逆向光流动来自于另一个对象，向前和向后的光流动是不一致的。许多经典的方法使用这个事实来确定遮挡。将其引入图1(b)中的网络结构。在这个版本中，让网络联合估计向前和向后的光流和遮挡。因此，修改了FlowNetC以包含第二个相关，它从第二幅图像中提取一个特征向量，并计算到第一幅图像中的一个邻域的相关性。我们将输出连接起来，并添加第二个跳跃连接用于第二个图像。这个设置如图1(b)中的FlowNetC-Bi所示。在整个堆栈中，估计向前和向后方向的光流和遮挡。

在图1(c)的第三种变体中，将向前和向后的光流估计建模为单独的光流，并在每个网络后向其他方向进行相互warp。例如，使用正向光流将第一个网络后的估计反向光流warp到第一个图像的坐标。随后，翻转扭曲光流的符号，有效地将其转变为正向光流。然后网络在相同的像素位置有正向光流和相应的反向光流作为输入。

最后，使用网络光来构建一个场景流扩展。对于场景光流任务，需要在t = 0时的视差，通过视差变化(类似于第三坐标的变化)来扩展光流。为了计算视差变化，可以估计t = 1时的视差，将其变形到t = 0，然后计算差值。然而，在存在遮挡的地方，warp将是不正确的或未定义的。因此，添加如图1(d)所示的网络，以学习对这些区域进行有意义的插值，给出了warp的视差、遮挡和图像。

秉持FlowNet系列以来的一贯风格，首先提出一大堆网络，如图的(a)、(b)、（c)；其中Bnd代表boundary，Occ代表occlusions，Ref表示融合网络，Aux表示Img 0和Warped Img 1。
(a)网络是最终选用的网络结构，与FlowNet1.0和FlowNet2.0相比，已经有了非常大的进化；例如出现了在多个网络之间出现了类似resnet的结构，最后的融合网络也去掉了一个“小光流”网络。
(b)和(c)是来凑数的，主要是想验证同时预测前后光流对Occ有没有价值，或者说Occ对前后光流预测有没有价值（有一说一，这个前后光流的思路后来被用于做非监督光流了）。
参考：https://blog.csdn.net/u012348774/article/details/111171577

实验

在这篇论文里特别关注Occ，所以作者做了大量实验，由于行文实在是太多，所以这里都直接写一些关键结论：

直接用遮挡GT监督两张照片预测Occ是可行的，即便再引入前后光流等等组合起来的信息也不会好更多；
预测光流时加上Occ，可以提高光流的预测结果，同时Occ的结果也不赖，当然再引入什么前后光流也差不多；
前文提到的(a)、(b)、(c)三个模型虽然都有一定的道理，但是(b)、(c)网络结构比较复杂，训练起来比较困难，反而结果比(a)差
最后，直接用CNN预测Occ效果不错