论文阅读《SGM-Net: Semi-global matching with neural networks》

CV科研随想录

已于 2023-12-25 14:26:57 修改

阅读量2.1k

点赞数 2

分类专栏： CV顶会(刊)论文阅读文章标签：深度学习计算机视觉神经网络

于 2022-03-31 17:52:05 首次发布

本文链接：https://blog.csdn.net/weixin_40957452/article/details/123846657

版权

CV顶会(刊)论文阅读专栏收录该内容

63 篇文章

订阅专栏

SGM-Net是一种利用深度学习改进半全局匹配（SGM）算法的方法，旨在解决传统SGM算法对参数敏感的问题。论文提出了通过网络学习SGM的惩罚参数，并使用路径损失和邻域损失来训练模型。路径损失确保正确路径的代价低于错误路径，邻域损失则减少了不确定性。此外，有向参数化增强了模型的表达能力。实验结果表明，SGM-Net能提高视差估计的准确性和图像质量。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

论文地址：http://openaccess.thecvf.com/content_cvpr_2017/papers/Seki_SGM-Nets_Semi-Global_Matching_CVPR_2017_paper.pdf

背景

传统SGM算法效果虽好，但严重依赖于使用者的调参经验，算法中的惩罚因子对算法的性能有极大的影响，为此提出使用CNN学习的方式来进行参数估计；SGM-Net的输入为小图像patch以及位置，输出为3D物体结构的惩罚参数；为了训练网络，提出了一种使用稀疏标注视差的损失函数；此外，还提出了一种新的SGM参数化方式，根据正负视差变化来调整惩罚参数，以便于区分不同的对象结构；

网络结构

在这里插入图片描述

SGM

对于经典传统算法中的SGM，其能量函数如式1所示：
$E(D)=\sum_{x}(C(x,d^{x})+\sum_{y\in N_{x}}P_1T[|d^x-d^y|=1]+\sum_{y\in N_{x}}P_2T[|d^x-d^y|>1])\tag1$
其中 $C(x,d^{x})$ 表示在像素 $x = (u, v)$ 点，视差为 $d^x$ 时的代价， $P_1$ 是对 $p$ 的邻域内处于同一个平面上的视差点给予小惩罚； $P_2$ 是对邻域内不连续的视差给予大惩罚，第一项为数据项：体现总体的匹配代价，后两项为平滑项：使得图像尽量平滑；
$L_{r}(\mathrm{x_0}, d)=\mathrm{C}(\mathrm{x_0}, d)+\min \left\{\begin{array}{l} L^\prime_{r}(\mathrm{x_0}-\mathrm{r}, d) \\ L^\prime_{r}(\mathrm{x_0}-\mathrm{r}, d-1)+P_{1} \\ L^\prime_{r}(\mathrm{x_0}-\mathrm{r}, d+1)+P_{1} \\ \underset{i}{\min} L^\prime_{r}(\mathrm{x_0}-\mathrm{r}, i)+P_{2} \end{array}\right\}-\min _{i} L^\prime_{r}(\mathrm{x_0}-\mathrm{r}, i)\tag2$
最后基于WTA计算最终的视差图如式3所示：
$D(x_0)=arg\min_{d}\sum_rL_r(x_0,d)\tag3$

对于SGM的详细介绍请移步博主的另外一篇博客：https://blog.csdn.net/weixin_40957452/article/details/121524482

SGM-Net

SGM-Net分为两个阶段：在训练阶段通过最小化路径损失与邻域损失来为每个像素预测 $P_1$ 与 $P_2$ 惩罚因子；在测试阶段，SGM使用SGM-Net预测的结果进行视差估计，得到最后的视差图；

标准参数化

在这里插入图片描述
上图表示SGM算法中沿路径进行代价聚合的过程，从左往右是沿着路径上的像素点，从上往下是路径上的每个像素的视差与最优视差选择；虚线指向正确视差，橙线与蓝线表示两种视差聚合路径，从中可以看出橙线与紫线在 $x_2$ 后面开始分离，如式2所示，在求 $x_1$ 点的代价时，取 $d_1^{x_1}$ 加上 $P_2$ 惩罚的代价（式2中min大括号内中的第4项）与取 $d_4^{x_1}$ 加上 $P_1$ 惩罚的代价（式2中min大括号内中的第3项）的值相等，则此时取哪一项具有不确定性，表明SGM算法中沿路径聚合具有模糊性；

路径损失

在 $x_0$ 像素处，沿着深度范围考虑，应该满足在深度 $d_{gt}^{\mathbf{x}_0}$ 的代价值小于其余深度的代价值 $d_{i}^{\mathbf{x}_{0}}$ 也就有： $L_{\mathbf{r}}\left(\mathbf{x}_{0}, d_{i}^{\mathbf{x}_{0}}\right)>L_{\mathbf{r}}\left(\mathbf{x}_{0}, d_{g t}^{\mathbf{x}_{0}}\right)$ ,将其写成Hinge loss形式如式4所示：
$E_{g}=\sum_{d_{i}^{\mathbf{x}_{0}} \neq d_{g t}^{\mathbf{x}_{0}}} \max \left(0, L_{\mathbf{r}}\left(\mathbf{x}_{0}, d_{g t}^{\mathbf{x}_{0}}\right)-L_{\mathbf{r}}\left(\mathbf{x}_{0}, d_{i}^{\mathbf{x}_{0}}\right)+m\right)\tag4$
式4表示如果 $d_{gt}^{\mathbf{x}_0}$ 与 $d_{i}^{\mathbf{x}_{0}}$ 的距离若大于m则损失为0，否则损失为 $[L_{\mathbf{r}}\left(\mathbf{x}_{0}, d_{i}^{\mathbf{x}_{0}}\right)-L_{\mathbf{r}}\left(\mathbf{x}_{0}, d_{g t}^{\mathbf{x}_{0}}\right)]$ , $x_0$ 点的损失在路径上聚合而来，如式5所示：
$\begin{aligned} L\left(\mathbf{x}_{0}, d_{g t}^{\mathbf{x}_{0}}\right) &=c\left(\mathbf{x}_{0}, d_{g t}^{\mathbf{x}_{0}}\right)+c\left(\mathbf{x}_{1}, d_{1}^{\mathbf{x}_{1}}\right)+c\left(\mathbf{x}_{2}, d_{3}^{\mathbf{x}_{2}}\right) \\ &+c\left(\mathbf{x}_{3}, d_{3}^{\mathbf{x}_{3}}\right)+P_{2}\left(\mathbf{x}_{2}\right)-\beta \\\\ L\left(\mathbf{x}_{0}, d_{5}^{\mathbf{x}_{0}}\right) &=c\left(\mathbf{x}_{0}, d_{5}^{\mathbf{x}_{0}}\right)+c\left(\mathbf{x}_{1}, d_{4}^{\mathbf{x}_{1}}\right)+c\left(\mathbf{x}_{2}, d_{3}^{\mathbf{x}_{2}}\right) \\ &+c\left(\mathbf{x}_{3}, d_{3}^{\mathbf{x}_{3}}\right)+P_{1}\left(\mathbf{x}_{1}\right)+P_{1}\left(\mathbf{x}_{2}\right)-\beta \end{aligned}\tag5$
其中 $L\left(\mathbf{x}_{0}, d_{g t}^{\mathbf{x}_{0}}\right)$ 表示黄线的聚合路径， $L\left(\mathbf{x}_{0}, d_{5}^{\mathbf{x}_{0}}\right)$ 代表紫线的聚合路径； $\beta$ 代表式2中的最小的路径损失。由此，路径累积损失可以表示为式6形式：
$\begin{array}{r} L_{\mathbf{r}}\left(\mathbf{x}_{0}, d_{i}^{\mathbf{x}_{0}}\right)=\gamma+\sum_{n}\left(P_{1, \mathbf{r}}\left(\mathbf{x}_{n}\right) T\left[\left|\delta d^{\mathbf{x}_{n} \leftarrow d_{i}^{\mathbf{x}_{0}}}\right|=1\right]\right. \left.+P_{2, \mathbf{r}}\left(\mathbf{x}_{n}\right) T\left[\left|\delta d^{\mathbf{x}_{n} \leftarrow d_{i}^{\mathbf{x}_{0}}}\right|>1\right]\right) \end{array}\tag6$
其中： $T[|\delta d^{\mathbf{x}_{n} \leftarrow d_{i}^{\mathbf{x}_{0}}}|>1]$ 表示在 $d_i^{x_0}$ 聚合路径上，相邻像素间的视差取值大于1的点，如橙线 $x_2$ 到 $x_1$ 过程中， $x_2$ 点的取值为 $d_3$ ， $x_1$ 点的取值为 $d_1$ ，相邻像素间的取值差值为2，此时 $T$ 取值为1；且给予 $P_2$ 惩罚； $\gamma$ 表示聚合路径上累积匹配代价（不含每个像素上的最小代价），且 $\gamma$ 与 $P_1 与 P_2$ 无关；将式6代入式4中得到损失函数 $E_g$ ，再对 $P_1$ 与 $P_2$ 求偏导得：
$\begin{array}{l} \frac{\partial E_{g}}{\partial P_{1, \mathbf{r}}}=\sum_{d_{t}^{\mathbf{x}_{0}} \neq d_{g t}^{\mathbf{x}_{0}} }\sum_n\left(T\left[\left|\delta d^{\mathbf{x}_{n} \leftarrow d_{g t}^{\mathbf{x}_{0}}}\right|=1\right]-T\left[\left|\delta d^{\mathbf{x}_{n} \leftarrow d_{t}^{\mathbf{x}_{0}}}\right|=1\right]\right)\\\\ \frac{\partial E_{g}}{\partial P_{2, \mathbf{r}}}=\sum_{d_{t}^{\mathbf{x}_{0}} \neq d_{g t}^{\mathbf{x}_{0}}} \sum_{n}\left(T\left[\left|\delta d^{\mathbf{x}_{n} \leftarrow d_{g t}^{\mathbf{x}_{0}}}\right|>1\right]-T\left[\left|\delta d^{\mathbf{x}_{n} \leftarrow d_{t}^{\mathbf{x}_{0}}}\right|>1\right]\right) \end{array}\tag7$
例如：将式5代入式4中求偏导得：
$\begin{array}{r} \frac{\partial E_{g}}{\partial P_{1}\left(\mathrm{x}_{1}\right)}=-1, \frac{\partial E_{g}}{\partial P_{2}\left(\mathrm{x}_{1}\right)}=0, \frac{\partial E_{g}}{\partial P_{2}\left(\mathrm{x}_{2}\right)}=1, \\\\ \text { when } E_{g}=L_{\mathbf{r}}\left(\mathbf{x}_{0}, d_{g t}^{\mathbf{x}_{0}}\right)-L_{\mathbf{r}}\left(\mathbf{x}_{0}, d_{5}^{\mathbf{x}_{0}}\right)+m>0 \end{array}\tag8$
有了上述的分析，就可以使用标准的神经网络来对其训练；

邻域损失

由上述的分析可知：沿路径的聚合具有模糊性，为此引入邻域损失函数来降低这种不确定性；其核心思想为正确的代价聚合路径应该包含正确的视差信息且长度最短，如下图所示：
在这里插入图片描述
图中，红线为正确的路径，绿线为错误的路径，图a表示边缘区域，相邻像素的视差变化较大；图b为倾斜区域，相邻像素间的视差变化小，图c为平坦区域，相邻像素的视差相同；三幅图中红线所计算的损失 $F_b(.),F_s(.),F_f(.)$ 小于绿线的损失 $N (.)$ ；邻域损失如式9所示：
$E_{n_{X}}=\sum_{d \neq d_{g t}^{\mathbf{x}_{1}}} \max \left(0, F_{X}\left(\mathbf{x}_{1}, d_{g t}^{\mathbf{x}_{1}}\right)-N\left(\mathbf{x}_{1}, d_{g t}^{\mathbf{x}_{0}}, d\right)+m\right)\tag9$
其中， $N (.)$ 如式10所示：
$\begin{aligned} N\left(\mathbf{x}_{1}, d_{g t}^{\mathrm{x}_{0}}, d\right)=L_{\mathbf{r}}\left(\mathbf{x}_{1}, d\right) &+P_{1, \mathbf{r}}\left(\mathbf{x}_{1}\right) T\left[\left|d_{g t}^{\mathbf{x}_{0}}-d\right|=1\right] +P_{2, \mathbf{r}}\left(\mathbf{x}_{1}\right) T\left[\left|d_{g t}^{\mathbf{x}_{0}}-d\right|>1\right] \end{aligned}\tag{10}$
其中 $F_X(.)$ 表示相邻像素之间的视差变化， $F_b(.)表示边缘损失,F_s(.)表示倾斜表面损失,F_f(.)表示平坦平面损失$ ；
边缘损失Border：
$F_{b}\left(\mathbf{x}_{1}, d_{g t}^{\mathrm{x}_{1}}\right)=L_{\mathbf{r}}\left(\mathbf{x}_{1}, d_{g t}^{\mathrm{x}_{1}}\right)+P_{2, \mathbf{r}}\left(\mathbf{x}_{1}\right)\tag{11}$
倾斜面损失：
$F_{s}\left(\mathbf{x}_{1}, d_{g t}^{\mathrm{x}_{1}}\right)=L_{\mathbf{r}}\left(\mathbf{x}_{1}, d_{g t}^{\mathbf{x}_{1}}\right)+P_{1, \mathbf{r}}\left(\mathbf{x}_{1}\right) \tag{12}$
平坦面损失：
$F_{f}\left(\mathbf{x}_{1}, d_{g t}^{\mathbf{x}_{1}}\right)=L_{\mathbf{r}}\left(\mathbf{x}_{1}, d_{g t}^{\mathbf{x}_{1}}\right)\tag{13}$
公式9像前文介绍一样，是可微的；
在这里插入图片描述
使用了不同损失的效果如图5所示：使用了邻域损失可以使得图像更加清晰，但是会增加许多噪点，将路径损失与邻域损失结合使用会取得较不错的效果，总的损失函数如式14所示：
$E=\sum_{\mathbf{r} \in R}\left(\sum_{\mathbf{x}_{1}, \mathbf{x}_{0} \in G_{b}} E_{n_{b}}+\sum_{\mathbf{x}_{1}, \mathbf{x}_{0} \in G_{s}} E_{n_{s}}+\sum_{\mathbf{x}_{1}, \mathbf{x}_{0} \in G_{f}} E_{n_{f}}+\xi \sum_{\mathbf{x}_{0} \in G} E_{g}\right)\tag{14}$
其中 $\xi$ 为权重参数，在各个 $r$ 方向上随机选择相同数量的border、slant、flat处的像素点；其中flat处像素点 $x_1$ 与 $x_0$ 的视差值相同，slant处 $x_1$ 与 $x_0$ 的视差值相差1（+1或-1），boder处 $x_1$ 与 $x_0$ 的视差值相差大于1，在多个视差候选中随机选择一个；

有向参数化

在这里插入图片描述
由于在前面的讨论中，相邻像素最优视差相差大于1，则给 $P_2$ 惩罚，相邻像素最优视差相差等于1，给 $P_1$ 惩罚，在这里，为了增强模型的表达能力，同时视差的差值的正负：

当 $x_1^d-x_0^d=1$ 时惩罚为 $P_1^+$
当 $x_1^d-x_0^d=-1$ 时惩罚为 $P_1^-$
当 $x_1^d-x_0^d>1$ 时惩罚为 $P_2^+$
当 $x_1^d-x_0^d<-1$ 时惩罚为 $P_2^-$

$L_{\mathbf{r}}^{\prime \pm}\left(\mathbf{x}_{0}, d\right)=c\left(\mathbf{x}_{0}, d\right)+min\begin{Bmatrix} L_{\mathbf{r}}^{\prime \pm}(\mathbf{x}_{1}, d)\\\\ \min _{i=d \pm 1} L_{\mathbf{r}}^{\prime}\left(\mathbf{x}_{1}, i\right)+P_{1, \mathbf{r}}^{+} \underbrace{T[d-i=1]}_{T_{1}^{+}[\cdot]}+P_{1}^{-} \underbrace{T[i-d=1]}_{T_{1}^{-}[\cdot]}\\\\ \min _{i \neq d \pm 1} L_{\mathbf{r}}^{\prime \pm}\left(\mathbf{x}_{1}, i\right)+P_{2, \mathbf{r}}^{+} \underbrace{T[i<d]}_{T_{2}^{+}[\cdot]}+P_{2, \mathbf{r}}^{-} \underbrace{T[i>d]}_{T_{2}^{-}[\cdot]}) \end{Bmatrix}\tag{15}$
增加了有向参数化后：
将式6中的 $L_r^\pm$ 替换 $L_r$ 得：
$L_{\mathbf{r}}^{\pm}=\gamma+\sum_{n}\left(P_{1, \mathbf{r}}^{+} T_{1}^{+}[\cdot]+P_{1, \mathbf{r}}^{-} T_{1}^{-}[\cdot]+P_{2, \mathbf{r}}^{+} T_{2}^{+}[\cdot]+P_{2, \mathbf{r}}^{-} T_{2}^{-}[\cdot]\right)\tag{16}$
式10更新后得：
$\begin{aligned} N^{\pm}\left(\mathrm{x}_{1}, d_{g t}^{\mathbf{x}_{0}}, d\right) &=L_{\mathbf{r}}^{\pm}\left(\mathrm{x}_{1}, d\right) +P_{1, \mathbf{r}}^{+}\left(\mathrm{x}_{1}\right) T[\delta=1]+P_{1, \mathbf{r}}^{-}\left(\mathrm{x}_{1}\right) T[\delta=-1] +P_{2, \mathbf{r}}^{+}\left(\mathrm{x}_{1}\right) T[\delta>1]+P_{2, \mathbf{r}}^{-}\left(\mathrm{x}_{1}\right) T[\delta<-1] \end{aligned}\tag{17}$
其中 $\delta=d_{g t}^{\mathrm{x}_{0}}-d$ ,
$F_b$ 更新后为如式18所示：
$\begin{aligned} F_{b}^{\pm}\left(\mathrm{x}_{1}, d_{g t}^{\mathrm{x}}\right)=L_{\mathbf{r}}\left(\mathrm{x}_{1}, d_{g t}^{\mathrm{x}_{1}}\right) +P_{2, \mathbf{r}}^{+}\left(\mathrm{x}_{1}\right) T\left[d_{g t}^{\mathrm{x}_{0}}>d_{g t}^{\mathrm{x}_{1}}\right] +P_{2, \mathbf{r}}^{-}\left(\mathrm{x}_{1}\right) T\left[d_{g t}^{\mathrm{x}_{0}}<d_{g t}^{\mathrm{x}_{1}}\right] \end{aligned}\tag{18}$
$F_s$ 更新后为如式19所示：
$\begin{aligned} F_{s}^{\pm}\left(\mathrm{x}_{1}, d_{g t}^{\mathrm{x}}\right)=L_{\mathbf{r}}\left(\mathrm{x}_{1}, d_{g t}^{\mathrm{x}_{1}}\right) +P_{1, \mathbf{r}}^{+}\left(\mathbf{x}_{1}\right) T\left[d_{g t}^{\mathrm{x}_{0}}-d_{g t}^{\mathrm{x}_{1}}=1\right] +P_{1, \mathbf{r}}^{-}\left(\mathbf{x}_{1}\right) T\left[d_{g t}^{\mathrm{x}_{1}}-d_{g t}^{\mathrm{x}_{0}}=1\right] \end{aligned}\tag{19}$