【双目代价空间分类】《Attention Concatenation Volume for Accurate and Efficient Stereo Matching》分类+代码

最新推荐文章于 2023-12-03 20:12:02 发布

星弈时

最新推荐文章于 2023-12-03 20:12:02 发布

阅读量383

点赞数 1

分类专栏：立体匹配文章标签：分类人工智能深度学习

本文链接：https://blog.csdn.net/AmadeB/article/details/128228175

版权

立体匹配专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Cost Volume for Stereo Matching

文章内容主要来源

【文献】《Attention Concatenation Volume for Accurate and Efficient Stereo Matching》
【文献】《Accurate and Efficient Stereo Matching via Attention Concatenation Volume》

【代码地址】https://github.com/gangweiX/ACVNet
【代码地址】https://github.com/gangweiX/Fast-ACVNet

一、Cost Volume 介绍

Recently, convolutional neural networks have exhibited great potentials in this field. State-of-the-art CNN stereo models typically consist of four steps, i.e. feature extraction, cost volume construction, cost aggregation and disparity regression. The cost volume which provides initial similarity measures for left image pixels and possible corresponding right image pixels is a crucial step of stereo matching. An informative and concise cost volume representation from this step is vital for the final accuracy and computational complexity. Learning-based methods explore different cost volume representations.
翻译：近年来，卷积神经网络在该领域表现出了巨大的潜力。最先进的CNN立体模型通常包括四个步骤，即特征提取、成本体积构建、成本聚合和视差回归。代价量是立体匹配的关键步骤，它为左侧图像像素和可能对应的右侧图像像素提供初始相似度度量。从这一步得到一个信息丰富而简洁的成本量表示对于最终的准确性和计算复杂性至关重要。基于学习的方法探索不同的成本体积表示。

【1】Full correlation volume

DispNetC [3] computes a singlechannel full correlation volume between the left and right feature maps along every disparity level. Such full correlation volume provides an efficient way for measuring similarities，but it loses much content information.
翻译：DispNetC [3]沿着每个视差级别计算左右特征映射之间的单通道全相关量。这种 full correlation volume提供了一种有效的测量相似度的方法，但是这种方法丢失了很多内容信息。

【2】4D concatenation volume

GC-Net [4] constructs a 4D concatenation volume by concatenating left and right feature maps along all disparity levels to provide abundant content information. However, the concatenation volume does not provide explicit similarity measurements, and thus requires extensive 3D convolutions for cost aggregation to learn similarity measurements from scratch.
翻译：GC-Net[4]通过将左右特征图沿所有视差级别进行拼接，构建4D拼接体，提供丰富的内容信息。然而，拼接体积并没有提供明确的相似度测量，因此需要大量的3D卷积来进行成本聚合，从而从头开始学习相似度测量。

【3】Group-wise correlation volume

To tackle the above drawbacks, GwcNet [5] concatenates the group-wise correlation volume with a compact concatenation volume to encode both matching and content information in the final 4D cost volume. However, the data distribution and characteristics of a correlation volume and a concatenation volume are quite different, i.e. the former represents the similarity measurement obtained through dot product, and the latter is the concatenation of the unary features. Simply concatenating the two volumes and regularizing them via 3D convolutions can hardly exert the advantages of the two volumes to the full. As a result, GwcNet [5] still requires extensive 3D convolutions for cost aggregation.
翻译：为了解决上述缺点，GwcNet[5]将分组相关卷与紧凑的连接卷连接起来，在最终的4D代价卷中编码匹配信息和内容信息。但是，关联量和拼接量的数据分布和特征有很大的不同，关联量是通过点积得到的相似度度量，拼接量是一元特征的拼接。简单地将两个体量连接起来，通过3D卷积进行正则化，很难充分发挥两个体量的优势。因此，GwcNet[5]仍然需要大量的3D卷积来进行成本聚合。

【4】Attention Concatenation Volume

we propose an Attention Concatenation Volume (ACV) which exploits a lightweight correlation volume to generate attention weights to filter a concatenation volume (see Fig. 1). The ACV can achieve a high accuracy and meanwhile significantly alleviate the burden of cost aggregation. Experimental results show that after replacing the combined volume of GwcNet with our ACV, only four 3D convolutions for cost aggregation can achieve better accuracy than GwcNet which employs twenty-eight 3D convolutions for cost aggregation. Our ACV is a general cost volume representation that can be seamlessly integrated into various 3D CNN stereo models for performance improvement. Results show that after applying our method, PSMNet [13] and GwcNet [5] can respectively achieve additional a 28% and 39% accuracy improvement.
翻译：我们提出了一种注意级联量(Attention concatation Volume, ACV)，它利用一个轻量级的相关量来生成注意权重来过滤级联量(如图1所示)。该ACV可以达到较高的精度，同时显著减轻了成本聚合的负担。实验结果表明，用我们的ACV替换GwcNet的合并体积后，仅用4个3D卷积进行成本聚合就可以达到比GwcNet采用28个3D卷积进行成本聚合更好的精度。我们的ACV是一种通用的成本体积表示，可以无缝集成到各种3D CNN立体模型中，以提高性能。结果表明，应用该方法后，PSMNet[13]和GwcNet[5]的准确率分别提高了28%和39%。

【5】其他 Cost Volume

To further reduce the memory and computational complexity, several methods [6], [7], [8] employ the cascade cost volume which build a cost volume pyramid in a coarse-to-fine manner to progressively narrow down the target disparity range. However, these cascaded methods need to re-construct and re-aggregate a cost volume for each stage without reusing prior information in the probability volume of the previous stage, yielding a low utilization efficiency. In addition, these cascaded methods could suffer from irreversible cumulative errors as they directly discard disparities that are beyond the prediction range in the previous stages.
翻译：为了进一步降低内存和计算复杂度，[6]、[7]、[8]几种方法采用级联代价体积，从粗到细构建代价体积金字塔，逐步缩小目标视差范围。但是，这些级联方法需要对每一阶段的成本量进行重构和重新聚合，而不能重用前一阶段概率量中的先验信息，因此利用效率很低。此外，这些级联方法可能会产生不可逆的累积误差，因为它们直接丢弃了前一阶段预测范围之外的差异。

[6] Z. Shen, Y . Dai, and Z. Rao, “Cfnet: Cascade and fused cost volume for robust stereo matching,” in CVPR, 2021, pp. 13 906–13 915.
[7] X. Gu, Z. Fan, S. Zhu, Z. Dai, F. Tan, and P . Tan, “Cascade cost volume for high-resolution multi-view stereo and stereo matching,” in CVPR, 2020, pp. 2495–2504.
[8] S. Cheng, Z. Xu, S. Zhu, Z. Li, L. E. Li, R. Ramamoorthi, and H. Su, “Deep stereo using adaptive thin volume representation with uncertainty awareness,” in CVPR, 2020, pp. 2524–2534.

二、代码

2.1 Full correlation volume

def build_correlation_volume(refimg_fea, targetimg_fea, maxdisp, num_groups):
    B, C, H, W = refimg_fea.shape
    volume = refimg_fea.new_ones([B, num_groups, maxdisp, H, W])
    for i in range(maxdisp):
        if i > 0:
            volume[:, :, i, :, i:] = groupwise_correlation(refimg_fea[:, :, :, i:], targetimg_fea[:, :, :, :-i],
                                                           num_groups)
        else:
            volume[:, :, i, :, :] = groupwise_correlation(refimg_fea, targetimg_fea, num_groups)
    volume = volume.contiguous()
    return volume

def norm_correlation(fea1, fea2):
    cost = torch.mean(((fea1/(torch.norm(fea1, 2, 1, True)+1e-05)) * (fea2/(torch.norm(fea2, 2, 1, True)+1e-05))), dim=1, keepdim=True)
    return cost

def build_norm_correlation_volume(refimg_fea, targetimg_fea, maxdisp):
    B, C, H, W = refimg_fea.shape
    volume = refimg_fea.new_zeros([B, 1, maxdisp, H, W])
    for i in range(maxdisp):
        if i > 0:
            volume[:, :, i, :, i:] = norm_correlation(refimg_fea[:, :, :, i:], targetimg_fea[:, :, :, :-i])
        else:
            volume[:, :, i, :, :] = norm_correlation(refimg_fea, targetimg_fea)
    volume = volume.contiguous()
    return volume

2.2 4D concatenation volume

def build_concat_volume(refimg_fea, targetimg_fea, maxdisp):
    B, C, H, W = refimg_fea.shape
    volume = refimg_fea.new_zeros([B, 2 * C, maxdisp, H, W])
    for i in range(maxdisp):
        if i > 0:
            volume[:, :C, i, :, :] = refimg_fea[:, :, :, :]
            volume[:, C:, i, :, i:] = targetimg_fea[:, :, :, :-i]
        else:
            volume[:, :C, i, :, :] = refimg_fea
            volume[:, C:, i, :, :] = targetimg_fea
    volume = volume.contiguous()
    return volume

2.3 Group-wise correlation volume

def groupwise_correlation(fea1, fea2, num_groups):
    B, C, H, W = fea1.shape
    assert C % num_groups == 0
    channels_per_group = C // num_groups
    cost = (fea1 * fea2).view([B, num_groups, channels_per_group, H, W]).mean(dim=2)
    assert cost.shape == (B, num_groups, H, W)
    return cost

def build_gwc_volume(refimg_fea, targetimg_fea, maxdisp, num_groups):
    B, C, H, W = refimg_fea.shape
    volume = refimg_fea.new_zeros([B, num_groups, maxdisp, H, W])
    for i in range(maxdisp):
        if i > 0:
            volume[:, :, i, :, i:] = groupwise_correlation(refimg_fea[:, :, :, i:], targetimg_fea[:, :, :, :-i],
                                                           num_groups)
        else:
            volume[:, :, i, :, :] = groupwise_correlation(refimg_fea, targetimg_fea, num_groups)
    volume = volume.contiguous()
    return volume

2.4 Attention Concatenation Volume

在这里插入图片描述

features_left = self.feature_extraction(left)
features_right = self.feature_extraction(right)
gwc_volume = build_gwc_volume(features_left["gwc_feature"], features_right["gwc_feature"], self.maxdisp // 4, self.num_groups)
gwc_volume = self.patch(gwc_volume)
patch_l1 = self.patch_l1(gwc_volume[:, :8])
patch_l2 = self.patch_l2(gwc_volume[:, 8:24])
patch_l3 = self.patch_l3(gwc_volume[:, 24:40])
patch_volume = torch.cat((patch_l1,patch_l2,patch_l3), dim=1)
cost_attention = self.dres1_att_(patch_volume)
cost_attention = self.dres2_att_(cost_attention)
att_weights = self.classif_att_(cost_attention)

concat_feature_left = self.concatconv(features_left["gwc_feature"])
concat_feature_right = self.concatconv(features_right["gwc_feature"])  
concat_volume = build_concat_volume(concat_feature_left, concat_feature_right, self.maxdisp // 4)
ac_volume = F.softmax(att_weights, dim=2) * concat_volume   ### ac_volume = att_weights * concat_volume 
cost0 = self.dres0(ac_volume)

注意力权重 ---------- gwc_volume ------- correlation_volume
初始cost_volume --------- concat_volune
最终cost_volume --------- = F.softmax(att_weights, dim=2) * concat_volume