multi-scale geometric consistency guided multi-view stereo阅读笔记

最新推荐文章于 2023-08-15 14:57:36 发布

Octavia_Kong

最新推荐文章于 2023-08-15 14:57:36 发布

阅读量1.3k

点赞数

本文链接：https://blog.csdn.net/m0_37923410/article/details/102979927

版权

1. introduction

a popular four-step pipeline, including random initialization, propagation, view selection, refinement. 本文的改进主要是propagation和view selection。propagation有两种方式：sequential propagation 和 diffusion-like propagation. 前者比后者在challenging cases下性能更佳，《patchmatch based joint view selection and depthmap estimation》指出这是由于view selection，而不是propagation: Unlike their their elaborate view selection, the diffusion-like propagation adopts a simple threshold truncation scheme to determine aggregation view subsets.

4. structured information

sample better candidate hypotheses for propagation & select views with more credibility for multi-view matching costs aggregation

4.1 random initialization

直接利用Homograph来寻找在另一个view上的对应点，可以不用rectify。其实就是假设一个patch对应的3D物体在一个平面上。在一个小窗口中，对一个窗口内的部分点进行计算总的cost，作为中心点的cost。 the top K best matching costs are aggregate into the initial multi-view matching cost for the subsequent propagation 。

template< typename T >
__device__ FORCEINLINE static float pmCost (
                                           const cudaTextureObject_t &l,
                                           const T * __restrict__ tile_left,
                                           const int2 tile_offset,
                                           const cudaTextureObject_t &r,
                                           const int &x,
                                           const int &y,
                                           const float4 &normal,
                                           const int &vRad,
                                           const int &hRad,
                                           const AlgorithmParameters &algParam,
                                           const CameraParameters_cu &camParams,
                                           const int &camTo )
{
   const int cols = camParams.cols;
   const int rows = camParams.rows;
   const float alpha = algParam.alpha;
   const float tau_color = algParam.tau_color;
   const float tau_gradient = algParam.tau_gradient;
   const float gamma = algParam.gamma;

   float4 pt_c;
   float H[16];

   //计算Homography矩阵，用于获得patch在另一个view下的位置。
   getHomography_cu ( camParams.cameras[REFERENCE], camParams.cameras[camTo], camParams.cameras[REFERENCE].K_inv, camParams.cameras[camTo].K, normal, normal.w, H );

   //根据Homography矩阵，获得这个点对应的位置。
   getCorrespondingPoint_cu ( make_int2(x, y), H, &pt_c );

   {
       float cost = 0;
       //float weightSum = 0.0f;
       for ( int i = -hRad; i < hRad + 1; i+=WIN_INCREMENT ) {
           for ( int j = -vRad; j < vRad + 1; j+=WIN_INCREMENT ) {
               const int xTemp = x + i;
               const int yTemp = y + j;
               float4 pt_l;
               pt_l.x = __int2float_rn(xTemp);
               pt_l.y = __int2float_rn(yTemp);
               int2 pt_li = make_int2(xTemp, yTemp);

               float w;

               w = weight_cu<T> ( tex2D<T>(l, pt_l.x + 0.5f, pt_l.y + 0.5f), tex2D<T>(l,x + 0.5f,y + 0.5f), gamma);

               float4 pt;
               getCorrespondingPoint_cu ( make_int2(xTemp, yTemp),
                                          H,
                                          &pt );

               cost = cost + pmCostComputation<T> ( l, tile_left, r, pt_l, pt, rows, cols, tau_color, tau_gradient, alpha,  w );
               //weightSum = weightSum + w;
           }
       }
       return cost;
   }
}`

4.2 adaptive checkerboard sampling

partition the pixels of Iref into red-black grids of a checkerboard -> sample 8 good hypothesis from these areas according to previous multi-view matching cost

在这里插入图片描述

Q1 :原文Each Vshaped area contains 7 samples while every long strip area contains 11 samples. 为什么不是Each Vshaped area contains 3 samples while every long strip area contains 3 samples呀
A1:Fugure1.c 只表示了一种采样方式，红色区域可以向外扩展，使v形取够7个，长条形取够11个。因为分成红黑模式，所以对于中心的黑点来说，只会从红圈中取

4.3 Multi-Hypothesis joint view selection

cost matrix: m_i,j_ i为第i个hypothesis, j为第j个view
在这里插入图片描述
We adopt the bilateral weighted adaption of normalized cross correlation to compute the matching cost, which describes the photometric consistency between the reference and source patch.

matching cost boundary:
在这里插入图片描述
t为迭代次数，α为常数，tao0为初始的matching cost threshold.

Q2: tao0是根据什么给定？
A2: 经验值，是一个常数，对所有patch保持一致，为0.8。

在这里插入图片描述

Q3:没懂这个(4)… 为什么集合作分母？
在集合两边加杆表示集合中元素的个数。这个式子就是第j个view的所有属于Sgood的patch的C（置信度）加和再除以集合元素的个数。

对于（5）, 我们假设迭代t−1中最重要的视图vt−1将继续对当前迭代t的视图选择产生影响。if Ij 属于 St，Ij在t-1次迭代中是the most important view的话，权重更新为2w(Ij) ，不是则保持w(Ij)。if Ij 不属于 St，Ij在t-1次迭代中是the most important view的话，权重更新为0.2 ，不是则为0。

在这里插入图片描述
总之这步就是更新matching cost然后minimum multi-view aggregated cost对应的就是最优估计。

4.4 Refinement

A refinement is applied to enrich the diversity of solution space.
We generate two new hypotheses, one of which is randomly generated and the other is obtained by perturbing the current estimate. We combine these new depths and normals with the current depth and normal, yielding another six new hypotheses to be tested. （已计算出的、扰动的、随机的，这三个的法线和深度）The hypothesis with the least aggregated cost is chosen as the final estimate for pixel p. The above propagation, view selection and refinement are repeated multiple times to get the final depth map for Iref. At the end, a median filter of size 5 × 5 is applied to our final depth maps.

Q4 ：上述迭代一般几次？停止条件？迭代至收敛吗？
A4:PatchMatch的迭代次数从Gipuma文章的图可以看到，一般6次就可以收敛了, 自己的实验中一般三次就收敛了。迭代次数自己取，不用判断是否收敛。

5. Multi-scale Geometric Consistency

photometric consistency experiences difficulties when applied to optimize these depth estimates at finer scales. In this section, we detail how to leverage geometric consistency guidance to deal with the optimization of these estimates. Also, a detail restorer is present to correct the errors induced from coarser scales.

5.1 geometric consistency guidance

use the forward-backward reprojection error to indicate this consistency
在这里插入图片描述
我们使用the joint bilateral upsampler将先前尺度下的估计传播到当前尺度。upsampled估计作为当前尺度的初始种子来执行后续的传播、视图选择和细化，就像在ACMH中一样。不同的是，这里我们采用geo一致性而不是photo一致性来更新像素p，这种修正限制了当前hypothesis更新的解空间，特别是在低纹理区域的hypothesis更新。这保证了在最粗尺度下获得的低纹理区域的可靠估计可以传播到最细的尺度。值得注意的是，几何一致性还优化了除低纹理区域以外的其他区域的深度估计。

Q5:为什么引入重投影误差就可以起到这么些作用？几何一致性比光度一致性多一个重投影误差的约束，为什么这个可以限制hypothesis的解空间呀
A5：Cr是参考图像，C1C2是邻域相机，假设邻域的深度图估计是可靠的。C1和C2所表示的深度图是固定的，根据它们的深度可以往三维空间中投影。对于Cr,假设对应Xr这个三维点，往C1C2投影得到x1、x2两点，结合C1、C2的深度图，再从x1和x2往三维空间中投影，可以形成红圈那个三维约束空间。这个空间中的点往Cr再投影，可以看到其重投影误差都比较小。而对于另外一个假设对应的Xr’这个三维点，这种情况带来极大的重投影误差。也就是说，其与邻域深度图的估计越不一致的话，其带来的重投影误差就越大。（那如果∠C1XrCr比较大，即使深度估计误差不大，重投影误差也会比较大吧？那是不是在选择相邻相机时要考虑这个角的大小？那选择相邻相机的时候是不是要限制既不能太大也不能太小？）其实角度比较大是有好处的，相当于在三维空间中约束的更严苛，强迫估计只能落在那个空间中，不然引起的重投影误差就很大。一般会有这个限制，不过有些算法貌似只限制了不能太小。（λ过大会怎样？）取值过大，导致对邻域深度图依赖严重，而其实邻域深度图也只是个参考，并不一定就很准确。

ACMH获得的初始深度图由于模糊和遮挡而产生噪声。然而，光度的一致性很难反映出来上述的误差，因为深度的大变化只会引起小的cost变化。因此，我们还在最粗的尺度上执行几何一致性来优化这些初始深度图。直观地，如果更准确地估计相邻深度图，则将进一步提高参考图像的深度图。因此，我们在实验中进行了两次几何一致性指导，以细化每个尺度的深度图。

5.2 Detail restorer

模糊的细节常常发生在thin structure或边界上。These details can be better estimated at the original image scale with only photometric consistency。我们希望对这些区域进行检测，并仅在这些特定区域执行光度一致性，以纠正错误的估计。我们观察到，相邻尺度间光度一致性成本的差异图可以放大细节上的误差，同时抑制 the reflection of reliable estimates in low-textured areas。

Q6: the reflection of reliable estimates in low-textured areas?不懂这是什么
A6: figure4.e 可以看到对边缘细节区域，值差的比较大，比弱纹理区域（地面）更显著。黑色明显的都是场景结构的细节边缘，而光滑的区域基本都是白色的，就是这个差分图抑制了弱纹理区域可靠估计的反映。

在这里插入图片描述

Q7: (a)Its depth map is obtained by upscaling the estimation of the penultimate scale? 是把下采样并处理后得到的depth map按照最近邻插值的方式放到原图大小后，然后用原图的数据与放大的结果进行联合双边滤波吗？怎么用原图的数据与放大的结果进行联合双边滤波吗
A7:

对于上采样后的高分辨率深度图某个位置的像素（2x,2y），Sp是该像素的深度值。f(||p-q||)代表的是低分辨率深度图patch每一点位置距离中心点的位置的函数，g(||Ip-Iq||）代表的是高分辨彩色图像patch块每一点的颜色值与中心点颜色值的差的函数。f、g是权重函数，一般用高斯函数，kp是归一化值（每一点权重加和）。这样就可以邻域的深度值来加权，避免最近邻插值带来的锯齿效应。

Q8:f和c没看出来区别
A8: 红色表示距离ground truth的差异，可以看出圈的地方f比c好。ground truth是深度图的真实值，由激光雷达等硬件采集的值。e图是公式（10）的误差。

6.fusion

我们将每幅图像依次转换为参考图像，将其深度图转换为世界坐标下的三维点，并将其投影到其相邻视图以获得相应的匹配。 the relative depth difference, the angle between normals and the reprojection error有相应的约束，如果存在n个≥2相邻视图，其对应的匹配满足上述约束条件，则对应深度估计被accept。最后，将与这些一致深度估计相对应的三维点和正态估计平均为一个统一的三维点。

Octavia_Kong

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
2
评论
multi-scale geometric consistency guided multi-view stereo阅读笔记

1. introductiona popular four-step pipeline, including random initialization, propagation, view selection, refinement. 本文的改进主要是propagation和view selection。propagation有两种方式：sequential propagation 和 dif...
复制链接

扫一扫