slam 单目稠密深度详解

蓝羽飞鸟

已于 2022-04-22 08:30:17 修改

阅读量1.1k

点赞数 5

分类专栏： SLAM 文章标签： slam 算法 c++

于 2022-04-21 14:32:41 首次发布

本文链接：https://blog.csdn.net/level_code/article/details/124314743

版权

SLAM 专栏收录该内容

15 篇文章 6 订阅

订阅专栏

本文详细介绍了单目稠密重建中的极线搜索方法和高斯融合策略。通过不同视角的图像匹配，利用三角测量估计像素深度，并采用高斯融合融合多张深度图，最终得到稳定深度图像。内容涵盖极线定义、搜索匹配、三角测量、不确定性计算以及深度滤波器的更新过程。

摘要由CSDN通过智能技术生成

单目计算深度比较复杂，一般可以用RGB-D相机直接得到深度，但还是练习一下。

因为是稠密重建，对每个像素都算深度，所以就不是提取特征了。

仅凭一幅图像无法估计出深度，要用不同视角下的图像来估计。
特征点匹配法中，是根据不同视角中同一特征点的不同位置来进行三角测量估计深度，
但是稠密重建是不用特征点的，要对每个像素都进行匹配，然后进行三角测量。怎么匹配呢，用极线搜索和块匹配。

什么是极线搜索，如下（贴一张别人的图）：
在这里插入图片描述首先说神马是极线，相机1中的某像素p1, 它可能的深度是，比如[0.1, 5]（范围是图上的d), 然后p1的像素坐标结合这个最小和最大深度（d的两头），形成了俩位置，这俩位置映射到相机2的两个点，这两个点形成的线段就是极线。

那搜索呢，就是在极线这条线段上，搜索和p1一样的点，怎么判断一样，就是用各种相似度函数，这里用了NCC（后面说），只搜索一个点鲁棒性不行，就搜索一个块，块的大小自己定义。

搜索到p2点和p1点匹配度高之后，用三角测量估计p的深度，
现把图片1当作p1所在的图片，要估计图片1中所有像素的深度，目前有不同视角下拍的图片2来匹配p2，估计深度。
仅一张图片2，深度肯定是不够稳定的，
那么就来多张不同视角下的图片3,4,5…,
每张都和p1匹配，再估计图片1中所有像素的深度，于是得到了很多张不同的深度图。

那这么多张估计出来的深度图，总要融合成一个深度图吧，毕竟只要估计一张图片1的深度。
下面就是高斯融合了。

高斯融合就是假设深度d是服从高斯分布的，初始均值是 $\mu$ , 方差是 $\sigma$ ,
然后根据图片2，3，4…得到了新的深度均值方差： $\mu_{obs}$ , $\sigma_{obs}$ ,

根据高斯分布的乘积，得到融合公式
$\frac{ \sigma ^{2}_{obs}\mu + \sigma ^{2}\mu_{obs} }{\sigma ^{2} + \sigma ^{2}_{obs}}$
注意，因为每个像素都要算深度，所以这个均值，方差都是图像大小的矩阵。

最后得到的 $\mu$ 就是要得到的深度图。

接下来问题来了， $\mu_{obs}$ 和 $\sigma_{obs}$ 怎么算。
$\mu_{obs}$ 就是本次估计的深度，因为估计的时候就是求的平均值（后面代码会说）
$\sigma_{obs}$ 要根据不确定性公式算，
在这里插入图片描述先上结论： $\sigma_{obs} = \left \| p \right \| - \left \| p' \right \|$ ,
其中p指的是图中的 $O_{1}P$ 向量，p’指的是 $O_{1}P'$ 向量
那么问题回归到P和P’两个点怎么算出来。

首先p1对应的p2点是通过极线搜索出来的，通过p1, p2两点三角测量得到p1的估计位置P，
然后把p2沿极线移动一个像素，得到p2’,
再根据p1, p2’这两个点再估计一个位置P’。

而实际上不需要估计两次，在移动p2得到p2’后，
根据图上标的一些边和角，得到如下关系
$a = p - t$ (向量减）
$\alpha = arccos<p, t>$
$\beta = arccos<p, -t>$
$\beta' = arccos<O_{2}p_{2}', -t>$
$\gamma = \pi - \alpha - \beta$ (三角形内角和=180度）

然后根据正弦定理：
$\frac{\left \| p' \right \| }{sin\beta'} = \frac{\left \| t \right \| }{sin\gamma}$

就得到了
$\left \| p' \right \| = \left \| t \right \|\frac{sin\beta' }{sin\gamma}$

注意这个t是相机位姿变换T的平移部分。

至于三角测量具体怎么算，这个在代码中注释。

下面一步一步完成上面的过程。
首先初始化上面的 $\mu$ 和 $\sigma$ 矩阵

double init_depth = 3.0;  //深度初始值
double init_cov2 = 3.0;   //方差初始值
Mat depth(height, width, CV_64F, init_depth);   //深度图
Mat depth_cov2(height, width, CV_64F, init_cov2);  //深度图方差

由p1估计p2是需要相机的位姿的，这个位姿从data中读取
这是第一幅图的相机位姿。

SE3d pose_ref_TWC = poses_TWC[0];

第一幅图像

Mat ref = imread(color_image_files[0], 0);  //灰度图

接下来是遍历所有的图像（不同位姿下拍的图片）
我们称第一幅图像为ref, TWC指从current相机坐标系到world坐标系的变换矩阵T，
那么p1到world坐标系，再到p2, 就需要一个把ref坐标系变换到current相机坐标系的变换T_C_R。
它是由ref的T_W_C(current to world)左乘相机2的T_C_W(world to current)。

for(int index = 1; index < color_image_files.size(); index ++) {
    cout << "*** loop " << index << " ***" << endl;
    Mat curr = imread(color_image_files[index], 0);
    if(curr.data == nullptr) continue;
    SE3d pose_curr_TWC = poses_TWC[index];   //当前坐标系到世界坐标系的变换矩阵T
    SE3d pose_T_C_R = pose_curr_TWC.inverse() * pose_ref_TWC;  //左乘，ref->world->cur: T_C_W * T_W_R = T_C_R
    update(ref, curr, pose_T_C_R, depth, depth_cov2);
}

好了位姿解决了，下面就是update中需要做的极线搜索和高斯融合了。

Vector2d pt_curr;  //当前位置
Vector2d epipolar_direction;  //极线方向
bool ret = epipolarSearch(
        ref,
        curr,
        T_C_R,
        Vector2d(x, y),
        depth.ptr<double>(y)[x],
        sqrt(depth_cov2.ptr<double>(y)[x]),
        pt_curr,
        epipolar_direction
        );
if(ret == false)  continue;  //匹配失败

//匹配成功，更新深度滤波器
updateDepthFilter(Vector2d(x, y), pt_curr, T_C_R, epipolar_direction, depth, depth_cov2);

极线搜索，具体写在注释里

//极线搜索
bool epipolarSearch(
    const Mat &ref, const Mat &curr,
    const SE3d &T_C_R, const Vector2d &pt_ref,
    const double &depth_mu, const double &depth_cov,
    Vector2d &pt_curr, Vector2d &epipolar_direction) {

    Vector3d f_ref = px2cam(pt_ref);   //转归一化相机坐标（X/Z, Y/Z, 1)
    f_ref.normalize();  //单位向量
    Vector3d P_ref = f_ref * depth_mu;  //(X/Z, Y/Z, 1) * Z = (X, Y, Z), 参考帧的P向量

    Vector2d px_mean_curr = cam2px(T_C_R * P_ref); //投到第2个相机的像素坐标，书上的p2点
    double d_min = depth_mu - 3 * depth_cov;  //小概率事件左边mu - 3*sigma
    double d_max = depth_mu + 3 * depth_cov;  //小概率事件右边mu + 3*sigma
    if(d_min < 0.1) d_min = 0.1;  //实际情况考虑，不会再近了

    //因为已知位姿，把P点按深度范围投影到p2，可得到极线
    Vector2d px_min_curr = cam2px(T_C_R * (f_ref * d_min));  //按深度d_min投影的p2点
    Vector2d px_max_curr = cam2px(T_C_R * (f_ref * d_max));  //按深度d_max投影的p2点

    Vector2d epipolar_line = px_max_curr - px_min_curr;  //极线
    epipolar_direction = epipolar_line;   //向量形式
    epipolar_direction.normalize();   //按长度归一化,得到单位向量

    double half_length = 0.5 * epipolar_line.norm();    //极线线段的半长度
    if(half_length > 100) half_length = 100;    //不需要搜索太多

    // 在极线上搜索，以深度均值点(p2)为中心，左右各取半长度
    double best_ncc = -1.0;
    Vector2d best_px_curr;
    for(double l = -half_length; l <= half_length; l += 0.7) { // l+=sqrt(2)/2
        Vector2d px_curr = px_mean_curr + l * epipolar_direction;  //px_mean_curr:按深度均值计算出的p1点直接投影到p2
        if(!inside(px_curr)) continue;
        //计算NCC
        double ncc = NCC(ref, curr, pt_ref, px_curr);
        if(ncc > best_ncc) {
            best_ncc = ncc;
            best_px_curr = px_curr;
        }
    }
    if(best_ncc < 0.85f) return false;   //大于阈值才认为是匹配
    pt_curr = best_px_curr;  //通过引用形参传递匹配点
    return true;
}

块匹配在NCC函数里

double NCC(
        const Mat &ref, const Mat &curr,
        const Vector2d &pt_ref, const Vector2d &pt_curr) {

    double mean_ref = 0;
    double mean_curr = 0;
    vector<double> values_ref, values_curr;  //参考帧和当前帧的均值
    //以参考点p1, p2为中心，块扫描，像素值归一化
    for(int x = -ncc_window_size; x <= ncc_window_size; x ++) {
        for(int y = -ncc_window_size; y <= ncc_window_size; y ++) {
            double value_ref = double(ref.ptr<uchar>(int(pt_ref(1, 0) + y))[int(pt_ref(0, 0) + x)]) / 255.0;
            mean_ref += value_ref;

            //pt_curr是极线搜索的坐标，不是整数，用到双线性插值
            double value_curr = getBilinearInterpolatedValue(curr, pt_curr + Vector2d(x, y));
            mean_curr += value_curr;

            values_ref.push_back(value_ref);  //块区域内每个像素点的值
            values_curr.push_back(value_curr);
        }
    }
    //块区域内的像素均值
    mean_ref /= ncc_area;
    mean_curr /= ncc_area;

    //计算去均值的NCC
    double numerator = 0, demoniator1 = 0, demoniator2 = 0;
    for(int i = 0; i < values_ref.size(); i++) {
        //套NCC公式，用的是去均值的NCC
        double n = (values_ref[i] - mean_ref) * (values_curr[i] - mean_curr);
        numerator += n;
        demoniator1 += (values_ref[i] - mean_ref) * (values_ref[i] - mean_ref);
        demoniator2 += (values_curr[i] - mean_curr) * (values_curr[i] - mean_curr);
    }
    return numerator / sqrt(demoniator1 * demoniator2 + 1e-10);   //防止分母为0
}

高斯融合，里面包含了三角测量，三角测量公式的推导写在注释里。

bool updateDepthFilter(
    const Vector2d &pt_ref,
    const Vector2d &pt_curr,
    const SE3d &T_C_R,
    const Vector2d &epipolar_direction,
    Mat &depth,
    Mat &depth_cov2) {

    //三角化计算深度
    SE3d T_R_C = T_C_R.inverse();
    Vector3d f_ref = px2cam(pt_ref);   //p1的归一化相机坐标
    f_ref.normalize();  //O1P的单位方向向量
    Vector3d f_curr = px2cam(pt_curr);  //p2的归一化相机坐标
    f_curr.normalize();  //O2P的单位方向向量

    //三角测量法计算深度
    // 方程
    // d_ref * f_ref = d_cur * ( R_RC * f_cur ) + t_RC
    // f2 = R_RC * f_cur
    // 转化成下面这个矩阵方程组
    // => [ f_ref^T f_ref, -f_ref^T f2 ] [d_ref]   [f_ref^T t]
    //    [ f_2^T f_ref, -f2^T f2      ] [d_cur] = [f2^T t   ]
    //这里我要注释一下是怎么转化的，依据是什么
    //三角测量公式为s2x2 = s1Rx1 + t, 也就是两个相机估的p位置一样，s2为p2深度，s1为p1深度，x1,x2为特征点p1,p2的归一化坐标(书上177页)
    //现在把相机2当成ref,倒推相机1下p的深度，
    //所以 d_ref * f_ref = d_cur * ( R_RC * f_cur ) + t_RC
    //等式两边左乘f_ref^T, 得到d_ref * f_ref^T * f_ref = d_cur * f_ref^T * f2 + f_ref^T * t_RC, 其中f2 = R_RC * f_cur
    //等式两边左乘f2^T, 得到d_ref * f2^T * f_ref = d_cur * f2^T * f2 + f2^T * t_RC, 其中f2 = R_RC * f_cur
    //整理一下就得到上面的矩阵方程组
    Vector3d t = T_R_C.translation();   //第2个相机到第1个相机的平移,三角测量公式里的t，同时还是两个相机的平移
    Vector3d f2 = T_R_C.so3() * f_curr;  //O2P的方向旋转到第1个相机，代表三角测量公式里的Rx1
    Vector2d b = Vector2d(t.dot(f_ref), t.dot(f2));
    Matrix2d A;
    A(0, 0) = f_ref.dot(f_ref); //点乘，相当于f_ref^T * f_ref
    A(0, 1) = -f_ref.dot(f2);
    A(1, 0) = -A(0, 1);
    A(1, 1) = -f2.dot(f2);
    Vector2d ans = A.inverse() * b;  //矩阵规模小，直接求逆
    Vector3d xm = ans[0] * f_ref;  //相机1估计的p位置
    Vector3d xn = ans[1] * f2 + t;  //T_R_C.so3() * (ans[1] * f_curr) + t, 相机2估计的p位置变换回相机1的坐标系下
    Vector3d p_esti = (xm + xn) / 2.0;   //p的位置取两者的平均
    double depth_estimation = p_esti.norm();  //深度值,2范数，平方和开方，即mu_obs

    //计算不确定性（以一个像素为误差）
    Vector3d p = f_ref * depth_estimation;  //p的位置,O1P向量
    Vector3d a = p - t;    //O2P向量
    double t_norm = t.norm();
    double a_norm = a.norm();
    double alpha = acos(f_ref.dot(t)/t_norm); //acos要求取值在[-1,1],所以要用单位方向向量来算acos<p,t>
    double beta = acos(-a.dot(t)/(a_norm * t_norm)); //acos<a, -t>
    Vector3d f_curr_prime = px2cam(pt_curr + epipolar_direction);  //p2沿极线方向移动一个像素得到p2'
    f_curr_prime.normalize();  //得到单位方向向量
    double beta_prime = acos(f_curr_prime.dot(-t) / t_norm);   //acos<O2p2', -t>
    double gamma = M_PI - alpha - beta_prime;
    double p_prime =  t_norm * sin(beta_prime) / sin(gamma);   //移动一个像素以后新的p的深度估计
    double d_cov = p_prime - depth_estimation;
    double d_cov2 = d_cov * d_cov;

    //高斯融合
    double mu = depth.ptr<double>(int(pt_ref(1, 0)))[int(pt_ref(0, 0))];  //深度文件中的深度值
    double sigma2 = depth_cov2.ptr<double>(int(pt_ref(1, 0)))[int(pt_ref(0, 0))];

    double mu_fuse = (d_cov2 * mu + sigma2 * depth_estimation) / (sigma2 + d_cov2); //融合后的mu
    double sigma_fuse2 = (sigma2 * d_cov2) / (sigma2 + d_cov2);  //融合后的sigma

    depth.ptr<double>(int(pt_ref(1, 0)))[int(pt_ref(0, 0))] = mu_fuse;
    depth_cov2.ptr<double>(int(pt_ref(1, 0)))[int(pt_ref(0, 0))] = sigma_fuse2;

    return true;
}