MVSNet单应性变换推导

朽一

已于 2022-04-12 17:14:59 修改

阅读量3.3k

点赞数 14

分类专栏： MVS-DL 文章标签：计算机视觉单应性变换 MVSnet slam 平面扫描

于 2021-05-18 19:56:02 首次发布

本文链接：https://blog.csdn.net/qq_43027065/article/details/116946686

版权

MVS-DL 专栏收录该内容

17 篇文章 34 订阅

订阅专栏

MVSNet系列中基本都用到了可微的单应性变换，但是很难找到详细的推导过程，在这里整理出来。

MVSNet单应性变换分析推导

一、推导
二、分析tensorflow源码公式
三、分析pytorch源码公式

一、推导

单应性：相机从不同角度拍摄同一物体得到的图像可以用单应性矩阵进行变换

1.世界坐标系到像素坐标系的转换

pi,pci,PW,分别为像素坐标系，相机坐标系，世界坐标系坐标
参考图像I1相机：K1,R1,C1,Z1分别为相机内参矩阵、旋转矩阵、平移矩阵、深度
原图像Ii相机：Ki,Ri,Ci,Zi分别为相机内参矩阵、旋转矩阵、平移矩阵、深度
由参考图像和源图像的坐标变换关系得①②：

2.沿参考图像相机坐标系的Z轴建立代价体

即Z轴为深度方向，对于代价体给定深度的平面，可由③表示。其中，nT=(0,0,1)(后边会用到)，为平面法向量。

3.从公式②开始推导

p0和pi是三维空间同一点的成像，即Pw相同，先将p0换到世界坐标系Pw，再从世界坐标系变换到像素坐标系pi：

代入

得,

即

注：
1.旋转矩阵R为正交矩阵，转置等于逆
2.像素坐标pi，p1都为齐次坐标，与系数无关，省去Z1/Zi。例如[a,b,z]→[a/z,b/z,1]=[u,v,1]，x.[a,b,z]=[xa,xb,xz]→[xa/xz,xb/xz,1]=[a/z,b/z,1]

二、分析tensorflow源码公式

def get_homographies_inv_depth(left_cam, right_cam, depth_num, depth_start, depth_end):

    with tf.name_scope('get_homographies'):
        # cameras (K, R, t)
        R_left = tf.slice(left_cam, [0, 0, 0, 0], [-1, 1, 3, 3])
        R_right = tf.slice(right_cam, [0, 0, 0, 0], [-1, 1, 3, 3])
        t_left = tf.slice(left_cam, [0, 0, 0, 3], [-1, 1, 3, 1])
        t_right = tf.slice(right_cam, [0, 0, 0, 3], [-1, 1, 3, 1])
        K_left = tf.slice(left_cam, [0, 1, 0, 0], [-1, 1, 3, 3])
        K_right = tf.slice(right_cam, [0, 1, 0, 0], [-1, 1, 3, 3])

        # depth 
        depth_num = tf.reshape(tf.cast(depth_num, 'int32'), [])

        inv_depth_start = tf.reshape(tf.div(1.0, depth_start), [])
        inv_depth_end = tf.reshape(tf.div(1.0, depth_end), [])
        inv_depth = tf.lin_space(inv_depth_start, inv_depth_end, depth_num)
        depth = tf.div(1.0, inv_depth)

        # preparation
        num_depth = tf.shape(depth)[0]
        K_left_inv = tf.matrix_inverse(tf.squeeze(K_left, axis=1))
        R_left_trans = tf.transpose(tf.squeeze(R_left, axis=1), perm=[0, 2, 1])
        R_right_trans = tf.transpose(tf.squeeze(R_right, axis=1), perm=[0, 2, 1])

        fronto_direction = tf.slice(tf.squeeze(R_left, axis=1), [0, 2, 0], [-1, 1, 3])          # (B, D, 1, 3)

        c_left = -tf.matmul(R_left_trans, tf.squeeze(t_left, axis=1))
        c_right = -tf.matmul(R_right_trans, tf.squeeze(t_right, axis=1))                        # (B, D, 3, 1)
        c_relative = tf.subtract(c_right, c_left)        

        # compute
        batch_size = tf.shape(R_left)[0]
        temp_vec = tf.matmul(c_relative, fronto_direction)
        depth_mat = tf.tile(tf.reshape(depth, [batch_size, num_depth, 1, 1]), [1, 1, 3, 3])

        temp_vec = tf.tile(tf.expand_dims(temp_vec, axis=1), [1, num_depth, 1, 1])

        middle_mat0 = tf.eye(3, batch_shape=[batch_size, num_depth]) - temp_vec / depth_mat
        middle_mat1 = tf.tile(tf.expand_dims(tf.matmul(R_left_trans, K_left_inv), axis=1), [1, num_depth, 1, 1])
        middle_mat2 = tf.matmul(middle_mat0, middle_mat1)

        homographies = tf.matmul(tf.tile(K_right, [1, num_depth, 1, 1])
                     , tf.matmul(tf.tile(R_right, [1, num_depth, 1, 1])
                     , middle_mat2))

    return homographies

源码公式：

其中，fronto_direction为参考图像旋转矩阵的第三行。

对比分析

在这里插入图片描述

推导的公式与源码不同之处：

1.对于红色和蓝色部分，即平移量之差与负号

原因：对于平移旋转有两种做法

①：先平移，再旋转

②：先旋转，再平移

所以，

另一方面，可以这样考虑：
对于先旋转再平移

即 -R.T.t为相机坐标系原点在世界坐标系的坐标。
对于先平移再旋转

即平移量C为相机坐标系原点在世界坐标系的坐标。
所以，两平移量之差等于两相机坐标系原点对应世界坐标系坐标之差。

结论：由于数据集提供的平移量是t（先旋转)，所以将其转换为了C（先平移）

2.对于绿色部分

由于nT=(0,0,1),所以nT.R1相当于取旋转矩阵R1的最后一行，即代码中的fronto_direction。

三、分析pytorch源码公式

def homo_warping(src_fea, src_proj, ref_proj, depth_values):
    # src_fea: [B, C, H, W]
    # src_proj: [B, 4, 4]
    # ref_proj: [B, 4, 4]
    # depth_values: [B, Ndepth]
    # out: [B, C, Ndepth, H, W]
    batch, channels = src_fea.shape[0], src_fea.shape[1]
    num_depth = depth_values.shape[1]
    height, width = src_fea.shape[2], src_fea.shape[3]

    with torch.no_grad():
        proj = torch.matmul(src_proj, torch.inverse(ref_proj))
        rot = proj[:, :3, :3]  # [B,3,3]
        trans = proj[:, :3, 3:4]  # [B,3,1]

        y, x = torch.meshgrid([torch.arange(0, height, dtype=torch.float32, device=src_fea.device),
                               torch.arange(0, width, dtype=torch.float32, device=src_fea.device)])
        y, x = y.contiguous(), x.contiguous()
        y, x = y.view(height * width), x.view(height * width)
        xyz = torch.stack((x, y, torch.ones_like(x)))  # [3, H*W]
        xyz = torch.unsqueeze(xyz, 0).repeat(batch, 1, 1)  # [B, 3, H*W]
        rot_xyz = torch.matmul(rot, xyz)  # [B, 3, H*W]
        rot_depth_xyz = rot_xyz.unsqueeze(2).repeat(1, 1, num_depth, 1) * depth_values.view(batch, 1, num_depth,
                                                                                            1)  # [B, 3, Ndepth, H*W]
        proj_xyz = rot_depth_xyz + trans.view(batch, 3, 1, 1)  # [B, 3, Ndepth, H*W]
        proj_xy = proj_xyz[:, :2, :, :] / proj_xyz[:, 2:3, :, :]  # [B, 2, Ndepth, H*W]
        proj_x_normalized = proj_xy[:, 0, :, :] / ((width - 1) / 2) - 1
        proj_y_normalized = proj_xy[:, 1, :, :] / ((height - 1) / 2) - 1
        proj_xy = torch.stack((proj_x_normalized, proj_y_normalized), dim=3)  # [B, Ndepth, H*W, 2]
        grid = proj_xy

    warped_src_fea = F.grid_sample(src_fea, grid.view(batch, num_depth * height, width, 2), mode='bilinear',
                                   padding_mode='zeros')
    warped_src_fea = warped_src_fea.view(batch, channels, num_depth, height, width)

    return warped_src_fea

推导：

注：
1.Z1=d，为参考相机下的深度
2.n.T.Pc1取Pc1Z坐标，即深度d，与分母约掉
3.之前已推导过

朽一

关注

14
点赞
踩
45

收藏

觉得还不错? 一键收藏
4
评论
MVSNet单应性变换推导

[1] Gallup D , Frahm J M , Mordohai P , et al. Real-Time Plane-Sweeping Stereo with Multiple Sweeping Directions[C]// IEEE Conference on Computer Vision & Pattern Recognition. IEEE, 2007.[1] Yang R , Pollefeys M . Multi-Resolution Real-Time Stereo on
复制链接

扫一扫

专栏目录