Structure from Motion（附python代码）

ning_ww

于 2021-11-03 15:48:48 发布

阅读量2.6k

点赞数

分类专栏： 3D视觉文章标签： python

本文链接：https://blog.csdn.net/bb_sy_w/article/details/121099603

版权

3D视觉专栏收录该内容

12 篇文章

订阅专栏

本文详细介绍了SFM（Structure from Motion）三维重建算法的后三步骤：构建2D轨迹、解决SfM模型及利用束调整优化模型。涵盖数学原理、代码实现等内容。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

SFM算法的前两步：特征点提取、匹配，可以看我的这篇文章：《sift、surf、orb 特征提取——三维重建》，这里主要详细介绍后三步。
在这里插入图片描述

这个好像有点用，可以作为参考

3. Construct 2D tracks from the matches

当匹配关系建立后,需要生成track列表，指同名点的相片集合，比如第一幅图
的13号点和第二幅的14号点及第五幅的115号点是同名点，则(1,13)、
(2,14)、 (5,115)是属于一个track。据此可以生成一个track 集合,同时生
成track的时候也需要剔除无用匹配:

如果一个track包含同一幅图多次，则应该剔除，这是因为同一幅图的多
个特征点都匹配了同一个点，则匹配关系肯定是错误的。
如果track太少,应该剔除,一般取2（最小值）,是指只有两幅图有同一个点,三维重建的信息过少,容易产生误差。

4. Solve for the SfM model from the 2D tracks

数学原理

第四步找初始化像对,目的是找到相机基线最大的像对,采用RANSC算法四点法计算单应矩阵,满足单应矩阵的匹配点称为内点,不满足单应矩阵的称为外点，根据单应矩阵公式可知当T（~~？这是个什么东西？~~ ）越小时,内点占比越高,也就是低视差现象越明显，详见：Homography 知多少？
因此找到一个内点占比最小的像对就是初始化像对，当然它前提必须满足可重建,这个可以通过匹配点个数保证。
第四步是初始化像对的相对定向,根据RANSC八点法计算本征矩阵,可通过对本征矩阵SVD分解得到第二个图像的R、T,在这一步需要进行畸变校正，然后根据R、T和矫正后的像点坐标三角化计算出三维点（什么是坐标三角化？如何操作？详见：三角化求深度值（求三位坐标））。

Code

from utils import *
import logging


class Baseline:
    """Represents the functions that compute the baseline pose from the initial images of a reconstruction"""

    def __init__(self, view1, view2, match_object):

        self.view1 = view1  # first view
        self.view1.R = np.eye(3, 3)  # identity rotation since the first view is said to be at the origin
        '''
        array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])
        '''
        self.view2 = view2  # second view
        self.match_object = match_object  # match object between first and second view

    def get_pose(self, K):
        """Computes and returns the rotation and translation components for the second view"""

        F = remove_outliers_using_F(self.view1, self.view2, self.match_object)#return fundamental matrix
        E = K.T @ F @ K  # compute the essential matrix from the fundamental matrix
        #E：本质矩阵，从F到E？如何变过来的，K是intrinsic matrix
        #python中 @的作用：1.装饰符 2.矩阵乘法
        logging.info("Computed essential matrix")
        logging.info("Choosing correct pose out of 4 solutions")

        return self.check_pose(E, K)

    def check_pose(self, E, K):
        """Retrieves the rotation and translation components from the essential matrix by decomposing it and verifying the validity of the 4 possible solutions"""

        R1, R2, t1, t2 = get_camera_from_E(E)  # decompose E
        #从本质矩阵恢复摄像机参数矩阵
        if not check_determinant(R1):
            R1, R2, t1, t2 = get_camera_from_E(-E)  # change sign of E if R1 fails the determinant test
            #？
        # solution 1
        reprojection_error, points_3D = self.triangulate(K, R1, t1)#筛选误匹配
        #
        # check if reprojection is not faulty and if the points are correctly triangulated in the front of the camera
        if reprojection_error > 100.0 or not check_triangulation(points_3D, np.hstack((R1, t1))):

            # solution 2
            reprojection_error, points_3D = self.triangulate(K, R1, t2)
            if reprojection_error > 100.0 or not check_triangulation(points_3D, np.hstack((R1, t2))):

                # solution 3
                reprojection_error, points_3D = self.triangulate(K, R2, t1)
                if reprojection_error > 100.0 or not check_triangulation(points_3D, np.hstack((R2, t1))):

                    # solution 4
                    return R2, t2

                else:
                    return R2, t1

            else:
                return R1, t2

        else:
            return R1, t1

    def triangulate(self, K, R, t):
        """Triangulate points between the baseline views and calculates the mean reprojection error of the triangulation"""

        K_inv = np.linalg.inv(K)
        P1 = np.hstack((self.view1.R, self.view1.t))
        P2 = np.hstack((R, t))#本质矩阵分解得来的

        # only reconstructs the inlier points filtered using the fundamental matrix
        pixel_points1, pixel_points2 = get_keypoints_from_indices(keypoints1=self.view1.keypoints,
                                                                  keypoints2=self.view2.keypoints,
                                                                  index_list1=self.match_object.inliers1,
                                                                  index_list2=self.match_object.inliers2)

        # convert 2D pixel points to homogeneous coordinates
        pixel_points1 = cv2.convertPointsToHomogeneous(pixel_points1)[:, 0, :]#convertPointsToHomogeneous把点从欧式空间转换到齐次空间，
        pixel_points2 = cv2.convertPointsToHomogeneous(pixel_points2)[:, 0, :]

        reprojection_error = []

        points_3D = np.zeros((0, 3))  # stores the triangulated points

        for i in range(len(pixel_points1)):
            u1 = pixel_points1[i, :]
            u2 = pixel_points2[i, :]

            # convert homogeneous 2D points to normalized device coordinates
            #与内参矩阵进行运算，叫做normalized
            u1_normalized = K_inv.dot(u1)
            u2_normalized = K_inv.dot(u2)

            # calculate 3D point
            point_3D = get_3D_point(u1_normalized, P1, u2_normalized, P2)

            # calculate reprojection error
            error = calculate_reprojection_error(point_3D, u2[0:2], K, R, t)#重建误差
            reprojection_error.append(error)

            # append point
            points_3D = np.concatenate((points_3D, point_3D.T), axis=0)

        return np.mean(reprojection_error), points_3D

5.Refine the SfM model using bundle adjustment

加入更多图像，以第三副图为例，根据第四步生成的三维点和第三副图与前两图的track关系，可以反算第三副图的R、T,然后继续三角化计算出
更多的三维点，这样反复重复第5步，最后就会得到所有像片的POSE (R、T)和三维点，这就是稀疏重建SFM的结果了。
从第四步开始需要进行光束法平差Bundle+Adjustment，是一个非线性优化的过程，目的是使重建误差降低到最小，通过调整POSE和三维点使反向投影差最小，如果相机没有标定,还应该将焦距也参与平差。
Bundle+Adjustment是一个迭代的过程，在一次迭代过后，将所有三维点反向投影到各自相片的像素坐标并分别与初始坐标比对，如果大于某个阈值，则应将其从track中去掉，如果track中已经小于2个了，则整个track也去掉，一直优化到没有点可去为止

完整参考代码：structure-from-motion