Photo Tourism:Exploring Photo Collections in 3D中关于section4的学习笔记

关于Photo Tourism这篇论文的学习笔记
其中主要参考:https://blog.csdn.net/NSSC_K/article/details/89217993https://blog.csdn.net/kksc1099054857/article/details/77185939两位大佬的文章
在正式开始些论文笔记之前,根据上述第二篇文章大体说明下三维重建的基本过程,之后再对Photo Tourism中关于section4的笔记

三维重建的基本过程

基本的过程如下:

  1. 图像预处理:改善图像的视觉效果,方便后面对图像的运算。
  2. 特征点的检测与匹配:多数情况下使用SIFT提取出特征点,然后获得特征点的描述向量后进行特征点的匹配。
  3. 相机标定:获得相机的内部参数
  4. 计算基础矩阵和本质矩阵:获得相机的外部参数,并且获得空间点的三维坐标,常用RANSAC算法
  5. 稠密点云的网格化

以上基本上是对三维重建过程的总结,具体可见:https://blog.csdn.net/kksc1099054857/article/details/77185939
论文中section4主要是对步骤2 3 4的讲解

论文中的section4

以下会对论文每一段做些笔记,本节的题目为 Reconstructing Cameras and Sparse Geometry (相机重建与稀疏几何)

Our system requires accurate information about the relative location, orientation, and intrinsic parameters such as focal length for each photograph in a collection, as well as sparse 3D scene geometry. Some features of our system require the absolute locations of the cameras, in a geo-referenced coordinate frame. Some of this information can be provided with GPS devices and electronic compasses, but the vast majority of existing photographs lack such information. Many digital cameras embed focal length and other information in the EXIF tags of image files. These values are useful for initialization,but are sometimes inaccurate.

第一段的话,主要是讲一下重建时需要的一些信息,但是又的时候相机的信息缺失或者不准确

In our system, we do not rely on the camera or any other piece of equipment to provide us with location, orientation,or geometry. Instead, we compute this information from the images themselves using computer vision techniques.(核心,我们不是通过第一段讲的那些固有的信息计算到需要的参数,而是通过计算机视觉里的一些算法和技巧获得)We first detect feature points in each image(找到feature points),then match feature points between pairs of images(feature points的匹配),and finally run an iterative, robust SfM procedure to recover the camera parameters(重建相机参数). Because SfM only estimates the relative position of each camera, and we are also interested in absolute coordinates (e.g.,latitude and longitude) (这个是指的在地图上准确的标出建筑的位置,不是重点), we use an interactive technique to register the recovered cameras to an overhead map. Each of these steps is described in the following subsections.

第二段的话,主要是承接上面第一段讲的内容,说明下面字节都要讲什么东西

4.1 Keypoint detection and matching (关键点的检测与匹配)

The first step is to find feature points in each image. We use the SIFT keypoint detector [Lowe 2004], because of its invariance to image transformations.(我们在寻找关键点时使用的算法是SIFT) A typical image contains several thousand SIFT keypoints. Other feature detectors could also potentially be used; several detectors are compared in the work of Mikolajczyk, etal.[2005]. (不用看) In addition to the keypoint locations themselves,SIFT provides a local descriptor for each keypoint. Next,for each pair of images, we match keypoint descriptors between the pair, using the approximate nearest neighbors package(欧式距离最小,有一定的阈值) of Arya, et al.[1998], then robustly estimate a fundamental matrix for the pair using RANSAC [Fischler and Bolles 1987]. During each RANSAC iteration, we compute a candidate fundamental matrix (根据两幅图像的匹配点计算两个图像之间的基础矩阵)using the eight-point algorithm(经典的8点算法) [Hartley and Zisserman 2004], followed by non-linear refinement. Finally,we remove matches that are outliers to the recovered fundamental matrix. If the number of remaining matches is less than twenty(这里是文章中设置的一个阈值),we remove all of the matches from consideration.(如果内点数小于20个,则不予以考虑)

第一段的话,主要是讲如何提取keypoint 并且如何匹配,具体的方法请参考SIFT和RANSAC算法

After finding a set of geometrically consistent matches between each image pair(就是第一段讲的,寻找matches), we organize the matches into tracks,where a track is a connected set of matching keypoints across multiple images.(这个地方感觉一定要理解正确,不然后面的东西都不好懂,这里说的track是指什么呢?我们知道,空间中的一个点,可能在多幅图像中出现,并且出现时都会被作为keypoint出现,这样所有以这个空间点为基础匹配的keypoints(来自不同的image,它们对应的空间点都是一样的)就形成了一个track) If a track contains more than one keypoint in the same image, it is deemed inconsistent.(理解了track的定义,这里就很容易懂了,如果我在一个image中找到了两个keypoint,并且它们都是空间中同一点的投影,很显然,这个track就不是一致的) We keep consistent tracks containing at least two keypoints for the next phase of the reconstruction procedure.(在这里我们保留连续的tracks并且这个track里面包含的keypoints必须大于等于2,否则,后面就没有办法进行重构了)

第二段的话,主要讲了一个重要的概念track,并且暗含了一些它具有的性质

4.2 Structure from motion(运动重建?)

不知道怎么翻译,wiki的翻译是:运动推断结构,emmmmm

Next,we recover a set of camera parameters and a 3D location for each track.(根据上面获取的track,计算出相机的参数以及3D位置) The recovered parameters should be consistent, in that the reprojection error(重投影误差), i.e., the sum of distances between the projections of each track and its corresponding image features,is minimized.(重投影误差的最小化) This minimization problem is formulated as a non-linear least squares problem(最小二乘法解决)(see Appendix A) and solved with algorithms such as Levenberg-Marquardt[NocedalandWright1999]. Such algorithms are only guaranteed to find local minima, and large-scale SfM problems are particularly prone to getting stuck in bad local minima,(容易陷入到局部最小解)so it is important to provide good initial estimates of the parameters. Rather than estimating the parameters for all cameras and tracks at once,we take an incremental approach,adding in one camera at a time.(为了防止陷入到局部最小解,我们采用一种增长策略!)

第一段的话,主要讲了我们在计算相机参数和3d位置时的函数依据:最小化重投影误差

We begin by estimating the parameters of a single pair of cameras. This initial pair(相机对) should have a large number of matches(由这两个相机拍出来的图片上keypoint的匹配要很多), but also have a large baseline(并且有大量的基线), so that the 3D locations of the observed points are well-conditioned. (目的时这样的点具有良好的匹配性质)We therefore choose the pair of images that has the largest number of matches,subject to the condition that those matches can not be well-modeled by a single homography, to avoid degenerate cases.(感觉这里说的主要意思是,点多了之后,在获得的参数就会更具有代表性,鲁棒性更好)

第二段的话,主要讲了在初始化选取pair of cameras的注意事项

Next, we add another camera to the optimization.(在初始化两个相机之后,我们就正式开始增加相机) We select the camera that observes the largest number of tracks whose 3D locations have already been estimated(我们选取观察最多tracks的相机,并且track的3D位置已经被估算), and initialize the new camera’s extrinsic parameters using the direct linear transform (DLT) technique (使用DLT计算出相机的外参矩阵)[Hartleyand Zisserman 2004] inside a RANSAC procedure. The DLT also gives an estimate of the intrinsic parameter matrix K (同时给出内参矩阵)as a general upper-triangular matrix. We use K and the focal length estimated from the EXIF tags of the image to initialize the focal length of the new camera(see Appendix A for more details). (根据计算得到的参数,初始化新的相机)

第三段的话,主要说明了,如何选择将要加入的相机,并对加入的相机进行初始化,获得它的一部分参数

Finally, we add tracks observed by the new camera into the optimization.(将新相机观测到的track添加进来) A track is added if it is observed by at least one other recovered camera, and if triangulating the track gives a well conditioned estimate of its location.(如果这个track被至少其它的一个被恢复的相机观测到,并且很容易求出它的3d位置–使用三角定位法)This procedure is repeated, one camera at a time, until no remaining camera observes any reconstructed 3D point.(每次添加一个相机,直到每一个所有可以被观测的3D点都添加进来)To minimize the objective function at every iteration, we use the sparse bundle adjustment library of Lourakis and Argyros [2004]. After reconstructing a scene, we optionally run a post-processing step to detect 3D line segments in the scene using a line segment reconstruction technique, as in the work of Schmidand Zisserman[1997]. (可选步骤)

第四段的话,主要讲了添加一个相机之后的操作,即将该相机观测到的并且符合条件的track添加进来,重复此步骤,一直到所有可以被观测的点都添加进来

For increased robustness and speed, we make a few modifications to the basic procedure outlined above. First,after every run of the optimization, we detect outlier tracks that contain at least one keypoint with a high reprojection error, and remove these tracks from the optimization. We then rerun the optimization, rejecting outliers after each run,until no more outliers are detected. Second, rather than adding a single camera at a time into the optimization, we add multiple cameras. We first find the camera with the greatest number of matches, M,to existing 3D points,then add any camera with at least 0.75M matches to existing 3D points.
Figures 2 and 3show reconstructed cameras ( rendered as frusta) and 3D feature points for several famous world sites reconstructed with this method.
The total running time of the SfM procedure for the datasets we experimented with ranged from a few hours (for Great Wall, 120 photos processed and matched, and 82 ultimately registered) to about two weeks (for Notre Dame, 2,635 photos processed and matched, and 597 photos registered). The running time is dominated by the iterative bundle adjustment,which gets slower as more photos are added, and as the amount of coupling between cameras increases(e.g.,when many cameras observe the same set of points).

最后这几段讲的就是如何提高运算速率和鲁棒性

4.3 Geo-registration

4.4 Scenerepresentation

剩下的这两节就不是什么重点了,在这里就不写了,以及后面的大多是对于该系统的一些讲解,不是三维重建的重点了。

以上就是本人对这篇论文中关于section4部分的理解了,大多参考其它的文章,欢迎指错。

  • 5
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值