1.1 Inlier set maximization

Inlier set maximisation, a.k.a. consensus maximisation, where one seeks the model with the most number of inliers.

1.1.1 Formulation

Given a set of outlier-contaminated measurements D = ( a i , b i ) i = 1 N \mathcal{D} = {(a_i, b_i)}^N_{i=1} D=(ai,bi)i=1N, where a i ∈ R d a_i \in \mathbb{R}^d aiRd and b i ∈ R b_i \in \mathbb{R} biR, and an inlier threshold ϵ ∈ R d \epsilon \in \mathbb{R}^d ϵRd that maximises
Ψ ϵ ( x ∣ D ) = ∑ i = 1 N I ( ∣ a i T x − b i ∣ ≤ ϵ ) , \Psi_\epsilon (x | \mathcal{D}) = \sum\limits_{i=1}^{N} \mathbb{I} (|a_i^T x - b_i| \leq \epsilon), Ψϵ(xD)=i=1NI(aiTxbiϵ),
where x ∈ R d x \in \mathbb{R}^d xRd is the parameter vector, the 0/1 valued indicator function I \mathbb{I} I returns 1 if its input predicate is true, and 0 otherwise. The quantity ∣ a i T x − b i ∣ |a^T_i x - b_i| aiTxbi is the residual of the i i i-th measurement with respect to x x x, and the value given by Ψ ϵ ( x ∣ D ) \Psi_\epsilon (x | \mathcal{D}) Ψϵ(xD) is the consensus of x x x with respect to D \mathcal{D} D. Constant ϵ \epsilon ϵ is the predefined inlier threshold, and d d d is called the problem dimension.

1.2 M-estimation

1.3 Graph algorithms

2. Mathematical tools (Algorithms)

2.1 Sub-optimal methods

2.1.1 Fast heuristics

1. RANSAC: — Minimal solvers + consensus maximization + outlier-robust
Iteratively fit model on random minimal samples.
1.Good efficiency.
1.Often yield a much lower consensus than the maximum achievable;
2.Sometimes unstable, i.e., fail to gurantee the same result every time
3.No optimality guarantees
4.Running time grows exponentially with the outlier ratio

2. Locally Optimized RANSAC (LO-RANSAC):
Whenever RANSAC finds a better solution, perform model fitting on non-minimal inlier samples.
3. Graduated non-convexity (GNC) — Non-minimal solvers + M-estimation + outlier-robust
4. Local optimization – Outlier-robust
2.1.2 Deterministic approaches

Optimization on relaxed objective functions.

1.Gurantee the same result every time, i.e., good robustness.
1.Require the tuning of smoothing parameters;
2.Fail to gurantee the same result every time.

2.2 Globally optimal methods

2.2.1 Analytical solutions

A classical approach involves the computation of all the stationary points (among which there is the global minimum).

2.2.2 Brand and Bound

1.Guarantee the global optimality;
2.Good accuracy.
1.Effective on only small input sizes (small d, N and/or number of outliers o)
2.Run in worst-case exponential time

2.2.3 Convex relaxations

2.2.4 Certifiable approaches

Given an optimization problem P ( D ) \mathbb{P}(\mathbb{D}) P(D) that depends on input data D \mathbb{D} D, we say that an algorithm A \mathbb{A} A is certifiable if, after solving P ( D ) \mathbb{P}(\mathbb{D}) P(D), algorithm A \mathbb{A} A either provides a certificate for the quality of its solution (e.g., a proof of optimality, a finite bound on its sub-optimality, or a finite bound on the distance of the estimate from the optimal solution), or declares failure otherwise.

2. Outlier-robust
2.2.3 Others

Tree search + pseudo-convex (Li. CVPR’07, Chin et al. CVPR’15)
3. 特殊环境、有先验信息的情况

4. Minimal/Non-minimal case

4.1 Minimal case

Minimal solvers assume noiseless measurements and use the minimum number of measurements necessary to estimate parameters.

4.2 Non-minimal case

Non-minimal solvers account for measurement noise and estimate parameters via nonlinear least squares (NLS).

4.2.1 Closed form

4.2.2 Polynomial equations from first-order optimality conditions

4.2.3 Nonconvex problem

4.2.4 Non-catalog

5. Robust pose estimation

5.1 Absolute pose estimation

Given a set of 3D points with known position, and corresponding 2D image points, determine the location and pose of the camera.

5.1.1 Formulation

Given 3D points X i X_i Xi and corresponding 2D image points, x i x_i xi, determine the camera matrix P P P such that x i = P X i x_i = PX_i xi=PXi for all i i i. An algebraic solution to this problem is given by the DLT algorithm (R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision – 2nd Edition. Cambridge University Press, 2003., chapter 7).
where each P i T P^{iT} PiT is a 4-vector, the i i i-th row of P P P. Alternatively, one may choose to use only the first two equations:
P P P has 12 entries and (ignoring scale) 11 degrees of freedom, it is necessary to have 11 equations to solve for P P P. Since each point correspondence leads to two equations, at a minimum 5 1 2 5\frac{1}{2} 521 such correspondences are required to solve for P P P.
Given this minimum number of correspondences, the solution is exact, i.e. the space points are projected exactly onto their measured images. The solution is obtained by solving A p = 0 Ap = 0 Ap=0 where A is an 11 × 12 11 \times 12 11×12 matrix in this case. In general A A A will have rank 11, and the solution vector p p p is the 1-dimensional right null-space of A A A.

5.1.2 Algorithms Algebraic approaches

Optimize an algebraic cost function
1.Solution may get trapped in a local minimum. RANSAC

Hypothesize several poses from the sampled correspondences, and retrieve the optimal pose fitting most inliers.
1.Fail to handle high outlier ratios, i.e., limited robustness;
2.Inefficiency, i.e., number of iterations significantly increases. Inlier set maximization

Searches over the parameter space to avoid data sampling, guaranteeing the global optimality in terms of maximizing the number of inliers.
1.Guarantee the global optimality.
2. High robustness.
1.Inefficiency. Expectation-maximization

Estimate both parameters and inliers, and solve it in an iterative way.
1.Good efficiency.
1.Require a reliable initial pose and hardly converges.
2.Fail to handle high outlier ratios, e.g. >50% Robust losses

Define a cost function and minimize it, given an initial pose.
Prone to local minimum;
Fail to handle high outlier ratios, i.e., limited robustness. Globally optimal methods

5.2 Relative pose estimation

The relative pose (or relative orientation) problem is to find the relative pose of two cameras, given a set of image correspondences, determined by unknown 3D points. Often the solution to this problem involves finding the positions of the 3D points as well.

5.2.1 Formulation

Given image correspondences x i ↔ x_i \leftrightarrow xi x i ′ x^{'}_i xi, find two camera matrices P P P and P ′ P^{'} P, along with 3D points X i X_i Xi, such tah x i = P X i x_i = PX_i xi=PXi and x i ′ = P ′ X i x^{'}_i = P^{'}X_i xi=PXi. This is the problem often solved by computing the fundamental or essential matrix: acommonly used algorithm is the 8-point algorithm (R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision – 2nd Edition. Cambridge University Press, 2003., chapter 11).
The fundamental matrix is defined by the equation x ′ T F x = 0 x'^TFx=0 xTFx=0 for any pair of matching points in two images. In particular, writing x = ( x , y , 1 ) T x=(x, y, 1)^T x=(x,y,1)T and x ′ = ( x ′ , y ′ , 1 ) T x^{'}=(x^{'}, y^{'}, 1)^T x=(x,y,1)T each point match gives rise to one linear equation in the unknown entries of F F F
Denote by f f f the 9-vector made up of the entries of F F F in row-major order. Then the above equation can be expressed as a vector inner product
From a set of n n n point matches, we obtain a set of linear equations of the form
This is a homogeneous set of equations, and f f f can only be determined up to scale. F F F has 9 entries and (ignoring scale) 8 degrees of freedom, it is necessary to have 8 equations to solve for F F F. Since each point correspondence leads to only one equation, at a minimum 8 such correspondences are required to solve for F F F. For a solution to exist, matrix A A A must have rank at most 8, and if the rank is exactly 8, then the solution is unique (up to scale), and can be found by linear methods – the solution is the generator of the right null-space of A A A. The algorithm just described is the essence of a method called the 8-point algorithm for computation of the fundamental matrix.

minimizing an algebraic error

5.2.2 Related work Minimal case

The essential matrix has five degrees of freedom (three from 3D rotation, three from 3D translation and one less from the scale ambiguity) and therefore, only five correspondences (except for degenerate cases) are required for its estimation.

  8-point algorithm

8pt algorithm can be considered as the state-of-the-art initialization for further refinements.

  Iterative optimization on the essential matrix manifold

Minimal solvers or the 8-point algorithm typically provide suboptimal solutions for the non-minimal N-point problem and therefore it is a common practice to refine these initial estimates by local, iterative methods.
In this context, the essential matrix manifold has been characterized via different, yet (almost) equivalent formulations.

  Globally optimal methods

Despite its attractive as fast solvers, the above-mentioned proposals do not guarantee nor certify if the retrieved solution is optimal.

5.3 Multiple view geometry

6. 其他类型的correspondence

6.1 Affine correspondences

6.2 Semantic correspondences

6.3 未分类

8. Non-central camera system

8.1 Problem formulation


8.2 Relative pose

8.3 Generalized relative pose and scale

8.4 未分类

9. Wahba problem/rotation search

9.1 Problem formulation

Given two sets of vectors a i , b i ∈ R 3 , i = 1 , . . . , N a_i, b_i \in \mathbb{R}^3, i=1, ..., N ai,biR3,i=1,...,N, the the Wahba problem is formulated as a least squares problem
m i n R ∈ S O ( 3 ) ∑ i = 1 N w i 2 ∣ ∣ b i − R a i ∣ ∣ 2 \mathop{min}\limits_{R \in SO(3)}\sum\limits_{i=1}^N {w_i}^2||b_i-Ra_i||^2 RSO(3)mini=1Nwi2∣∣biRai2
which computes the best rotation R R R that aligns vectors a i a_i ai and b i b_i bi, and where { w i 2 } i = 1 N \left\{ w^2_i \right\}^N_{i=1} {wi2}i=1N are (known) weights associated to each pair of measurements. Here S O ( 3 ) = . { R ∈ R 3 × 3 : R T R = R R T = I 3 , d e t ( R ) = 1 } SO(3)\overset{.}{=}\left\{ R \in \mathbb{R}^{3\times3}: R^TR=RR^T=I_3, det(R)=1 \right\} SO(3)=.{RR3×3:RTR=RRT=I3,det(R)=1} is the 3D Special Orthogonal Group containing proper 3D rotation matrices and I d I_d Id denotes the identity matrix of size d d d. This problem is known to be a maximum likelihood estimator for the unknown rotation when the ground-truth correspondences ( a i , b i ) (a_i, b_i) (ai,bi) are known and the observations are corrupted with zero-mean isotropic Gaussian noise. In other words, this problem computes an accurate estimate for R R R when the observations can be written as b i = R a i + ϵ i ( i = 1 , . . . , N ) b_i = Ra_i+\epsilon_i (i=1,...,N) bi=Rai+ϵi(i=1,...,N), where ϵ i \epsilon_i ϵi is isotropic Gaussian noise.

9.2 Outlier-free

9.3 Robust Wahba

9.3.1 Local Methods

9.3.2 Global Methods

9.3.3 Outlier-removal Methods

10. 3D registration

10.1 Simultaneous pose and correspondence

10.1.1 Local methods

10.1.2 Global method

  • 3D registration+high outlier rate+graph matching+approximate vertex cover O. Enqvist, K. Josephson, and F. Kahl, “Optimal correspondences from pairwise constraints,” in Intl. Conf. on Computer Vision (ICCV), 2009, pp. 1295–1302.

10.1.3 未分类

10.2 Object-model registration


  1. 几何完整性
  2. 小细节

解决方案:object-model 全局位姿优化(registration) or 3D model retrieval,也就是配准相机观测的物体和CAD模型


  1. model上的点非常有结构和光滑,但是scan相机观测物体的噪声很多,而且观测是局部的,也就是不完整的
  2. high-level几何结构比较相似,但是low-level几何特征就可以相差很多
10.3 Outdoor-indoor registration

10.4 Day-night registration

10.5 Non-rigid registration

11. 3D shape reconstruction

11.1 Problem formulation

Given N N N pixel measurements Z = [ z 1 , . . . , z N ] ∈ R 2 × N Z = [z_1, ..., z_N] \in \mathbb{R}^{2 \times N} Z=[z1,...,zN]R2×N (the 2D landmarks), landmarks), generated from the projection of points belonging to an unknown 3D shape S ∈ R 3 × N S \in \mathbb{R}^{3 \times N} SR3×N onto an image. Further assume the shape S S S that can be represented as a linear combination of K K K predefined basis shape B k ∈ R 3 × N B_k \in \mathbb{R}^{3 \times N} BkR3×N, i . e . S = ∑ k = 1 K c k B k i.e. S = \sum\limits_{k=1}^{K} c_k B_k i.e.S=k=1KckBk, where { c k } k = 1 K \left \{ c_k \right \}_{k=1}^K {ck}k=1K are (unknown) shape coefficients. Then, the generative model of the 2D landmarks reads:
z i = Π R ( ∑ k = 1 K c k B k ) + t + ϵ i , i = 1 , . . . , N , z_i = \Pi R \left( \sum\limits_{k=1}^{K} c_k B_k \right)+t+\epsilon_i, i = 1, ..., N, zi=ΠR(k=1KckBk)+t+ϵi,i=1,...,N, where B k i B_{ki} Bki denotes i i i-th 3D point on the k k k-th basis shape, ϵ i ∈ R 2 \epsilon_i \in \mathbb{R}^{2} ϵiR2 models the measurement noise, and Π \Pi Π is the (known) weak perspective projection matrix:
Π = [ s x 0 0 0 s y 0 ] , \Pi = \begin{bmatrix} s_x \quad 0\quad 0\\ 0 \quad s_y \quad 0 \end{bmatrix}, Π=[sx000sy0], with s x s_x sx and s y s_y sy being constants, R ∈ S O ( 3 ) R \in SO(3) RSO(3) and t ∈ R 2 t \in \mathbb{R}^{2} tR2 model the (unknown) rotation and translation of the shape S S S relative to the camera (only a 2D translation can be estimated). The shape reconstruction problem consists in the joint estimation of the shape parameters { c k } k = 1 K \left \{ c_k \right \}_{k=1}^K {ck}k=1K and the camera pose ( R , t ) (R, t) (R,t).

12. Rotation averaging/synchronization

Rotation averaging, a.k.a. multiple rotation averaging or SO(3) synchronisation, is the problem of estimating absolute rotations (orientations w.r.t. a common coordinate system) from a set of distinct estimated relative rotation measurements.

12.1 Problem formulation

The input to rotation averaging is a set of noisy relative rotations { R ~ i j } \left \{ \tilde{R}_{ij}\right \} {R~ij}, where each R ~ i j \tilde{R}_{ij} R~ij is a measurement of the orientation difference between cameras i i i and j j j which overlap in view. From the relative rotations, rotation averaging aims to recover the absolute rotations { R ~ i } i = 1 N \left \{ \tilde{R}_i \right \}^N_{i=1} {R~i}i=1N which represent the orientations of the cameras. In the ideal case where there is no noise in the relative rotations { R ~ i j } \left \{ \tilde{R}_{ij}\right \} {R~ij}, R ~ i j = R j R i T \tilde{R}_{ij} = R_jR^T_i R~ij=RjRiT The input relative rotations { R ~ i j } \left \{ \tilde{R}_{ij}\right \} {R~ij} define a camera graph G = ( V , E ) \mathcal{G}=(\mathcal{V},\mathcal{E}) G=(V,E), where V = { 1 , . . . , n } \mathcal{V}=\left \{ 1,...,n \right \} V={1,...,n} is the set of cameras, and ( i , j ) ∈ E (i, j) \in \mathcal{E} (i,j)E is an edge in G \mathcal{G} G if the relative rotation R ~ i j \tilde{R}_{ij} R~ij between cameras i i i and j j j is measured.
However, in the presence of noise, Rotation averaging is usually posed as a nonlinear optimization problem in a least-metric sense with nonconvex domain
m i n R 1 , . . . , R n ∈ S O ( 3 ) ∑ ( i , j ) ∈ E d ( R j R i T , R ~ i j ) p \mathop{min}\limits_{ R_1,..., R_n \in SO(3) }\sum\limits_{(i, j) \in \mathcal{E}} d(R_jR_i^T, \tilde{R}_{ij})^p R1,...,RnSO(3)min(i,j)Ed(RjRiT,R~ij)p where d : S O ( 3 ) × S O ( 3 ) ↦ R d: SO(3) \times SO(3) \mapsto \mathbb{R} d:SO(3)×SO(3)R is a distance function that measures the deviation from R ~ i j = R j R i T \tilde{R}_{ij} = R_jR^T_i R~ij=RjRiT based on measured and estimated quantities.

12.2 Single rotation averaging

12.2 Multiple rotation averaging

13. Pose graph estimation/optimization

Camera orientations and positions are jointly optimized.

14. 未归类

Relative rotation: Kyle Wilson and Noah Snavely. Robust global translations with 1DSfM. In Eur. Conf. Comput. Vis., pages 61–75, 2014.

