Online Stereo Camera Calibration From Scratch 论文翻译

Online Stereo Camera Calibration From Scratch

https://www.mrt.kit.edu/z/publ/download/2017/rehder_iv17.pdf

Abstract

Stereo cameras are among the most promising sensors for automated driving. For their deployment, however, calibration should be automated and possible in-situ. We propose a restructuring of bundle adjustment into an incremental online calibration system. It allows us to estimate all observable camera parameters on the fly. Both simulations and experiments with real world cameras show its capability to calibrate stereo rigs in real time while driving. With this method, cameras can be employed with almost no calibration overhead. Only the non-observable parameter of scale has to be defined in advance.

双目相机是最有前景的自动驾驶传感器之一。然而,对于它们的部署,校准应该是自动化的,并且可能在现场进行。我们建议将Bundle Adjustment重构为一个增量在线校准系统。它允许我们在运行中估计所有可观测的摄像机参数。仿真和实际摄像机的实验都表明,它能够在驾驶时实时校准双目相机。通过这种方法,摄像机几乎可以在没有校准开销的情况下使用。只有不可观测的尺度参数必须事先定义。

如何将BA重构为一个在线增量校准系统?

对于时间 t-1到 t 之间的运动,我们将跟踪landmarks的重投影误差最小化到图像中。我们假设连续帧之间的运动变化很小,因此我们只需要在一个单一的优化步骤中解决线性化问题(D)
迭代地改进了内参和外参的校准。然后仅对每个小图像序列的全Bundle Adjustment部分优化

标定目的?

标定的任务是找到一组(双目相机)内参和外参,以及landmark位置,使所有观测landmarks的总重投影误差最小化

( X ⃗ W N , τ , T M , T c , K ) = arg ⁡ min ⁡ ∑ i = 0 N e ⃗ ^ i ⊤ e ⃗ ^ i \left(\vec{X}_{W}^{N}, \tau, T^{M}, T_{c}, K\right)=\arg \min \sum_{i=0}^{N} \hat{\vec{e}}_{i}^{\top} \hat{\vec{e}}_{i} (X WN,τ,TM,Tc,K)=argmini=0Ne ^ie ^i

1. Introduction

Along the road towards automated vehicles, stereo cameras have always played a significant role. To this day, they are widely applied in experimental vehicles for automated driving. Also, quite recently, they have been introduced into series production cars as sensors for driver assistance functions.

在通往自动化车辆的道路上,双目摄像机一直扮演着重要的角色。时至今日,它们被广泛应用于自动驾驶的实验车辆中。此外,最近,它们已被引入批量生产的汽车中,作为驾驶员辅助功能的传感器。

While stereo cameras provide rich information for scene understanding, they are extremely sensitive to calibration. Even smallest errors in calibration degrade performance significantly. These errors may be due to faulty initial calibration but also occur over time, e.g. due to vibrations, thermal stress or aging of materials. Thus, special attention must be paid to calibration of stereo camera systems.

虽然双目摄像机为场景理解提供了丰富的信息,但它们对校准非常敏感。即使是最小的校准误差都会显著降低性能。这些误差可能是由于初始校准的错误,但随着时间的推移也会发生,例如由于振动、热应力或材料的老化。因此,双目摄像机系统的标定必须引起足够的重视。

For accurate calibration, the pattern based method of Zhang et al. [1] lays the foundation of most available tools. They require recordings of known calibration targets which can be identified in the images later. With the known structure of the targets, the calibration parameters can be inferred. A great diversity of different toolboxes is available [3], [4], [5], including the well-known MATLAB toolbox by Bouguet et al. [2]. All these processes require time consuming recording and processing steps as well as expert knowledge for handling the tools. Also, the camera parameters may change over time, making frequent recalibration necessary.

为了精确校准,张等人的基于模式的方法[1]奠定了大多数可用工具的基础。他们要求对已知的校准目标进行录音,这些目标可以在以后的图像中识别。根据已知的目标结构,可以推导出校准参数。大量不同的工具箱是可用的[3] ,[4] ,[5] ,包括著名的 matlab 工具箱 bouguet al. [2]。所有这些过程都需要耗时的记录和处理步骤,以及处理工具的专家知识。同时,摄像机的参数可能会随着时间的推移而改变,因此需要频繁地重新校准。

In order to estimate calibration parameters from unknown structure, camera self-calibration has been proposed. For the case of simple, undistorted cameras, self-calibration was introduced in [6], [7]. However, this fails if lens distortions affect the imaging process. To overcome this, joint optimization of scene structure, camera motion and camera parameters was proposed as the so-called bundle-adjustment [8], [9]. Since bundle adjustment requires a joint global optimization of all parameters, it is not feasible to apply these methods with real time constraints.

为了从未知结构中估计标定参数,提出了摄像机自标定方法。对于简单、无畸变的摄像机,在[6]、[7]中引入了自校准。但是,如果镜头畸变影响成像过程,则此操作失败。为了克服这一问题,提出了场景结构、摄像机运动和摄像机参数的联合优化,即所谓的bundle-adjustment[8],[9]。由于bundle-adjustment需要对所有参数进行联合全局优化,因此在实时约束条件下应用这些方法是不可行的。

The works of Dang et al. aim to tackle these problems [10], [11]. They propose a reduced order bundle adjustment as well as parameter tracking using an Extended Kalman Filter. While this method can even cope with active stereo cameras, it requires a precalibration of all distortion parameters, making it unsuitable for calibration from scratch.

Dang等人的工作旨在解决这些问题[10],[11]。他们提出了一种降阶 bundle adjustment 以及使用扩展卡尔曼滤波器的参数跟踪。虽然这种方法甚至可以处理 bundle adjustment ,但它需要预先校准所有失真参数,因此不适合从头开始校准。

For the deployment of stereo cameras, it would be desirable to perform full stereo camera calibration in-situ and in real time. For production line setups, this enables the system to track the calibration parameters, even if they change over its lifetime. Also, in-factory calibration becomes obsolete. In the context of experimental vehicles, frequent reconfiguration of stereo camera setups would no longer entail tedious recalibration.

对于双目摄像机的部署,需要在现场实时执行全立体摄像机校准。对于生产线设置,这使系统能够跟踪校准参数,即使这些参数在其使用寿命内发生变化。此外,工厂内校准也变得过时。在实验车辆的环境中,立体摄像机设置的频繁重新配置将不再需要繁琐的重新校准。

In this work, we propose a restructuring of the bundle adjustment problem into an incremental calibration process. From in-situ observations in stereo camera images, we iteratively improve the calibration in both intrinsic and extrinsic parameters. For this, we break down the problem in sequentially executed, quickly computable tasks and then only perform part-wise optimization for the full bundle adjustment per small image sequences, resulting in an ever-improving calibration in real time. For a system that performs motion estimation from camera images, i.e. visual odometry, the overhead of simultaneous calibration is neglectable.

在这项工作中,我们建议将Bundle Adjustment问题重新构造为增量校准过程。从双目相机图像的实时观测结果来看,我们迭代地改进了内参和外参的校准。为此,我们将问题分解为顺序执行的、快速可计算的任务,然后仅对每个小图像序列的全Bundle Adjustment部分优化,从而实现不断改进的实时校准。对于从摄像机图像(即视觉里程计)执行运动估计的系统,可忽略同时校准的开销。

2. Online Camera Calibration

A. Projection Model

As basis for future derivations, we would like to review the standard pinhole camera model with radial distortions as used in previous works [2].

作为未来推导的基础,我们想回顾一下在前面的工作中使用的径向变形的标准小孔成像。

A three-dimensional point under observation is denoted XW= (xW,yW,zW)^T. At a pose T ∈ SE(3), we place our camera in this three-dimensional space. In a first step, the point XW is transformed from some world fixed coordinate frame into the camera frame with

观察中的三维点表示为 X W = ( x W , y W , z W ) T X_W=(x_W,y_W,z_W)^T XW=(xWyWzW)T。姿势为T∈ SE(3),我们将相机放置在这个三维空间中。在第一步中,将点X_W从某个世界固定坐标帧转换为相机帧

在这里插入图片描述

Note here that the transform T may be augmented by a series of individual transformations (see Fig. 1).

注意这里的变换 t 可以通过一系列单独的变换来增强(见图1)。
在这里插入图片描述

Fig. 1: Observation of a landmark in multiple camera frames. All camera parameters, transformations and landmark positions have to be estimated online.

图一: 在多个相机镜头中观察一个landmark。所有的摄像机参数,转换和landmark位置都必须在线估计。

For a 3D point (xC,yC,zC)^T, we obtain the normalized direction of its line of sight as

对于一个3d 点(xc,yc,zc) ^T ,我们获得其视线的归一化方向如下

在这里插入图片描述

Lens imperfections introduce distortions to the lines of sight, i.e. they shift the perceived line of sight depending on their position in the image. A common approach to model radial distortion is based on even polynomials. The distorted line of sight ~ pd is derived from ~ pu as

透镜缺陷会导致视线扭曲,即它们会根据其在图像中的位置移动可感知的视线。建立径向畸变模型的常用方法是基于偶数多项式。畸变视线p_d由p_u 导出
在这里插入图片描述

where r is the norm of the undistorted image position r = s q r t ( x u 2 + n u 2 ) r =sqrt(xu^2+nu^2) r=sqrt(xu2+nu2). In our experience, we found tangential and higher order distortions as modeled in [1] to be of little relevance. Instead, the additional degrees of freedom may lead to overfitting and are thus omitted.

其中,r是未失真图像位置的范数 r = s q r t ( x u 2 + n u 2 ) r=sqrt(xu^2+nu^2) r=sqrtxu2+nu2。根据我们的经验,我们发现[1]中所模拟的切向和高阶失真几乎不相关。相反,额外的自由度可能会导致过度装配,因此被忽略

As the final stage, the distorted normalized image coordinates are transformed into pixel coordinates by

最后,将畸变的归一化图像坐标转换为像素坐标

在这里插入图片描述

where fu, fv represent the focal length of the lens in horizontal and vertical direction respectively and (cu,cv)>represents the principal point.

其中 fu,fv 分别代表水平方向和垂直方向的透镜焦距, ( c u , c v ) T (c_u,c_v) ^T (cucv)T代表主点。

Depending on the parameters of the perspective mapping, different camera properties can be modeled, three of which are of greater relevance in this work:

根据透视映射的参数,可以对不同的相机特性进行建模,其中三个特性在这项工作中具有更大的相关性:

  1. Real Camera: the full model as described above,

  2. Pinhole Camera: while projection and transforms stay the same, distortion has been corrected for, thus setting τ1and τ2equal to zero,

  3. Rectified Camera: for this setup, two pinhole cameras have been virtually aligned by rotation and scaling such that the transform between the two is only a shift in horizontal direction

1) Real Camera:如上所述的完整模型,
2) 针孔相机:在投影和变换保持不变的情况下,对失真进行了校正,从而将τ1和τ2相等设置为零,
3) 校正相机:对于此设置,两个针孔相机通过旋转和缩放实际上对齐,这样两个针孔相机之间的变换仅在水平方向上移动

B. Calibration

The goal of this work is to calibrate a stereo camera setup. That is, we would like to estimate all projection parameters of both cameras (instrinsic calibration) and their respective pose (extrinsic calibration). For this purpose, we utilize a set of observed landmarks ~ pCi and their reprojections pCi from their estimated positions ~XC. These two measures together make up the well known reprojection error

这项工作的目标是校准双目相机、。也就是说,我们希望估计两个相机的所有投影参数(内参)及其各自的姿势(外参)。为此,我们利用一组观测到的landmark p ⃗ ^ \hat{\vec{p}} p ^及从其估计位置 X ⃗ ^ C \hat{\vec{X}}_{C} X ^C重新投影的 p C i p_{C i} pCi。这两种方法共同构成了重投影误差

e ⃗ ^ i = p ⃗ ^ C i − p C i ( X ⃗ ^ C ) \hat{\vec{e}}_{i}=\hat{\vec{p}}_{C i}-p_{C i}\left(\hat{\vec{X}}_{C}\right) e ^i=p ^CipCi(X ^C)

the error that is made from projecting an uncertain landmark into an image using an uncertain camera model and comparing it to an uncertain observation.

该误差是由使用不确定的摄像机模型将不确定的landmark投影到图像中,并将其与不确定的观察结果进行比较所产生的。

The calibration task is to find the set of intrinsic and extrinsic camera parameters as well as landmark positions that minimize the total reprojection error of all observed landmarks

标定的任务是找到一组内参和外参,以及landmark位置,使所有观测landmarks的总重投影误差最小化

( X ⃗ W N , τ , T M , T c , K ) = arg ⁡ min ⁡ ∑ i = 0 N e ⃗ ^ i ⊤ e ⃗ ^ i \left(\vec{X}_{W}^{N}, \tau, T^{M}, T_{c}, K\right)=\arg \min \sum_{i=0}^{N} \hat{\vec{e}}_{i}^{\top} \hat{\vec{e}}_{i} (X WN,τ,TM,Tc,K)=argmini=0Ne ^ie ^i

where X ⃗ W N \vec{X}_{W}^{N} X WNis a series of N 3D-points, TMa series of M individual camera motion steps and (τ,Tc,K) represent all stereo camera parameters, i.e. two independent sets of projection parameters for the two cameras as well as the relative pose. Solely the length of the translation vector between the two cameras is unobservable and thus held constant. This is known as the windowed bundle adjustment problem. It is solved using non-linear optimization.

其中 X ⃗ W N \vec{X}_{W}^{N} X WN是一系列N个3D点, T M T^M TM是一系列M个单独的摄像机运动步骤, ( τ , T c , K ) (τ,T_c,K) τTcK表示所有双目摄像机参数,即两个摄像机的两组独立投影参数以及相对姿态。仅两个摄像机之间的平移向量的长度是不可观测的,因此保持不变。这就是所谓的窗口BA问题。采用非线性优化方法求解。

Unfortunately, the bundle adjustment problem is computationally expensive. Thus, a full calibration in an online application is hardly possible. To overcome this problem, we break it into several parts that can each be solved in real time individually. The following steps are executed to perform full online calibration:

不幸的是,BA问题的计算代价高昂。因此,在线应用程序中进行完全校准是不可能的。为了解决这个问题,我们把它分成几个部分,每个部分都可以单独实时解决。执行下列步骤以进行全面的在线校正:

  1. Compute scene structure: with the assumption of a rectified setup, compute 3D landmarks from triangulation of observations.

  2. Guess camera motion: from the initial scene structure, estimate the camera motion between the last two consecutive frames.

  3. Select relevant landmarks: from the motion and scene structure, find the best landmarks for further use.

  4. Improve scene structure and motion chain: reoptimize the entire chain of motion transformations and the 3D landmark positions. The following steps can only be executed if sufficient motion is detected.

  5. Improve projection: again, with the assumption of a pinhole camera, re-estimate the projection parameters. 6) Improve entire calibration: with all knowledge from Steps 1)-5) as initialization, optimize all real camera parameters.

1) 计算场景结构:假设设置已校正,从观测值的三角化来计算3D地标。(使用初始参数和迭代结果校正?)
2) 猜测摄影机运动:从初始场景结构,估计最新的两个连续帧之间的摄影机运动。(与D有关)
3) 选择相关landmarks:从运动和场景结构中,找到最佳landmarks以供进一步使用。(与E有关)
4) 改进场景结构和运动链:重新优化整个运动变换链和三维landmarks位置。只有在检测到足够的运动时,才能执行以下步骤。(与F有关)
5) 改进投影:再次假设使用针孔相机,重新估计投影参数。6) 改进整个校准:将步骤1)-5)中的所有知识作为初始化,优化所有真实摄像机参数(与G,H有关)。

While Step 1) can be solved entirely linear, the initial guess of the camera motion can be approximated with only one iteration of non-linear optimization if the camera motion is small [12]. Step 4) is solved to full convergence with [13].

当步骤1)可以完全线性化时,在摄像机运动较小的情况下,只需一次非线性优化迭代就可以近似摄像机运动的初始猜测。步骤4)的与[13]完全收敛。

For runtime reasons, only few optimization iterations of both 5) and 6) are executed last. Then, we use the result as initial guess as well as for feature remapping for the next time instance.

由于运行时的原因,只有少数优化迭代5)和6)是最后执行的。然后,我们使用结果作为初始猜测,以及用于下一次实例的特征重新映射。

C. Feature Re-Mapping

(几个相机模型互相换)
For efficient optimization of the processing steps, we have to switch between different camera models in the course of estimation.

为了有效地优化处理步骤,我们必须在估计的过程中切换不同的摄像机模型。

  1. From Rectified to Pinhole Camera: Rectification is achieved by rotation of lines of sight with a 3D rotation matrix RR. If the rectified camera has projection matrix KRand the pinhole camera KP, then observations may be remapped with

1)从Rectified camera到针孔摄像机: 通过三维旋转矩阵 R R R_R RR 的视线旋转来实现纠正。如果Rectified camera有投影矩阵 K R K_R KR 和针孔相机 k p k_p kp,那么观测可以重新映射

p ⃗ P = λ K P R R − 1 K R − 1 p ⃗ R \vec{p}_{P}=\lambda K_{P} R_{R}^{-1} K_{R}^{-1} \vec{p}_{R} p P=λKPRR1KR1p R

where λ is a scaling parameter to normalize to homogeneous coordinates.

其中 λ 是一个标度参数用于归一化为齐次坐标。

  1. From Pinhole to Real Camera: The distortion function (3) is defined in forward manner, i.e. from the undistorted to the distorted case. Again, we utilize the two projection matrices of pinhole camera KPand of real camera Kr. The distorted image point can be computed as

2)从Pinhole到 Real Camera: 畸变函数(3)定义为正向方式,即从未畸变到畸变情况。再次,我们利用针孔相机 kp 和的真实相机 kr的两个投影矩阵。畸变的图像点可以计算为

p ⃗ r = K r ∗ d r ( K P − 1 p ⃗ P ) \vec{p}_{r}=K_{r} * d_{r}\left(K_{P}^{-1} \vec{p}_{P}\right) p r=Krdr(KP1p P)

where dr(·) represents the distortion function (3).

其中 dr ()表示失真函数(3)。

  1. From Real to Rectified Camera: The view from a real camera can be transformed into that of a rectified camera by substituting (7) into (8) and solving for pR,

3)从真实摄像机到矫正摄像机: 将(7)代入(8) ,求解 pr,将真实摄像机的视图转化为矫正摄像机的视图,

p ⃗ R = λ K R R R ∗ d r − 1 ( K r − 1 p ⃗ r ) \vec{p}_{R}=\lambda K_{R} R_{R} * d_{r}^{-1}\left(K_{r}^{-1} \vec{p}_{r}\right) p R=λKRRRdr1(Kr1p r)

where, again, λ is needed for normalization. The inverse distortion d−1 r(·) takes the same functional form as (3), where only the parameters differ. In order to obtain the parameter set that undoes distortion, we can simply distort known points and then solve the system of linear equations for the undistortion parameters.

其中,λ也是归一化所必需的。逆失真 d r − 1 ( . ) d^{−1}_r(.) dr1(.)采用与(3)相同的函数形式,其中只有参数不同。为了获得消除畸变的参数集,我们可以简单地对已知点进行畸变,然后求解线性方程组中的无畸变参数。

Obviously, (7), (8) and (9) can be concatenated for arbitrary conversions.

显然,可以将(7)、(8)和(9)串联起来进行任意转换。

D. Motion Estimation

The problem of estimating camera motion from subsequent images is known as visual odometry. Since we observe a sequence of motion steps, we may assume that we have obtained all but the last transformation before. For the motion from time t − 1 to t, we minimize the reprojection error of tracked landmarks into the images w.r.t. the motion transform. We assume little change of motion in between consecutive frames, thus we only solve the linearized problem in one single optimization step. We evaluate the reprojection error for outlier detection. If the error is too large, we run the RANSAC algorithm for a new initial guess of motion

从后续图像中估计摄像机运动的问题称为视觉里程计。因为我们观察到了一系列的运动步骤,我们可以假设我们已经获得了除最后一个变换之外的所有变换。对于时间 t-1到 t 之间的运动,我们将跟踪landmarks的重投影误差最小化到图像 w.r.t. 中,即运动变换。我们假设连续帧之间的运动变化很小,因此我们只需要在一个单一的优化步骤中解决线性化问题。我们评估异常检测的重投影误差。如果误差太大,我们运行 ransac 算法,得到一个新的初始运动猜测

E .Feature Selecton

The result of the calibration highly depends on the quality of the landmark observations. We therefore have multiple criteria for landmark evaluation. First of all, in the visual odometry step, we have already computed reprojection errors per landmark or even performed RANSAC. Thus, we use the current motion estimate for outlier removal.

校准的结果在很大程度上取决于landmarks观测的质量。因此,我们有多个landmarks评估标准。首先,在视觉里程测量步骤中,我们已经计算了每个landmarks的重投影误差,甚至执行了RANSAC。因此,我们使用当前的运动估计来去除异常值。

As a second step, we perform bucketing of landmark observations over the image. This ensures that the landmarks are distributed equally over the image. Without this step, the landmark density would be cumulated in the center of the image, leading to a biased result that preferably minimizes errors in the center.

作为第二步,我们在图像上执行landmark观测的bucketing。这可确保landmarks在图像上均匀分布。如果没有这一步,landmarks将聚积在图像的中心,导致偏倚的结果,最好将中心的误差最小化。

The third quality measure is the duration of landmark visibility. Landmarks that can be tracked for a long period of time have the property of being highly discriminative. Thus, they can be located accurately throughout an image sequence.

第三个质量度量是landmarks可见性的持续时间。可以长时间跟踪的地标具有高度的鉴别性。因此,它们可以在整个图像序列中被精确定位。

F . Robustness and Regularization 正则化

The detection of landmarks in images is prone to a multitude of errors, among which are random noise and occasional outliers. These errors may degrade the performance of the optimization or, in the case of outliers, even fully break it. In order to cope with the given conditions, we introduce countermeasures to the optimization.

图像中的landmarks检测容易出现大量的误差,其中包括随机噪声和偶尔的异常值。这些错误可能会降低优化的性能,或者,在异常值的情况下,甚至完全破坏优化。为了适应给定的条件,我们提出了优化的对策。

The first countermeasure is a robust loss function that is applied to the reprojection error. We employ a Cauchy Loss Function that reduces the impact of large outliers on the optimization result. Even with the previous outlier rejection, this improves performance greatly.

第一种对策是应用于重投影误差的稳健损失函数。我们采用了一个柯西损失函数,以减少大异常值对优化结果的影响。即使使用以前的异常值拒绝,这也大大提高了性能。

As a second measure for robustness, we desire that the supposedly static calibration parameters only change slowly over time. For this, we use a regularization term in the error function. The change of each parameter with respect to its last value is introduced to the error. Let, for example, f− u be the estimated focal length from the last full optimization step. Then we introduce

作为鲁棒性的第二个度量,我们希望假定的静态校准参数只随时间缓慢变化。为此,我们在误差函数中使用正则化项。每个参数相对于其最后一个值的变化被引入到误差中。例如,设 f u − f^−_u fu是最后一个完整优化步骤的估计焦距。然后我们介绍

e Δ f u = λ f u ( f u − f u − ) 2 e_{\Delta f_{u}}=\lambda_{f_{u}}\left(f_{u}-f_{u}^{-}\right)^{2} eΔfu=λfu(fufu)2

as a new error term, where λfu is a non-negative weighting constant. Larger values for λfu lead to slower change of parameters and more robustness. Smaller values lead to faster calibration but less robustness.

作为一个新的误差项,其中 λ f u λ_{f_u} λfu 是一个非负的加权常数。对于 λ f u λ_{f_u} λfu值越大,参数变化越慢,鲁棒性越好。数值越小,校准速度越快,但鲁棒性越差。

Due to the special field of application in a driving vehicle with wide horizontal field of view, we found it beneficial to also regularize for isotropic projection, i.e. the focal length in horizontal as well as vertical direction should be close to equal. This is achieved by introducing

由于在宽水平视场的驾驶车辆上的特殊应用领域,我们发现对各向同性投影也进行正则化也是有益的,即水平方向和垂直方向的焦距应该接近相等。这是通过引入

e Δ f = λ Δ f ( f u − f v ) 2 e_{\Delta f}=\lambda_{\Delta f}\left(f_{u}-f_{v}\right)^{2} eΔf=λΔf(fufv)2

to the error function. The reason for this is that due to the wide horizontal field of view, errors are more pronounced in horizontal direction.

到错误函数。其原因是由于水平视场较宽,水平方向的误差更为明显。

Since the base width of the stereo setup, i.e. scale, cannot be observed, we hold the length of the translation vector between the two cameras constant at all times.

由于双目设置的基本宽度,即尺度,不能被观察到,我们保持两个摄像机之间的平移矢量的长度在任何时候都是恒定的。

G. Threshold Decay

阈值衰减

Over time, the calibration parameters improve and thus, they may need less change but instead can be used for stricter outlier detection. Therefore, we adapt the thresholds for outlier detection over time.

随着时间的推移,校准参数不断改进,因此,它们可能需要更少的变化,但可用于更严格的异常检测。因此,我们随着时间的推移调整了异常检测的阈值。

The outlier removal threshold at initialization εmaxshould be fairly high since at this time, reprojection may not be computed accurately. With better reprojection estimation, the outlier threshold may decline to a lower level εminto still allow for some tolerance. This is achieved with a threshold ε subject to

初始化εmax时的异常值去除阈值应相当高,因为此时可能无法准确计算重投影。通过更好的重投影估计,离群值阈值可能会下降到较低的εminto水平,以允许一定的容差。这是通过一个阈值ε来实现的

ε = ( ε max ⁡ − ε min ⁡ ) exp ⁡ ( − t T ε ) + ε min ⁡ \varepsilon=\left(\varepsilon_{\max }-\varepsilon_{\min }\right) \exp \left(-\frac{t}{T_{\varepsilon}}\right)+\varepsilon_{\min } ε=(εmaxεmin)exp(Tεt)+εmin

where t is the run time and Tεis the time constant governing the duration of the threshold decline. We choose an exponential function since the improvement of reprojection error is large in the beginning where extrinsics are corrected quickly. Later, the influence of calibration declines and so should the threshold decay.

其中t是运行时间,tε是控制阈值下降持续时间的时间常数。我们选择一个指数函数,因为重投影误差的改善在开始时很大,外部因素被迅速纠正。之后,校准的影响会下降,阈值也会下降。

H. Parameter Initialization

The parameters have to be initialized for the optimization to find a valid solution. We have two sets of intrinsic parameters per camera to initialize, namely the distortion and the projection parameters, as well as the extrinsic transformation between the two.

为了找到一个有效的解决方案,参数必须被初始化。每个摄像机有两组内部参数需要初始化,即畸变和投影参数,以及两者之间的外部变换。

We initialize the principal point as the center of the image and the focal length for a horizontal field of view of 90◦, i.e. half the width of the image in pixels. Distortion is initialized with all-zero values. If no information on the base width is available, b = 1 can be assumed.

我们将主要点初始化为图像的中心,水平视场焦距初始化为90 °,即图像像素宽度的一半。失真用全零值初始化。如果没有可用的基宽信息,可以假设 b = 1。

In some cases, the user may know specifications of the camera setup. If this is the case, these parameters can be passed to the calibration in advance. This is especially relevant for the baseline that determines scale. In most cases, it can be measured by hand to give a close approximate of the real value.

在某些情况下,用户可能知道摄像机设置的规格。如果是这样的话,这些参数可以提前传递给校准器。这对于决定规模的基线特别相关。在大多数情况下,它可以通过手工测量给出一个近似的真实价值。

3. Experiment

Since a ground truth calibration cannot be obtained for real camera system, we perform evaluation in two different ways: first, we use a simulated stereo camera setup to evaluate the accuracy of parameter estimation. For real world calibration, we quantify quality based on results of algorithms that require calibration, namely visual odometry [12] and disparity computation [14]. All real-world calibrations use the point feature matching from [12].

由于实际摄像机系统无法获得地面真值校准,我们采用两种不同的方法进行评估:首先,我们使用模拟双目摄像机设置来评估参数估计的准确性。对于真实世界的校准,我们根据需要校准的算法(即视觉里程计[12]和视差计算[14])的结果量化质量。所有实际校准均使用[12]中的点特征匹配。

A. Simulated Camera Rig

Ground truth calibration parameters do not exist for real camera systems. Thus, we simulate a projection with known parameters to evaluate our online calibration. For the simulation, we use projection of randomly created 3D-landmarks with a sequence of motion taken from the KITTI dataset. As calibration parameters, we also employ the parameters of the rectified KITTI cameras but introduce decalibration manually. We added displacement of the cameras together with mustache distortion (see Fig. 2).

真实摄像机系统不存在真值校准参数。因此,我们用已知参数模拟投影来评估在线校准。对于模拟,我们使用随机创建的3D地标的投影,以及从KITTI数据集获取的运动序列。作为校准参数,我们也使用校正后的KITTI摄像机的参数,但手动引入去抖动。我们添加了相机含有mustache distortio的位移(见图2)。

在这里插入图片描述

Fig. 2: The simulated mustache distortion of straight lines (left) and after online calibration (right).

图2:直线(左)和在线校准后(右)的模拟mustache distortion。

Since the parameters have a diverse range of values, we normalize all errors w.r.t. to the ground truth (GT) value and evaluate the deviation from that, e.g. for focal length

由于参数具有不同的值范围,我们将所有误差w.r.t.归一化为真值(GT)值并评估与之的偏差,例如焦距

e = f E s t − f G T f G T e=\frac{f_{E s t}-f_{G T}}{f_{G T}} e=fGTfEstfGT

Results for the calibration can be seen in Figure 3. For clarity of presentation, we only show parameters of the left camera and also omit the principal point. The later experiments will demonstrate the validity for both cameras and pose estimation

校准结果如图3所示。为了表达的清晰,我们只显示左摄像机的参数,也省略了主点。后面的实验将证明摄像机和姿态估计的有效性

![在这里插入图片描述](https://img-blog.csdnimg.cn/eda3c82903d2452a8863b4f52065ec55.png? #pic_center)

Fig. 3: Deviation of instrinsic parameters of left camera from the simulated values. The value 0 means our calibration and the simulated values coincide.

图3:左摄像机的仪器参数与模拟值的偏差。值0表示我们的校准和模拟值重合。

Most parameters converge to their actual simulated value. Only the parameter of distortion of 4thdegree is slightly overestimated due to simulated noise. However, this error is barely noticeable in the undistortion (see Fig. 2).

大多数参数收敛到其实际模拟值。由于模拟噪声,仅第四个degree的失真参数略微高估。然而,这种误差在不失真情况下几乎不明显(见图2)。

B. Real-World Stereo Cameras

对于真实世界的评估,我们测试了三种不同的立体相机设置。1) KITTI:来自KITTI数据集[15]的未校正和未同步原始数据以及提供的校准参数。2) 低失真:采用蔡司DISTAGON T*15MM F/2.8 ZF.2-I镜头的立体声设置,失真低至0.3%。3) 高失真:带有LENSAGON BM4018S118镜头的立体声设置,具有明显的失真和超过120的视野◦.

表一概述了双目系统的相关参数。使用[3]或[5]对棋盘目标进行离线校准,对所有设置进行校准。

在这里插入图片描述

C. Parameter Accuracy

在第一次尝试中,我们将校准结果与棋盘格图案离线执行的校准结果进行比较[3]。我们使用KITTI原始序列提供的校准。

图4显示了参数的偏差如何随时间变化。虽然估计焦距具有相同的值,但其余参数收敛到完全不同的值。当我们查看应用于图像进行校正的重映射的差异时,可以看到这种影响。图5显示了这些差异的大小和方向。由于我们可以添加任意偏移量来重新映射,因此我们已经补偿了差异的均值。

![在这里插入图片描述](https://img-blog.csdnimg.cn/cb935d92ede744ba8684ee62af98b65d.png? #pic_center)

Fig. 4: Deviation of instrinsic parameters of left camera from those obtained from offline calibration [12]. The value 0 means our calibration and the offline results coincide.

图4:左相机的仪器参数与离线校准获得的参数偏差[12]。值0表示我们的校准和离线结果重合。

在这里插入图片描述

Due to differences in distortion, the remapping differs in the center as well as in the extreme regions. The consequences of this can be seen in stereo matching results in Figure 8. With the original calibration, the matching in the corners fails while ours allows dense disparity computation throughout the entire image. Thus we assume that pure comparison of parameters is not meaningful to evaluate quality of calibration.

由于失真的不同,重新映射在中心区域和极端区域都不同。图8中的立体匹配结果可以看出这一点。使用原始校准,角点匹配失败,而我们的匹配允许在整个图像中进行密集的视差计算。因此,我们假设纯粹的参数比较对评估校准质量没有意义。

![在这里插入图片描述](https://img-blog.csdnimg.cn/ed78a7c8ee434bd79d5321f484a38df7.png? #pic_center)

D. Stereo Matching Density

Disparity computation highly depends on the quality of calibration, as seen in the previous section. Since a known calibration is the prerequisite for disparity evaluation, we only evaluate stereo matching in terms of disparity density. The intuition behind this is that a denser stereo image requires a better alignment of epipolar lines and thus, a better calibration (see Fig. 8).

视差计算高度依赖于校准的质量,如前一节所示。由于已知的校准是视差评估的先决条件,我们仅根据视差密度评估立体匹配。这背后的intuition是,更稠密集的双目图像需要更好的极线对齐,从而更好地校准(见图8)。

For each setup, we select a set of images from a sequence as test images. We then run online calibration on the same sequences and evaluate both, the disparity density over time as well as in comparison to an offline calibration result.

对于每个设置,我们从序列中选择一组图像作为测试图像。然后,我们对相同的序列进行在线校准,并评估两者,随时间变化的视差密度以及与离线校准结果的比较。

Figure 6 shows the density of disparity computation over calibration runtime. It can be seen that the density of disparity images increases over calibration. Density reaches same levels and even surpasses the result achieved with [3] and [5]. Same quality is reached after roughly 30 seconds.

图6显示了校准运行时的视差计算密度。可以看出,视差图像的密度比校准增加。密度达到相同的水平,甚至超过[3]和[5]所达到的结果。大约30秒后达到相同的质量。

![在这里插入图片描述](https://img-blog.csdnimg.cn/3067b36c6fa1482687664066f966953d.png? #pic_center)

Fig. 6: Disparity image density over online calibration runtime. Dashed lines represent the density achieved with offline calibration.

图6:在线校准运行时的视差图像密度。虚线表示离线校准达到的密度。

E. Visual Odometry

Different studies have shown the effect of calibration on the quality of visual odometry [16]. As an additional reference, we compare visual odometry performance with the result of online calibration. For this, we utilize the first sequence from the KITTI odometry raw data. We loop it for five minutes for calibration. We then hold the resulting parameters constant and run visual odometry [12] on that sequence. Figure 7 shows the estimated motion for the different calibration states together with one run for the KITTI calibration parameters.

不同的研究表明了校准对视觉里程计质量的影响[16]。作为补充参考,我们将视觉里程计性能与在线校准结果进行比较。为此,我们利用KITTI里程计原始数据的第一个序列。我们循环五分钟进行校准。然后,我们将得到的参数保持不变,并在该序列上运行视觉里程计[12]。图7显示了不同校准状态下的估计运动以及KITTI校准参数的一次运行。

![在这里插入图片描述](https://img-blog.csdnimg.cn/690adf80ac3f4defa5da49503a89bcc1.png? #pic_center)

At initialization, calibration parameters are far from accurate, thus visual odometry fails completely. After the five minutes of processing the online calibration achieves good results so that the quality of visual odometry computation even exceeds the one with the original KITTI calibration

初始化时,校准参数远不准确,因此视觉里程计完全失败。经过五分钟的处理后,在线标定取得了良好的效果,因此视觉里程计计算的质量甚至超过了原始KITTI标定的质量

F .Runtime

整个处理链都可以实时执行。每个过程步骤的平均运行时间如图9所示。我们在分辨率为512×1024的2.9GHz INTEL I7-3520M CPU的一个内核上运行它。不出所料,图像点的匹配和初始气味计算占据了大部分运行时间。

![在这里插入图片描述](https://img-blog.csdnimg.cn/595e865e85cf4f12913801c9ec93b0c8.png? #pic_center)

4.Conclusion

In this work, we introduced an online camera calibration scheme based on bundle adjustment. By breaking the bundle adjustment into sequentially executable smaller tasks, real time constraints can be met. The use of observed scene structure for calibration allows us to calibrate all observable camera parameters, that is all intrinsic and extrinsic camera parameters up to scale, in real time. In comparison to other calibration methods, we do not require the use of any calibration pattern. Also, pre-calibration for an initial guess is not necessary.

在这项工作中,我们引入了基于Bundle Adjustment的在线相机校准方案。通过将Bundle Adjustmen分解为可顺序执行的较小任务,可以满足实时性的限制。使用观察到的场景结构进行校准使我们能够实时校准所有可观察到的相机参数,即所有内部和外部相机参数。与其他校准方法相比,我们不需要使用任何校准模式。此外,不需要进行初始猜测的预校准。

Calibration is done with only little overhead over standard visual odometry. In the context of automated driving and experimental vehicles, this enables us to exchange the standard visual odometry with visual odometry plus calibration. This way, we do not require time consuming offline calibration in advance. Instead, we even can do alterations of the camera setup on the go without having to worry about sensitive camera parameter adjustments. The code is made publicly available at https://github.com/KIT-MRT.

与标准视觉里程计相比,标定只需很少的开销。在自动驾驶和试验车辆的环境中,这使我们能够将标准视觉里程计与视觉里程计加上校准进行交换。这样,我们就不需要提前进行耗时的离线校准。相反,我们甚至可以在移动中更改相机设置,而不必担心敏感的相机参数调整。该守则在以下网址公开发布:https://github.com/KIT-MRT.

在这里插入图片描述
https://github.com/KIT-MRT/coco

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: OpenCV是一个开源的计算机视觉库,提供了很多图像处理和计算机视觉算法的功能。其中,stereocalibration 是其重要的功能之一。 Stereocalibration (立体标定)是指通过对立体摄像机进行标定,获取两个摄像机之间的相对位置和姿态关系的过程。通过立体标定,我们可以得到摄像机的内参(内部参数)和外参(外部参数),来实现立体视觉应用的效果。 在OpenCV中,实现立体标定的方法为cv::stereoCalibrate。该函数需要输入最少8组匹配的立体校准图像点对,并输出摄像机内参和外参矩阵。在进行立体标定之前,需要先进行单目摄像机的标定(通过cv::calibrateCamera函数),获取摄像机的内参矩阵。 进行立体标定时,需要确保两个摄像机存在一定的平行关系,即两个摄像机的视野相同,且图像中有一定数量的共视点。标定过程会计算两个摄像机之间的旋转矩阵和平移矩阵,通过这些参数可以实现对图像的三维重建以及深度信息的获取。 值得注意的是,立体标定的精度会受到多种因素的影响,如图像噪声、特征点提取的准确性等。因此,在进行立体标定的过程中,需要谨慎选择标定图像,优化特征点匹配算法,以及对标定结果进行评估和优化。 总结来说,OpenCV的立体标定功能为我们提供了实现立体视觉应用的基础,通过标定摄像机的内外参,我们可以获得摄像机的几何关系,从而实现深度信息的获取和三维重建等应用。但在使用过程中需要注意标定参数的选择和精度评估,以获得较好的标定效果。 ### 回答2: OpenCV是一个开源的计算机视觉库,它提供了许多用于图像处理和计算机视觉任务的函数和工具。其中一个重要的功能是立体摄像机标定(stereo calibration)。 立体摄像机标定是通过对立体摄像机进行一系列的计算和校准,使其能够准确地测量和重建三维物体的几何信息。 在标定过程中,需要收集一组立体图像对(即左右两个摄像机拍摄的同一场景),并对这些图像进行处理和分析。 首先,必须确定摄像机的内部参数,如焦距、主点位置和畸变系数。这可以通过摄像机的参数模型和一组已知的三维点与它们在图像中的对应点来完成。OpenCV提供了一些函数,可以自动检测并计算这些内部参数。 接下来,需要确定摄像机的相对位置和姿态(即外部参数)。这可以通过对图像中的特征点进行匹配,并使用三角测量技术计算两个摄像机的相对位置和姿态。 一旦摄像机的内部和外部参数都已确定,我们就可以使用立体摄像机进行深度感知和三维重建。通过将左右两个图像对应的像素点投影到三维空间中,并计算它们之间的距离,就可以得到三维物体的位置和形状信息。 OpenCV提供了一些函数和工具,可帮助我们执行所有这些步骤。它包括摄像机标定函数(calibrateCameraStereo)和立体匹配函数(stereoRectify,stereoMatch),这些函数可以方便地进行标定和三维重建。 总之,OpenCV的立体摄像机标定功能可以帮助我们准确地测量和重建三维物体的几何信息。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值