[Notes] MonoGS camera pose optimization

Section 1

The paper you provided uses Lie algebra to derive the minimal Jacobians that are used in camera pose optimization, particularly in the context of SLAM. Here’s a breakdown of how Lie algebra is applied in this process:

1. Understanding the Camera Pose Representation

The camera pose T c w T_{cw} Tcw refers to the transformation from the world coordinates to the camera coordinates. This transformation lies in the group of rigid body transformations, denoted as S E ( 3 ) SE(3) SE(3), which combines rotation and translation.

To optimize the camera pose, you need to compute how small perturbations in the transformation affect the projection of 3D points into the camera’s coordinate frame. These small perturbations are represented using the Lie algebra of the group S E ( 3 ) SE(3) SE(3), denoted as s e ( 3 ) \mathfrak{se}(3) se(3).

2. Using Lie Algebra for Minimal Parameterization

The paper emphasizes the use of Lie algebra to derive minimal Jacobians. The motivation behind this is to reduce the dimensionality of the problem and eliminate any redundant computations. For example, the S E ( 3 ) SE(3) SE(3) group has 6 degrees of freedom (3 for translation and 3 for rotation), so the Jacobians should match this minimal set of degrees of freedom. Using Lie algebra provides a compact and efficient way to describe small perturbations in the pose.

3. Logarithmic and Exponential Maps

The Lie algebra s e ( 3 ) \mathfrak{se}(3) se(3) is the tangent space of the group S E ( 3 ) SE(3) SE(3) at the identity. Small perturbations in the pose are expressed in this tangent space. These perturbations are mapped to the group using the exponential map exp ⁡ ( ⋅ ) \exp(\cdot) exp(), and conversely, the logarithmic map log ⁡ ( ⋅ ) \log(\cdot) log() is used to map elements from the group back to the Lie algebra.

In the paper, the logarithmic and exponential maps are used to compute the derivatives of the transformation. The key formula here is Eq. (5), which defines the partial derivative on the manifold:

D ( T ) = lim ⁡ r → 0 log ⁡ ( exp ⁡ ( r ⋅ τ ) ∘ f ( T ) ) − f ( T ) r \mathcal{D}(T) = \lim_{r \to 0} \frac{\log(\exp(r \cdot \tau) \circ f(T)) - f(T)}{r} D(T)=r0limrlog(exp(rτ)f(T))f(T)

This formula expresses the derivative of a function defined on the Lie group S E ( 3 ) SE(3) SE(3) in terms of the Lie algebra s e ( 3 ) \mathfrak{se}(3) se(3). Essentially, it describes how small changes in the Lie algebra τ \tau τ result in changes in the transformation T T T.

4. Chain Rule for Jacobians

The paper uses the chain rule to compute the Jacobians with respect to the camera pose T c w T_{cw} Tcw. The chain rule is applied to both the 3D position of the Gaussian in the camera coordinates and the covariance Σ \Sigma Σ of the Gaussian.

For example, in Eq. (4), the chain rule is applied to the derivative of Σ c \Sigma_c Σc with respect to T c w T_{cw} Tcw:

∂ Σ c ∂ T c w = ∂ Σ c ∂ μ c ∂ μ c ∂ T c w + ∂ Σ c ∂ W c ∂ W c ∂ T c w \frac{\partial \Sigma_c}{\partial T_{cw}} = \frac{\partial \Sigma_c}{\partial \mu_c} \frac{\partial \mu_c}{\partial T_{cw}} + \frac{\partial \Sigma_c}{\partial W_c} \frac{\partial W_c}{\partial T_{cw}} TcwΣc=μcΣcTcwμc+WcΣcTcwWc

Where:

  • μ c \mu_c μc is the 3D position of the Gaussian in camera coordinates.
  • W c W_c Wc refers to components related to the covariance matrix.

5. Minimal Jacobians Using Lie Algebra

Using the Lie algebra representation, the paper derives minimal Jacobians that match the 6 degrees of freedom of the S E ( 3 ) SE(3) SE(3) group. In Eq. (6), the paper presents the Jacobians ∂ μ c ∂ T c w \frac{\partial \mu_c}{\partial T_{cw}} Tcwμc, which describe how the position of the Gaussian μ c \mu_c μc in the camera frame changes with respect to small changes in the transformation T c w T_{cw} Tcw.

The Jacobian matrices are derived as:
∂ μ c ∂ T c w = [ I − [ μ c ] ∗ ] \frac{\partial \mu_c}{\partial T_{cw}} = \begin{bmatrix} I & -[\mu_c]^* \end{bmatrix} Tcwμc=[I[μc]]
∂ W c ∂ T c w = [ 0 − W c 1 0 − W c 2 0 − W c 3 ] \frac{\partial W_c}{\partial T_{cw}} = \begin{bmatrix} 0 & -W_c^1 \\ 0 & -W_c^2 \\ 0 & -W_c^3 \end{bmatrix} TcwWc= 000Wc1Wc2Wc3
Where:

  • [ μ c ] ∗ [\mu_c]^* [μc] denotes the skew-symmetric matrix of the 3D vector μ c \mu_c μc.
  • W c i W_c^i Wci refers to the i-th column of the matrix that represents the covariance.

6. Skew-Symmetric Matrix

The skew-symmetric matrix of a vector v = ( v x , v y , v z ) \mathbf{v} = (v_x, v_y, v_z) v=(vx,vy,vz) is defined as:
[ v ] ∗ = [ 0 − v z v y v z 0 − v x − v y v x 0 ] [\mathbf{v}]^* = \begin{bmatrix} 0 & -v_z & v_y \\ v_z & 0 & -v_x \\ -v_y & v_x & 0 \end{bmatrix} [v]= 0vzvyvz0vxvyvx0
This matrix appears in the Jacobian because it represents the cross product between the position of the Gaussian and the rotational component of the transformation.

Summary of the Process:

  1. Lie Algebra Representation: The camera pose is represented in the Lie group S E ( 3 ) SE(3) SE(3), and small changes in the pose are expressed in the Lie algebra s e ( 3 ) \mathfrak{se}(3) se(3).
  2. Logarithmic and Exponential Maps: The logarithmic and exponential maps are used to map between the group and its Lie algebra, which allows for the computation of minimal Jacobians.
  3. Chain Rule: The chain rule is applied to compute the derivatives of the 3D position and covariance with respect to the camera pose.
  4. Minimal Jacobians: The derived Jacobians match the minimal dimensionality (6 degrees of freedom for S E ( 3 ) SE(3) SE(3)), ensuring efficient optimization.

By using Lie algebra, the paper ensures that the Jacobians are both minimal and computationally efficient, which is crucial for real-time SLAM systems.

Section 2

q1. How does Lie Algebra reduce the dimensionality of the problem?

When working with rigid body transformations, we are dealing with the SE(3) group, which includes both rotations (in 3D) and translations. The Lie algebra s e ( 3 ) \mathfrak{se}(3) se(3) provides a way to represent small perturbations in these transformations in a more compact form.

  • SE(3) is a 6-dimensional manifold because it describes 3 degrees of freedom for translation and 3 degrees of freedom for rotation. Directly optimizing over SE(3) would require working with 4x4 transformation matrices, which involves 12 parameters (9 for the rotation matrix and 3 for translation). However, only 6 of these parameters are independent due to the orthogonality constraint on the rotation matrix.

  • Lie algebra s e ( 3 ) \mathfrak{se}(3) se(3) provides a way to reduce this dimensionality by working with a minimal representation of S E ( 3 ) SE(3) SE(3). Instead of directly working with the transformation matrix, we work with a 6-dimensional vector ξ = ( ω T , v T ) T \xi = (\omega^T, v^T)^T ξ=(ωT,vT)T, where:

    • ω ∈ R 3 \omega \in \mathbb{R}^3 ωR3 represents the rotation (in the Lie algebra of S O ( 3 ) SO(3) SO(3)).
    • v ∈ R 3 v \in \mathbb{R}^3 vR3 represents the translation.

By using this minimal representation, we avoid redundant parameters and constraints (like the orthogonality of the rotation matrix), leading to a more efficient optimization process. The dimensionality of the problem is reduced from 12 parameters to 6 parameters, matching the degrees of freedom of S E ( 3 ) SE(3) SE(3).


q2. Why can we use the exponential and logarithmic maps to map the Lie algebra to SE(3)?

The exponential map exp ⁡ ( ⋅ ) \exp(\cdot) exp() and logarithmic map log ⁡ ( ⋅ ) \log(\cdot) log() are used to move between the Lie algebra s e ( 3 ) \mathfrak{se}(3) se(3) and the Lie group S E ( 3 ) SE(3) SE(3).

  • Exponential Map exp ⁡ ( ⋅ ) \exp(\cdot) exp(): The exponential map takes an element from the Lie algebra s e ( 3 ) \mathfrak{se}(3) se(3) (which represents a small perturbation or velocity) and maps it to an element of the Lie group S E ( 3 ) SE(3) SE(3) (the group of rigid transformations). This is analogous to how the exponential function in calculus maps a tangent vector to a point on a curve.

    • In the case of S E ( 3 ) SE(3) SE(3), the exponential map turns a 6-dimensional vector ξ = ( ω T , v T ) T \xi = (\omega^T, v^T)^T ξ=(ωT,vT)T (where ω \omega ω represents rotation and v v v represents translation) into a 4x4 transformation matrix in SE(3).
  • Logarithmic Map log ⁡ ( ⋅ ) \log(\cdot) log(): The logarithmic map is the inverse of the exponential map. It takes an element from the Lie group S E ( 3 ) SE(3) SE(3) and maps it back to the Lie algebra s e ( 3 ) \mathfrak{se}(3) se(3). This operation is used to extract the “small perturbations” that describe how the transformation can be changed.

Why are these maps valid?

These maps are valid because of the structure of Lie groups and Lie algebras:

  • A Lie group is a smooth manifold that has a corresponding Lie algebra, which is the tangent space at the identity element of the group. The exponential map allows us to move from the Lie algebra (tangent space) to the Lie group (manifold), and the logarithmic map allows us to reverse this process.
  • For S E ( 3 ) SE(3) SE(3), the exponential map allows us to generate a smooth transformation from a small perturbation in the Lie algebra, and the logarithmic map lets us linearize the problem by mapping group elements back to the tangent space.

In the specific context of SLAM, using these maps allows us to efficiently handle the non-linear nature of rigid body transformations during optimization.


q3. How many features are needed to define a scene made up of Gaussians?

Scene Representation with Gaussians:

When representing a scene using Gaussian distributions, each Gaussian is defined by:

  1. Mean μ ∈ R 3 \mu \in \mathbb{R}^3 μR3: This represents the 3D position of the center of the Gaussian.
  2. Covariance Σ ∈ R 3 × 3 \Sigma \in \mathbb{R}^{3 \times 3} ΣR3×3: This represents the uncertainty or spread of the Gaussian in 3D space. The covariance matrix encodes how the Gaussian is “stretched” in different directions.

Thus, each Gaussian is fully defined by:

  • 3 parameters for the mean μ \mu μ (position in 3D).
  • 6 parameters for the symmetric covariance matrix Σ \Sigma Σ (since Σ \Sigma Σ is a 3x3 matrix that is symmetric, it has 6 independent components).

So, in total, 9 parameters are needed to define a single Gaussian in 3D space.

Impact of Covariance on the Scene:

The covariance matrix Σ \Sigma Σ affects the shape and orientation of the Gaussian:

  • A diagonal covariance matrix indicates that the Gaussian is axis-aligned, with variances along each axis (e.g., the spread in the x, y, and z directions).
  • A non-diagonal covariance matrix indicates that the Gaussian is rotated and possibly stretched in the space. The off-diagonal elements encode the correlations between different axes.

In SLAM, the covariance matrix comes into play when modeling the uncertainty of landmarks or features in the environment. The covariance affects how the system estimates and optimizes the positions of these features.


Example: Computing Eq. (4)

Let’s walk through Eq. (4) from the paper, which computes the derivative of the covariance matrix with respect to the camera pose:
∂ Σ c ∂ T c w = ∂ Σ c ∂ μ c ∂ μ c ∂ T c w + ∂ Σ c ∂ W c ∂ W c ∂ T c w \frac{\partial \Sigma_c}{\partial T_{cw}} = \frac{\partial \Sigma_c}{\partial \mu_c} \frac{\partial \mu_c}{\partial T_{cw}} + \frac{\partial \Sigma_c}{\partial W_c} \frac{\partial W_c}{\partial T_{cw}} TcwΣc=μcΣcTcwμc+WcΣcTcwWc

Components of Eq. (4):
  1. μ c \mu_c μc is the 3D position of the Gaussian in the camera frame.
  2. W c W_c Wc is related to the covariance matrix in the camera frame.

Let’s break it down:

  • The first term ∂ Σ c ∂ μ c ∂ μ c ∂ T c w \frac{\partial \Sigma_c}{\partial \mu_c} \frac{\partial \mu_c}{\partial T_{cw}} μcΣcTcwμc computes how the covariance matrix changes with respect to the position μ c \mu_c μc, and how the position μ c \mu_c μc changes with respect to the camera pose T c w T_{cw} Tcw.

    • ∂ μ c ∂ T c w \frac{\partial \mu_c}{\partial T_{cw}} Tcwμc is given in Eq. (6) as [ I − [ μ c ] ∗ ] \begin{bmatrix} I & -[\mu_c]^* \end{bmatrix} [I[μc]], where [ μ c ] ∗ [\mu_c]^* [μc] is the skew-symmetric matrix of μ c \mu_c μc.
    • ∂ Σ c ∂ μ c \frac{\partial \Sigma_c}{\partial \mu_c} μcΣc represents how the covariance matrix depends on the position of the Gaussian.
  • The second term ∂ Σ c ∂ W c ∂ W c ∂ T c w \frac{\partial \Sigma_c}{\partial W_c} \frac{\partial W_c}{\partial T_{cw}} WcΣcTcwWc computes how the covariance matrix changes with respect to the matrix W c W_c Wc, and how W c W_c Wc changes with respect to the camera pose T c w T_{cw} Tcw.

    • ∂ W c ∂ T c w \frac{\partial W_c}{\partial T_{cw}} TcwWc is also given in Eq. (6) as a 3x6 matrix that depends on the components of W c W_c Wc.
Example Computation:

Suppose we have a Gaussian with:

  • Mean μ c = ( 1 , 2 , 3 ) \mu_c = (1, 2, 3) μc=(1,2,3).
  • Covariance matrix Σ c = [ 1 0.1 0.2 0.1 2 0.3 0.2 0.3 3 ] \Sigma_c = \begin{bmatrix} 1 & 0.1 & 0.2 \\ 0.1 & 2 & 0.3 \\ 0.2 & 0.3 & 3 \end{bmatrix} Σc= 10.10.20.120.30.20.33 .

To compute the Jacobian ∂ Σ c ∂ T c w \frac{\partial \Sigma_c}{\partial T_{cw}} TcwΣc:

  1. First, compute the skew-symmetric matrix of μ c \mu_c μc:
    [ μ c ] ∗ = [ 0 − 3 2 3 0 − 1 − 2 1 0 ] [\mu_c]^* = \begin{bmatrix} 0 & -3 & 2 \\ 3 & 0 & -1 \\ -2 & 1 & 0 \end{bmatrix} [μc]= 032301210
  2. Use Eq. (6) to compute ∂ μ c ∂ T c w = [ I − [ μ c ] ∗ ] \frac{\partial \mu_c}{\partial T_{cw}} = \begin{bmatrix} I & -[\mu_c]^* \end{bmatrix} Tcwμc=[I[μc]].
  3. Finally, compute the full derivative by applying the chain rule as described in Eq. (4), using the derivatives of W c W_c Wc and Σ c \Sigma_c Σc to finish the computation.

Section3

1. Lie Algebra s o ( 3 ) \mathfrak{so}(3) so(3) and Rotation Representation

The group of 3D rotations is represented by the special orthogonal group S O ( 3 ) SO(3) SO(3). A rotation matrix R ∈ S O ( 3 ) \mathbf{R} \in SO(3) RSO(3) is a 3x3 orthogonal matrix with determinant 1. However, directly working with these matrices is inefficient because they have 9 elements but only 3 degrees of freedom (DOF) due to the orthogonality constraint.

The Lie algebra associated with S O ( 3 ) SO(3) SO(3) is denoted as s o ( 3 ) \mathfrak{so}(3) so(3). This is the tangent space of S O ( 3 ) SO(3) SO(3) at the identity element and can be used to represent small perturbations in rotation. The key idea is that instead of using a full 3x3 matrix to represent the rotation, we use a 3-dimensional vector that encodes the rotation in a more compact form.

Skew-Symmetric Matrix and the Lie Algebra s o ( 3 ) \mathfrak{so}(3) so(3):

In s o ( 3 ) \mathfrak{so}(3) so(3), any element can be represented as a skew-symmetric matrix. Given a 3D vector ω = ( ω x , ω y , ω z ) ∈ R 3 \mathbf{\omega} = (\omega_x, \omega_y, \omega_z) \in \mathbb{R}^3 ω=(ωx,ωy,ωz)R3, the corresponding skew-symmetric matrix [ ω ] ∗ [\mathbf{\omega}]^* [ω] is:
[ ω ] ∗ = [ 0 − ω z ω y ω z 0 − ω x − ω y ω x 0 ] [\mathbf{\omega}]^* = \begin{bmatrix} 0 & -\omega_z & \omega_y \\ \omega_z & 0 & -\omega_x \\ -\omega_y & \omega_x & 0 \end{bmatrix} [ω]= 0ωzωyωz0ωxωyωx0

This 3x3 skew-symmetric matrix belongs to the Lie algebra s o ( 3 ) \mathfrak{so}(3) so(3).

2. Exponential Map from KaTeX parse error: Can't use function '\)' in math mode at position 18: …athfrak{so}(3) \̲)̲ to \( SO(3)

To convert from the Lie algebra s o ( 3 ) \mathfrak{so}(3) so(3) to the Lie group S O ( 3 ) SO(3) SO(3) (i.e., to get a rotation matrix from the 3D vector representing the rotation), we use the exponential map. This map is defined as:
R = exp ⁡ ( [ ω ] ∗ ) ∈ S O ( 3 ) \mathbf{R} = \exp([\mathbf{\omega}]^*) \in SO(3) R=exp([ω])SO(3)
The exponential map computes the rotation matrix R \mathbf{R} R from the skew-symmetric matrix [ ω ] ∗ [\mathbf{\omega}]^* [ω]. In practice, the exponential map for S O ( 3 ) SO(3) SO(3) can be computed using Rodrigues’ rotation formula, which converts the 3D rotation vector ω \mathbf{\omega} ω into a rotation matrix.

Rodrigues’ Rotation Formula:

Rodrigues’ formula gives the exponential map for S O ( 3 ) SO(3) SO(3) as:
exp ⁡ ( [ ω ] ∗ ) = I + sin ⁡ ( θ ) θ [ ω ] ∗ + 1 − cos ⁡ ( θ ) θ 2 [ ω ] ∗ 2 \exp([\mathbf{\omega}]^*) = \mathbf{I} + \frac{\sin(\theta)}{\theta} [\mathbf{\omega}]^* + \frac{1 - \cos(\theta)}{\theta^2} [\mathbf{\omega}]^{*2} exp([ω])=I+θsin(θ)[ω]+θ21cos(θ)[ω]2
Where:

  • θ = ∥ ω ∥ \theta = \|\mathbf{\omega}\| θ=ω is the magnitude of the rotation (in radians) around the axis defined by ω \mathbf{\omega} ω.
  • [ ω ] ∗ [\mathbf{\omega}]^* [ω] is the skew-symmetric matrix corresponding to the vector ω \mathbf{\omega} ω.
  • I \mathbf{I} I is the identity matrix.

This formula converts the 3D vector ω \mathbf{\omega} ω into the corresponding 3x3 rotation matrix R \mathbf{R} R.

3. Logarithmic Map from S O ( 3 ) SO(3) SO(3) to s o ( 3 ) \mathfrak{so}(3) so(3)

The logarithmic map is the inverse of the exponential map and is used to convert a rotation matrix R ∈ S O ( 3 ) \mathbf{R} \in SO(3) RSO(3) back into the Lie algebra s o ( 3 ) \mathfrak{so}(3) so(3), which is represented by the 3D vector ω \mathbf{\omega} ω.

Given a rotation matrix R \mathbf{R} R, the logarithmic map is:
[ ω ] ∗ = log ⁡ ( R ) [\mathbf{\omega}]^* = \log(\mathbf{R}) [ω]=log(R)
In this case, ω \mathbf{\omega} ω can be extracted from the skew-symmetric matrix [ ω ] ∗ [\mathbf{\omega}]^* [ω]. The angle of rotation θ \theta θ can be computed as:
θ = cos ⁡ − 1 ( Tr ( R ) − 1 2 ) \theta = \cos^{-1} \left( \frac{\text{Tr}(\mathbf{R}) - 1}{2} \right) θ=cos1(2Tr(R)1)
Where Tr ( R ) \text{Tr}(\mathbf{R}) Tr(R) is the trace of the rotation matrix R \mathbf{R} R. After solving for θ \theta θ, the vector ω \mathbf{\omega} ω is:
ω = θ 2 sin ⁡ ( θ ) [ R 32 − R 23 R 13 − R 31 R 21 − R 12 ] \mathbf{\omega} = \frac{\theta}{2 \sin(\theta)} \begin{bmatrix} \mathbf{R}_{32} - \mathbf{R}_{23} \\ \mathbf{R}_{13} - \mathbf{R}_{31} \\ \mathbf{R}_{21} - \mathbf{R}_{12} \end{bmatrix} ω=2sin(θ)θ R32R23R13R31R21R12
Where R i j \mathbf{R}_{ij} Rij refers to the elements of the 3x3 rotation matrix.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值