[Notes] MonoGS camera pose optimization

Shaun888

于 2024-09-09 00:19:05 发布

阅读量582

点赞数 7

文章标签： math slam

本文链接：https://blog.csdn.net/weixin_44444492/article/details/142034796

版权

Section 1

The paper you provided uses Lie algebra to derive the minimal Jacobians that are used in camera pose optimization, particularly in the context of SLAM. Here’s a breakdown of how Lie algebra is applied in this process:

1. Understanding the Camera Pose Representation

The camera pose $T_{cw}$ refers to the transformation from the world coordinates to the camera coordinates. This transformation lies in the group of rigid body transformations, denoted as $SE (3)$ , which combines rotation and translation.

To optimize the camera pose, you need to compute how small perturbations in the transformation affect the projection of 3D points into the camera’s coordinate frame. These small perturbations are represented using the Lie algebra of the group $SE (3)$ , denoted as $\mathfrak{se}(3)$ .

2. Using Lie Algebra for Minimal Parameterization

The paper emphasizes the use of Lie algebra to derive minimal Jacobians. The motivation behind this is to reduce the dimensionality of the problem and eliminate any redundant computations. For example, the $SE (3)$ group has 6 degrees of freedom (3 for translation and 3 for rotation), so the Jacobians should match this minimal set of degrees of freedom. Using Lie algebra provides a compact and efficient way to describe small perturbations in the pose.

3. Logarithmic and Exponential Maps

The Lie algebra $\mathfrak{se}(3)$ is the tangent space of the group $SE (3)$ at the identity. Small perturbations in the pose are expressed in this tangent space. These perturbations are mapped to the group using the exponential map $\exp(\cdot)$ , and conversely, the logarithmic map $\log(\cdot)$ is used to map elements from the group back to the Lie algebra.

In the paper, the logarithmic and exponential maps are used to compute the derivatives of the transformation. The key formula here is Eq. (5), which defines the partial derivative on the manifold:

$\mathcal{D}(T) = \lim_{r \to 0} \frac{\log(\exp(r \cdot \tau) \circ f(T)) - f(T)}{r}$

This formula expresses the derivative of a function defined on the Lie group $SE (3)$ in terms of the Lie algebra $\mathfrak{se}(3)$ . Essentially, it describes how small changes in the Lie algebra $\tau$ result in changes in the transformation $T$ .

4. Chain Rule for Jacobians

The paper uses the chain rule to compute the Jacobians with respect to the camera pose $T_{cw}$ . The chain rule is applied to both the 3D position of the Gaussian in the camera coordinates and the covariance $\Sigma$ of the Gaussian.

For example, in Eq. (4), the chain rule is applied to the derivative of $\Sigma_c$ with respect to $T_{cw}$ :

$\frac{\partial \Sigma_c}{\partial T_{cw}} = \frac{\partial \Sigma_c}{\partial \mu_c} \frac{\partial \mu_c}{\partial T_{cw}} + \frac{\partial \Sigma_c}{\partial W_c} \frac{\partial W_c}{\partial T_{cw}}$

Where:

$\mu_c$ is the 3D position of the Gaussian in camera coordinates.
$W_c$ refers to components related to the covariance matrix.

5. Minimal Jacobians Using Lie Algebra

Using the Lie algebra representation, the paper derives minimal Jacobians that match the 6 degrees of freedom of the $SE (3)$ group. In Eq. (6), the paper presents the Jacobians $\frac{\partial \mu_c}{\partial T_{cw}}$ , which describe how the position of the Gaussian $\mu_c$ in the camera frame changes with respect to small changes in the transformation $T_{cw}$ .

The Jacobian matrices are derived as:
$\frac{\partial \mu_c}{\partial T_{cw}} = \begin{bmatrix} I & -[\mu_c]^* \end{bmatrix}$
$\frac{\partial W_c}{\partial T_{cw}} = \begin{bmatrix} 0 & -W_c^1 \\ 0 & -W_c^2 \\ 0 & -W_c^3 \end{bmatrix}$
Where:

$[\mu_c]^*$ denotes the skew-symmetric matrix of the 3D vector $\mu_c$ .
$W_c^i$ refers to the i-th column of the matrix that represents the covariance.

6. Skew-Symmetric Matrix

The skew-symmetric matrix of a vector $\mathbf{v} = (v_x, v_y, v_z)$ is defined as:
$[\mathbf{v}]^* = \begin{bmatrix} 0 & -v_z & v_y \\ v_z & 0 & -v_x \\ -v_y & v_x & 0 \end{bmatrix}$
This matrix appears in the Jacobian because it represents the cross product between the position of the Gaussian and the rotational component of the transformation.

Summary of the Process:

Lie Algebra Representation: The camera pose is represented in the Lie group $SE (3)$ , and small changes in the pose are expressed in the Lie algebra $\mathfrak{se}(3)$ .
Logarithmic and Exponential Maps: The logarithmic and exponential maps are used to map between the group and its Lie algebra, which allows for the computation of minimal Jacobians.
Chain Rule: The chain rule is applied to compute the derivatives of the 3D position and covariance with respect to the camera pose.
Minimal Jacobians: The derived Jacobians match the minimal dimensionality (6 degrees of freedom for $SE (3)$ ), ensuring efficient optimization.

By using Lie algebra, the paper ensures that the Jacobians are both minimal and computationally efficient, which is crucial for real-time SLAM systems.

Section 2

q1. How does Lie Algebra reduce the dimensionality of the problem?

When working with rigid body transformations, we are dealing with the SE(3) group, which includes both rotations (in 3D) and translations. The Lie algebra $\mathfrak{se}(3)$ provides a way to represent small perturbations in these transformations in a more compact form.

SE(3) is a 6-dimensional manifold because it describes 3 degrees of freedom for translation and 3 degrees of freedom for rotation. Directly optimizing over SE(3) would require working with 4x4 transformation matrices, which involves 12 parameters (9 for the rotation matrix and 3 for translation). However, only 6 of these parameters are independent due to the orthogonality constraint on the rotation matrix.
Lie algebra $\mathfrak{se}(3)$ provides a way to reduce this dimensionality by working with a minimal representation of $SE (3)$ . Instead of directly working with the transformation matrix, we work with a 6-dimensional vector $\xi = (\omega^T, v^T)^T$ , where:
- $\omega \in \mathbb{R}^3$ represents the rotation (in the Lie algebra of $SO (3)$ ).
- $\in \mathbb{R}^3$ represents the translation.

By using this minimal representation, we avoid redundant parameters and constraints (like the orthogonality of the rotation matrix), leading to a more efficient optimization process. The dimensionality of the problem is reduced from 12 parameters to 6 parameters, matching the degrees of freedom of $SE (3)$ .

q2. Why can we use the exponential and logarithmic maps to map the Lie algebra to SE(3)?

The exponential map $\exp(\cdot)$ and logarithmic map $\log(\cdot)$ are used to move between the Lie algebra $\mathfrak{se}(3)$ and the Lie group $SE (3)$ .

Exponential Map $\exp(\cdot)$ : The exponential map takes an element from the Lie algebra $\mathfrak{se}(3)$ (which represents a small perturbation or velocity) and maps it to an element of the Lie group $SE (3)$ (the group of rigid transformations). This is analogous to how the exponential function in calculus maps a tangent vector to a point on a curve.
- In the case of $SE (3)$ , the exponential map turns a 6-dimensional vector $\xi = (\omega^T, v^T)^T$ (where $\omega$ represents rotation and $v$ represents translation) into a 4x4 transformation matrix in SE(3).
Logarithmic Map $\log(\cdot)$ : The logarithmic map is the inverse of the exponential map. It takes an element from the Lie group $SE (3)$ and maps it back to the Lie algebra $\mathfrak{se}(3)$ . This operation is used to extract the “small perturbations” that describe how the transformation can be changed.

Why are these maps valid?

These maps are valid because of the structure of Lie groups and Lie algebras:

A Lie group is a smooth manifold that has a corresponding Lie algebra, which is the tangent space at the identity element of the group. The exponential map allows us to move from the Lie algebra (tangent space) to the Lie group (manifold), and the logarithmic map allows us to reverse this process.
For $SE (3)$ , the exponential map allows us to generate a smooth transformation from a small perturbation in the Lie algebra, and the logarithmic map lets us linearize the problem by mapping group elements back to the tangent space.

In the specific context of SLAM, using these maps allows us to efficiently handle the non-linear nature of rigid body transformations during optimization.

q3. How many features are needed to define a scene made up of Gaussians?

Scene Representation with Gaussians:

When representing a scene using Gaussian distributions, each Gaussian is defined by:

Mean $\mu \in \mathbb{R}^3$ : This represents the 3D position of the center of the Gaussian.
Covariance $\Sigma \in \mathbb{R}^{3 \times 3}$ : This represents the uncertainty or spread of the Gaussian in 3D space. The covariance matrix encodes how the Gaussian is “stretched” in different directions.

Thus, each Gaussian is fully defined by:

3 parameters for the mean $\mu$ (position in 3D).
6 parameters for the symmetric covariance matrix $\Sigma$ (since $\Sigma$ is a 3x3 matrix that is symmetric, it has 6 independent components).

So, in total, 9 parameters are needed to define a single Gaussian in 3D space.

Impact of Covariance on the Scene:

The covariance matrix $\Sigma$ affects the shape and orientation of the Gaussian:

A diagonal covariance matrix indicates that the Gaussian is axis-aligned, with variances along each axis (e.g., the spread in the x, y, and z directions).
A non-diagonal covariance matrix indicates that the Gaussian is rotated and possibly stretched in the space. The off-diagonal elements encode the correlations between different axes.

In SLAM, the covariance matrix comes into play when modeling the uncertainty of landmarks or features in the environment. The covariance affects how the system estimates and optimizes the positions of these features.

Example: Computing Eq. (4)

Let’s walk through Eq. (4) from the paper, which computes the derivative of the covariance matrix with respect to the camera pose:
$\frac{\partial \Sigma_c}{\partial T_{cw}} = \frac{\partial \Sigma_c}{\partial \mu_c} \frac{\partial \mu_c}{\partial T_{cw}} + \frac{\partial \Sigma_c}{\partial W_c} \frac{\partial W_c}{\partial T_{cw}}$

Components of Eq. (4):

$\mu_c$ is the 3D position of the Gaussian in the camera frame.
$W_c$ is related to the covariance matrix in the camera frame.

Let’s break it down:

The first term $\frac{\partial \Sigma_c}{\partial \mu_c} \frac{\partial \mu_c}{\partial T_{cw}}$ computes how the covariance matrix changes with respect to the position $\mu_c$ , and how the position $\mu_c$ changes with respect to the camera pose $T_{cw}$ .
- $\frac{\partial \mu_c}{\partial T_{cw}}$ is given in Eq. (6) as $\begin{bmatrix} I & -[\mu_c]^* \end{bmatrix}$ , where $[\mu_c]^*$ is the skew-symmetric matrix of $\mu_c$ .
- $\frac{\partial \Sigma_c}{\partial \mu_c}$ represents how the covariance matrix depends on the position of the Gaussian.
The second term $\frac{\partial \Sigma_c}{\partial W_c} \frac{\partial W_c}{\partial T_{cw}}$ computes how the covariance matrix changes with respect to the matrix $W_c$ , and how $W_c$ changes with respect to the camera pose $T_{cw}$ .
- $\frac{\partial W_c}{\partial T_{cw}}$ is also given in Eq. (6) as a 3x6 matrix that depends on the components of $W_c$ .

Example Computation:

Suppose we have a Gaussian with:

Mean $\mu_c = (1, 2, 3)$ .
Covariance matrix $\Sigma_c = \begin{bmatrix} 1 & 0.1 & 0.2 \\ 0.1 & 2 & 0.3 \\ 0.2 & 0.3 & 3 \end{bmatrix}$ .

To compute the Jacobian $\frac{\partial \Sigma_c}{\partial T_{cw}}$ :

First, compute the skew-symmetric matrix of $\mu_c$ :
$[\mu_c]^* = \begin{bmatrix} 0 & -3 & 2 \\ 3 & 0 & -1 \\ -2 & 1 & 0 \end{bmatrix}$
Use Eq. (6) to compute $\frac{\partial \mu_c}{\partial T_{cw}} = \begin{bmatrix} I & -[\mu_c]^* \end{bmatrix}$ .
Finally, compute the full derivative by applying the chain rule as described in Eq. (4), using the derivatives of $W_c$ and $\Sigma_c$ to finish the computation.

Section3

1. Lie Algebra $\mathfrak{so}(3)$ and Rotation Representation

The group of 3D rotations is represented by the special orthogonal group $SO (3)$ . A rotation matrix $\mathbf{R} \in SO(3)$ is a 3x3 orthogonal matrix with determinant 1. However, directly working with these matrices is inefficient because they have 9 elements but only 3 degrees of freedom (DOF) due to the orthogonality constraint.

The Lie algebra associated with $SO (3)$ is denoted as $\mathfrak{so}(3)$ . This is the tangent space of $SO (3)$ at the identity element and can be used to represent small perturbations in rotation. The key idea is that instead of using a full 3x3 matrix to represent the rotation, we use a 3-dimensional vector that encodes the rotation in a more compact form.

Skew-Symmetric Matrix and the Lie Algebra $\mathfrak{so}(3)$ :

In $\mathfrak{so}(3)$ , any element can be represented as a skew-symmetric matrix. Given a 3D vector $\mathbf{\omega} = (\omega_x, \omega_y, \omega_z) \in \mathbb{R}^3$ , the corresponding skew-symmetric matrix $[\mathbf{\omega}]^*$ is:
$[\mathbf{\omega}]^* = \begin{bmatrix} 0 & -\omega_z & \omega_y \\ \omega_z & 0 & -\omega_x \\ -\omega_y & \omega_x & 0 \end{bmatrix}$

This 3x3 skew-symmetric matrix belongs to the Lie algebra $\mathfrak{so}(3)$ .

2. Exponential Map from $KaTeX parse error: Can't use function '\)' in math mode at position 18: …athfrak{so}(3) \̲)̲ to \( SO(3)$

To convert from the Lie algebra $\mathfrak{so}(3)$ to the Lie group $SO (3)$ (i.e., to get a rotation matrix from the 3D vector representing the rotation), we use the exponential map. This map is defined as:
$\mathbf{R} = \exp([\mathbf{\omega}]^*) \in SO(3)$
The exponential map computes the rotation matrix $\mathbf{R}$ from the skew-symmetric matrix $[\mathbf{\omega}]^*$ . In practice, the exponential map for $SO (3)$ can be computed using Rodrigues’ rotation formula, which converts the 3D rotation vector $\mathbf{\omega}$ into a rotation matrix.

Rodrigues’ Rotation Formula:

Rodrigues’ formula gives the exponential map for $SO (3)$ as:
$\exp([\mathbf{\omega}]^*) = \mathbf{I} + \frac{\sin(\theta)}{\theta} [\mathbf{\omega}]^* + \frac{1 - \cos(\theta)}{\theta^2} [\mathbf{\omega}]^{*2}$
Where:

$\theta = \|\mathbf{\omega}\|$ is the magnitude of the rotation (in radians) around the axis defined by $\mathbf{\omega}$ .
$[\mathbf{\omega}]^*$ is the skew-symmetric matrix corresponding to the vector $\mathbf{\omega}$ .
$\mathbf{I}$ is the identity matrix.

This formula converts the 3D vector $\mathbf{\omega}$ into the corresponding 3x3 rotation matrix $\mathbf{R}$ .

3. Logarithmic Map from $SO (3)$ to $\mathfrak{so}(3)$

The logarithmic map is the inverse of the exponential map and is used to convert a rotation matrix $\mathbf{R} \in SO(3)$ back into the Lie algebra $\mathfrak{so}(3)$ , which is represented by the 3D vector $\mathbf{\omega}$ .

Given a rotation matrix $\mathbf{R}$ , the logarithmic map is:
$[\mathbf{\omega}]^* = \log(\mathbf{R})$
In this case, $\mathbf{\omega}$ can be extracted from the skew-symmetric matrix $[\mathbf{\omega}]^*$ . The angle of rotation $\theta$ can be computed as:
$\theta = \cos^{-1} \left( \frac{\text{Tr}(\mathbf{R}) - 1}{2} \right)$
Where $\text{Tr}(\mathbf{R})$ is the trace of the rotation matrix $\mathbf{R}$ . After solving for $\theta$ , the vector $\mathbf{\omega}$ is:
$\mathbf{\omega} = \frac{\theta}{2 \sin(\theta)} \begin{bmatrix} \mathbf{R}_{32} - \mathbf{R}_{23} \\ \mathbf{R}_{13} - \mathbf{R}_{31} \\ \mathbf{R}_{21} - \mathbf{R}_{12} \end{bmatrix}$
Where $\mathbf{R}_{ij}$ refers to the elements of the 3x3 rotation matrix.