【读论文】【速读】4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

最新推荐文章于 2024-06-21 11:11:26 发布

小白有颗大白梦

最新推荐文章于 2024-06-21 11:11:26 发布

阅读量666

点赞数 7

分类专栏：读论文NeRF NeRF学习文章标签： NeRF Gaussian 计算机视觉

本文链接：https://blog.csdn.net/weixin_62012485/article/details/139079981

版权

NeRF学习同时被 2 个专栏收录

14 篇文章 1 订阅

订阅专栏

读论文NeRF

10 篇文章 0 订阅

订阅专栏

文章目录

1. What
2. Preliminary
3. What
- 3.1 Spatial-Temporal Structure Encoder
- 3.2 Multi-head Gaussian Deformation Decoder

1. What

4D Gaussian Splatting (4D-GS) as a holistic representation for dynamic scenes rather than applying 3D-GS for each individual frame.

It uses an encoder and decoder structure to predict the motion of each Gaussian over time. The core idea is to represent the 4D information (x,y,z,t) into 2D HexPlane and then use MLP and decoder to extract the information of the change of Gaussian. This approach allows for efficient processing and storage of high-dimensional data while preserving the necessary spatiotemporal information.

2. Preliminary

There are two dynamic methods in NeRF and one method in Gaussian, as shown below:

在这里插入图片描述

As for NeRF, all the dynamic NeRF algorithms can be formulated as:

$c,\sigma=\mathcal{M}(\mathbf{x},t)$

In Fig. 2 (a), the canonical mapping volume rendering transforms each sampled point into a canonical space: $\phi_{t}:(\mathbf{x},t)\to\Delta\mathbf{x}$ and calculates the color and density along each ray:

$c,\sigma=\mathrm{NeRF}(\mathbf{x}+\Delta\mathbf{x}).$
In Fig. 2 (b), the time-aware volume rendering. It won’t change the rendering path, oppositely, it directly calculates the features of each point at a time:

$c,\sigma=\mathrm{NeRF}(\mathbf{x},t).$

3. What

在这里插入图片描述

The network to learn the Gaussian deformation field includes an efficient spatial-temporal structure encoder $\mathcal{H}$ and a Gaussian deformation decoder $\mathcal{D}$ for predicting the deformation of each 3D Gaussian.

3.1 Spatial-Temporal Structure Encoder

The input is a 4D data containing $x, y, z, t$ . It will be represented by six 2D planes about ${(x,y),(x,z),(y,z),(x,t),(y,t),(z,t)\}$ . Each 2D plane will have a resolution, that is the canonical space with a fixed size, and each point such as $(x, t)$ will contain information about the characteristics of the change in x-coordinate at different time points. Similarly, the $x y$ plane captures features at different spatial locations (x and y coordinates).

Meanwhile, the 2D planes have an upsample level just like the mipmapping. In the calculation of the feature, it will use interpolation between the two adjacent layers: $f_{h}=\bigcup_{l}\prod\mathrm{interp}(R_{l}(i,j)).$

Then, the plane information will become a vector with six values and pass an MLP to the decoder.

3.2 Multi-head Gaussian Deformation Decoder

When all the features of 3D Gaussians are encoded, we can compute any desired variable with a multi-head Gaussian deformation decoder $\mathcal{D}=\{\phi_{x},\phi_{r},\phi_{s}\}$ ：

$\Delta\mathcal{X}=\phi_{x}(f_{d}),\Delta r=\phi_{r}(f_{d}),\Delta s=\phi_{s}(f_{d}).$

So finally, we can obtain the deformer 3D Gaussians:

$(\mathcal X',r',s',\sigma, \mathcal C)=(\mathcal X+\Delta\mathcal X,r+\Delta r,s+\Delta s,\sigma, \mathcal C).$

小白有颗大白梦

关注

7
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
【读论文】【速读】4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering论文阅读笔记
复制链接

扫一扫