【读论文】Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering

最新推荐文章于 2024-06-09 09:36:12 发布

小白有颗大白梦

最新推荐文章于 2024-06-09 09:36:12 发布

阅读量1.1k

点赞数 19

分类专栏：读论文NeRF NeRF学习文章标签： NeRF Gaussian 计算机视觉图形渲染

本文链接：https://blog.csdn.net/weixin_62012485/article/details/137370579

版权

NeRF学习同时被 2 个专栏收录

18 篇文章 1 订阅

订阅专栏

读论文NeRF

14 篇文章 0 订阅

订阅专栏

本文介绍了一种利用周期振动表示动态场景特征的方法，提出了一种新的时间平滑机制和位置感知的自适应控制策略。通过结合图像、LiDAR点云和光学流，研究解决了如何处理动态场景中的时空一致性问题，特别强调了在自动驾驶等场景中动态元素的处理。

摘要由CSDN通过智能技术生成

文章目录

1. What

What kind of thing is this article going to do (from the abstract and conclusion, try to summarize it in one sentence)

Utilizing periodic vibration-based temporal dynamics to represent the characteristics of various objects and elements in dynamic urban scenes. A novel temporal smoothing mechanism and a position-aware adaptive control strategy were also introduced to handle temporally coherent. Finish the reconstruction of dynamic scenes.

2. Why

Under what conditions or needs this research plan was proposed (Intro), what problems/deficiencies should be solved at the core, what others have done, and what are the innovation points? (From Introduction and related work)

Maybe contain Background, Question, Others, Innovation:

Introduction:

NSG: Decomposes dynamic scenes into scene graphs
PNF: Decomposes scenes into objects and backgrounds, incorporating a panoptic segmentation auxiliary task.
SUDS: Use optical flow(Optical flow helps in identifying which parts of the scene are static and which are dynamic)
EmerNeRF: Use a self-supervised method to reduce dependence on optical flow.

Related work:

Dynamic scene models
- In one research direction, certain studies [2, 7, 11, 18, 38] introduce time as an additional input to the radiance field, treating the scene as a 6D plenoptic function. However, this approach couples positional variations induced by temporal dynamics with the radiance field, lacking geometric priors about how time influences the scene.
- An alternative approach [1, 20, 24–26, 33, 40] focuses on modeling the movement or deformation of specific static structures, assuming that the dynamics arise from these static elements within the scene.
- Gaussian-based
Urban scene reconstruction

One research avenue(NeRF-based) has focused on enhancing the modeling of static street scenes by utilizing scalable representations [19, 28, 32, 34], achieving high-fidelity surface reconstruction [14, 28, 39], and incorporating multi-object composition [43]. However, these methods face difficulties in handling dynamic elements commonly encountered in autonomous driving contexts.
Another research direction seeks to address these challenges. Notably, these techniques require additional input, such as leveraging panoptic segmentation to refine the dynamics of reconstruction [PNF]. [Street Gaussians, Driving Gaussian] decompose the scene with different sets of Gaussian points by bounding boxes. However, they all need manually annotated or predicted bounding boxes and have difficulty reconstructing the non-rigid objects.

3. How

The input data contain images, represented as $\{\mathcal{I}_{i},t_{i},\mathbf{E}_{i},\mathbf{I}_{i}|i=1,2,\ldots N_{c}\}$ and LiDAR point clouds represented as $\{(x_i,y_i,z_i,t_i)|i=1,2,\ldots N_l\}$ . The rendering process can be represented as $\hat{\mathcal{I}}=\mathcal{F}_{\theta}(\mathbf{E}_{o},\mathbf{I}_{o},t)$ , which shows the image at any timestamp $t$ and camera pose $(\mathbf{E}_{o},\mathbf{I}_{o})$ .

3.1 PVG

The motivation behind this concept is to assign a distinct lifespan to each Gaussian
point, defining when it actively contributes and to what degree.

Overall pipeline:
Mathematically,
The Gaussian model can be denoted as $\mathcal H(t)=\{\widetilde{\boldsymbol\mu}(t),\boldsymbol q,\boldsymbol s,\widetilde o(t),\boldsymbol c\},$ where,
$\begin{aligned}\widetilde{\boldsymbol{\mu}}(t)&=\boldsymbol{\mu}+\frac{l}{2\pi}\cdot\sin(\frac{2\pi(t-\tau)}{l})\cdot\boldsymbol{v}\\\widetilde{o}(t)&=o\cdot e^{-\frac{1}{2}(t-\tau)^2\beta^{-2}}.\end{aligned}$
These told me how the mean value and opacity change over time. And we list the parameters that need to be learned:
$\{\boldsymbol{\mu},\boldsymbol{q},\boldsymbol{s},o,\boldsymbol{c},\tau,\beta,\boldsymbol{v}\}，$
where $\tau$ is the life span and $\boldsymbol{v}$ is the velocity, indicating the direction and value at time $t$ . Notice $l$ is the scene prior that we don’t need to learn.
Definition of staticness coefficient
We define $\rho=\frac{\beta}{l}$ to quantify the degree of staticness exhibited by a PVG point. In this formulation, $\beta$ is the decay rate of opacity, which is positively related to lifespan, and $l$ is a hyper-parameter as a prior. The dynamic aspects of a scene are particularly evident in points with small ρ. At a specific timestamp $t$ , dynamic objects are more likely to be predominantly represented by points with $\tau$ close to $t$ .

3.2 Position-aware point adaptive control

The adaptive control method in the original Gaussian Splatting doesn’t fit the urban scene. So they use $\boldsymbol{s}$ in the Gaussian model to adjust the size of Gaussian.

That is firstly defining a scale factor:

$\left.\gamma(\boldsymbol{\mu})=\left\{\begin{matrix}1&\text{if}\quad\|\boldsymbol{\mu}\|_2<2r\\\|\boldsymbol{\mu}\|_2/r-1&\text{if}\quad\|\boldsymbol{\mu}\|_2\geq2r\end{matrix}\right.\right.$

Then compare $ma x (s)$ with $\boldsymbol{\mu}$ multiple a threshold $g$ . If $\max(\boldsymbol{s})\leq g\cdot\gamma(\boldsymbol{\mu})$ the PVG points will clone and choose another threshold $b$ to decide whether this point needs to be pruned.

3.3 Model training

Temporal smoothing by intrinsic motion
In PVG, individual points encompass only a narrow time window, resulting in constrained training data and an increased susceptibility to overfitting.
To eliminate the dependence on optical flow estimation, this paper introduces the average velocity metric:
$\bar{\boldsymbol{v}}=\left.\frac{\mathrm d\widetilde{\boldsymbol{\mu}}(t)}{\mathrm dt}\right|_{t=\tau}\cdot\exp(-\frac{\rho}{2})=\boldsymbol{v}\cdot\exp(-\frac{\rho}{2}).$
Then considering that dynamic objects often maintain a constant speed
within a short time interval, so the update policy is:
$\widehat{\mathcal{H}}(t_2)=\{\widetilde{\boldsymbol{\mu}}(t_1)+\bar{\boldsymbol{v}}\cdot\Delta t,\boldsymbol{q},\boldsymbol{s},\widetilde{\boldsymbol{o}}(t_1),\boldsymbol{c}\}.$
Sky refinement
The calculation of color was corrected to:
$C_{f}=C+(1-O)C_{sky},$
where $O$ represents the rendered opacity. That is to say the remain opacity was filled by the sky.
Loss function
$\mathcal{L}=(1-\lambda_{r})\mathcal{L}_{1}+\lambda_{r}\mathcal{L}_{\mathrm{ssim}}+\lambda_{d}\mathcal{L}_{d}+\lambda_{o}\mathcal{L}_{o}+\lambda_{\bar{\boldsymbol{v}}}\mathcal{L}_{\bar{\boldsymbol{v}}}$
It considers the impacts of the image, LiDAR, sky, and velocity.

小白有颗大白梦

关注

19
点赞
踩
14

收藏

觉得还不错? 一键收藏
0
评论
【读论文】Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering

Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering阅读笔记
复制链接

扫一扫