【读论文】Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering

1. What

What kind of thing is this article going to do (from the abstract and conclusion, try to summarize it in one sentence)

Utilizing periodic vibration-based temporal dynamics to represent the characteristics of various objects and elements in dynamic urban scenes. A novel temporal smoothing mechanism and a position-aware adaptive control strategy were also introduced to handle temporally coherent. Finish the reconstruction of dynamic scenes.

2. Why

Under what conditions or needs this research plan was proposed (Intro), what problems/deficiencies should be solved at the core, what others have done, and what are the innovation points? (From Introduction and related work)

Maybe contain Background, Question, Others, Innovation:

Introduction:

  1. NSG: Decomposes dynamic scenes into scene graphs
  2. PNF: Decomposes scenes into objects and backgrounds, incorporating a panoptic segmentation auxiliary task.
  3. SUDS: Use optical flow(Optical flow helps in identifying which parts of the scene are static and which are dynamic)
  4. EmerNeRF: Use a self-supervised method to reduce dependence on optical flow.

Related work:

  1. Dynamic scene models
    • In one research direction, certain studies [2, 7, 11, 18, 38] introduce time as an additional input to the radiance field, treating the scene as a 6D plenoptic function. However, this approach couples positional variations induced by temporal dynamics with the radiance field, lacking geometric priors about how time influences the scene.
    • An alternative approach [1, 20, 24–26, 33, 40] focuses on modeling the movement or deformation of specific static structures, assuming that the dynamics arise from these static elements within the scene.
    • Gaussian-based
  2. Urban scene reconstruction
  • One research avenue(NeRF-based) has focused on enhancing the modeling of static street scenes by utilizing scalable representations [19, 28, 32, 34], achieving high-fidelity surface reconstruction [14, 28, 39], and incorporating multi-object composition [43]. However, these methods face difficulties in handling dynamic elements commonly encountered in autonomous driving contexts.
  • Another research direction seeks to address these challenges. Notably, these techniques require additional input, such as leveraging panoptic segmentation to refine the dynamics of reconstruction [PNF]. [Street Gaussians, Driving Gaussian] decompose the scene with different sets of Gaussian points by bounding boxes. However, they all need manually annotated or predicted bounding boxes and have difficulty reconstructing the non-rigid objects.

3. How

The input data contain images, represented as { I i , t i , E i , I i ∣ i = 1 , 2 , … N c } \{\mathcal{I}_{i},t_{i},\mathbf{E}_{i},\mathbf{I}_{i}|i=1,2,\ldots N_{c}\} {Ii,ti,Ei,Iii=1,2,Nc} and LiDAR point clouds represented as { ( x i , y i , z i , t i ) ∣ i = 1 , 2 , … N l } \{(x_i,y_i,z_i,t_i)|i=1,2,\ldots N_l\} {(xi,yi,zi,ti)i=1,2,Nl}. The rendering process can be represented as I ^ = F θ ( E o , I o , t ) \hat{\mathcal{I}}=\mathcal{F}_{\theta}(\mathbf{E}_{o},\mathbf{I}_{o},t) I^=Fθ(Eo,Io,t), which shows the image at any timestamp t t t and camera pose ( E o , I o ) (\mathbf{E}_{o},\mathbf{I}_{o}) (Eo,Io).

3.1 PVG

The motivation behind this concept is to assign a distinct lifespan to each Gaussian
point, defining when it actively contributes and to what degree.

  1. Overall pipeline:
    在这里插入图片描述

  2. Mathematically,
    The Gaussian model can be denoted as H ( t ) = { μ ~ ( t ) , q , s , o ~ ( t ) , c } , \mathcal H(t)=\{\widetilde{\boldsymbol\mu}(t),\boldsymbol q,\boldsymbol s,\widetilde o(t),\boldsymbol c\}, H(t)={μ (t),q,s,o (t),c}, where,
    μ ~ ( t ) = μ + l 2 π ⋅ sin ⁡ ( 2 π ( t − τ ) l ) ⋅ v o ~ ( t ) = o ⋅ e − 1 2 ( t − τ ) 2 β − 2 . \begin{aligned}\widetilde{\boldsymbol{\mu}}(t)&=\boldsymbol{\mu}+\frac{l}{2\pi}\cdot\sin(\frac{2\pi(t-\tau)}{l})\cdot\boldsymbol{v}\\\widetilde{o}(t)&=o\cdot e^{-\frac{1}{2}(t-\tau)^2\beta^{-2}}.\end{aligned} μ (t)o (t)=μ+2πlsin(l2π(tτ))v=oe21(tτ)2β2.
    These told me how the mean value and opacity change over time. And we list the parameters that need to be learned:
    { μ , q , s , o , c , τ , β , v } , \{\boldsymbol{\mu},\boldsymbol{q},\boldsymbol{s},o,\boldsymbol{c},\tau,\beta,\boldsymbol{v}\}, {μ,q,s,o,c,τ,β,v}
    where τ \tau τ is the life span and v \boldsymbol{v} v is the velocity, indicating the direction and value at time t t t. Notice l l l is the scene prior that we don’t need to learn.

  3. Definition of staticness coefficient
    We define ρ = β l \rho=\frac{\beta}{l} ρ=lβ to quantify the degree of staticness exhibited by a PVG point. In this formulation, β \beta β is the decay rate of opacity, which is positively related to lifespan, and l l l is a hyper-parameter as a prior. The dynamic aspects of a scene are particularly evident in points with small ρ. At a specific timestamp t t t, dynamic objects are more likely to be predominantly represented by points with τ \tau τ close to t t t.

3.2 Position-aware point adaptive control

The adaptive control method in the original Gaussian Splatting doesn’t fit the urban scene. So they use s \boldsymbol{s} s in the Gaussian model to adjust the size of Gaussian.

That is firstly defining a scale factor:

γ ( μ ) = { 1 if ∥ μ ∥ 2 < 2 r ∥ μ ∥ 2 / r − 1 if ∥ μ ∥ 2 ≥ 2 r \left.\gamma(\boldsymbol{\mu})=\left\{\begin{matrix}1&\text{if}\quad\|\boldsymbol{\mu}\|_2<2r\\\|\boldsymbol{\mu}\|_2/r-1&\text{if}\quad\|\boldsymbol{\mu}\|_2\geq2r\end{matrix}\right.\right. γ(μ)={1μ2/r1ifμ2<2rifμ22r

Then compare m a x ( s ) max(s) max(s) with μ \boldsymbol{\mu} μ multiple a threshold g g g. If max ⁡ ( s ) ≤ g ⋅ γ ( μ ) \max(\boldsymbol{s})\leq g\cdot\gamma(\boldsymbol{\mu}) max(s)gγ(μ) the PVG points will clone and choose another threshold b b b to decide whether this point needs to be pruned.

3.3 Model training

  1. Temporal smoothing by intrinsic motion
    In PVG, individual points encompass only a narrow time window, resulting in constrained training data and an increased susceptibility to overfitting.
    To eliminate the dependence on optical flow estimation, this paper introduces the average velocity metric:
    v ˉ = d μ ~ ( t ) d t ∣ t = τ ⋅ exp ⁡ ( − ρ 2 ) = v ⋅ exp ⁡ ( − ρ 2 ) . \bar{\boldsymbol{v}}=\left.\frac{\mathrm d\widetilde{\boldsymbol{\mu}}(t)}{\mathrm dt}\right|_{t=\tau}\cdot\exp(-\frac{\rho}{2})=\boldsymbol{v}\cdot\exp(-\frac{\rho}{2}). vˉ=dtdμ (t) t=τexp(2ρ)=vexp(2ρ).
    Then considering that dynamic objects often maintain a constant speed
    within a short time interval, so the update policy is:
    H ^ ( t 2 ) = { μ ~ ( t 1 ) + v ˉ ⋅ Δ t , q , s , o ~ ( t 1 ) , c } . \widehat{\mathcal{H}}(t_2)=\{\widetilde{\boldsymbol{\mu}}(t_1)+\bar{\boldsymbol{v}}\cdot\Delta t,\boldsymbol{q},\boldsymbol{s},\widetilde{\boldsymbol{o}}(t_1),\boldsymbol{c}\}. H (t2)={μ (t1)+vˉΔt,q,s,o (t1),c}.

  2. Sky refinement
    The calculation of color was corrected to:
    C f = C + ( 1 − O ) C s k y , C_{f}=C+(1-O)C_{sky}, Cf=C+(1O)Csky,
    where O O O represents the rendered opacity. That is to say the remain opacity was filled by the sky.

  3. Loss function
    L = ( 1 − λ r ) L 1 + λ r L s s i m + λ d L d + λ o L o + λ v ˉ L v ˉ \mathcal{L}=(1-\lambda_{r})\mathcal{L}_{1}+\lambda_{r}\mathcal{L}_{\mathrm{ssim}}+\lambda_{d}\mathcal{L}_{d}+\lambda_{o}\mathcal{L}_{o}+\lambda_{\bar{\boldsymbol{v}}}\mathcal{L}_{\bar{\boldsymbol{v}}} L=(1λr)L1+λrLssim+λdLd+λoLo+λvˉLvˉ
    It considers the impacts of the image, LiDAR, sky, and velocity.

  • 19
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值