文章目录
1. What
A novel point-based approach with a novel Dual-Domain Deformation Model for dynamic scene reconstruction.
Contribution:
- Gaussian-Flow, which is a novel point-based differentiable rendering approach for dynamic 3D scene reconstruction, setting a new sota for training speed, rendering FPS, and novel view synthesis quality for 4D scene reconstruction.
- Propose a Dual-Domain Deformation Model for efficient 4D scene training and rendering, which preserves a running speed on par with the original 3DGS with minimum overhead.
- Can be used for downstream tasks
2. Why
2.1 Introduction
- NeRF still remains a challenge for high-fidelity real-time rendering.
- 3DGS has been used on 4D tasks but it significantly lowers the rendering speed of the original 3DGS.
2.2 Related work
Remarkable work
- Dynamic Neural Radiance Field: dynamic neural scene flow methods have been proposed [27, 30],
- Accelerated Neural Radiance Field
- Differentiable Point-based Rendering: PointRF [41], DSS [39], and 3D Gaussians splatting(3DGS) [13].
3. How
3.1 Dual-Domain Deformation Model
Assume that only the rotation q q q, radiance c c c, and position μ \mu μ of a 3D Gaussian particle change over time, while the scaling s s s and opacity α \alpha α remain constant.
Then, we use a time-dependent attribute residual D ( t ) D(t) D(t) to adjust the error between the base attribute S 0 ∈ { μ 0 , c 0 , q 0 } S_{0}\in\{\mu_{0},c_{0},q_{0}\} S0∈{μ0,c0,q0} and the attribute at time t t t. This is:
S ( t ) = S 0 + D ( t ) , S(t)=S_{0}+D(t), S(t)=S0+D(t),
where
D
(
t
)
=
P
N
(
t
)
+
F
L
(
t
)
D(t)=P_{N}(t)+F_{L}(t)
D(t)=PN(t)+FL(t) is combined by a polynomial
P
N
(
t
)
P_{N}(t)
PN(t) with coefficients
a
=
{
a
}
n
=
0
N
a=\{a\}_{n=0}^{N}
a={a}n=0N
and a Fourier series
F
L
(
t
)
F_{L}(t)
FL(t) with coefficients
f
=
{
f
s
i
n
l
,
f
c
o
s
l
}
l
=
0
L
f=\{f_{sin}^{l},f_{cos}^{l}\}_{l=0}^{L}
f={fsinl,fcosl}l=0L. These are respectively denoted as:
P N ( t ) = ∑ n = 0 N a n t n , F L ( t ) = ∑ l = 1 L ( f s i n l cos ( l t ) + f c o s l sin ( l t ) ) . P_{N}(t)=\sum_{n=0}^{N}a_{n}t^{n},\\F_{L}(t)=\sum_{l=1}^{L}\left(f_{sin}^{l}\cos(lt)+f_{cos}^{l}\sin(lt)\right). PN(t)=n=0∑Nantn,FL(t)=l=1∑L(fsinlcos(lt)+fcoslsin(lt)).
Notice that we assume different dimensions of an attribute are independently changed over time. For instance, we utilize { D μ i ( t ) } i = 0 3 \{D_{\mu_{i}}(t)\}_{i=0}^{3} {Dμi(t)}i=03 to describe the motion of a 3D position μ \mu μ.
Meanwhile, we need to distinguish the function between polynomials and Fourier. The Fourier series excels at capturing the variations associated with violent motions while polynomials yield a good fit with smooth motion with a small order of polynomials.
3.2 Adaptive Timestemp Scaling
In the actual process, we use the normalized time
t
t
t, ranging from 0 to 1. But adhering to the standard temporal division would necessitate an exceedingly large coefficient to accommodate highly intense movements within a very short
time frame. So we use a dilation factor
λ
\lambda
λ to scale the temporal input for each Gaussian point:
t s = λ s ⋅ t + λ b t_s=\lambda_s\cdot t+\lambda_b ts=λs⋅t+λb
To summarize, in our dynamic scene setting, a Gaussian particle contains multiple attributes to be optimized, including base attributes { μ 0 , q 0 , s 0 , c 0 , α 0 } \{\mu_0,q_0,s_0,c_0,\alpha_0\} {μ0,q0,s0,c0,α0}at reference frame t 0 t_0 t0 polynomial coefficients and Fourier coefficients in { D μ ( t ) , D q ( t ) , D c ( t ) } \{D_{\boldsymbol{\mu}}(t),D_{\boldsymbol{q}}(t),D_{\boldsymbol{c}}(t)\} {Dμ(t),Dq(t),Dc(t)}
3.3 Regularizations
-
Time Smoothness Loss: To ensure temporal smoothness over time, the time smoothness term is defined as
L t = ∥ D ( t ) − D ( t + ϵ ) ∥ 2 \mathcal{L}_t=\|D(t)-D(t+\epsilon)\|_2 Lt=∥D(t)−D(t+ϵ)∥2
where ϵ = 0.1 / f r a m e s \epsilon = 0.1/frames ϵ=0.1/frames.
-
KNN Rigid Loss: The local rigid constraint is incorporated in every latter stage, and it is defined as:
L s = ∑ j ∈ N i ∥ D ( t ) i − D ( t ) j ∥ 2 \mathcal{L}_s=\sum_{j\in\mathcal{N}_i}\|D(t)_i-D(t)_j\|_2 Ls=j∈Ni∑∥D(t)i−D(t)j∥2
where N \mathcal{N} N represents the K K K nearest neighbor of i i i-th Gaussian.
3.4 Experiments
- Two datasets: Plenoptic Video dataset and HyperNeRF dataset
- Ablation study
- Deformation Models(Fourier or polynomial)
- Regularizations(Time or KNN)
- Quantitative Comparisons
- Qualitative Comparisons