【读论文】SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

1. What

What kind of thing is this article going to do (from the abstract and conclusion, try to summarize it in one sentence)

Inputting a monocular dynamic video, this paper uses sparse control points to drive Gaussian. Each control point has 6 DoF which is time-varying and can be predicted by a MLP. This method can enable dynamic view synthesis and motion editing but still has some limitations in inaccurate poses or intense movements.

2. Why

Under what conditions or needs this research plan was proposed (Intro), what problems/deficiencies should be solved at the core, what others have done, and what are the innovation points? (From Introduction and related work)

Maybe contain Background, Question, Others, Innovation:

Nerf-based methods struggle with low rendering qualities, speeds, and high memory usage. Existing 3D-GS only applies to static scenes. An intuitive method [47] involves learning a flow vector for each 3D Gaussian, but it incurs a significant time cost for training and inference(The author of 47 is a co-author of this).

Related work:

  • Dynamic NeRF

  • Dynamic Gaussian Splatting

  • 3D Deformation and Editing

    This part is relatively unfamiliar. It introduces the traditional editing methods in graphics which focus on preserving the geometric details of 3D objects during the deformation process, containing some descriptors like Laplacian coordinates, Poisson equation, and cage-based approaches.

    Recently, there have been other approaches that aim to edit the scene geometry learned from 2D images. This paper belongs to this class.

3. How

3.1 Sparse Control Points

We will first introduce the definition of control points, which is the core concept used in this article.

There are a set of sparse control points P = { ( p i ∈ R 3 , o i ∈ R + ) } , i ∈ { 1 , 2 , ⋯   , N p } \mathcal{P}=\{(p_{i}\in\mathbb{R}^{3},o_{i}\in\mathbb{R}^{+})\},i\in \{1,2,\cdots,N_{p}\} P={(piR3,oiR+)},i{1,2,,Np}. And o i o_i oi is a learnable radius parameter that controls how the impact of a control point on a Gaussian.

Meanwhile, for each control point k k k, we learn time-varying 6 DoF transformations [ R i t ∣ T i t ] ∈ S E ( 3 ) [R_i^t|T_i^t]\in\mathbf{SE}(3) [RitTit]SE(3) , consisting of a local frame rotation matrix R i t ∈ S O ( 3 ) R_i^t\in\mathbf{SO}(3) RitSO(3) and a translation vector T i t ∈ R 3 T_i^t\in\mathbb{R}^3 TitR3. But instead of directly optimizing the transformation parameters, we employ an MLP Ψ \Psi Ψ to learn a time-varying transformation field:

Ψ : ( p i , t ) → ( R i t , T i t ) . \Psi:(p_{i},t)\rightarrow(R_{i}^{t},T_{i}^{t}). Ψ:(pi,t)(Rit,Tit).

3.2 Dynamic Scene Rendering

After having some control points, we need to found the connection between it with the Gaussian.

We use the k-nearest neighbor (KNN) search to obtain its K(= 4) neighboring control points denoted as { p k ∣ k ∈ N j } \{p_{k}|k\in\mathcal{N}_{j}\} {pkkNj}. Then define its weight as:

w j k = w ^ j k ∑ k ∈ N j w ^ j k , where w ^ j k = exp ⁡ ( − d j k 2 2 o k 2 ) , w_{jk}=\frac{\hat w_{jk}}{\sum\limits_{k\in\mathcal{N}_{j}}\hat w_{jk}},\text{where}\hat w_{jk}=\exp(-\frac{d_{jk}^{2}}{2o_{k}^{2}}), wjk=kNjw^jkw^jk,wherew^jk=exp(2ok2djk2),

where d j k d_{jk} djk is the distance between the center of Gaussian G j G_j Gj and the neighboring control point p k p_k pk, and o k o_k ok is the learned radius parameter of neighboring control points p k . p_k. pk.

Then learn from the ideas of Linear Blend Skinning (LBS), which tells us each vertex in the model is assigned a weight for each bone indicating how much influence that bone has over the vertex’s position. When a bone moves, each vertex moves according to a weighted average of the transformations (rotations and translations) of all the bones that influence it. So we can adjust the Gaussian by:

μ j t = ∑ k ∈ N j w j k ( R k t ( μ j − p k ) + p k + T k t ) q j t = ( ∑ k ∈ N j w j k r k t ) ⊗ q j , \begin{aligned}\mu_j^t&=\sum_{k\in\mathcal{N}_j}w_{jk}\left(R_k^t(\mu_j-p_k)+p_k+T_k^t\right)\\q_j^t&=(\sum_{k\in\mathcal{N}_j}w_{jk}r_k^t)\otimes q_j,\end{aligned} μjtqjt=kNjwjk(Rkt(μjpk)+pk+Tkt)=(kNjwjkrkt)qj,

The new rotation is the weighted average of the rotations of the neighboring control points, represented as quaternions r k t r_k^t rkt (the same meaning as R k t R_k^t Rkt, just different in the mathematical form). This average quaternion is then multiplied by the original rotation quaternion of the Gaussian using the quaternion product, which combines the two rotations in a way that is appropriate for 3D rotations.

3.3 Optimization

We have known how to control Gaussian by the control points. Then we will introduce two small strategies in optimization.

  1. ARAP Loss

A loss function was introduced from the paper “As-rigid-as-possible surface modeling”, which helps maintain rigidity.

Firstly, it defines points’ trajectories p i t r a j p_i^{traj} pitraj in the scene motion as:

p i traj = 1 N t p i t 1 ⊕ p i t 2 ⊕ ⋯ ⊕ p i t N t , p_i^{\text{traj}}=\frac{1}{N_t}p_i^{t_1}\oplus p_i^{t_2}\oplus\cdots\oplus p_i^{t_{N_t}}, pitraj=Nt1pit1pit2pitNt,

where p i t p_i^{t} pit represents the position of point i i i in time t t t.

Then, a local neighborhood for each control point is determined via ball queries, which means finding all control points within a predefined radius to define a local area of influence. To calculate L a r a p \mathcal{L}_\mathrm{arap} Larap, we randomly sample two time steps t 1 t_1 t1 and t 2 t_2 t2. For each point p k p_k pk within the radius ( i . e .   k ∈ N c i ) (i.e. \: k\in\mathcal{N}_{\mathrm{c}i}) (i.e.kNci), its transformed locations with learned translation parameters T k t 1 T_k^{t_1} Tkt1 and T k t 2 T_k^{t_2} Tkt2 are: p k t 1 = p k + T k t 1 p_k^{t_1}=p_k+T_k^{t_1} pkt1=pk+Tkt1 and p k t 2 = p k + T k t 2 p{k}^{t_{2}}=p_{k}+T_{k}^{t_{2}} pkt2=pk+Tkt2, thus the new rotation matrix R ^ i \hat{R}_{i} R^i can be estimated as:

R ^ i = arg ⁡ min ⁡ R ∈ S O ( 3 ) ∑ k ∈ N c i w i k ∣ ∣ ( p i t 1 − p k t 1 ) − R ( p i t 2 − p k t 2 ) ∣ ∣ 2 . \hat{R}_{i}=\arg\min_{R\in\mathbf{SO}(3)}\sum_{k\in\mathcal{N}_{c_{i}}}w_{ik}||(p_{i}^{t_{1}}-p_{k}^{t_{1}})-R(p_{i}^{t_{2}}-p_{k}^{t_{2}})||^{2}. R^i=argRSO(3)minkNciwik∣∣(pit1pkt1)R(pit2pkt2)2.

Finally, L a r a p \mathcal{L}_\mathrm{arap} Larap can be calculated as:

L a r a p ( p i , t 1 , t 2 ) = ∑ k ∈ N c i w i k ∣ ∣ ( p i t 1 − p k t 1 ) − R ^ i ( p i t 2 − p k t 2 ) ∣ ∣ 2 . {\mathcal L}_{\mathrm{arap}}(p_{i},t_{1},t_{2})=\sum_{k\in{\mathcal N}_{c_{i}}}w_{ik}||(p_{i}^{t_{1}}-p_{k}^{t_{1}})-\hat{R}_{i}(p_{i}^{t_{2}}-p_{k}^{t_{2}})||^{2}. Larap(pi,t1,t2)=kNciwik∣∣(pit1pkt1)R^i(pit2pkt2)2.

  1. Adaptive Control Points

Another strategy, very similar to the Adaptive Control of Gaussian.

  • Prune: calculate its overall impact W i = ∑ j ∈ N ~ i w j i W_i= \sum_{j\in\tilde{N}i}w_{ji} Wi=jN~iwji on the set of Gaussians j ∈ N ~ i j\in\tilde{\mathcal{N}}_i jN~i whose K nearest neighbors include p i . p_i. pi. Then, we prune p i p_i pi if W i W_i Wi is close to zero, indicating little contribution to the motion of 3D Gaussians.

  • Clone: calculate the summation of the Gaussian gradient norm as:

    g i = ∑ j ∈ N ~ i w ~ j ∥ d L d μ j ∥ 2 2 , w h e r e w ~ j = w j i ∑ j ∈ N ~ k w j i . g_{i}=\sum_{j\in\tilde{\mathcal{N}}_{i}}\tilde{w}_{j}\|\frac{d\mathcal{L}}{d\mu_{j}}\|_{2}^{2},\mathrm{where}\tilde{w}_{j}=\frac{w_{ji}}{\sum\limits_{j\in\tilde{\mathcal{N}}_{k}}w_{ji}}. gi=jN~iw~jdμjdL22,wherew~j=jN~kwjiwji.

    A large g k g_k gk indicates poor reconstruction. And add a new point p k ′ p_k^{'} pk as

    p k ′ = ∑ j ∈ N ~ k w ~ i μ j ; σ k ′ = σ k . p_k^{\prime}=\sum_{j\in\tilde{\mathcal{N}}_k}\tilde{w}_i\mu_j;\sigma_k^{\prime}=\sigma_k. pk=jN~kw~iμj;σk=σk.

Now, we can utilize this pipeline to understand every process.

在这里插入图片描述

3.4 Motion Editing

Given a set of user-defined handle points { h l ∈ R 3 ∣ l ∈ H ⊂ { 1 , 2 , ⋯   , N p } } \{h_l\in\mathbb{R}^3\mid l\in\mathcal{H}\subset \{1,2,\cdots,N_p\}\} {hlR3lH{1,2,,Np}}, the control graph P ′ P^{\prime} P can be deformed by minimizing the APAR energy formulated as:

E ( P ′ ) = ∑ i = 1 N p ∑ j ∈ N i w i j ∣ ∣ ( p i ′ − p j ′ ) − R ^ i ( p i − p j ) ∣ ∣ 2 , E(\mathcal{P}^{\prime})=\sum_{i=1}^{N_{p}}\sum_{j\in\mathcal{N}_{i}}w_{ij}||(p_{i}^{\prime}-p_{j}^{\prime})-\hat{R}_{i}(p_{i}-p_{j})||^{2}, E(P)=i=1NpjNiwij∣∣(pipj)R^i(pipj)2,

with the fixed position condition p l ′ = h l   f o r   l ∈ H . p_{l}^{\prime}=h_{l} \: \mathrm{for}\:l\in\mathcal{H}. pl=hlforlH.

4. Self-thoughts

  1. Only use one camera, if we use six cameras like AD, how to combine them?
  2. The edit of Gaussian is with the help of physical rigid body motion.
  3. ARAP can be replaced by ARAPReg.
  4. Only suitable for small objects with meaningful control points. Can we use only a few control points like human posture representation?
Python网络爬虫与推荐算法新闻推荐平台:网络爬虫:通过Python实现新浪新闻的爬取,可爬取新闻页面上的标题、文本、图片、视频链接(保留排版) 推荐算法:权重衰减+标签推荐+区域推荐+热点推荐.zip项目工程资源经过严格测试可直接运行成功且功能正常的情况才上传,可轻松复刻,拿到资料包后可轻松复现出一样的项目,本人系统开发经验充足(全领域),有任何使用问题欢迎随时与我联系,我会及时为您解惑,提供帮助。 【资源内容】:包含完整源码+工程文件+说明(如有)等。答辩评审平均分达到96分,放心下载使用!可轻松复现,设计报告也可借鉴此项目,该资源内项目代码都经过测试运行成功,功能ok的情况下才上传的。 【提供帮助】:有任何使用问题欢迎随时与我联系,我会及时解答解惑,提供帮助 【附带帮助】:若还需要相关开发工具、学习资料等,我会提供帮助,提供资料,鼓励学习进步 【项目价值】:可用在相关项目设计中,皆可应用在项目、毕业设计、课程设计、期末/期中/大作业、工程实训、大创等学科竞赛比赛、初期项目立项、学习/练手等方面,可借鉴此优质项目实现复刻,设计报告也可借鉴此项目,也可基于此项目来扩展开发出更多功能 下载后请首先打开README文件(如有),项目工程可直接复现复刻,如果基础还行,也可在此程序基础上进行修改,以实现其它功能。供开源学习/技术交流/学习参考,勿用于商业用途。质量优质,放心下载使用。
1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值