Phys4DGen：基于物理学的图像到 4D 生成-CSDN博客

本文链接：https://blog.csdn.net/m0_51976564/article/details/144419182

Paper: Lin J, Wang Z, Jiang S, et al. Phys4DGen: A Physics-Driven Framework for Controllable and Efficient 4D Content Generation from a Single Image[J]. arXiv preprint arXiv:2411.16800, 2024.
Introduction: https://jiajinglin.github.io/Phys4DGen/
Code: Unreleased

Phys4DGen 是一种 image-to-4D 方法，完成的任务是根据用户的输入图像和对应外力生成符合物理规律的 4D GS 场景。Phys4DGen 的生成过程本质上是通过物理仿真方法模拟的，所以不再需要传统 image-to-4D 任务中进行的大量多视角监督。
在这里插入图片描述

Phys4DGen 的过程如下：

先根据输入图像生成静态 GS 场景；
PPM 模块先分割出不同实例，再预测每个部分的物理属性及其参数，并将其传给 Gaussians；
根据预测的物理属性和外力模拟动态场景。

在这里插入图片描述

一. 3D Gaussians Generation

使用 image-to-3D 的方法生成 3D GS 场景。

二. Physcial Perception

先根据物体表面材质对 3D GS 场景中的 Gassians 进行分类，再使用视觉大模型预测每种材质的物理属性及参数。

1. Material Segmentation

分割材质时，先使用 SAM 对输入图像 $\mathbf{I}_0$ 和 3D GS 场景渲染的 N 张图像 $\mathcal{I}=\left\{\mathbf{I}_o\right\}_{o=1}^N$ 做实例分割，然后再将 N 张渲染图像的实例向输入图像对齐。具体来说，为了保证渲染图像 $\mathbf{I}_o$ 的分割实例对齐输入图像 $\mathbf{I}_0$ ，分别计算其所有实例的 mask 的 CLIP 特征，然后和输入图像的所有实例的 mask 的 CLIP 特征计算相似度，匹配最大值：
$\begin{gathered} \mathbf{L}_0(m)=\mathbf{V}\left(\mathbf{I}_0 \odot \mathbf{M}_0(m)\right. \\ \mathbf{L}_o(k)=\mathbf{V}\left(\mathbf{I}_o \odot \mathbf{M}_o(k)\right) \\ \mathbf{M}_o(k)=\arg \max \left\{\mathbf{L}_o(k) \cdot \mathbf{L}_0(m)\right\}_m^{\mathcal{M}} \\ \end{gathered}$

所有渲染图像的分割实例对齐输入图像后，就可以将输入图像和渲染N张图像的材质分组信息投影到 3D Gaussians 上，得到：
$\mathcal{G}^0=\left(\mathbf{x}_p, \boldsymbol{\Sigma}_p, \alpha_p, \mathbf{c}_p, \theta p\right)_{p=1}^P$

2. Material Reasoning

使用 GPT-4 预测输入图像 $\mathbf{I}_0$ 每一部分的材质及其物理属性参数，由于二维图像中材质和三维 Gaussian 中一一对应，因此可以直接投影到三维空间中。

三. 4D Dynamics Generation

Phys4DGen 的动态生成可以使用任意计算机物理模拟算法实现，本文使用的是 MPM。将 Gaussians 视为 MPM 中连续体的离散化，即物质粒子，并赋予其时间和物理属性：
$\begin{gathered} \mathbf{x}^{t+1}, \mathbf{F}^{t+1}, \mathbf{v}^{t+1}=\operatorname{MPMSimulation}\left(\mathcal{G}^t\right) \\ \boldsymbol{\Sigma}_p^{t+1}=\left(\mathbf{F}_p^{t+1}\right) \boldsymbol{\Sigma}_p^t\left(\mathbf{F}_p^{t+1}\right)^T \end{gathered}$