Neural Body

最新推荐文章于 2024-05-09 19:10:29 发布

Daos2020

最新推荐文章于 2024-05-09 19:10:29 发布

阅读量1.9k

点赞数 7

文章标签：计算机视觉

本文链接：https://blog.csdn.net/weixin_44086996/article/details/121242884

版权

Neural Body实现及源码解读

环境配置

cuda8.0

数据准备

源码下载：git clone git://github.com/zju3dv/neuralbody.git
进入到neuralbody，并在neuralbody下创建data文件夹mkdir data，data文件夹下放置数据以及预训练模型：在这里插入图片描述
复现此项目的单目实验

数据集下载：
people_snapshot_public单目数据集下载链接：https://graphics.tu-bs.de/people-snapshot
将数据集解压后软连接到neuralbody的data文件夹下：

unzip people_snapshot_public
ln -s /home/xds/project/dataset/people_snapshot_public

注：/home/xds/project/dataset为数据集路径，根据个人情况修改，

预训练模型下载：https://zjueducn-my.sharepoint.com/personal/pengsida_zju_edu_cn/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fpengsida%5Fzju%5Fedu%5Fcn%2FDocuments%2Fneural%5Fbody%2Ftrained%5Fmodel
将预训练模型放到data/trained_model/ 下

单目数据集处理
单目的数据集是论文：Video Based Reconstruction of 3D People Models的数据集。

单目数据集内容：
在这里插入图片描述
其中keypoints.hdf5是人体姿势关键点，reconstructed_poses.hdf5是smpl参数，我们需要将Video Based Reconstruction of 3D People Models数据集的数据转为Neural Body的输入，作者提供了处理的脚本tools/process_snapshot.py.处理完的结果如下图：
在这里插入图片描述
注：需要将smpl模型放到tool目录下
下载链接：https://smpl.is.tue.mpg.de/

论文

论文链接：https://arxiv.org/pdf/2012.15838.pdf

论文整体网络图：
在这里插入图片描述
分为两个部分，Code diffusidon部分我们可以理解为使空间中的每个点都可以表示为特征向量，Density and color regression部分可以理解为通过MLP对特征向量分配颜色与密度。

以下是我理解的论文关键部分

Specifically, we anchor a set of latent codes to the vertices of a deformable human model (SMPL [38] in this work), namely that their spatial locations vary with the human pose. To obtain the 3D representation at a frame, we first transform the code locations based on the human pose,which can be reliably estimated from sparse camera views.Then, a network is designed to regress the density and color for any 3D point based on these latent codes.Both the latent codes and the network are jointly learned from images of all video frames during the reconstruction process. This model is inspired by the latent variable model [36] in statistics, which enables us to effectively integrate observations at different frames.

SMPL视频在这里插入图片描述
作者的这段说明了该论文的方法路线，作者使用标准的人体蒙皮模板作为输入，这个人体蒙皮模板由6890个顶点组成，至于这个人体模板在每一帧时，空间中的位置（姿势）是根据每一帧图片提取出的姿势特征来确定。对于重建一段视频里的人物，每一帧都是使用同一组人体模板顶点，因此能够有效地整合不同帧下的观测数据（This model is inspired by the latent variable model [36] in statistics, which enables us to effectively integrate observations at different frames）。到这一步我们转下一思维，与其无中生有去建出一个人，不如对这个人体模板去分配合适的颜色及密度。接下来的任务就是用MPL将这6890个点作为输入，训练出每个点的颜色密度。但其实没这么简单，因为6890个顶点太少，直接训练这六千多个点可能效果不好，因此才有后面的稀疏卷积。

We denote the video as {Itc|c = 1, …, Nc, t = 1, …, Nt}, where c is the camera index, Nc is the number of cameras, t is the frame index, and Nt is the number of frames. The cameras are pre-calibrated. For each image, we apply [19] to obtain the foreground human mask and set the values of the background image pixels as zero.
The overview of the proposed model is illustrated in Figure 3. Neural Body starts from a set of structured latent codes attached to the surface of a deformable human model (Section 3.1). The latent code at any location around the surface can be obtained with a code diffusion process (Section 3.2) and then decoded to density and color values by neural networks (Section 3.3). The image from any viewpoint can be generated by volume rendering (Section 3.4).The structured latent codes and neural networks are jointly learned by minimizing the difference between the rendered images and input images (Section 3.5).
作者使用多个标定好的摄像头（内参以及外参），其中部分用于训练，部分用于测试。对于每张图片，作者使用
CIHP_PGN进行分割，分割的效果如图：

但实际的输入，肉眼是看不出的

这里有个不确定的地方，分割的目的是否是为了减小体渲染部分的计算量。

3.2
对于3.1部分 Structured latent codes在上面已经解释过了，重点理解3.2 Code diffusion 我们说过对于重建人体这个任务而言6890个点是很稀疏的，因此我们需要构造更多的点，对构造的点去分配颜色以及密度。当然，新构造的点依然是根据人体模板的6890个点去构造的。

Specifically, based on the SMPL parameters, we compute the 3D bounding box of the human and divide the box into small voxels with voxel size of5mm×5mm×5mm. The latent code of a non-empty voxel is the mean of latent codes of SMPL vertices inside this voxel.

For any pointxin 3D space, we query its latent code from the latent code volume. Specifically, the pointxis first transformed to the SMPL coordinate system, which aligns the point and the latent code volume in 3D space. Then, the latent code is computed using the trilinear interpolation.For the SMPL parametersSt, we denote the latent code at pointxasψ(x,Z, St). The code vector is passed into MLP networks to predict the density and color for pointx.

在这里插入图片描述
为了方便理解画了个简图

作者的做法是通过稀疏卷积网络去解决输入稀疏的问题,首先是将6890个顶点进行分割（Smpl Latent code），上图黑色正方形是分割好的一个5毫米的小体束，其中橙色点是Smpl Latent code 通过求体束内Smpl Latent code的均值得到每个volume的Latent code（上图蓝点），此时，稀疏卷积网络的输入就是volume Latent code（蓝色点），作者通过卷积的方法使得蓝色点的感受范围能辐射到周围，而空间中任一点x（上图红点）的Latent code（上图蓝条）利用volume Latent code的三线性插值表示。

TODO 这里的三线性插值是使用什么规则（哪些volume Latent code）插值的

3.3
有了3.2的工作之后，接下来的工作就是为这些点分配颜色以及密度。方法可以参考https://arxiv.org/pdf/2003.08934.pdf NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
在这里插入图片描述
我们可以对比左右两边NeRF方法的输入是沿着相机光线的方向构造点（构造 Latent code），而这一步3.2已经做了，因此只需将这些点放入MLP中分配颜色以及密度。