CVPR2019 (一)

1. Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

paper

code

            

Abstract:To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face. Learning to regress the parameters of SMPL-X directly from images is challenging without paired images and 3D ground truth. Consequently, we follow the approach of SMPLify, which estimates 2D features and then optimizes model parameters to fit the features. We improve on SMPLify in several significant ways: (1) we detect 2D features corresponding to the face, hands, and feet and fit the full SMPL-X model to these; (2) we train a new neural network pose prior using a large MoCap dataset; (3) we define a new interpenetration penalty that is both fast and accurate; (4) we automatically detect gender and the appropriate body models (male, female, or neutral); (5) our PyTorch implementation achieves a speedup of more than 8×over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild. We evaluate 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth. This is a step towards automatic expressive human capture from monocular RGB data. The models, code, and data are available for research purposes at https://smpl-x.is.tue.mpg.de.

为了便于分析人的行为、互动和情绪,我们从单张图像中计算出人体姿势、手姿势和面部表情的三维模型。为了实现这一点,我们使用了数千次3D扫描来训练一个新的、统一的人体3D模型SMPL-X,它用完全关节化的手和表情丰富的脸来扩展SMPL。在没有配对图像和三维ground truth 的条件下学习直接从图像中回归SMPL-X的参数是一个挑战。因此,我们采用SMPLify方法,对二维特征进行估计,然后优化模型参数以确定特征。我们在SMPLify上有几个显著的改进:(1)我们检测与脸、手和脚相对应的二维特征,并确定这些特征的完整smpl-x模型;(2)我们使用大型mocap数据集训练一个新的神经网络姿势先验;(3)我们定义了一个新的快速且准确的穿透惩罚;(4)我们自动检测性别和适当的身体模型(男性、女性或中性);(5)我们的PyTorch实现了Chumpy方法8倍的速度。我们使用新的方法smplify-x,使smpl-x适用于为受控图像和自然环境中的图像(in the wild)。我们评估了一个新的整理过后包括100张具有伪ground-truth的图像数据集的三维精度。这是实现从单目RGB数据自动表达人物造型捕获的一个进步。这些模型、代码和数据可在https://smpl-x.is.tue.mpg.de上获得,以供研究之用。

2. Combining 3D Morphable Models: A Large scale Face-and-Head Model

paper

lsfm-code

Abstract:Three-dimensional Morphable Models (3DMMs) are powerful statistical tools for representing the 3D surfaces of an object class. In this context, we identify an interesting question that has previously not received research attention: is it possible to combine two or more 3DMMs that (a) are built using different templates that perhaps only partly overlap, (b) have different representation capabilities and (c) are built from different datasets that may not be publiclyavailable? In answering this question, we make two contributions. First, we propose two methods for solving this problem: i. use a regressor to complete missing parts of one model using the other, ii. use the Gaussian Process framework to blend covariance matrices from multiple models. Second, as an example application of our approach, we build a new face-and-head shape model that combines the variability and facial detail of the LSFM with the full head modelling of the LYHM. The resulting combined shape model achieves state-of-the-art performance and outperforms existing head models by a large margin. Finally, as an application experiment, we reconstruct full head representations from single, unconstrained images by utilizing our proposed large-scale model in conjunction with the FaceWarehouse blendshapes for handling expressions.

三维变形模型(3DMMs)是表示对象类的三维曲面的强大统计工具。在此背景下,我们发现了一个以前未受到研究关注的有趣问题:是否可以将两个或多个3DMM组合起来:(a)使用可能只是部分重叠的不同模板构建;(b)具有不同的表示能力;(c)基于可能无法公开的不同数据集?在回答这个问题时,我们做出了两项贡献。首先,我们提出了两种方法来解决这个问题:一,使用回归器和另一个模型来完成一个模型的缺失部分;二,使用高斯过程框架混合来自多个模型的协方差矩阵。其次,作为我们方法的一个应用实例,我们构建了一个新的面部和头部形状模型,将LSFM的可变性和面部细节与Lyhm的全头部模型相结合。由此产生的组合形状模型达到了最先进的性能,并大大优于现有的头部模型。最后,作为一个应用实验,我们利用我们提出的大规模模型,结合FaceWarehouse的混合变形来处理表达式,从单一的、不受约束的图像重建完整的头部表示

主要贡献:1)一种方法,旨在融合基于形状的3DMMs,以人脸和头部为例。特别地,我们提出了一种基于潜在形状参数的回归方法,以及一种在高斯过程框架中使用的协方差组合方法。2)一种大规模人类头部组合比例统计模型,(人类头部包含了种族、年龄和性别方面)。该模型比任何其他现有的头部变形模型都要精确得多——我们将此模型公之于众,以造福研究界,包括有眼和无牙的版本。3)一个应用实验,我们利用组合3dmm从无约束的单个图像进行全头部重建,同时利用FaceWarehouse混合变形处理面部表情。

整合现有的3DMM来增强自己的表达能力。

3. Self-supervised 3D hand pose estimation through training by fitting

paper

code

Abstract:We present a self-supervision method for 3D hand pose estimation from depth maps. We begin with a neural network initialized with synthesized data and fine-tune it on real but unlabelled depth maps by minimizing a set of datafitting terms. By approximating the hand surface with a set ofspheres,we design a differentiable hand renderer to align estimates by comparing the rendered and input depth maps. In addition,we place a set of priors including a data-driven term to further regulate the estimate’s kinematic feasibility. Our method makes highly accurate estimates comparable to current supervised methods which require large amounts of labelled training samples, thereby advancing state-of-theart in unsupervised learning for hand pose estimation.

提出了一种基于深度图的三维手部姿态估计的自监督方法。我们从一个用合成数据初始化的神经网络开始,通过最小化一组数据定义项,在真实但未标记的深度图上对其进行微调。通过将手部表面近似为一组球体,我们设计了一个可区分的手部渲染器,通过比较渲染和输入深度映射来对齐估计值。此外,我们还放置了一组先验项,包括数据驱动项,以进一步调节估计的运动可行性。我们的方法使高精度的估计可与当前的监督方法相比,后者需要大量的标记训练样本,从而提高了无监督学习中的手位估计状态。

4. Monocular Total Capture:Posing Face, Body and Hands in the Wild

paper

code

Abstract:We present the first method to capture the 3D total motion of a target person from a monocular view input. Given an image or a monocular video, our method reconstructs the motion from body, face, and fingers represented by a 3D deformable mesh model. We use an efficient representation called 3D Part Orientation Fields (POFs), to encode the 3D orientations of all body parts in the common 2D image space. POFs are predicted by a Fully Convolutional Network, along with the joint confidence maps. To train our network, we collect a new 3D human motion dataset capturing diverse total body motion of 40 subjects in a multiview system. We leverage a 3D deformable human model to reconstruct total body pose from the CNN outputs with the aid of the pose and shape prior in the model. We also present a texture-based tracking method to obtain temporally coherent motion capture output. We perform thorough quantitative evaluations including comparison with the existing body-specific and hand-specific methods, and performance analysis on camera viewpoint and human pose changes. Finally, we demonstrate the results of our total body motion capture on various challenging in-the-wild videos.

我们提出了第一种方法,从单眼视图输入中捕捉目标人物的三维总运动。对于一个图像或单目视频,我们的方法从三维可变形网格模型表示的身体、面部和手指重建运动。我们使用

  • 5
    点赞
  • 40
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值