[人脸对齐] LUVLi Face Alignment: Estimating Landmarks’ Location, Uncertainty, and Visibility Likelihood

最新推荐文章于 2023-01-10 16:59:29 发布

john_bh

最新推荐文章于 2023-01-10 16:59:29 发布

阅读量2.1k

点赞数

分类专栏：人脸对齐算法考试文章标签：人脸对齐 LUVLi Face LUVLi 人脸关键点检测 facialLandmark

本文链接：https://blog.csdn.net/john_bh/article/details/105416543

版权

人脸对齐同时被 2 个专栏收录

16 篇文章 10 订阅

订阅专栏

算法考试

3 篇文章 0 订阅

订阅专栏

转载请注明作者和出处： http://blog.csdn.net/john_bh/

论文链接： LUVLi Face Alignment: Estimating Landmarks’ Location, Uncertainty, and Visibility Likelihood
Arxiv 链接： Arxiv
作者及团队：犹他大学 & 三菱 & 曼彻斯特大学
会议及时间：CVPR 2020

文章目录

1. 主要贡献

Modern face alignment methods have become quite accurate at predicting the locations of facial landmarks, but they do not typically estimate the uncertainty of their predicted locations nor predict whether landmarks are visible.In this paper, we present a novel framework for jointly predicting landmark locations, associated uncertainties of these predicted locations, and landmark visibilities. We model these as mixed random variables and estimate them using a deep network trained with our proposed Location,Uncertainty, and Visibility Likelihood (LUVLi) loss. In addition,we release an entirely new labeling of a large face alignment dataset with over 19,000 face images in a full range of head poses. Each face is manually labeled with the ground-truth locations of 68 landmarks, with the additional information of whether each landmark is unoccluded,self-occluded (due to extreme head poses), or externally occluded.Not only does our joint estimation yield accurate estimates of the uncertainty of predicted landmark locations,but it also yields state-of-the-art estimates for the landmark locations themselves on multiple standard face alignment datasets. Our method’s estimates of the uncertainty of predicted landmark locations could be used to automatically identify input images on which face alignment fails, which can be critical for downstream tasks.
现代人脸对齐方法在预测面部标志的位置方面已经非常准确，但通常不会估算其预测位置的不确定性，也无法预测标志是否可见。在本文中，我们提出了一种用于共同预测标志位置的新颖框架，这些预测位置的不确定性和地标性。我们将它们建模为混合随机变量，并使用经过我们建议的位置，不确定性和可见性可能性（LUVLi）损失训练的深度网络对它们进行估算。此外，我们发布了一个全新的大型面部对齐数据集标签，其中包含超过19,000个面部表情的完整头部姿势图像。手动标记每个面孔的68个地标的真实位置，并附加信息，包括每个地标是未被遮挡，自身被遮挡（由于极端的头部姿势）还是外部被遮挡。我们的联合估算不仅可以得出准确的估算值预测的地标位置的不确定性，但它也可以在多个标准人脸对齐数据集上生成地标位置本身的最新估算。我们的方法对预测的地标位置的不确定性的估计可用于自动识别面部对齐失败的输入图像，这对下游任务可能至关重要。
在这里插入图片描述

1.Introduction

Modern methods for face alignment (facial landmark localization) perform quite well most of the time, but all of them fail some percentage of the time. Unfortunately, almost all of the state-of-the-art (SOTA) methods simply output predicted landmark locations, with no assessment of whether (or how much) downstream tasks should trust these landmark locations. This is concerning, as face alignment is a key pre-processing step in numerous safety-critical ap-plications,including advanced driver assistance systems(ADAS), driver monitoring, and remote measurement of vital signs [57]. As deep neural networks are notorious for producing overconfident predictions [33], similar concerns have been raised for other neural network technologies [46],and they become even more acute in the era of adversarial machine learning where adversarial images may pose a great threat to a system [14]. However, previous work in face alignment (and landmark localization in general) has largely ignored the area of uncertainty estimation.
现代人脸对齐方法（面部界标定位）在大多数情况下效果都很好，但是所有这些方法都在一定程度上失败了。不幸的是，几乎所有最新技术（SOTA）都仅输出预测的地标位置，而没有评估下游任务是否（或多少）应该信任这些地标位置。这是令人担忧的，因为面部对齐是许多安全关键型应用程序中的关键预处理步骤，其中包括高级驾驶员辅助系统（ADAS），驾驶员监视和生命体征的远程测量[57]。由于深层神经网络因产生过度自信的预测而臭名昭著[33]，对其他神经网络技术也提出了类似的担忧[46]，并且在对抗性机器学习时代，对抗性图像可能会严重威胁其发展，它们变得更加尖锐。一个系统[14]。但是，以前的人脸对齐工作（通常是地标定位）在很大程度上忽略了不确定性估计的领域。

To address this need, we propose a method to jointly estimate facial landmark locations and a parametric probability distribution representing the uncertainty of each estimated location. Our model also jointly estimates the visibility of landmarks, which predicts whether each landmark is occluded due to extreme head pose.
为了满足这一需求，我们提出了一种方法来联合估计面部界标位置和代表每个估计位置不确定性的参数概率分布。我们的模型还共同估计地标的可见性，从而预测每个地标是否由于极端的头部姿势而被遮挡。

We find that the choice of methods for calculating mean and covariance is crucial. Landmark locations are best obtained using heatmaps, rather than by direct regression. To estimate landmark locations in a differentiable manner using heatmaps, we do not select the location of the maximum (argmax) of each landmark’s heatmap, but instead propose to use the spatial mean of the positive elements of each heatmap. Unlike landmark locations, uncertainty distribution parameters are best obtained by direct regression rather than from heatmaps. To estimate the uncertainty of the predicted locations, we add a Cholesky Estimator Network (CEN) branch to estimate the covariance matrix of a multivariate Gaussian or Laplacian probability distribution. To estimate visibility of each landmark, we add a Visibility Estimator Network (VEN). We combine these estimates using a joint loss function that we call the Location, Uncertainty and Visibility Likelihood (LUVLi) loss. Our primary goal in designing this model was to estimate uncertainty in landmark localization. In the process, not only does our method yields accurate uncertainty estimation, but it also produces SOTA landmark localization results on several face alignment datasets.
我们发现，选择用于计算均值和协方差的方法至关重要。最好使用热图而不是直接回归来获得地标位置。为了使用热图以可区分的方式估计地标位置，我们不选择每个地标热图的最大值（argmax）的位置，而是建议使用每个热图的正元素的空间均值。与地标位置不同，不确定性分布参数最好通过直接回归而不是从热图获得。为了估计预测位置的不确定性，我们添加了一个Cholesky估计器网络（CEN）分支来估计多元高斯或拉普拉斯概率分布的协方差矩阵。为了估算每个地标的可见性，我们添加了可见性估算器网络（VEN）。我们使用联合损失函数（称为位置，不确定性和可见性可能性（LUVLi）损失）将这些估计值合并在一起。设计此模型的主要目的是估计地标定位的不确定性。在此过程中，我们的方法不仅可以产生准确的不确定性估计，而且还可以在多个面部对齐数据集上产生SOTA界标定位结果。

Uncertainty can be broadly classified into two categories[41]: epistemic uncertainty is related to a lack of knowledge about the model that generated the observed data, and aleatoric uncertainty is related to the noise inherent in the observations, e.g., sensor or labelling noise. The ground-truth landmark locations marked on an image by human labelers would vary across multiple labelings of an image by different human labelers (or even by the same human labeler). Furthermore, this variation will itself vary across different images and landmarks (e.g., it will vary more for occluded landmarks and poorly lit images). The goal of our method is to estimate this aleatoric uncertainty.
不确定性大致可分为两类[41]：认知不确定性与对生成观测数据的模型缺乏了解有关，而不确定性与观测中固有的噪声（例如传感器或标签噪声）有关。由人类标记者在图像上标记的地面真相地标位置在图像的多个标记之间会因不同的人类标记者（甚至由同一人类标记者）而异。此外，这种变化本身会在不同的图像和地标之间发生变化（例如，对于被遮挡的地标和光线不足的图像，其变化会更大）。我们方法的目标是估计这种不确定性。

每个图像每个地标只有一个地面标记的位置这一事实使估算这种不确定性分布变得困难，但并非不可能。为此，我们使用参数模型进行不确定性分布。我们训练了一个神经网络来估计每个输入人脸图像的每个界标的模型参数，从而在该界标的地面真实位置模型下最大化似然性（汇总所有训练脸部的所有地标）。
The main contributions of this work are as follows: