计算机英语学习笔记&深度学习

最新推荐文章于 2024-08-19 21:58:17 发布

充满力量的人类

最新推荐文章于 2024-08-19 21:58:17 发布

阅读量130

点赞数

文章标签：深度学习学习笔记

本文链接：https://blog.csdn.net/qq_52750257/article/details/130668018

版权

学习资料来源于：Robust Face Recognition via Multimodal Deep Face Representation----Changxing Ding, Student Member, IEEE, Dacheng Tao, Fellow, IEEE-----2015
Words:
CNNs convolutional neural networks 卷积神经网络
SAE three-layer stacked auto-encoder 三层堆叠自动编码器
multimodal data 多模态数据
complementary 互补的
high-dimensional feature vector 高维特征向量
face recognition 人脸识别
Local Binary Patterns (LBP) 局部二值模式
Local Phase Quantization (LPQ) 局部相位量化
DualCross Patterns (DCP) 双交叉模式
Binarised Statistical Image Features (BSIF) 二值化图像统计
elaborately 精心地
optimize 使最优化，使可能有效
ReLU nonlinearity ReLU 非线性
multiple modalities 多模态
aggressive data augmentation 积极的数据增强
multi-stage training 多阶段训练
L2 normalization L2 归一化
gray-level image 灰度图像
RGB image RGB图像（加色模式）
gradient map 梯度图
convolutional layers 卷积层
max-pooling layers 最大池层
max-pooling layers 平均池层
fully-connected layers 全连通层
orthogonal projection 正交投影
Joint Bayesian (JB) model 联合贝叶斯模型
the nearest neighbor (NN) classifier 近邻分类器
supervised paradigm 监督范式

Sentence:（值得学习的部分都标示了加粗/斜体）
All the CNNs are trained using a subset of 9,000 subjects from the publicly available CASIA-WebFace database, which ensures the reproducibility of this work. Using the proposed single CNN architecture and limited training data, 98.43% verification rate is achieved on the LFW database. Benefited from the complementary information contained in multimodal data, our small ensemble system achieves higher than 99.0% recognition rate on LFW using publicly available training set.
所有CNNs都使用公开的CASIA-WebFace数据库中的9000名受试者子集进行培训，这确保了这项工作的可重复性。利用所提出的单一CNN体系结构和有限的训练数据，在LFW数据库上获得了98.43%的验证率。利用多模态数据中包含的互补信息，我们的小型集成系统在使用公开的训练集的LFW上获得了高于99.0%的识别率。

…，therefore faces in these images usually exhibit rich variations in pose, illumination, expression, and occlusion, as illustrated in Fig. 1.
因此，这些图像中的人脸通常在姿势、光照、表情和遮挡方面表现出丰富的变化，如图所示 1

Accurate face recognition depends on high quality face representations. Good face representation should be discriminative to the change of face identify while remains robust to intra-personal variations. Conventional face representations are built on local descriptors, e.g., Local Binary Patterns (LBP) [3], Local Phase Quantization (LPQ) [4], [5], DualCross Patterns (DCP) [6], and Binarised Statistical Image Features (BSIF) [7]. However, the representation composed by local descriptors is too shallow to differentiate the complex nonlinear facial appearance variations. To handle this problem, recent works turn to Convolutional Neural Networks (CNNs) [8], [9] to automatically learn effective features that are robust to the nonlinear appearance variation of face images. However, the existing works of CNN on face recognition extract features from limited modalities, the complementary information contained in more modalities is not well studied.
准确的人脸识别依赖于高质量的人脸表示。良好的人脸表征应该对人脸识别的变化具有识别性，同时对人体内的变化保持鲁棒性。传统的人脸表示建立在局部描述符上，例如，局部二值模式(LBP)[3]、局部相位量化(LPQ)[4]、[5]、双交叉模式(DCP)[6]和二值化统计图像特征(BSIF)[7]。然而，由局部描述子组成的表示方法过于肤浅，难以区分复杂的非线性人脸容貌变化。为了解决这一问题，最近的工作转向卷积神经网络(CNNs)[8]，[9]来自动学习有效的特征，这些特征对人脸图像的非线性外观变化具有鲁棒性。然而，现有的CNN人脸识别工作都是从有限的模态中提取特征，对更多模态中所包含的互补信息没有进行很好的研究。

Inspired by the complementary information contained in multi-modalities and the recent progress of deep learning on various fields of computer vision, we present a novel face representation framework that adopts an ensemble of CNNs to leverage the multimodal information. The performance of the proposed multimodal system is optimized from two perspectives. First, the architecture for single CNN is elaborately designed and optimized with extensive experimentations. Second, a set of CNNs is designed to extract complementary information from multiple modalities, i.e., the holistic face image, the rendered frontal face image by 3D model, and uniformly sampled face patches. Besides, we design different structures for different modalities, i.e., a complex structure is designed for the modality that contains the richest information while a simple structure is proposed for the modalities with less information. In this way, we strike a balance between recognition performance and efficiency. The capacity of each modality for face recognition is also compared and discussed.
受多模态所包含的互补信息的启发，结合计算机视觉各个领域深度学习的最新进展，我们提出了一种新的人脸表示框架，该框架采用CNNs集合来充分利用多模态信息。从两个角度对所提出的多模态系统的性能进行了优化。首先，通过大量的实验，对单个CNN的体系结构进行了精心设计和优化。其次，设计了一套CNNS算法，用于从整体人脸图像、3D模型渲染的正面人脸图像和均匀采样的人脸贴片等多种模式中提取互补信息。此外，我们还针对不同的模态设计了不同的结构，即对信息最丰富的模态设计了复杂的结构，对信息较少的模态设计了简单的结构。这样，我们在识别性能和效率之间取得了平衡。并对各种模式的人脸识别能力进行了比较和讨论。

As shown in Fig. 2, MM-DFR is essentially composed of two steps: multimodal feature extraction using a set of CNNs, and feature-level fusion of the set of CNN features using SAE.
如图所示 2、MM-DFR算法主要由两个步骤组成：利用一组CNNs进行多模态特征提取；利用SAE对这组CNN特征进行特征级融合。

Different from previous works that randomly sample a large number of image patches…
不同于以往随机抽取大量图像块的工作…

In this section, the face matching problem is addressed based on the proposed MM-DFR framework. Two evaluation modes are adopted: the unsupervised mode and the supervised mode. Suppose two features produced by MM-DFR for two images are denoted as y1 and y2, respectively. In the unsupervised mode, the cosine distance is employed to measure the similarity s between y1 and y2.
在这一部分中，基于所提出的MM-DFR框架来解决人脸匹配问题。采用两种评价模式：无监督模式和有监督模式。假设MM-DFR为两幅图像产生的两个特征分别表示为Y1和Y2。在无监督模式下，利用余弦距离度量Y1和Y2之间的相似性S。

Five sets of experiments are conducted. First, we empirically justify the advantage of dense features for face recognition by excluding two ReLU nonlinearities compared with previous works. The performance of the proposed single CNN model is also compared against the state-of-the-art CNN models on the LFW database. Next, the performance of the eight CNNs contained within the MM-DFR framework is compared on face verification task on LFW. Then, the fusion of the eight CNNs by SAE is conducted and different nonlinearities are also compared. We also test the performance of MM-DFR followed with the supervised classifier JB. Lastly, face identification experiment is conducted on the CASIA-WebFace database with our own defined evaluation protocol.
进行了五组实验。首先，我们通过排除两个RELU非线性，从经验上证明了密集特征在人脸识别中的优势。本文还将所提出的单个CNN模型与LFW数据库上的最先进的CNN模型进行了性能比较。然后，比较了MM-DFR框架中包含的8个CNNs在LFW人脸验证任务中的性能。然后，利用SAE对这8种CNN进行了融合，并对不同的非线性进行了比较。我们还用监督分类器JB测试了MM-DFR的性能。最后，在CASIA-WebFace数据库上，利用自定义的评价协议进行了人脸识别实验。

In this experiment, we evaluate the role of ReLU nonlinearity using CNN-H1 as an example. For fast evaluation, the comparison is conducted with the simple NN1 structure described in Table I and only the softmax loss is employed for model training. Performance of CNN-H1 using the NN2 structure can be found in Table IV. Two paradigms2 are followed: 1) the unsupervised paradigm that directly calculate the similarity between two CNN features using cosine distance metric. 2) the supervised paradigm that uses JB to calculate the similarity between two CNN features. For the supervised paradigm, we concatenate the CNN features of the original face image and its horizontally flipped version as the raw representation of each test sample. Then, we adopt PCA for dimension reduction and JB for similarity calculation. The dimension of the PCA subspace is tuned on the View 1 data of LFW and applied to the View 2 data. Both PCA and JB are trained on the CASIA-WebFace database. For PCA, to boost performance, we also re-evaluate the mean of CNN features using the 9 training folds of LFW in 10-fold cross validation.
在本实验中，我们以CNN-H1为例来评估RELU非线性的作用。为了快速评估，与表I中描述的简单NN1结构进行了比较，并且仅使用Softmax损失进行模型训练。使用NN2结构的CNN-H1的性能可以在表IV中找到。两个范例2是：1）无监督范例，直接计算两个CNN特征之间的相似度利用余弦距离度量。 2）使用JB计算两个CNN特征之间相似度的监督范式。对于监督范式，我们将原始人脸图像及其水平翻转版本的CNN特征串联起来作为每个测试样本的原始表示。然后，我们采用PCA进行降维，JB进行相似度计算。在LFW的视图1数据上调整PCA子空间的维数，并将其应用于视图2数据。 PCA和JB都在CASIA-WebFace数据库上接受培训。对于PCA，为了提高性能，我们还在10倍交叉验证中使用LFW的9个训练折叠来重新评估CNN特征的均值。

The above three experiments have justified the advantage of the proposed CNN structures. In this experiment, we further promote the performance of the proposed framework. We show the performance of MM-DFR with JB, where the output of MM-DFR is utilized as the signature of the face image. We term this face recognition pipeline as MM-DFR-JB. For comparison, the performance achieved by CNN-H1 with the JB classifier is also presented, denoted as “CNN-H1 + JB”. The performance of the two systems is tabulated in Table V and the ROC curves are illustrated in Fig. 9. It is shown that MM-DFR considerably outperforms the single modalbased approach, which indicates the fusion of multimodal information is important to promote the performance of face recognition systems. By excluding the five labeling errors in LFW, the actual performance of MM-DFR-JB reaches 99.10%. Our simple 8-net based ensemble system also outperforms DeepID2 [9], which includes as much as 25 CNNs. Some more recent approaches that were published after the submission of this paper, e.g. [38], [31], achieve better performance than MM-DFR. However, they either employ significantly larger private training dataset or considerably larger number of CNN models. In comparison, we employ only 8 nets and train the models using a relatively small training set.
以上三个实验证明了所提出的CNN结构的优越性。在这个实验中，我们进一步提升了所提出框架的性能。利用MM-DFR的输出作为人脸图像的特征，并用JB对MM-DFR的性能进行了验证。我们把这个人脸识别管道称为MM-DFR-JB。为了比较，本文还给出了CNN-H1与JB分类器的性能，表示为“CNN-H1+JB”。两种系统的性能如表V所示，并在图中说明了ROC曲线 9. 结果表明，MM-DFR算法的性能明显优于单一模态识别算法，说明多模态信息的融合对提高人脸识别系统的性能具有重要意义。剔除LFW中的5个标记错误，MM-DFR-JB的实际性能达到99.10%。我们简单的基于8网的集成系统性能也优于Deepid2[9]，后者包含多达25个CNN。本文提交后发表的一些新方法，如[38]、[31]，取得了比MM-DFR更好的性能。然而，它们要么使用更大的私人训练数据集，要么使用更多的CNN模型。相比之下，我们只使用8个网络，并使用相对较小的训练集来训练模型。