论文学习《Face Recognition: From Traditional to Deep Learning Methods》

介绍

基于人工设计的特征和传统的机器学习技术的传统方法目前已经被使用大规模数据集训练的深度神经网络所取代。在这篇论文中,提出了全面的最新的文献综述包括传统的(基于几何的,整体的,基于特征的和混合的方法)和深度学习方法的流行人脸识别方法。

人脸识别问题面临的挑战

人脸识别面临诸多技术挑战,比如人头部姿态变化,跨年龄人面部变化,光照变化,表情变化,人脸被遮挡等。

人脸识别系统模块

在这里插入图片描述
1)人脸检测Face Detection

从图像中找到人脸并返回人脸包围框坐标。

2)人脸对齐Face Alignment

检测人脸特征点,并据此进行仿射变换,对人脸进行尺度和角度的归一化。最新的技术甚至在这一步将人脸正面化。

3)人脸表示Face Representation(人脸特征提取)

从人脸图像像素中计算提取人脸紧凑且具鉴别性的特征向量,或者称为模板(template)。

理想的特征是能够从同一个体的人脸不同图像中提取相似的特征向量。

4)人脸匹配Face Matching

将两幅图像的特征向量进行比较,得到相似分数,用于表示这两幅人脸图像属于同一个人的似然性。

人脸特征提取方法

1)基于几何的方法Geometry-based Methods

早期的人脸识别方法使用特定的边缘和轮廓检测找到人脸特征点,并据此计算特征点之间相互位置和距离,用来衡量两幅人脸图像的相似程度。
这些方法往往在极少个体(10-20个人)的人脸数据库中进行实验,但在早期使得计算机来识别人脸称为可能。

2)基于整体的方法Holistic Methods

这类方法对图像整体进行投影操作提取特征。
包括我们熟悉的主分量分析(PCA)、线性鉴别分析(LDA),通过寻找一组投影向量将人脸图像降维,再将低维特征送入类似SVM等机器学习分类器进行人脸分类。

局部保持投影(LPP)是这个方向另一个重要算法,实践证明LPP往往优于PCA、LDA。虽然PCA和LDA保留了图像空间的整体结构(分别最大化了方差和判别信息),但LPP旨在保留图像空间的局部结构。 这意味着LPP学习的映射将具有类似局部信息的图像映射到LPP子空间中的相邻点。
这个方向还有一项重要工作是图像稀疏表示(sparse representation),及由此衍生的稀疏表示分类器,以重建误差最小衡量分类结果。
以LFW数据集(Labeled Faces in the Wild)评估为衡量标准,这一类基于整体变换的方法中,取得最高精度的是joint Bayesian方法,达到92.4%的精度。

3)基于特征的方法Feature-based Methods

在人脸图像中的不同位置提取局部特征的方法,这种方法往往比基于整体的方法更具鲁棒性。
较早的基于特征的方法比如模块特征脸(modular eigenfaces),还有类似在图像块中提取HOG、LBP、SIFT、SURF特征(这些特征更具鉴别性),将各模块局部特征的向量串联,作为人脸表示。

4)混合方法Hybrid Methods

先使用基于特征的方法(比如LBP、SIFT)提取局部特征,再使用子空间方法(比如PCA、LDA)投影获取低维、鉴别特征,将基于整体和基于特征的方法相结合的方法。
这一类方法中有不少基于Gabor+子空间方法。
大量文献中LBP特征是这一类方法中重要的局部特征。
由于此处文献很多,基本涉及到使用不同的分块方法、使用不同的局部特征(Gabor、LBP、SIFT、LTP、LPQ等)、使用不同的子空间方法(PCA、LDA、MFDA、Laplacian PCA、Kernel PCA、kernel LDA等)的不同排列组合。
在这一类方法中,GaussianFace在LFW上获得了最好的精度98.52%,几乎匹敌很多后来出现的深度学习方法。

5)基于深度学习的方法Deep Learning Methods

深度学习尤其是深度卷积神经网络方法最大的优势是可以从数据集中学习特征,如果数据集能够覆盖人脸识别中常遇到的各种情况,则系统能够自动学习克服各种挑战的特征。在早期的神经网络研究中也有用于人脸识别的报道,但由于数据集不够大,网络不够深,效果也一般,没能吸引大家的注意力。在CNN改进人脸识别的文献中,DeepFace和DeepID是先驱,成功吸引众多学者研究该方向。DeepFace的出现将LFW上state-of-the-art人脸识别方法误差降低了27%!深度学习成功用于人脸识别三大要素:大规模数据集、先进的网络架构残差块的设计、有针对性的损失函数triplet loss、centre loss。

总结

几年前基于手工特征的传统方法已经被基于CNN的深度学习方法取代。
实际上,基于CNN的人脸识别系统已经成为标准,这是因为与其他类型的方法相比,其准确性有了显着提高。

Convolutional Neural Networks for Image Recognition Abstract: Convolutional Neural Networks (CNNs) have recently shown outstanding performance in many computer vision tasks, especially in image recognition. This paper presents an overview of CNNs, including their architecture, training, and applications. We first introduce the basic concepts of CNNs, including convolution, pooling, and nonlinearity, and then discuss the popular CNN models, such as LeNet-5, AlexNet, VGGNet, and ResNet. The training methods of CNNs, such as backpropagation, dropout, and data augmentation, are also discussed. Finally, we present several applications of CNNs in image recognition, including object detection, face recognition, and scene understanding. Introduction: Image recognition is a fundamental task in computer vision, which aims to classify the objects and scenes in images. Traditional methods for image recognition relied on handcrafted features, such as SIFT, HOG, and LBP, which were then fed into classifiers, such as SVM, boosting, and random forests, to make the final prediction. However, these methods suffered from several limitations, such as the need for manual feature engineering, sensitivity to image variations, and difficulty in handling large-scale datasets. Convolutional Neural Networks (CNNs) have emerged as a powerful technique for image recognition, which can automatically learn the features from raw images and make accurate predictions. CNNs are inspired by the structure and function of the visual cortex in animals, which consists of multiple layers of neurons that are sensitive to different visual features, such as edges, corners, and colors. The neurons in each layer receive inputs from the previous layer and apply a set of learned filters to extract the relevant features. The output of each layer is then fed into the next layer, forming a hierarchical representation of the input image. Architecture of CNNs: The basic building blocks of CNNs are convolutional layers, pooling layers, and fully connected layers. Convolutional layers apply a set of filters to the input image, which convolves the filters with the input to produce a set of activation maps. Each filter is responsible for detecting a specific feature, such as edges, corners, or blobs, at different locations in the input image. Pooling layers reduce the dimensionality of the activation maps by subsampling them, which makes the network more robust to spatial translations and reduces the computational cost. Fully connected layers connect all the neurons in one layer to all the neurons in the next layer, which enables the network to learn complex nonlinear relationships between the features. Training of CNNs: The training of CNNs involves minimizing a loss function, which measures the difference between the predicted labels and the true labels. The most common loss function for image classification is cross-entropy, which penalizes the predicted probabilities that deviate from the true probabilities. Backpropagation is used to compute the gradients of the loss function with respect to the network parameters, which are then updated using optimization algorithms, such as stochastic gradient descent (SGD), Adam, and RMSprop. Dropout is a regularization technique that randomly drops out some neurons during training, which prevents overfitting and improves generalization. Data augmentation is a technique that generates new training examples by applying random transformations to the original images, such as rotation, scaling, and flipping, which increases the diversity and quantity of the training data. Applications of CNNs: CNNs have been successfully applied to various image recognition tasks, including object detection, face recognition, and scene understanding. Object detection aims to localize and classify the objects in images, which is a more challenging task than image classification. The popular object detection frameworks based on CNNs include R-CNN, Fast R-CNN, and Faster R-CNN, which use region proposal methods to generate candidate object locations and then classify them using CNNs. Face recognition aims to identify the individuals in images, which is important for security and surveillance applications. The popular face recognition methods based on CNNs include DeepFace, FaceNet, and VGGFace, which learn the deep features of faces and then use metric learning to compare the similarity between them. Scene understanding aims to interpret the semantic meaning of images, which is important for autonomous driving and robotics applications. The popular scene understanding methods based on CNNs include SegNet, FCN, and DeepLab, which perform pixel-wise classification of the image regions based on the learned features. Conclusion: CNNs have revolutionized the field of computer vision and achieved state-of-the-art performance in many image recognition tasks. The success of CNNs can be attributed to their ability to learn the features from raw images and their hierarchical structure that captures the spatial and semantic relationships between the features. CNNs have also inspired many new research directions, such as deep learning, transfer learning, and adversarial learning, which are expected to further improve the performance and scalability of image recognition systems.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值