人脸捕捉
目前,人脸识别算是一个比较火的人工智能话题。实际上,对于所有的人脸相关问题,无论是用何种解决方法,目前比较主流的预处理我们都采取这样一种方法:人脸区域提取->面部点对齐->仿射运算。人脸是图像特征非常明显的,在做任何网络训练前,我们需要尽可能将眼睛、鼻子、嘴巴这种特征放到图片相同的相对位置,从而能够更好的训练模型。
那么谈到人脸区域提取,其实有很多方法,对不同的环境、人种、速度要求甚至是否佩戴口罩我们都有很多不同的网络适用,其中mtcnn算是一种比较常见的网络,目前的实用性很高,关于网络的介绍MTCNN算法及代码笔记,这里我们使用它最关键的原因是它不仅解决了人脸区域提取的问题,还完成了人脸点对其的问题,人脸点对其问题的解决,使得我们对图片中人脸特征位置的对齐甚至于人脸姿态回归问题都有很大的帮助。当然,mtcnn的人脸点对齐,只有5个点,如果我们想做更多点的对齐,还是需要别的网络和方法、数据集来实现。
人脸点对齐
部点对齐的方法也有很多,其中68点最为常见,当然也有商汤的98点,旷视的106点,点越多的话对人脸的描述就更加细致。也可以分为2D的人脸点对齐和3D,2D更擅长于头部姿态、仿射计算,3D更擅长于人脸重构、形状回归等等。
点对齐实际上有很多的问题是需要解决的,比方说左右对称:数据集大部分都是左右对称的时候很容易把训练出来左右堆成的模型,但是当我们一边动的时候(比方说单睁单闭),就很难将左右区分,这样实际上是不准确的,还有侧脸不准确问题、张大嘴问题。
选择好的网络来训练面部点对齐是很重要的,我推荐一个68点3D人脸点对齐的方法
face-alignment,
作者还分享了一个23万3D人脸数据集,这对于追求高质量点对齐的同学来说真的是巨大的福利How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)。
仿射运算
下面就谈到仿射计算,简单的说就是将一个倾斜的人脸转到成正的,我简单放个视频给你们看看(无论我本人的脸怎么转,左边人脸眼睛嘴巴鼻子的位置基本上还在那里):
最后分享一下五点人脸点对齐的调用函数python和c++源码:
python
coord5point2 = [[(30.2946+8.0000) * 2, (51.6963-8.0000) * 2], # 224x224的目标点
[(65.5318+8.0000) * 2, (51.6963-8.0000) * 2],
[(48.0252+8.0000) * 2, (71.7366-8.0000) * 2],
[(33.5493+8.0000) * 2, (92.3655-8.0000) * 2],
[(62.7299+8.0000) * 2, (92.3655-8.0000) * 2]]
def transformation_from_points(points1, points2):
points1 = points1.astype(np.float64)
points2 = points2.astype(np.float64)
c1 = np.mean(points1, axis=0)
c2 = np.mean(points2, axis=0)
points1 -= c1
points2 -= c2
s1 = np.std(points1)
s2 = np.std(points2)
points1 /= s1
points2 /= s2
U, S, Vt = np.linalg.svd(points1.T * points2)
R = (U * Vt).T
return np.vstack([np.hstack(((s2 / s1) * R,c2.T - (s2 / s1) * R * c1.T)),np.matrix([0., 0., 1.])])
def warp_im(img_im, orgi_landmarks,tar_landmarks):
pts1 = np.float64(np.matrix([[point[0], point[1]] for point in orgi_landmarks]))
pts2 = np.float64(np.matrix([[point[0], point[1]] for point in tar_landmarks]))
M = transformation_from_points(pts1, pts2)
dst = cv2.warpAffine(img_im, M[:2], (imgSize2[1], imgSize2[0]))
return dst
def find_five(image, boxes, landmark):
shape = image.shape
height = shape[0]
width = shape[1]
x1, y1, x2, y2 = int(boxes[0]),int(boxes[1]),int(boxes[2]),int(boxes[3])
# 外扩大100%,防止对齐后人脸出现黑边
new_x1 = max(int(1.50 * x1 - 0.50 * x2),0)
new_x2 = min(int(1.50 * x2 - 0.50 * x1),width-1)
new_y1 = max(int(1.50 * y1 - 0.50 * y2),0)
new_y2 = min(int(1.50 * y2 - 0.50 * y1),height-1)
# 得到原始图中关键点坐标
left_eye_x = np.mean(landmark[36 : 42, 0])
right_eye_x = np.mean(landmark[42 : 48, 0])
nose_x = landmark[30, 0]
left_mouth_x = landmark[48, 0]
right_mouth_x = landmark[54, 0]
left_eye_y = np.mean(landmark[36 : 42, 1])
right_eye_y = np.mean(landmark[42 : 48, 1])
nose_y = landmark[30, 1]
left_mouth_y = landmark[48, 1]
right_mouth_y = landmark[54, 1]
# 得到外扩100%后图中关键点坐标
new_left_eye_x = left_eye_x - new_x1
new_right_eye_x = right_eye_x - new_x1
new_nose_x = nose_x - new_x1
new_left_mouth_x = left_mouth_x - new_x1
new_right_mouth_x = right_mouth_x - new_x1
new_left_eye_y = left_eye_y - new_y1
new_right_eye_y = right_eye_y - new_y1
new_nose_y = nose_y - new_y1
new_left_mouth_y = left_mouth_y - new_y1
new_right_mouth_y = right_mouth_y - new_y1
face_landmarks = [[new_left_eye_x,new_left_eye_y], # 在扩大100%人脸图中关键点坐标
[new_right_eye_x,new_right_eye_y],
[new_nose_x,new_nose_y],
[new_left_mouth_x,new_left_mouth_y],
[new_right_mouth_x,new_right_mouth_y]]
face = image[new_y1: new_y2, new_x1: new_x2] # 扩大100%的人脸区域
dst2 = warp_im(face,face_landmarks,coord5point2) # 224x224对齐后尺寸
# crop_im1 = dst1[0:imgSize1[0],0:imgSize1[1]]
crop_im2 = dst2[0:imgSize2[0],0:imgSize2[1]]
return crop_im2
cpp
#ifndef FACE_DEMO_FACEPREPROCESS_H
#define FACE_DEMO_FACEPREPROCESS_H
#include<opencv2/opencv.hpp>
namespace FacePreprocess {
cv::Mat meanAxis0(const cv::Mat& src)
{
int num = src.rows;
int dim = src.cols;
// x1 y1
// x2 y2
cv::Mat output(1, dim, CV_32F);
for (int i = 0; i < dim; i++)
{
float sum = 0;
for (int j = 0; j < num; j++)
{
sum += src.at<float>(j, i);
}
output.at<float>(0, i) = sum / num;
}
return output;
}
cv::Mat elementwiseMinus(const cv::Mat& A, const cv::Mat& B)
{
cv::Mat output(A.rows, A.cols, A.type());
assert(B.cols == A.cols);
if (B.cols == A.cols)
{
for (int i = 0; i < A.rows; i++)
{
for (int j = 0; j < B.cols; j++)
{
output.at<float>(i, j) = A.at<float>(i, j) - B.at<float>(0, j);
}
}
}
return output;
}
cv::Mat varAxis0(const cv::Mat& src)
{
cv::Mat temp_ = elementwiseMinus(src, meanAxis0(src));
cv::multiply(temp_, temp_, temp_);
return meanAxis0(temp_);
}
int MatrixRank(cv::Mat M)
{
cv::Mat w, u, vt;
cv::SVD::compute(M, w, u, vt);
cv::Mat1b nonZeroSingularValues = w > 0.0001;
int rank = countNonZero(nonZeroSingularValues);
return rank;
}
cv::Mat similarTransform(cv::Mat src, cv::Mat dst) {
int num = src.rows;
int dim = src.cols;
cv::Mat src_mean = meanAxis0(src);
cv::Mat dst_mean = meanAxis0(dst);
cv::Mat src_demean = elementwiseMinus(src, src_mean);
cv::Mat dst_demean = elementwiseMinus(dst, dst_mean);
cv::Mat A = (dst_demean.t() * src_demean) / static_cast<float>(num);
cv::Mat d(dim, 1, CV_32F);
d.setTo(1.0f);
if (cv::determinant(A) < 0) {
d.at<float>(dim - 1, 0) = -1;
}
cv::Mat T = cv::Mat::eye(dim + 1, dim + 1, CV_32F);
cv::Mat U, S, V;
cv::SVD::compute(A, S, U, V);
// the SVD function in opencv differ from scipy .
int rank = MatrixRank(A);
if (rank == 0) {
assert(rank == 0);
}
else if (rank == dim - 1) {
if (cv::determinant(U) * cv::determinant(V) > 0) {
T.rowRange(0, dim).colRange(0, dim) = U * V;
}
else {
// s = d[dim - 1]
// d[dim - 1] = -1
// T[:dim, :dim] = np.dot(U, np.dot(np.diag(d), V))
// d[dim - 1] = s
int s = d.at<float>(dim - 1, 0) = -1;
d.at<float>(dim - 1, 0) = -1;
T.rowRange(0, dim).colRange(0, dim) = U * V;
cv::Mat diag_ = cv::Mat::diag(d);
cv::Mat twp = diag_ * V; //np.dot(np.diag(d), V.T)
cv::Mat B = cv::Mat::zeros(3, 3, CV_8UC1);
cv::Mat C = B.diag(0);
T.rowRange(0, dim).colRange(0, dim) = U * twp;
d.at<float>(dim - 1, 0) = s;
}
}
else {
cv::Mat diag_ = cv::Mat::diag(d);
cv::Mat twp = diag_ * V.t(); //np.dot(np.diag(d), V.T)
cv::Mat res = U * twp; // U
T.rowRange(0, dim).colRange(0, dim) = -U.t() * twp;
}
cv::Mat var_ = varAxis0(src_demean);
float val = cv::sum(var_).val[0];
cv::Mat res;
cv::multiply(d, S, res);
float scale = 1.0 / val * cv::sum(res).val[0];
T.rowRange(0, dim).colRange(0, dim) = -T.rowRange(0, dim).colRange(0, dim).t();
cv::Mat temp1 = T.rowRange(0, dim).colRange(0, dim); // T[:dim, :dim]
cv::Mat temp2 = src_mean.t(); //src_mean.T
cv::Mat temp3 = temp1 * temp2; // np.dot(T[:dim, :dim], src_mean.T)
cv::Mat temp4 = scale * temp3;
T.rowRange(0, dim).colRange(dim, dim + 1) = -(temp4 - dst_mean.t());
T.rowRange(0, dim).colRange(0, dim) *= scale;
return T;
}
}
#endif //FACE_DEMO_FACEPREPROCESS_H