在本教程中,我们将讨论深度学习应用于人脸的一个有趣应用。我们将估计年龄并从单个图像中找出人的性别。我们将简要讨论本文的主要思想,并提供有关如何在 OpenCV 中使用该模型的分步说明。我们将使用 OpenCV 学习性别和年龄分类。
1. 使用 CNN 进行性别和年龄分类
作者使用了一个非常简单的卷积神经网络架构,类似于CaffeNet和AlexNet。该网络使用3个卷积层,2个完全连接层和一个最终输出层。下面给出了这些层的详细信息。
- Conv1:第一个卷积层有96 个,核大小为 7的卷积核 。
- Conv2:第二个卷积层有256 个,核大小为 5的卷积核 。
- Conv3:第三个卷积层有 384 个,核大小为 3的卷积核 。
- 两个全连接层各有 512 个节点。
他们使用 Adience 数据集来训练模型。
1.1性别预测
们将性别预测定义为一个分类问题。性别预测网络中的输出层是 softmax 类型,有 2 个节点,表示“男”和“女”两类。
1.2年龄预测
理想情况下,年龄预测应该作为回归问题来处理,因为我们期望输出一个实数。然而,使用回归准确估计年龄具有挑战性。即使是人类也无法通过观察一个人来准确预测年龄。但是,我们知道他们是 20 多岁还是 30 多岁。由于这个原因,明智的做法是将这个问题定义为一个分类问题,我们尝试估计这个人所处的年龄组。例如,0-2 岁范围内的年龄是一个类别,4-6 岁是另一个类别类等。
Adience 数据集有 8 个类别,分为以下年龄组 [(0 – 2), (4 – 6), (8 – 12), (15 – 20), (25 – 32), (38 – 43), ( 48 – 53), (60 – 100)]。因此,年龄预测网络在最后的 softmax 层中有 8 个节点,表示提到的年龄范围。
应该记住,从单一图像预测年龄不是一个很容易解决的问题,因为感知的年龄取决于很多因素,相同年龄的人在世界各地可能看起来非常不同。而且,人们非常努力地隐藏他们的真实年龄!
2. 代码教程
代码可以分为四部分:
- 1.检测人脸
- 2.检测性别
- 3.检测年龄
- 4.显示输出
2.1 代码展示
让我们看看在 OpenCV 中使用 DNN 模块进行性别和年龄预测的代码。
链接:https://pan.baidu.com/s/1eVd4kEczt4diGApc6BbQFQ
提取码:123a
(1)Python
# Usage
# python AgeGender.py --input sample1.jpg
# 导入所需模块
import cv2 as cv
import math
import time
import argparse
def getFaceBox(net, frame, conf_threshold=0.15):
frameOpencvDnn = frame.copy()
frameHeight = frameOpencvDnn.shape[0]
frameWidth = frameOpencvDnn.shape[1]
blob = cv.dnn.blobFromImage(frameOpencvDnn, 1.0, (300, 300), [104, 117, 123], True, False)
net.setInput(blob)
detections = net.forward()
bboxes = []
for i in range(detections.shape[2]):
confidence = detections[0, 0, i, 2]
if confidence > conf_threshold:
x1 = int(detections[0, 0, i, 3] * frameWidth)
y1 = int(detections[0, 0, i, 4] * frameHeight)
x2 = int(detections[0, 0, i, 5] * frameWidth)
y2 = int(detections[0, 0, i, 6] * frameHeight)
bboxes.append([x1, y1, x2, y2])
cv.rectangle(frameOpencvDnn, (x1, y1), (x2, y2), (0, 255, 0), int(round(frameHeight/150)), 8)
return frameOpencvDnn, bboxes
parser = argparse.ArgumentParser(description='Use this script to run age and gender recognition using OpenCV.')
parser.add_argument('--input', help='Path to input image or video file. Skip this argument to capture frames from a camera.', default="people.jpg")
parser.add_argument("--device", default="cpu", help="Device to inference on")
args = parser.parse_args()
args = parser.parse_args()
faceProto = "opencv_face_detector.pbtxt"
faceModel = "opencv_face_detector_uint8.pb"
ageProto = "age_deploy.prototxt"
ageModel = "age_net.caffemodel"
genderProto = "gender_deploy.prototxt"
genderModel = "gender_net.caffemodel"
MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744, 114.895847746)
ageList = ['(0-2)', '(4-6)', '(8-12)', '(15-20)', '(25-32)', '(38-43)', '(48-53)', '(60-100)']
genderList = ['Male', 'Female']
# 加载网络
ageNet = cv.dnn.readNet(ageModel, ageProto)
genderNet = cv.dnn.readNet(genderModel, genderProto)
faceNet = cv.dnn.readNet(faceModel, faceProto)
if args.device == "cpu":
ageNet.setPreferableBackend(cv.dnn.DNN_TARGET_CPU)
genderNet.setPreferableBackend(cv.dnn.DNN_TARGET_CPU)
faceNet.setPreferableBackend(cv.dnn.DNN_TARGET_CPU)
print("Using CPU device")
elif args.device == "gpu":
ageNet.setPreferableBackend(cv.dnn.DNN_BACKEND_CUDA)
ageNet.setPreferableTarget(cv.dnn.DNN_TARGET_CUDA)
genderNet.setPreferableBackend(cv.dnn.DNN_BACKEND_CUDA)
genderNet.setPreferableTarget(cv.dnn.DNN_TARGET_CUDA)
genderNet.setPreferableBackend(cv.dnn.DNN_BACKEND_CUDA)
genderNet.setPreferableTarget(cv.dnn.DNN_TARGET_CUDA)
print("Using GPU device")
# 打开视频文件或图像文件或相机流
cap = cv.VideoCapture(args.input if args.input else 0)
padding = 20
while cv.waitKey(1) < 0:
# 读取帧图像
t = time.time()
hasFrame, frame = cap.read()
if not hasFrame:
cv.waitKey()
break
frameFace, bboxes = getFaceBox(faceNet, frame)
if not bboxes:
print("No face Detected, Checking next frame")
continue
for bbox in bboxes:
# print(bbox)
face = frame[max(0,bbox[1]-padding):min(bbox[3]+padding,frame.shape[0]-1),max(0,bbox[0]-padding):min(bbox[2]+padding, frame.shape[1]-1)]
blob = cv.dnn.blobFromImage(face, 1.0, (227, 227), MODEL_MEAN_VALUES, swapRB=False)
genderNet.setInput(blob)
genderPreds = genderNet.forward()
gender = genderList[genderPreds[0].argmax()]
# print("Gender Output : {}".format(genderPreds))
print("Gender : {}, conf = {:.3f}".format(gender, genderPreds[0].max()))
ageNet.setInput(blob)
agePreds = ageNet.forward()
age = ageList[agePreds[0].argmax()]
print("Age Output : {}".format(agePreds))
print("Age : {}, conf = {:.3f}".format(age, agePreds[0].max()))
label = "{},{}".format(gender, age)
cv.putText(frameFace, label, (bbox[0], bbox[1]-10), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2, cv.LINE_AA)
cv.imshow("Age Gender Demo", frameFace)
# cv.imwrite("age-gender-out-{}".format(args.input),frameFace)
print("time : {:.3f}".format(time.time() - t))
# cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=~/opencv_gpu -DINSTALL_PYTHON_EXAMPLES=OFF -DINSTALL_C_EXAMPLES=OFF -DOPENCV_ENABLE_NONFREE=ON -DOPENCV_EXTRA_MODULES_PATH=~/cv2_gpu/opencv_contrib/modules -DPYTHON_EXECUTABLE=~/env/bin/python3 -DBUILD_EXAMPLES=ON -DWITH_CUDA=ON -DWITH_CUDNN=ON -DOPENCV_DNN_CUDA=ON -DENABLE_FAST_MATH=ON -DCUDA_FAST_MATH=ON -DWITH_CUBLAS=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-10.2 -DOpenCL_LIBRARY=/usr/local/cuda-10.2/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda-10.2/include/ ..
(2)C++
// Usage
//./AgeGender sample1.jpg
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/dnn.hpp>
#include <tuple>
#include <iostream>
#include <opencv2/opencv.hpp>
#include <iterator>
using namespace cv;
using namespace cv::dnn;
using namespace std;
tuple<Mat, vector<vector<int>>> getFaceBox(Net net, Mat &frame, double conf_threshold)
{
Mat frameOpenCVDNN = frame.clone();
int frameHeight = frameOpenCVDNN.rows;
int frameWidth = frameOpenCVDNN.cols;
double inScaleFactor = 1.0;
Size size = Size(300, 300);
// std::vector<int> meanVal = {104, 117, 123};
Scalar meanVal = Scalar(104, 117, 123);
cv::Mat inputBlob;
inputBlob = cv::dnn::blobFromImage(frameOpenCVDNN, inScaleFactor, size, meanVal, true, false);
net.setInput(inputBlob, "data");
cv::Mat detection = net.forward("detection_out");
cv::Mat detectionMat(detection.size[2], detection.size[3], CV_32F, detection.ptr<float>());
vector<vector<int>> bboxes;
for(int i = 0; i < detectionMat.rows; i++)
{
float confidence = detectionMat.at<float>(i, 2);
if(confidence > conf_threshold)
{
int x1 = static_cast<int>(detectionMat.at<float>(i, 3) * frameWidth);
int y1 = static_cast<int>(detectionMat.at<float>(i, 4) * frameHeight);
int x2 = static_cast<int>(detectionMat.at<float>(i, 5) * frameWidth);
int y2 = static_cast<int>(detectionMat.at<float>(i, 6) * frameHeight);
vector<int> box = {x1, y1, x2, y2};
bboxes.push_back(box);
cv::rectangle(frameOpenCVDNN, cv::Point(x1, y1), cv::Point(x2, y2), cv::Scalar(0, 255, 0),2, 4);
}
}
return make_tuple(frameOpenCVDNN, bboxes);
}
int main(int argc, char** argv)
{
string faceProto = "opencv_face_detector.pbtxt";
string faceModel = "opencv_face_detector_uint8.pb";
string ageProto = "age_deploy.prototxt";
string ageModel = "age_net.caffemodel";
string genderProto = "gender_deploy.prototxt";
string genderModel = "gender_net.caffemodel";
Scalar MODEL_MEAN_VALUES = Scalar(78.4263377603, 87.7689143744, 114.895847746);
vector<string> ageList = {"(0-2)", "(4-6)", "(8-12)", "(15-20)", "(25-32)",
"(38-43)", "(48-53)", "(60-100)"};
vector<string> genderList = {"Male", "Female"};
cout << "USAGE : ./AgeGender <videoFile> " << endl;
cout << "USAGE : ./AgeGender <device> " << endl;
cout << "USAGE : ./AgeGender <videoFile> <device>" << endl;
string device = "cpu";
string videoFile = "0";
// 从命令行获取参数
if (argc == 2)
{
if((string)argv[1] == "gpu")
device = "gpu";
else if((string)argv[1] == "cpu")
device = "cpu";
else
videoFile = argv[1];
}
else if (argc == 3)
{
videoFile = argv[1];
if((string)argv[2] == "gpu")
device = "gpu";
}
// 加载网络模型
Net ageNet = readNet(ageModel, ageProto);
Net genderNet = readNet(genderModel, genderProto);
Net faceNet = readNet(faceModel, faceProto);
if (device == "cpu")
{
cout << "Using CPU device" << endl;
ageNet.setPreferableBackend(DNN_TARGET_CPU);
genderNet.setPreferableBackend(DNN_TARGET_CPU);
faceNet.setPreferableBackend(DNN_TARGET_CPU);
}
else if (device == "gpu")
{
cout << "Using GPU device" << endl;
ageNet.setPreferableBackend(DNN_BACKEND_CUDA);
ageNet.setPreferableTarget(DNN_TARGET_CUDA);
genderNet.setPreferableBackend(DNN_BACKEND_CUDA);
genderNet.setPreferableTarget(DNN_TARGET_CUDA);
faceNet.setPreferableBackend(DNN_BACKEND_CUDA);
faceNet.setPreferableTarget(DNN_TARGET_CUDA);
}
VideoCapture cap;
if (videoFile.length() > 1)
cap.open(videoFile);
else
cap.open(0);
int padding = 20;
while(waitKey(1) < 0) {
// read frame
Mat frame;
cap.read(frame);
if (frame.empty())
{
waitKey();
break;
}
vector<vector<int>> bboxes;
Mat frameFace;
tie(frameFace, bboxes) = getFaceBox(faceNet, frame, 0.7);
if(bboxes.size() == 0) {
cout << "No face detected, checking next frame." << endl;
continue;
}
for (auto it = begin(bboxes); it != end(bboxes); ++it) {
Rect rec(it->at(0) - padding, it->at(1) - padding, it->at(2) - it->at(0) + 2*padding, it->at(3) - it->at(1) + 2*padding);
Mat face = frame(rec); // take the ROI of box on the frame
Mat blob;
blob = blobFromImage(face, 1, Size(227, 227), MODEL_MEAN_VALUES, false);
genderNet.setInput(blob);
// string gender_preds;
vector<float> genderPreds = genderNet.forward();
// 在这里打印性别
// 找到最大元素索引
// 距离函数在 C++ 中执行 argmax() 工作
int max_index_gender = std::distance(genderPreds.begin(), max_element(genderPreds.begin(), genderPreds.end()));
string gender = genderList[max_index_gender];
cout << "Gender: " << gender << endl;
/* // 如果您想遍历gender_preds 向量,请取消注释
for(auto it=begin(gender_preds); it != end(gender_preds); ++it) {
cout << *it << endl;
}
*/
ageNet.setInput(blob);
vector<float> agePreds = ageNet.forward();
/* // 如果要遍历 age_preds,请取消注释下面的代码
* vector
cout << "PRINTING AGE_PREDS" << endl;
for(auto it = age_preds.begin(); it != age_preds.end(); ++it) {
cout << *it << endl;
}
*/
// 在 age_preds 向量中找到最大值索引
int max_indice_age = std::distance(agePreds.begin(), max_element(agePreds.begin(), agePreds.end()));
string age = ageList[max_indice_age];
cout << "Age: " << age << endl;
string label = gender + ", " + age; // label
cv::putText(frameFace, label, Point(it->at(0), it->at(1) -15), cv::FONT_HERSHEY_SIMPLEX, 0.9, Scalar(0, 255, 255), 2, cv::LINE_AA);
imshow("Frame", frameFace);
imwrite("out.jpg",frameFace);
}
}
}
2.2代码解析
2.2.1人脸检测
我们将使用 DNN 人脸检测器进行人脸检测。该模型只有 2.7MB,即使在 CPU 上也非常快。人脸检测是使用函数 getFaceBox 完成的,如下所示。
tuple<Mat, vector<vector<int>>> getFaceBox(Net net, Mat &frame, double conf_threshold)
{
Mat frameOpenCVDNN = frame.clone();
int frameHeight = frameOpenCVDNN.rows;
int frameWidth = frameOpenCVDNN.cols;
double inScaleFactor = 1.0;
Size size = Size(300, 300);
// std::vector<int> meanVal = {104, 117, 123};
Scalar meanVal = Scalar(104, 117, 123);
cv::Mat inputBlob;
cv::dnn::blobFromImage(frameOpenCVDNN, inputBlob, inScaleFactor, size, meanVal, true, false);
net.setInput(inputBlob, "data");
cv::Mat detection = net.forward("detection_out");
cv::Mat detectionMat(detection.size[2], detection.size[3], CV_32F, detection.ptr<float>());
vector<vector<int>> bboxes;
for(int i = 0; i < detectionMat.rows; i++)
{
float confidence = detectionMat.at<float>(i, 2);
if(confidence > conf_threshold)
{
int x1 = static_cast<int>(detectionMat.at<float>(i, 3) * frameWidth);
int y1 = static_cast<int>(detectionMat.at<float>(i, 4) * frameHeight);
int x2 = static_cast<int>(detectionMat.at<float>(i, 5) * frameWidth);
int y2 = static_cast<int>(detectionMat.at<float>(i, 6) * frameHeight);
vector<int> box = {x1, y1, x2, y2};
bboxes.push_back(box);
cv::rectangle(frameOpenCVDNN, cv::Point(x1, y1), cv::Point(x2, y2), cv::Scalar(0, 255, 0),2, 4);
}
}
return make_tuple(frameOpenCVDNN, bboxes);
}
def getFaceBox(net, frame, conf_threshold=0.7):
frameOpencvDnn = frame.copy()
frameHeight = frameOpencvDnn.shape[0]
frameWidth = frameOpencvDnn.shape[1]
blob = cv.dnn.blobFromImage(frameOpencvDnn, 1.0, (300, 300), [104, 117, 123], True, False)
net.setInput(blob)
detections = net.forward()
bboxes = []
for i in range(detections.shape[2]):
confidence = detections[0, 0, i, 2]
if confidence > conf_threshold:
x1 = int(detections[0, 0, i, 3] * frameWidth)
y1 = int(detections[0, 0, i, 4] * frameHeight)
x2 = int(detections[0, 0, i, 5] * frameWidth)
y2 = int(detections[0, 0, i, 6] * frameHeight)
bboxes.append([x1, y1, x2, y2])
cv.rectangle(frameOpencvDnn, (x1, y1), (x2, y2), (0, 255, 0), int(round(frameHeight/150)), 8)
return frameOpencvDnn, bboxes
2.2.2预测性别
我们将性别网络加载到内存中,并将检测到的人脸通过网络。前向传递给出了两个类的概率或置信度。我们取两个输出的最大值并将其用作最终的性别预测。
string genderProto = "gender_deploy.prototxt";
string genderModel = "gender_net.caffemodel";
Net genderNet = readNet(genderModel, genderProto);
vector<string> genderList = {"Male", "Female"};
blob = blobFromImage(face, 1, Size(227, 227), MODEL_MEAN_VALUES, false);
genderNet.setInput(blob);
// string gender_preds;
vector<float> genderPreds = genderNet.forward();
// 在这里打印性别
// 找到最大元素索引
// distance函数在c++中执行argmax()函数
int max_index_gender = std::distance(genderPreds.begin(), max_element(genderPreds.begin(), genderPreds.end()));
string gender = genderList[max_index_gender];
genderProto = "gender_deploy.prototxt"
genderModel = "gender_net.caffemodel"
ageNet = cv.dnn.readNet(ageModel, ageProto)
genderList = ['Male', 'Female']
blob = cv.dnn.blobFromImage(face, 1, (227, 227), MODEL_MEAN_VALUES, swapRB=False)
genderNet.setInput(blob)
genderPreds = genderNet.forward()
gender = genderList[genderPreds[0].argmax()]
print("Gender Output : {}".format(genderPreds))
print("Gender : {}".format(gender))
2.2.3预测年龄
我们加载年龄网络并使用前向传递来获得输出。由于网络架构类似于性别网络,我们可以从所有输出中取最大值以获得预测的年龄组。
string ageProto = "age_deploy.prototxt";
string ageModel = "age_net.caffemodel";
Net ageNet = readNet(ageModel, ageProto);
vector<string> ageList = {"(0-2)", "(4-6)", "(8-12)", "(15-20)", "(25-32)", "(38-43)", "(48-53)", "(60-100)"};
ageNet.setInput(blob);
vector<float> agePreds = ageNet.forward();
int max_indice_age = distance(agePreds.begin(), max_element(agePreds.begin(), agePreds.end()));
string age = ageList[max_indice_age];
ageProto = "age_deploy.prototxt"
ageModel = "age_net.caffemodel"
ageNet = cv.dnn.readNet(ageModel, ageProto)
ageList = ['(0 - 2)', '(4 - 6)', '(8 - 12)', '(15 - 20)', '(25 - 32)', '(38 - 43)', '(48 - 53)', '(60 - 100)']
ageNet.setInput(blob)
agePreds = ageNet.forward()
age = ageList[agePreds[0].argmax()]
print("Gender Output : {}".format(agePreds))
print("Gender : {}".format(age))
2.3显示输出
我们将在输入图像上显示网络的输出,并使用 imshow 函数显示它们。
string label = gender + ", " + age; // label
cv::putText(frameFace, label, Point(it->at(0), it->at(1) -20), cv::FONT_HERSHEY_SIMPLEX, 0.9, Scalar(0, 255, 255), 2, cv::LINE_AA);
imshow("Frame", frameFace);
label = "{}, {}".format(gender, age)
cv.putText(frameFace, label, (bbox[0], bbox[1]-20), cv.FONT_HERSHEY_SIMPLEX, 0.8, (255, 0, 0), 3, cv.LINE_AA)
cv.imshow("Age Gender Demo", frameFace)
2.4结果
3.结论
尽管性别预测网络表现良好,但年龄预测网络没有达到我们的预期。我们试图在论文中找到答案,并找到以下年龄预测模型的混淆矩阵。
从上表可以得出以下意见:
- 0-2、4-6、8-13 和 25-32 年龄组的预测准确度相对较高。 (见对角线元素)
- 输出严重偏向于年龄组 25-32(参见属于年龄组 25-32 的行)。这意味着网络很容易混淆 15 到 43 岁之间的年龄。因此,即使实际年龄在 15-20 或 38-43 之间,预测年龄也很有可能是 25- 32.这在结果部分也很明显。
除此之外,我们观察到如果我们在检测到的人脸周围使用填充,模型的准确性会提高。这可能是因为训练时的输入是标准的人脸图像,而不是我们在人脸检测后得到的裁剪得很近的人脸。
我们还在进行预测之前分析了人脸对齐的使用,发现某些示例的预测有所改善,但与此同时,某些示例的预测变得更糟。如果您主要使用非正面的面孔,则使用对齐可能是个好主意。
参考目录
https://learnopencv.com/age-gender-classification-using-opencv-deep-learning-c-python/