目标检测之—MTCNN实现人脸检测

最新推荐文章于 2023-08-15 20:02:20 发布

置顶 toocy7

最新推荐文章于 2023-08-15 20:02:20 发布

阅读量3.6k

点赞数 4

分类专栏：图像生成深度学习人脸识别文章标签：深度学习 pytorch 神经网络

本文链接：https://blog.csdn.net/qq_40321214/article/details/106360880

版权

深度学习同时被 3 个专栏收录

10 篇文章 0 订阅

订阅专栏

人脸识别

5 篇文章 0 订阅

订阅专栏

图像生成

4 篇文章 0 订阅

订阅专栏

摘要

MTCNN算法，这个算法可以将人脸检测和特征点检测结合起来，并且MTCNN的级联结构对现代的人脸识别也产生了很大的影响。本文为大家介绍MTCNN的算法原理和训练技巧，随后解析MTCNN算法的代码以及DEMO演示。论文地址。

一，原理

人脸检测，解决两个问题：

1)识别图片中有没有人脸？

2)如果有，人脸在哪？因此，许多人脸应用(人脸识别、特征分析)的基础是人脸检测。

MTCNN:(Multi-task Cascaded Convolutional Neural Networks) 翻译为：
多任务级联卷积神经网络，MTCNN在刚出来的时候是表现非常优秀的，目前已经不是最优的了，该网络的进步意义在于：第一次将人脸检测和人脸特征点定位结合起来，以及采用三个网络级联使用和图像金字塔缩放的思想。

（1）MTCNN侦测第一阶段
通过P-Net（Proposal Network），获得候选窗口和边界回归值。同时候选窗口根据边界框进行校正，再利用NMS（非极大值抑制）去除重复的候选框。
（2）MTCNN侦测第二阶段
经过PNet处理后的候选框输入到R-Net（Refine Network），RNet对这些候选的框体对应的图片进一步侦测，最后在卷积的最后一层使用全连接网络进行分类，然后再次使用NMS去除这时重复的候选框，留下部分候选框。
（3）MTCNN侦测第三阶段
O-net(Output Network)对上层网络输出的结果进行进一步的侦测，同样对这些重复的候选框使用了NMS，并最终输出各个候选框的置信度以及标定每个候选框中人脸的5个特征点。
以上只是三个级联网络的大致使用流程，其中涉及了很多图像变换操作，IOU，NMS的计算，而且三个网络的要求精度为：Onet >Rnet>Pnet。让我们直接上摘自论文的经典流程图：

侦测流程
可以看出，MTCNN网络在对单独一个人脸侦测过程是从最开始的输出很多候选框到最终确定为一个人脸框及5个特征点。

二、网络结构图

如下：
网络结构
代码：

import torch
import torch.nn as nn
""" 对网络结构进行了改进，Depthwise Conv"""

class PNet(nn.Module):
    
    def __init__(self):
        super(PNet, self).__init__()
        
        self.pre_layer = nn.Sequential(
            nn.Conv2d(3, 8, 3, 1),
            nn.PReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(8, 16, 3, 1, groups=4),
            nn.PReLU(),
            nn.Conv2d(16, 32, 3, 1, groups=8),
            nn.PReLU())
        
        self.conv4_1 = nn.Conv2d(32, 1, kernel_size=1, stride=1)
        self.conv4_2 = nn.Conv2d(32, 4, kernel_size=1, stride=1)
    
    def forward(self, x):
        x = self.pre_layer(x)
        cls = torch.sigmoid(self.conv4_1(x))
        offset = self.conv4_2(x)
        return cls, offset


class RNet(nn.Module):
    def __init__(self):
        super(RNet, self).__init__()
        self.pre_layer = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, stride=1),  # 22*22*28
            nn.PReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2,padding=1),  # 11*11*28
            nn.Conv2d(32, 32, kernel_size=3, stride=1,groups=32),
            nn.Conv2d(32,48,kernel_size=1,stride=1), # 9*9*48
            nn.PReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2),  # 4*4*48
            nn.Conv2d(48, 64, kernel_size=2, stride=1),  # 3*3*64
            nn.PReLU()

        )
        self.conv4 = nn.Linear(64 * 3 * 3, 128)
        self.prelu4 = nn.PReLU()
        self.conv5_1 = nn.Linear(128, 1)
        self.conv5_2 = nn.Linear(128, 4)

    def forward(self, x):
        x = self.pre_layer(x)
        x = x.view(x.size(0), -1)
        x = self.conv4(x)
        x = self.prelu4(x)
        cls = torch.sigmoid(self.conv5_1(x))
        offset = self.conv5_2(x)
        return cls, offset


class ONet(nn.Module):
    def __init__(self):
        super(ONet, self).__init__()
        self.pre_layer = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, stride=1),  # 46*46*32
            nn.PReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),  # 23*23*32
            nn.Conv2d(32, 64, kernel_size=3, stride=1),  # 21*21*64
            nn.PReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2),  # 10*10*64
            nn.Conv2d(64, 64, kernel_size=3, stride=1, groups=64),
            nn.Conv2d(64, 64, 1, ),  # 8*8*64
            nn.PReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 4*4*64
            nn.Conv2d(64, 128, kernel_size=2, stride=1),  # 3*3*128
            nn.PReLU()
        )
        self.conv5 = nn.Linear(128 * 3 * 3, 256)
        self.prelu5 = nn.PReLU()
        self.conv6_1 = nn.Linear(256, 1)
        self.conv6_2 = nn.Linear(256, 4)

    def forward(self, x):
        x = self.pre_layer(x)
        x = x.view(x.size(0), -1)
        x = self.conv5(x)
        x = self.prelu5(x)
        cls = torch.sigmoid(self.conv6_1(x))
        offset = self.conv6_2(x)
        return cls, offset
  """ 带有landmark的Onet"""
class ONet_(nn.Module):
    def __init__(self):
        super(ONet, self).__init__()
        self.pre_layer = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, stride=1),  # 46*46*32
            nn.PReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),  # 23*23*32
            nn.Conv2d(32, 64, kernel_size=3, stride=1),  # 21*21*64
            nn.PReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2),  # 10*10*64
            nn.Conv2d(64, 64, kernel_size=3, stride=1, groups=64),
            nn.Conv2d(64, 64, 1, 1),  # 8*8*64
            nn.PReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 4*4*64
            nn.Conv2d(64, 128, kernel_size=2, stride=1),  # 3*3*128
            nn.PReLU()
        )
        self.conv5 = nn.Linear(128 * 3 * 3, 256)
        self.prelu5 = nn.PReLU()
        self.conv6_1 = nn.Linear(256, 1)
        self.conv6_2 = nn.Linear(256, 4)
        self.conv6_3 = nn.Linear(256, 10)

    def forward(self, x):
        x = self.pre_layer(x)
        x = x.view(x.size(0), -1)
        x = self.conv5(x)
        x = self.prelu5(x)
        cls = torch.sigmoid(self.conv6_1(x))
        offset = self.conv6_2(x)
        landmark = self.conv6_3(x)
        return cls, offset, landmark

对于网络结构可以后面看代码具体了解，还可以对网络进行改进。。重点看每个网络的输入部分；P-Net的输入为：12x12x3，即一张12x12像素的三通道RGB图像所对应的数组。然而真实的拍张照片一个人脸的大小怎么也比12x12大吧？所以MTCNN使用图像金字塔通过对一幅图像的缩放来产生一系列最小边长不小于12的图像输入我们的侦测网络中，因为像素很小，所以网络对每张图像的响应是很快的，但是由于图像数量太多，导致网络整体的侦测速度较慢，这里不再展开叙述。
所以我们知道了，R网络可以侦测的最小图像大小为24x24，O网络为48x48。要是我们的原始图像里的人脸小于48*48这个值，MTCNN就不具备侦测能力了——这是错误理解。
图像金字塔是对图像进行缩放，也就是说把一张比如200x200的图像可以缩放为12x12大小，在这个缩放过程中，无论图像中人脸大小为多少，都能被放进12x12大的框中然后被输入网络进行侦测。

三，网络训练

1、损失函数

MTCNN对人脸的特征描述分为：
（1）对人脸和非人脸的分类问题；
（2）对人脸边界框的回归问题；
（3）对人脸的特征点的回归问题。
对于人脸分类问题的目标函数为
BCELoss
对边界框回归的目标函数为
MSELoss
对人脸5个特征点回归的目标函数为
MSELoss
综合起来的损失函数为

原论文在P,R网络中设置：αdet = 1，αbox =0.5，αlandmark = 0.50，而在O网络中将αlandmark设为1，其他不变。这是常见的多任务联合损失函数的定义。

2，数据集生成

我使用的是CelebA数据集，使用前需要对数据集进行预处理。
论文中生成训练时需要的正样本：负样本：部分样本比例为 1：3：1。
IOU值（即两幅图像有重叠部分时，交集比并集的值）。其中IOU值大于0.65作为正样本，0.65>iou>0.4的样本为部分样本，iou<0.4的为负样本。（这个取值范围可以根据自己数据集生成后的效果进行调整，要求是正样本的图片包含了一个完整的人脸，部分样本要能够让我们直接看出来这是一张人脸，负样本里看不出来这是一张人脸。）
IOU
IOU 生成数据集的**代码链接**：主要是通过获取原始图片的中心坐标和宽高后，对中心点进行一定偏移，生成不同的（W*H）大小的图像并判断图像与原始图像的IOU值，归为各自的样本分类，并对图像进行Resize操作为12x12、24x24、48x48大小的样本集，在TXT文档中保存的图像中心点对原始图像中心点的偏移量，及左上角和右下角的偏移量，对于O网络，还生成5个特征点坐标的偏移量。有了各自对应的偏移量，我们就可以反算回原图的坐标啦。

3 ，编辑工具代码

为了满足MTCNN训练和侦测使用需要的工具代码主要有三部分：IOU、NMS和转换正方形。代码：

import numpy as np

def iou(box, boxes, isMin=False):
    """
    :param box: [N 1] x1,y1,x2,y2 0123
    :param boxes: [N,4]-->[x1,y1,x2,y2]
    :param isMin:calculate the minimum inter area
    :return:[n, iou]
    """
    box_area = (box[2] - box[0]) * (box[3] - box[1])
    area = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
    xx1 = np.maximum(boxes[:, 0], box[0])
    yy1 = np.maximum(boxes[:, 1], box[1])
    xx2 = np.minimum(boxes[:, 2], box[2])
    yy2 = np.minimum(boxes[:, 3], box[3])
    w = np.maximum(0, xx2 - xx1)
    h = np.maximum(0, yy2 - yy1)
    inter = w * h
    if isMin:
        ious = np.true_divide(inter, np.minimum(box_area, area))
    else:
        ious = np.true_divide(inter, (box_area+area-inter))
    
    return ious


def nms(boxes, thresh=0.3, isMin=False):
    """
    :param boxes: [x1 y1 x2 y2 cls] * n cls>cls>cls --> iou -->
    :param thresh:iou condition
    :param isMin: p-r: False , o:True
    :return: np.arrays ([x,y,x1,y1,conf]* N)
    """
    if boxes.shape[0] == 0:
        return np.array([])
    a_boxes = boxes[(-boxes[:, 4]).argsort()]  # 切片和返回排序的索引按照最大到最小排列
    r_boxes = []
    while a_boxes.shape[0] > 1:  # 大于1行以上
        a_box = a_boxes[0]
        b_boxes = a_boxes[1:]
        r_boxes.append(a_box)  # 每次都保留cls最大的box
        index = np.where(iou(a_box, b_boxes, isMin) < thresh)
        # np.where 返回索引值
        a_boxes = b_boxes[index]
    x = 2
    print(x)
    if a_boxes.shape[0] > 1:
        r_boxes.append(a_boxes[0])
    
    return np.stack(r_boxes)


def nms2(boxes, thresh=0.3, isMin=False):
    """
    包含iou计算的nms，对图像中 H < 50 及 inter面积 < 5000的box进行侦测
    :param boxes: [x1 y1 x2 y2 cls] * n cls>cls>cls --> iou -->
    :param thresh:iou condition
    :param isMin: p-r: False , o:True
    :return: np.arrays ([x,y,x1,y1,conf]* N)
    """
    if boxes.shape[0] == 0:
        return np.array([])
    a_boxes = boxes[(-boxes[:, 4]).argsort()]  # 切片和返回排序的索引按照最大到最小排列
    r_boxes = []
    while a_boxes.shape[0] > 1:  # 大于1行以上
        
        a_box = a_boxes[0]
        b_boxes = a_boxes[1:]
        r_boxes.append(a_box)  # 每次都保留cls最大的box
        box_area = (a_box[2] - a_box[0]) * (a_box[3] - a_box[1])
        area = (b_boxes[:, 2] - b_boxes[:, 0]) * (b_boxes[:, 3] - b_boxes[:, 1])
        xx1 = np.maximum(b_boxes[:, 0], a_box[0])
        yy1 = np.maximum(b_boxes[:, 1], a_box[1])
        xx2 = np.minimum(b_boxes[:, 2], a_box[2])
        yy2 = np.minimum(b_boxes[:, 3], a_box[3])
        w = np.maximum(0, xx2 - xx1)
        h = np.maximum(0, yy2 - yy1)
        index_h = np.where(h < 50)
        inter = w[index_h] * h[index_h]
        index1 = np.where(inter < 5000)
        area = area[index1]
        inter = inter[index1]
        if isMin:
            ious = np.true_divide(inter, np.minimum(box_area, area))
        else:
            ious = np.true_divide(inter, (box_area + area - inter))
        index = np.where(ious < thresh)
        a_boxes = ((b_boxes[index1])[index])
    
    if a_boxes.shape[0] > 1:
        r_boxes.append(a_boxes[0])
    
    return np.stack(r_boxes)


def convert_to_squre(bbox):
    """
    :param bbox: x1,y1,x2,y2
    :return: square box of x1,y1,x2,y2
    """
    sq_box = bbox.copy()
    if bbox.shape[0] == 0:
        return np.array([])
    
    h = bbox[:, 3] - bbox[:, 1]
    w = bbox[:, 2] - bbox[:, 0]
    
    max_len = np.maximum(h, w)
    
    sq_box[:, 0] = bbox[:, 0] + (w - max_len) * 0.5
    sq_box[:, 1] = bbox[:, 1] + (h - max_len) * 0.5
    sq_box[:, 2] = sq_box[:, 0] + max_len
    sq_box[:, 3] = sq_box[:, 1] + max_len
    
    return sq_box

为什么需要呢？

IOU：重复部分占两个候选框并集面积的衡量。

网络对需要侦测的图片输出结果为: 置信度和边框坐标。由于检测过程中使用图像金字塔对输入图像进行缩放，网络对图像的人脸区域会生成很多候选框。这些候选框不能全部都输入后面的网络，这样会增加这个侦测所用的时间，而且，我们只需要得到对一个人脸唯一的最优的一个候选框，所有与该候选框具有重复部分的候选框我们都可以通过筛选来去除掉。
那么，重复部分的面积越大，两个框为同一个人脸的概率就越大；面积越小，概率越小。所以这里我们相当于引入了一个超参数对这个条件进行约束。同理，这里在我们生成数据集样本时，也用到了IOU，在这里我们认为：
iou >0.65 的图像，我们认为图像中心点虽然有一定的偏移，但是图像里还是一个完整的人脸，因为他偏移的不多，就可以作为我们的正样本来使用。
Positive
部分样本呢，我们选择的是 0.65>iou>0.4, 跟正样本类似，能看到图像里是一个人脸，但是他偏移的多一些。所以我们用正样本和部分样本一起来进行三个网络的偏移量训练。这是回归任务哦。
Part
负样本：当iou<0.3时，嗯。。。。这图片里肯定不是个人脸。所以：我们用负样本和正样本去训练网络识别是否有人脸。这是分类任务哦。
Negative
NMS:抑制不是极大值的元素，搜索局部的极大值。

先假设有6个矩形框，根据分类器类别分类概率做排序，从大到小分别属于人脸的概率分别为：
概率：蓝色>红色 >绿色
图中几乎每两个框都有相交，只要有相交的就有IOU值，不相交就为0。
1、首先对所有框进行置信度排序：蓝色0>红色 5>绿色… （编号1-4）；
2、蓝色0不动，其他所有框都跟蓝色0大佬作IOU计算，得到一个排列：绿色3>绿色4>绿色2>红色5>绿色1=0；
PS：从图上可以看出来吧？
3、去除其中大于阈值的框，比如0.3；
4、这时，绿色3、4因为重复面积太多被删除了，绿色1、2和红色5留下，同时保留蓝色大佬；
5、对绿色1、2和红色5进行置信度排序：红色5 > 绿色2 > 绿色1；
6、红色5大佬不动，其他所有框跟红色5作IOU计算，排列：绿色2 > 绿色1 > 0.3；
7、绿色1、2也走了。。。
结果：蓝色、红色 嗯。。。是我们想要的！
NMS
转正方形：MTCNN使用的级联思想是：Pnet–>Rnet–>Onet，在这中间流动的是我们的图片和上一层网络侦测的结果。

但是通过网络预测处理的box（x1,x2,y1,y2）不是正方形的，而后面网络想要对这些预测框进行准确的侦测的基础是，输入网络的图片必须是固定大小的。所以我们需要将box转换为正方形。
convert
常用的resize，reshape操作一般会改变图像的实际分辨率和像素间的位置关系。这不是我们想要的。而且，这里得到只是一组**人脸图像对应的坐标值，而不是原始的图像。**所以我们想到了，将box变成一个正方形区域，这样在原图上对应的图像区域就包含有以前box，同时也得到图像的更多更完整的信息。上图:红色框是Pnet检测出的box，蓝色框是转为正方形后的box。
Output
4 ，开始训练网络
P网络Loss：0.01
R网络Loss：0.002
O网络Loss：0.0005
训练时间：PNet 1~2小时，RNet 1小时 ONet 1.5小时。
基于Pytorch这里只是训练O-net，其他两个网络的训练需要导入其他两个网络，并变动Mydataset返回的参数为img_data, cls, offset，且训练时不需要landmark_loss。
代码：

import os
import torch
import torch.nn as nn
import torch.optim as opt
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
from torch.utils.data import Dataset
import os
import torch
from PIL import Image
from torchvision import transforms


class Mydataset(Dataset):
    def __init__(self, path):
        self.path = path
        self.dataset = []
        self.dataset.extend(open(os.path.join(path, "positive.txt")).readlines())
        self.dataset.extend(open(os.path.join(path, "negative.txt")).readlines())
        self.dataset.extend(open(os.path.join(path, "part.txt")).readlines())
        self.transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, index):
        strs = self.dataset[index].strip().split(' ')
        img_path = os.path.join(self.path, strs[0])
        cls = torch.tensor([int(strs[1])], dtype=torch.float32)
        offset = torch.tensor([float(strs[2]), float(strs[3]), float(strs[4]), float(strs[5])])
        land = torch.tensor([float(strs[6]), float(strs[7]), float(strs[8]), float(strs[9]), float(strs[10]), float(strs[11]), float(strs[12]), float(strs[13]), float(strs[14]), float(strs[15])])
        img = Image.open(img_path)
        img_data = self.transform(img)
        return img_data, cls, offset, land


class ONet(nn.Module):
    def __init__(self):
        super(ONet, self).__init__()
        self.pre_layer = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, stride=1),  # 46*46*32
            nn.PReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),  # 23*23*32
            nn.Conv2d(32, 64, kernel_size=3, stride=1),  # 21*21*64
            nn.PReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2),  # 10*10*64
            nn.Conv2d(64, 64, kernel_size=3, stride=1, groups=64),
            nn.Conv2d(64, 64, 1, 1),  # 8*8*64
            nn.PReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 4*4*64
            nn.Conv2d(64, 128, kernel_size=2, stride=1),  # 3*3*128
            nn.PReLU()
        )
        self.conv5 = nn.Linear(128 * 3 * 3, 256)
        self.prelu5 = nn.PReLU()
        self.conv6_1 = nn.Linear(256, 1)
        self.conv6_2 = nn.Linear(256, 4)
        self.conv6_3 = nn.Linear(256, 10)

    def forward(self, x):
        x = self.pre_layer(x)
        x = x.view(x.size(0), -1)
        x = self.conv5(x)
        x = self.prelu5(x)
        cls = torch.sigmoid(self.conv6_1(x))
        offset = self.conv6_2(x)
        landmark = self.conv6_3(x)
        return cls, offset, landmark


class Trainer:
    def __init__(self, net, save_path, dataset_path):
        if torch.cuda.is_available():
            self.device = torch.device('cuda')
        else:
            self.device = torch.device('cpu')
        print(self.device)
        self.net = net.to(self.device)
        self.save_path = save_path
        self.data_path = dataset_path
        self.cls_fc = nn.BCELoss()
        self.offset_fc = nn.MSELoss()
        self.opt = opt.Adam(self.net.parameters())
        if os.path.exists(self.save_path):
            net.load_state_dict(torch.load(self.save_path))
        else:
            print('No parameters')

    def train(self, stop_value, batch_size):
        data_load = DataLoader(Mydataset(self.data_path), batch_size=batch_size, shuffle=True)
        for e in range(100):
            for i, (img_data_, cls_, offset_, landmark_) in enumerate(data_load):
                """标签"""
                img_data_ = img_data_.to(self.device)
                cls_ = cls_.to(self.device)
                offset_ = offset_.to(self.device)
                landmark_ = landmark_.to(self.device)
                """网络输出"""
                x, y, z = self.net(img_data_)
                x = x.view(-1, 1)
                y = y.view(-1, 4)
                z = z.view(-1, 10)
                """ 分类损失"""
                mask_cls = torch.lt(cls_, 2)
                cls = torch.masked_select(cls_, mask_cls)
                out_cls = torch.masked_select(x, mask_cls)
                cls_loss = self.cls_fc(out_cls, cls)
                """偏移损失"""
                mask_off = torch.gt(cls_, 0)
                offset = torch.masked_select(offset_, mask_off)
                out_offset = torch.masked_select(y, mask_off)
                offset_loss = self.offset_fc(out_offset, offset)
                """位置损失"""
                mask = torch.gt(cls_, 0)
                landmark = torch.masked_select(landmark_, mask)
                out_landmark = torch.masked_select(z, mask)
                landmark_loss = self.offset_fc(out_landmark, landmark)
        
                loss = cls_loss + 0.5 * offset_loss + landmark_loss
                # loss = cls_loss + offset_loss 
                self.opt.zero_grad()
                loss.backward()
                self.opt.step()
                a = cls_loss.cpu().item()
                b = offset_loss.cpu().item()
                c = landmark_loss.cpu().item()
                # print(c)
                loss2 = (1 - loss.cpu().item()) * 100
                print('e:{} i: {}  loss:{}/{}--cls损失：{}——偏移坐标损失：{}——5位置坐标损失:{}'.format(e, i, loss2, loss, a, b, c))
                # losses.append(loss)
                # plt.clf()
                # plt.plot(losses)
                # plt.pause(0.01)
                if e % 4 == 0 and i % 500 == 0:
                    torch.save(self.net.state_dict(), self.save_path)
                    print('Saved')
                if loss < stop_value:
                    torch.save(self.net.state_dict(), self.save_path)
                    print('保存模型参数成功')
                    break


if __name__ == '__main__':
    net = ONet()
    t = Trainer(net, './params/Onet_Landmark.pth', r'C:\MTCNN_data\48')
    t.train(0.0005, 256)

五、侦测流程
自制侦测流程图：

侦测中因为P网络输出的Box是经过图像金字塔和卷积池化后的偏移量，在我们进行NMS操作前，要经过坐标的反算来得到Box对应的原图上真正的坐标位置。
坐标反算
侦测代码：

import numpy as np
import time
import torch
from MTCNN_last_version import Tools
from MTCNN_last_version.Tools import convert_to_squre
from MTCNN_last_version.Module import RNet, PNet
from MTCNN_last_version.Train_Onet_Landmark import ONet  # Landmark
from PIL import Image
from PIL import ImageDraw
from torchvision import transforms


class Detector:
    def __init__(self, pp=r'./params/Pnet2048.pth', rp=r'./params/Rnet.pth',
                 op=r'./params/Onet_Landmark.pth', isCuda=True):
        self.isCuda = isCuda
        self.pnet = PNet()
        self.rnet = RNet()
        self.onet = ONet()
        
        if self.isCuda:
            self.pnet.cuda()
            self.rnet.cuda()
            self.onet.cuda()
        
        self.pnet.eval()
        self.rnet.eval()
        self.onet.eval()
        
        self.pnet.load_state_dict(torch.load(pp, map_location='cpu'))
        self.rnet.load_state_dict(torch.load(rp, map_location='cpu'))
        self.onet.load_state_dict(torch.load(op, map_location='cpu'))
        self.__transfrom = transforms.Compose([transforms.ToTensor(),
                                               transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                                                    std=[0.229, 0.224, 0.225])])
    
    def detect(self, image):
        empty = np.array([])
        t1 = time.time()
        p_boxes = self.__pnet_detect(image)
        print('Pbox', p_boxes.shape)
        if p_boxes.shape[0] == 0:
            return empty
        t2 = time.time()
        t_p = t2 - t1
        t3 = time.time()
        r_boxes = self.__rnet_detect(image, p_boxes)
        print('Rbox', r_boxes.shape)
        if r_boxes.shape[0] == 0:
            return empty
        t4 = time.time()
        t_r = t4 - t3
        t5 = time.time()
        o_boxes = self.__onet_detect(image, r_boxes)
        print('Obox', o_boxes.shape)
        if o_boxes.shape[0] == 0:
            return empty
        t6 = time.time()
        t_o = t6 - t5
        t_sum = t_p + t_r + t_o
        print('t_sum :{} t_p: {} t_r: {} t_o: {}'.format(t_sum, t_p, t_r, t_o))
        return o_boxes
    
    def __pnet_detect(self, x):
        boxes = []
        img = x
        w, h = img.size
        min_side = min(w, h)
        scale = 1
        while min_side > 12:
            img_data = self.__transfrom(img)
            if self.isCuda:
                img_data = img_data.cuda()
            
            img_data.unsqueeze_(0)
            _cls, _offset = self.pnet(img_data)
            cls, offset = _cls[0][0].cpu().data, _offset[0].cpu().data
            """ 优化使用切片提高速度"""
            cls = cls[0, 0, :, :]
            offset = offset[0, :, :, :]
            mask = torch.gt(cls, 0.75)
            idxs = torch.nonzero(mask)
            offset = offset[:, idxs[:, 0], idxs[:, 1]]
            cls = cls[mask]
            boxes.append(self.__box(idxs, offset, cls, scale))
            # indexes = torch.nonzero(torch.gt(cls, 0.75))
            # for idx in indexes:
            #     boxes.append(self.__box(idx, offset, cls[idx[0], idx[1]], scale))
            scale *= 0.709
            _w = int(w * scale)
            _h = int(h * scale)
            img = img.resize((_w, _h))
            min_side = np.minimum(_w, _h)
        
        # return Tools.nms(np.array(boxes), thresh=0.3)
        return Tools.nms(np.concatenate(boxes, axis=0), thresh=0.3)
        # return Tools.nms_(np.array(boxes), thresh=0.3)  # 返回：对图像中 H < 50 及 inter面积 < 5000的box进行侦测
    
    def __box(self, start_index, offset, cls, scale, stride=2, side_len=12):
        # _x1 = int(start_index[1] * stride) / scale  # 宽，W，x
        # _y1 = int(start_index[0] * stride) / scale  # 高，H,y
        # _x2 = int(start_index[1] * stride + side_len) / scale
        # _y2 = int(start_index[0] * stride + side_len) / scale
        """ 优化使用切片提高速度"""
        _x1 = (start_index[:, 1].float() * stride) / scale
        _y1 = (start_index[:, 0].float() * stride) / scale
        _x2 = (start_index[:, 1].float() * stride + side_len) / scale
        _y2 = (start_index[:, 0].float() * stride + side_len) / scale
        ow = _x2 - _x1  # 12
        oh = _y2 - _y1
        # _offset = offset[:, start_index[0], start_index[1]]
        # x1 = _x1 + ow * _offset[0]
        # y1 = _y1 + oh * _offset[1]
        # x2 = _x2 + ow * _offset[2]
        # y2 = _y2 + oh * _offset[3]
        x1 = _x1 + ow * offset[0]
        y1 = _y1 + oh * offset[1]
        x2 = _x2 + ow * offset[2]
        y2 = _y2 + oh * offset[3]
        # return [x1, y1, x2, y2, cls]
        return np.stack((x1, y1, x2, y2, cls), axis=1)

    def __rnet_detect(self, image, y):
        _img_dataset = []
        pnet_boxes = convert_to_squre(y)
        for _box in pnet_boxes:
            _x1 = int(_box[0])
            _y1 = int(_box[1])
            _x2 = int(_box[2])
            _y2 = int(_box[3])
            
            img = image.crop((_x1, _y1, _x2, _y2))
            img = img.resize((24, 24))
            img_data = self.__transfrom(img)
            _img_dataset.append(img_data)
        
        img_dataset = torch.stack(_img_dataset)
        if self.isCuda:
            img_dataset = img_dataset.cuda()
        _cls, _offset = self.rnet(img_dataset)
        _cls = _cls[:, 0]
        _cls = _cls.cpu().data.numpy()
        offset = _offset.cpu().data.numpy()
        boxes = []
        # indexes, _ = np.where(_cls > 0.95)
        idx, _ = np.where(_cls > 0.95)[0]
        # for idx in indexes:
        #     _box = pnet_boxes[idx]
        #     _x1 = int(_box[0])
        #     _y1 = int(_box[1])
        #     _x2 = int(_box[2])
        #     _y2 = int(_box[3])
        _x1 = (_box[:, 0])
        _y1 = (_box[:, 1])
        _x2 = (_box[:, 2])
        _y2 = (_box[:, 3])
        ow = _x2 - _x1
        oh = _y2 - _y1

        x1 = _x1 + ow * offset[idx][0]
        y1 = _y1 + oh * offset[idx][1]
        x2 = _x2 + ow * offset[idx][2]
        y2 = _y2 + oh * offset[idx][3]
        cls = _cls[idx]
        # boxes.append([x1, y1, x2, y2, cls])
        a = np.stack((x1, y1, x2, y2, cls), axis=1)
        boxes.append(a)
        # return Tools.nms(np.array(boxes), 0.3)

        return Tools.nms(np.concatenate(boxes, axis=0), 0.3)
    
    def __onet_detect(self, image, _rnet_box):
        _img_data = []
        rnet_box = convert_to_squre(_rnet_box)
        for _box in rnet_box:
            _x1 = int(_box[0])
            _y1 = int(_box[1])
            _x2 = int(_box[2])
            _y2 = int(_box[3])
            img = image.crop((_x1, _y1, _x2, _y2))
            img = img.resize((48, 48))
            img_data = self.__transfrom(img)
            _img_data.append(img_data)
        
        img_dataset = torch.stack(_img_data)
        if self.isCuda:
            img_dataset = img_dataset.cuda()
        _cls, _offset, fll_ = self.onet(img_dataset)
        # _cls = _cls[:, 0]
        # _offset_4 = _offset_4[:, :]
        # _offset_10 = _offset_10[:, :]
        _cls = _cls.cpu().data.numpy()
        offset = _offset.cpu().data.numpy()
        fll = fll_.cpu().data.numpy()
        boxes = []
        # idxs = np.where(cls > self.opt.o_cls)[0]
        indexes, _ = np.where(_cls > 0.95)
        # _x1 = (_box[:, 0])
        # _y1 = (_box[:, 1])
        # _x2 = (_box[:, 2])
        # _y2 = (_box[:, 3])
        for idx in indexes:
            _box = rnet_box[idx]
            _x1 = int(_box[0])
            _y1 = int(_box[1])
            _x2 = int(_box[2])
            _y2 = int(_box[3])
            
            ow = _x2 - _x1
            oh = _y2 - _y1
            
            x1 = _x1 + ow * offset[idx][0]
            y1 = _y1 + oh * offset[idx][1]
            x2 = _x2 + ow * offset[idx][2]
            y2 = _y2 + oh * offset[idx][3]
            cls = _cls[idx][0]
            fllx1 = _x1 + ow * fll[idx][0]
            flly1 = _y1 + oh * fll[idx][1]
            fllx2 = _x1 + ow * fll[idx][2]
            flly2 = _y1 + oh * fll[idx][3]
            fllx3 = _x1 + ow * fll[idx][4]
            flly3 = _y1 + oh * fll[idx][5]
            fllx4 = _x1 + ow * fll[idx][6]
            flly4 = _y1 + oh * fll[idx][7]
            fllx5 = _x1 + ow * fll[idx][8]
            flly5 = _y1 + oh * fll[idx][9]
            boxes.append([x1, y1, x2, y2, cls, fllx1, flly1, fllx2, flly2, fllx3, flly3, fllx4, flly4, fllx5, flly5])
        return Tools.nms(np.array(boxes), 0.3, isMin=True)


if __name__ == '__main__':
    t01 = time.time()
    with torch.no_grad() as grad:
        image_file = r'C:\Projects\MTCNN_last_version\211.jpg'
        detect = Detector()
        with Image.open(image_file) as img:
            boxes = detect.detect(img)
            print(boxes.shape)
            imgDraw = ImageDraw.Draw(img)
            for box in boxes:
                x1 = int(box[0])
                y1 = int(box[1])
                x2 = int(box[2])
                y2 = int(box[3])
                '''5 - 14'''
                fllx1 = int(box[5])
                flly1 = int(box[6])
                fllx2 = int(box[7])
                flly2 = int(box[8])
                fllx3 = int(box[9])
                flly3 = int(box[10])
                fllx4 = int(box[11])
                flly4 = int(box[12])
                fllx5 = int(box[13])
                flly5 = int(box[14])
                box2 = [fllx1, flly1, fllx2, flly2, fllx3, flly3, fllx4, flly4, fllx5, flly5]
                for i in range(1, 6):
                    imgDraw.chord((int(box[i*2+3]), int(box[i*2+4]), int(box[i*2+3])+2, int(box[i*2+4])+2),start=0,end=360, width=5, fill='red')
                imgDraw.text((x1, y1-15), str(round(box[4], 5)), fill='red')
                imgDraw.rectangle((x1, y1, x2, y2), outline='red', width=3)
            t02 = time.time()
            print(t02 - t01)
            img.show()

网络改进：
1、for循环的改进

""" 优化使用切片提高速度"""
            cls = cls[0, 0, :, :]
            offset = offset[0, :, :, :]
            mask = torch.gt(cls, 0.75)
            idxs = torch.nonzero(mask)
            offset = offset[:, idxs[:, 0], idxs[:, 1]]
            cls = cls[mask]

2、深度可分离卷积结构（Depthwise separable convolution）

class RNet(nn.Module):
    def __init__(self):
        super(RNet, self).__init__()
        self.pre_layer = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, stride=1),  # 22*22*28
            nn.PReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2,padding=1),  # 11*11*28
            nn.Conv2d(32, 32, kernel_size=3, stride=1,groups=32),
            nn.Conv2d(32,48,kernel_size=1,stride=1), # 9*9*48
            nn.PReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2),  # 4*4*48
            nn.Conv2d(48, 64, kernel_size=2, stride=1),  # 3*3*64
            nn.PReLU()

        )
        self.conv4 = nn.Linear(64 * 3 * 3, 128)
        self.prelu4 = nn.PReLU()
        self.conv5_1 = nn.Linear(128, 1)
        self.conv5_2 = nn.Linear(128, 4)

    def forward(self, x):
        x = self.pre_layer(x)
        x = x.view(x.size(0), -1)
        x = self.conv4(x)
        x = self.prelu4(x)
        cls = torch.sigmoid(self.conv5_1(x))
        offset = self.conv5_2(x)
        return cls, offset

附件上传一份自己做的PPT。
MTCNN真的不难，只是过程太繁琐，很难一下理清楚。

toocy7

关注

4
点赞
踩
51

收藏

觉得还不错? 一键收藏
3
评论
目标检测之—MTCNN实现人脸检测

摘要MTCNN算法，这个算法可以将人脸检测和特征点检测结合起来，并且MTCNN的级联结构对现代的人脸识别也产生了很大的影响。本文为大家介绍MTCNN的算法原理和训练技巧，随后解析MTCNN算法的代码以及DEMO演示。论文地址。一，原理人脸检测，解决两个问题：1)识别图片中有没有人脸？2)如果有，人脸在哪？因此，许多人脸应用(人脸识别、特征分析)的基础是人脸检测。MTCNN:(Multi-task Cascaded Convolutional Neural Networks) 翻译为：多任务级联
复制链接

扫一扫

专栏目录