多任务级联神经网络-mtcnn

最新推荐文章于 2023-07-31 00:38:53 发布

sc9876543210

最新推荐文章于 2023-07-31 00:38:53 发布

阅读量1.1k

点赞数 1

分类专栏：深度学习 pytorch

本文链接：https://blog.csdn.net/sc9876543210/article/details/117433456

版权

深度学习同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

pytorch

5 篇文章 0 订阅

订阅专栏

一：侦测阶段

在这里插入图片描述
如上图所示，整个Mtcnn的推理过程分为以下几个步骤：
mtcnn只能侦测单类多目标，不能侦测多类
注：以下所有代码只做参考

1、P_net

对原图进行图像特征金字塔处理（一般设置缩放因子为0.7左右合适，太大增
加推理时间，太小可能漏小框，具体还需根据实际情况处理），直到最小边
小于12（P_net网络的最小输入为12*12）就停止缩放。然后将经缩放后各个
尺寸的图像依次输入P-net中，得到分类置信度和位置偏移量，然后通过分类
阈值筛选，再经过坐标反算得到候选框在原图的位置。最后经过nms去除部分
重叠的框（候选框的去重操作也可以在图像金字塔的内部做，这样可以增加
候选框的个数，避免漏框，但会增加推理时间）。

注：在原图中横向为w(即下面的x),纵向为h(y),但在特征图中对应的 N、C、H、W，故要想恢复特征图在原图中对应的x坐标，则需取特征图中的最后一维W，同理要恢复y坐标，需取特征图的倒数第二维H。

---------------------P_net侦测代码如下--------------------------

def box(indexs, off_set, cond, scale, side_len=12, stride=2):
 #########反算代码############
    reture_box = []
    #需要注意的是在此的步长等于网络层步长之积（直接把P网络当作一个卷积核为12*12，步长为2的卷积比较好理解）
    x1 = (indexs[:, 1] * stride).detach().numpy() / scale
    y1 = (indexs[:, 0] * stride).detach().numpy() / scale
    x2 = (indexs[:, 1] * stride + side_len).detach().numpy() / scale
    y2 = (indexs[:, 0] * stride + side_len).detach().numpy() / scale
    off_set = off_set.detach().numpy()
    cond = cond.detach().numpy()

        w = x2 - x1
        h = y2 - y1
        offset = off_set[:, indexs[:, 0], indexs[:, 1]]

        _x1 = x1 + w * offset[0]
        _y1 = y1 + h * offset[1]
        _x2 = x2 + w * offset[2]
        _y2 = y2 + h * offset[3]

        for i in range(_x1.shape[0]):
            reture_box.append([_x1[i], _y1[i], _x2[i], _y2[i], cond[i]])
        return reture_box
 
 transft = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize([0.5], [0.5])
    ])

 def p_detect(self, image):
	 ###P-net侦测########
        boxes = []
        w, h = image.size
        min_stide = min(w, h)
        scale = 1.
        while min_stide > 12:
            transf_img = transft(image)#图像预处理
            if self.isCuda:
                transf_img = transf_img.cuda()
            transf_img.unsqueeze_(0)
            with torch.no_grad():
                _cond, _off_set = self.p_net(transf_img)
            cond = _cond[0][0].cpu()
            off_set = _off_set[0].cpu()

            indexs = torch.nonzero(torch.gt(cond, 0.92))
            box = box(indexs, off_set, cond[indexs[:, 0], indexs[:, 1]], scale)##调用上述反算函数
            scale *= 0.7
            _w = int(scale * w)
            _h = int(scale * h)
            image = image.resize((_w, _h))
            min_stide = min(_w, _h)
            boxes.extend(box)
        return tool.nms(np.array(boxes), 0.2)

2、R_net

根据P_net的输出结果，将所有候选框转换为正方形(不产生形变同时保留更多脸部框
周围细节)。并依次在原图中裁出来，缩放为24*24大小后送入R_net，经分类置信度筛
选后，反算（不同于P_net,此处的反算是相对P-net输出的候选框的，而P_net的反算是  
相对原图的）得到剩余的候选框，并通nms去除重复的框。

---------------------R_net侦测代码如下--------------------------

def r_detect(p_box, image):
    r_data_box = []
    pnet_box = tool.convert_sqaure(p_box)
    for box in pnet_box:
        x1 = int(box[0])
        y1 = int(box[1])
        x2 = int(box[2])
        y2 = int(box[3])
        img = image.crop((x1, y1, x2, y2))

        img = img.resize((24, 24))
        img_transft = transft(img)
        r_data_box.append(img_transft)
    _r_data_box = torch.stack(r_data_box)
    if self.isCuda:
        _r_data_box = _r_data_box.cuda()
    cond, off_set = r_net(_r_data_box)
    r_boxes = []
    cond = cond.cpu().detach().numpy()
    off_set = off_set.cpu().detach().numpy()

    idxs, _ = np.where(cond > 0.99)
    for idx in idxs:
        _boxes = pnet_box[idx]
        _x1 = int(_boxes[0])
        _y1 = int(_boxes[1])
        _x2 = int(_boxes[2])
        _y2 = int(_boxes[3])

        w = _x2 - _x1
        h = _y2 - _y1

        x1 = _x1 + w * off_set[idx][0]
        y1 = _y1 + h * off_set[idx][1]
        x2 = _x2 + w * off_set[idx][2]
        y2 = _y2 + h * off_set[idx][3]
        r_boxes.append([x1, y1, x2, y2, cond[idx][0]])
    return tool.nms(np.array(r_boxes), 0.2)

3、O_net

O-net的推理与R_net类似，只是在做nms时它去除的是大框套小框的情况

---------------------O-net侦测代码如下--------------------------

 def o_detect(self, r_box, image):
    o_data_box = []
    rnet_boxes = tool.convert_sqaure(r_box)
    for _box in rnet_boxes:
        x1 = int(_box[0])
        y1 = int(_box[1])
        x2 = int(_box[2])
        y2 = int(_box[3])

        img = image.crop((x1, y1, x2, y2))
        img = img.resize((48, 48))
        img_transft = transft(img)
        o_data_box.append(img_transft)
    _o_data_box = torch.stack(o_data_box)
    if self.isCuda:
        _o_data_box = _o_data_box.cuda()
    cond, off_set= o_net(_o_data_box)
    cond = cond.cpu().detach().numpy()
    off_set = off_set.cpu().detach().numpy()
    indexs, _ = np.where(cond > 0.9)#该处的阈值根据情况去调
    onet_box = []
    for idx in indexs:
        box = rnet_boxes[idx]
        _x1 = int(box[0])
        _y1 = int(box[1])
        _x2 = int(box[2])
        _y2 = int(box[3])

        h = _y2 - _y1
        w = _x2 - _x1
        x0 = int(0.5 * (_x2 + _x1))
        y0 = int(0.5 * (_y2 + _y1))

        x1 = _x1 + w * off_set[idx][0]
        y1 = _y1 + h * off_set[idx][1]
        x2 = _x2 + w * off_set[idx][2]
        y2 = _y2 + h * off_set[idx][3]

        onet_box.append([x1, y1, x2, y2, cond[idx][0]])

    return tool.nms(np.array(onet_box), 0.2,contain_box=True)

4、侦测工具

在侦测阶段，还需要用到三个常用的工具nms,convert_sqaure,iou

##########iou###########

def iou(box,boxes,contain_box=False):
    area=(box[2]-box[0])*(box[3]-box[1])
    areas=(boxes[:,2]-boxes[:,0])*(boxes[:,3]-boxes[:,1])
    x1=np.maximum(box[0],boxes[:,0])
    y1=np.maximum(box[1],boxes[:,1])
    x2=np.minimum(box[2],boxes[:,2])
    y2=np.minimum(box[3],boxes[:,3])
    h=np.maximum(x2-x1,0)
    w=np.maximum(y2-y1,0)
    inter=h*w
    if contain_box:
        over=np.true_divide(inter,np.minimum(area,areas))
    else:
        over=np.true_divide(inter,area+areas-inter)
    return over

################nms##########

def nms(boxes,thresh,contain_box=False):
    if boxes.shape[0]==0:
        return np.array([])
    boxs=boxes[(-boxes[:,4]).argsort()]
    box=[]
    while boxs.shape[0]>1:
        a_box=boxs[0]
        box.append(a_box)
        b_box=boxs[1:]
        boxs=b_box[np.where(iou(a_box,b_box,contain_box)<thresh)]
    if boxs.shape[0]==1:
        box.append(boxs[0])

    return np.stack(box)

##########convert_sqaure##########

def convert_sqaure(boxes):
    sqaure_box=boxes.copy()
    if boxes.shape[0]==0:
        return np.array([])
    else:
        x1=boxes[:,0]
        y1=boxes[:,1]
        x2=boxes[:,2]
        y2=boxes[:,3]
        w=x2-x1
        h=y2-y1
        sqaure_box[:,0]=boxes[:,0]-0.5*np.maximum(w,h)+0.5*w
        sqaure_box[:,1]=boxes[:,1]-0.5*np.maximum(w,h)+0.5*h
        sqaure_box[:,2]=boxes[:,2]+0.5*np.maximum(w,h)+0.5*w
        sqaure_box[:,3]=boxes[:,3]+0.5*np.maximum(w,h)+0.5*h
        return sqaure_box

二：训练数据

在训练之前，我们需先制作好训练数据集
在此需要三种数据：正样本、负样本、部分样本
Positive face数据：图片左上右下坐标和label的IOU>0.7的图片
part face数据：图片左上右下坐标和label的0.7>IOU>0.3的图片
negative face 数据：图片左上右下坐标和lable的IOU<0.3的图片
在制作标签时，正样本类别标作1，部分样本标作2，负样本标作0（可以另设，主要是为了便于提取不同类别做损失），其中正负样本用于训练是否为人脸，正样本和部分样本用于回归坐标，为了格式统一，负样本的偏移全部记为0即可。所谓的偏移都是针对候选框的，即针对的传入网络的图片在原图片上的绝对位置，例如：标签坐标为（x1,y1,x2,y2），传入网络的图片在原图上的绝对坐标为（x1’,y1’,x2’,y2’）,则偏移量为【(x1-x1’)/|x2’-x1’|,(y1-y1’)/|y2’-y1’|,(x2-x2’)/|x2’-x1’|,(y2-y2’)/|y2’-y1’|】
注：各样本分割的界限要看具体数据而言，要基本保证正样本中能看到较完整的目标，部分样本中不能有完整的目标，同时也不能没有目标（否则回归框训练不准），负样本中不能看见目标

三：训练阶段

讲完侦测阶段，再来看训练阶段，三个网络的任务都是一样的，输出分类标签以及位置偏移。三个网络在训练阶段是互不相关的，可以独立训练，且精度逐渐提升。
在这里插入图片描述
P_net、R_net、O_net的输入分别是12123，24243，48483的图片。由原标签宽高经过少许缩放，然后以原图片标签中心上下左右平移然后缩放到固定尺寸所得，

		for i in range(len(label_line)):
			#真实标签坐标
	        x1=float(label_line[i].split()[1].strip())
            y1=float(label_line[i].split()[2].strip())
            w=float(label_line[i].split()[3].strip())
            h=float(label_line[i].split()[4].strip())
            #真实标签坐标中心
            x0 = (x1 + x2) / 2
        	y0 = (y1 + y2) / 2
        	#现坐标中心
	   		x_mid_off=random.randint(int(x0-0.2*w),int(x0+0.2*w))
            y_mid_off=random.randint(int(y0-0.2*h),int(y0+0.2*h))
            #现边长
            stide=random.randint(int(0.9*min(w,h)),int(1.1*max(w,h)))
            #现坐标
            x_1=np.maximum(x_mid_off-0.5*stide,0)
            y_1=np.maximum(y_mid_off-0.5*stide,0)
            x_2=np.maximum(x_mid_off+0.5*stide,0)
            y_2=np.maximum(y_mid_off+0.5*stide,0)
            #真实标签相对于现坐标（候选框）的偏移量，即我们训练的目标
            x1_off = (x1 - x_1) / stide
            y1_off = (y1 - y_1) / stide
            x2_off = (x2 - x_2) / stide
            y2_off = (y2 - y_2) / stide

四：mtcnn的缺点

1：虚景比较高（许多不是人脸的被认为人脸），主要原因是网络比较浅，对图像融合不够，当然也与数据集有一定的关系。所以后面常接一个比较深的网络做识别（一般没有单独侦测的需求，都是需要做识别的，如果有，可以在O网络后面再接一个深一点的网络）。
2：由于建议框都是正方形，对于非正方形的目标侦测能力不够。

sc9876543210

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
1
评论
多任务级联神经网络-mtcnn

如上图所示，整个Mtcnn的推理过程分为以下几个步骤：1、对原图进行图像特征金字塔处理（一般设置缩放因子为0.7左右合适，太大增加推理时间，太小可能漏小框，具体还需根据实际情况处理），直到最小边小于12（P-net网络的最小输入为12*12）就停止缩放。然后将经缩放后各个尺寸的图像依次输入P-net中，得到分类置信度和位置偏移量，然后通过分类阈值筛选，再经过坐标反算得到候选框在原图的位置。最后经过nms去除部分重叠的框（候选框的去重操作也可以在图像金字塔的内部做，这样可以增加候选框的个数，避免漏框，但会.
复制链接

扫一扫