Paddle单阶段口罩检测

Paddle官方有一个口罩检测的示例,但是其是两阶段的,首先要检测出人脸,再将裁剪下来的人脸进行二分类.

facemask曾基于YOLO V2实现过单阶段口罩检测,但是速度在移动端不是很快,仅能用于服务端部署,且由于样本缺乏,实测精度也不是很高.

FaceMaskDetection是由AIZOO开源的小网络口罩检测模型,输入260x260大小,主干网络只有8层,有五个定位和分类层,一共只有28个卷积层。而每个卷积层的通道数,是32、64、128这三种,所有这个模型总的参数量只有101.5万个参数。实测速度非常快,在CPU上就能实时,而且开源了网络结构,还能进一步压缩耗时,其提供了caffe、pytorch、tensorflow、onnx和mxnet等近乎全平台的支持,美中不足的是唯独缺了paddle, 好在经过不懈努力终于补上了这个遗憾,代码见FaceMaskDetection

本文选择了最简单的caffe转paddle方案,转换非常简单,pip install x2paddle后直接运行下面的命令即可

x2paddle --framework=caffe --prototxt=models/face_mask_detection.prototxt --weight=models/face_mask_detection.caffemodel --save_dir=./ --params_merge
    mv inference_model models/paddle

接下来就是改造预测部分,paddle有zero-copy和复制数据两种方式,zero-copy把数据指针直接传给网络输入,省去了一次拷贝过程,因此速度更快些.别的需要注意的就是预处理部分了, 数据要RGB格式,转换到0-1的浮点数,resize到260x260大小,具体见paddle_infer.py

        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        height, width, _ = img.shape
        image_resized = cv2.resize(img, target_shape)
        image_np = image_resized / 255.0
        image_np = image_np.transpose(2,0,1)
        img = np.expand_dims(image_np,axis=0).copy()
        img = img.astype("float32")
        input_names = predictor.get_input_names()
        input_tensor = predictor.get_input_tensor(input_names[0])
        input_tensor.copy_from_cpu(img)
        predictor.zero_copy_run()
        output_names = predictor.get_output_names()
        y_bboxes_output = predictor.get_output_tensor(output_names[0])
        y_cls_output = predictor.get_output_tensor(output_names[1])
        y_bboxes_output = y_bboxes_output.copy_to_cpu()
        y_cls_output = y_cls_output.copy_to_cpu()

使用深度学习 计算模型中每层参数的个数和FLOPs分析可知模型参数量为1M,计算量为759M,还有有点偏大,应重点优化conv2_1和conv2_2,计算热点如下所示:

face_mask_detection.prototxt
layer name           Filter Shape     Output Size      Params   Flops        Ratio
conv2d_0             (32, 3, 3, 3)    (1, 32, 260, 260) 864      58406400     7.693%
conv2d_1             (64, 32, 3, 3)   (1, 64, 130, 130) 18432    311500800    41.029%
conv2d_2             (64, 64, 3, 3)   (1, 64, 65, 65)  36864    155750400    20.514%
conv2d_3             (64, 64, 3, 3)   (1, 64, 33, 33)  36864    40144896     5.288%
conv2d_4             (128, 64, 3, 3)  (1, 128, 17, 17) 73728    21307392     2.806%
conv2d_5             (128, 128, 3, 3) (1, 128, 9, 9)   147456   11943936     1.573%
conv2d_6             (64, 128, 3, 3)  (1, 64, 5, 5)    73728    1843200      0.243%
conv2d_7             (64, 64, 3, 3)   (1, 64, 3, 3)    36864    331776       0.044%
cls_0_insert_conv2d  (64, 64, 3, 3)   (1, 64, 33, 33)  36864    40144896     5.288%
cls_1_insert_conv2d  (64, 128, 3, 3)  (1, 64, 17, 17)  73728    21307392     2.806%
cls_2_insert_conv2d  (64, 128, 3, 3)  (1, 64, 9, 9)    73728    5971968      0.787%
cls_3_insert_conv2d  (64, 64, 3, 3)   (1, 64, 5, 5)    36864    921600       0.121%
cls_4_insert_conv2d  (64, 64, 3, 3)   (1, 64, 3, 3)    36864    331776       0.044%
loc_0_insert_conv2d  (64, 64, 3, 3)   (1, 64, 33, 33)  36864    40144896     5.288%
loc_1_insert_conv2d  (64, 128, 3, 3)  (1, 64, 17, 17)  73728    21307392     2.806%
loc_2_insert_conv2d  (64, 128, 3, 3)  (1, 64, 9, 9)    73728    5971968      0.787%
loc_3_insert_conv2d  (64, 64, 3, 3)   (1, 64, 5, 5)    36864    921600       0.121%
loc_4_insert_conv2d  (64, 64, 3, 3)   (1, 64, 3, 3)    36864    331776       0.044%
cls_0_conv           (8, 64, 3, 3)    (1, 8, 33, 33)   4608     5018112      0.661%
cls_1_conv           (8, 64, 3, 3)    (1, 8, 17, 17)   4608     1331712      0.175%
cls_2_conv           (8, 64, 3, 3)    (1, 8, 9, 9)     4608     373248       0.049%
cls_3_conv           (8, 64, 3, 3)    (1, 8, 5, 5)     4608     115200       0.015%
cls_4_conv           (8, 64, 3, 3)    (1, 8, 3, 3)     4608     41472        0.005%
loc_0_conv           (16, 64, 3, 3)   (1, 16, 33, 33)  9216     10036224     1.322%
loc_1_conv           (16, 64, 3, 3)   (1, 16, 17, 17)  9216     2663424      0.351%
loc_2_conv           (16, 64, 3, 3)   (1, 16, 9, 9)    9216     746496       0.098%
loc_3_conv           (16, 64, 3, 3)   (1, 16, 5, 5)    9216     230400       0.03%
loc_4_conv           (16, 64, 3, 3)   (1, 16, 3, 3)    9216     82944        0.011%
Layers num: 28
Total number of parameters:  1010016
Total number of FLOPs:  759223296

作者还提供了其自己整理的数据,百度网盘(提取码: eyfz),分布如表所示

来自

WIDER Face

来自

MAFA

共计
训练集/张311430066120
验证集/张78010591839

写了个脚本可视化标注

import cv2
import os
import xml.etree.ElementTree as ET
from tqdm import tqdm

def show(split="train",togt=True):
    files = os.listdir(split)
    files = [f for f in files if f.endswith("jpg")]
    for file in tqdm(files):
        img = cv2.imread(split+"/"+file)
        xml_path = split+"/"+file.replace("jpg","xml")
        tree = ET.parse(xml_path)
        root = tree.getroot()
        for obj in root.findall('object'):
            name = obj.find('name').text
            bbox = obj.find('bndbox')
            xmin = int(bbox.find('xmin').text)
            ymin = int(bbox.find('ymin').text)
            xmax = int(bbox.find('xmax').text)
            ymax = int(bbox.find('ymax').text)
            cv2.rectangle(img,(xmin,ymin),(xmax,ymax),(255,0,0))
            cv2.putText(img,name,(xmin,ymin),1,1,(0,0,255))
        #cv2.imshow("img",img)
        #cv2.waitKey()
        cv2.imwrite("gt/"+file,img)

show()

可以看到覆盖了很多角度,即使手遮挡也不会认为戴口罩.

训练的话原作者并没有提供,可以在ssd-models找到.

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值