Paddle单阶段口罩检测

迷若烟雨

已于 2023-08-15 10:19:11 修改

阅读量2.1k

点赞数 1

文章标签： paddle

于 2020-11-21 11:32:31 首次发布

本文链接：https://blog.csdn.net/minstyrain/article/details/83149124

版权

深度学习专栏收录该内容

16 篇文章 17 订阅

订阅专栏

Paddle官方有一个口罩检测的示例，但是其是两阶段的，首先要检测出人脸，再将裁剪下来的人脸进行二分类.

facemask曾基于YOLO V2实现过单阶段口罩检测，但是速度在移动端不是很快，仅能用于服务端部署，且由于样本缺乏，实测精度也不是很高.

FaceMaskDetection是由AIZOO开源的小网络口罩检测模型，输入260x260大小，主干网络只有8层，有五个定位和分类层，一共只有28个卷积层。而每个卷积层的通道数，是32、64、128这三种，所有这个模型总的参数量只有101.5万个参数。实测速度非常快，在CPU上就能实时，而且开源了网络结构，还能进一步压缩耗时，其提供了caffe、pytorch、tensorflow、onnx和mxnet等近乎全平台的支持，美中不足的是唯独缺了paddle, 好在经过不懈努力终于补上了这个遗憾，代码见FaceMaskDetection

本文选择了最简单的caffe转paddle方案，转换非常简单，pip install x2paddle后直接运行下面的命令即可

x2paddle --framework=caffe --prototxt=models/face_mask_detection.prototxt --weight=models/face_mask_detection.caffemodel --save_dir=./ --params_merge
    mv inference_model models/paddle

接下来就是改造预测部分，paddle有zero-copy和复制数据两种方式，zero-copy把数据指针直接传给网络输入，省去了一次拷贝过程，因此速度更快些.别的需要注意的就是预处理部分了, 数据要RGB格式，转换到0-1的浮点数，resize到260x260大小，具体见paddle_infer.py

        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        height, width, _ = img.shape
        image_resized = cv2.resize(img, target_shape)
        image_np = image_resized / 255.0
        image_np = image_np.transpose(2,0,1)
        img = np.expand_dims(image_np,axis=0).copy()
        img = img.astype("float32")
        input_names = predictor.get_input_names()
        input_tensor = predictor.get_input_tensor(input_names[0])
        input_tensor.copy_from_cpu(img)
        predictor.zero_copy_run()
        output_names = predictor.get_output_names()
        y_bboxes_output = predictor.get_output_tensor(output_names[0])
        y_cls_output = predictor.get_output_tensor(output_names[1])
        y_bboxes_output = y_bboxes_output.copy_to_cpu()
        y_cls_output = y_cls_output.copy_to_cpu()

使用深度学习计算模型中每层参数的个数和FLOPs分析可知模型参数量为1M，计算量为759M，还有有点偏大，应重点优化conv2_1和conv2_2，计算热点如下所示:

face_mask_detection.prototxt
layer name           Filter Shape     Output Size      Params   Flops        Ratio
conv2d_0             (32, 3, 3, 3)    (1, 32, 260, 260) 864      58406400     7.693%
conv2d_1             (64, 32, 3, 3)   (1, 64, 130, 130) 18432    311500800    41.029%
conv2d_2             (64, 64, 3, 3)   (1, 64, 65, 65)  36864    155750400    20.514%
conv2d_3             (64, 64, 3, 3)   (1, 64, 33, 33)  36864    40144896     5.288%
conv2d_4             (128, 64, 3, 3)  (1, 128, 17, 17) 73728    21307392     2.806%
conv2d_5             (128, 128, 3, 3) (1, 128, 9, 9)   147456   11943936     1.573%
conv2d_6             (64, 128, 3, 3)  (1, 64, 5, 5)    73728    1843200      0.243%
conv2d_7             (64, 64, 3, 3)   (1, 64, 3, 3)    36864    331776       0.044%
cls_0_insert_conv2d  (64, 64, 3, 3)   (1, 64, 33, 33)  36864    40144896     5.288%
cls_1_insert_conv2d  (64, 128, 3, 3)  (1, 64, 17, 17)  73728    21307392     2.806%
cls_2_insert_conv2d  (64, 128, 3, 3)  (1, 64, 9, 9)    73728    5971968      0.787%
cls_3_insert_conv2d  (64, 64, 3, 3)   (1, 64, 5, 5)    36864    921600       0.121%
cls_4_insert_conv2d  (64, 64, 3, 3)   (1, 64, 3, 3)    36864    331776       0.044%
loc_0_insert_conv2d  (64, 64, 3, 3)   (1, 64, 33, 33)  36864    40144896     5.288%
loc_1_insert_conv2d  (64, 128, 3, 3)  (1, 64, 17, 17)  73728    21307392     2.806%
loc_2_insert_conv2d  (64, 128, 3, 3)  (1, 64, 9, 9)    73728    5971968      0.787%
loc_3_insert_conv2d  (64, 64, 3, 3)   (1, 64, 5, 5)    36864    921600       0.121%
loc_4_insert_conv2d  (64, 64, 3, 3)   (1, 64, 3, 3)    36864    331776       0.044%
cls_0_conv           (8, 64, 3, 3)    (1, 8, 33, 33)   4608     5018112      0.661%
cls_1_conv           (8, 64, 3, 3)    (1, 8, 17, 17)   4608     1331712      0.175%
cls_2_conv           (8, 64, 3, 3)    (1, 8, 9, 9)     4608     373248       0.049%
cls_3_conv           (8, 64, 3, 3)    (1, 8, 5, 5)     4608     115200       0.015%
cls_4_conv           (8, 64, 3, 3)    (1, 8, 3, 3)     4608     41472        0.005%
loc_0_conv           (16, 64, 3, 3)   (1, 16, 33, 33)  9216     10036224     1.322%
loc_1_conv           (16, 64, 3, 3)   (1, 16, 17, 17)  9216     2663424      0.351%
loc_2_conv           (16, 64, 3, 3)   (1, 16, 9, 9)    9216     746496       0.098%
loc_3_conv           (16, 64, 3, 3)   (1, 16, 5, 5)    9216     230400       0.03%
loc_4_conv           (16, 64, 3, 3)   (1, 16, 3, 3)    9216     82944        0.011%
Layers num: 28
Total number of parameters:  1010016
Total number of FLOPs:  759223296

作者还提供了其自己整理的数据，百度网盘(提取码: eyfz),分布如表所示

来自

WIDER Face

来自

MAFA

共计

训练集/张

3114

3006

6120

验证集/张

780

1059

1839

写了个脚本可视化标注

import cv2
import os
import xml.etree.ElementTree as ET
from tqdm import tqdm

def show(split="train",togt=True):
    files = os.listdir(split)
    files = [f for f in files if f.endswith("jpg")]
    for file in tqdm(files):
        img = cv2.imread(split+"/"+file)
        xml_path = split+"/"+file.replace("jpg","xml")
        tree = ET.parse(xml_path)
        root = tree.getroot()
        for obj in root.findall('object'):
            name = obj.find('name').text
            bbox = obj.find('bndbox')
            xmin = int(bbox.find('xmin').text)
            ymin = int(bbox.find('ymin').text)
            xmax = int(bbox.find('xmax').text)
            ymax = int(bbox.find('ymax').text)
            cv2.rectangle(img,(xmin,ymin),(xmax,ymax),(255,0,0))
            cv2.putText(img,name,(xmin,ymin),1,1,(0,0,255))
        #cv2.imshow("img",img)
        #cv2.waitKey()
        cv2.imwrite("gt/"+file,img)

show()

可以看到覆盖了很多角度，即使手遮挡也不会认为戴口罩.

训练的话原作者并没有提供，可以在ssd-models找到.

迷若烟雨

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
Paddle单阶段口罩检测

Paddle官方有一个口罩检测的示例，但是其是两阶段的，首先要检测出人脸，再将裁剪下来的人脸进行二分类.facemask曾基于YOLO V2实现过单阶段口罩检测，但是速度在移动端不是很快，仅能用于服务端部署，且由于样本缺乏，实测精度也不是很高.FaceMaskDetection 实测速度非常快，在CPU上就能实时，而且开源了网络结构，还能进一步压缩耗时，其提供了caffe、pytorch、tensorflow、onnx和mxnet等近乎全平台的支持，美中不足的是唯独缺了paddle
复制链接

扫一扫