PaddleOCR 文字检测部分源码学习(4)-模型(2)

本文链接：https://blog.csdn.net/shy2218/article/details/121465110

2021SC@SDUSC
骨干网络
代码位置：ppocr->modeling->backbones->det_mobilenet_v3.py

def make_divisible(v, divisor=8, min_value=None):
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

这各函数的作用是使可除，相当于对数据的一个预处理操作。

#init函数的片段
        if model_name == "large":
            cfg = [
                # k, exp, c,  se,     nl,  s,
                [3, 16, 16, False, 'relu', 1],
                [3, 64, 24, False, 'relu', 2],
                [3, 72, 24, False, 'relu', 1],
                [5, 72, 40, True, 'relu', 2],
                [5, 120, 40, True, 'relu', 1],
                [5, 120, 40, True, 'relu', 1],
                [3, 240, 80, False, 'hardswish', 2],
                [3, 200, 80, False, 'hardswish', 1],
                [3, 184, 80, False, 'hardswish', 1],
                [3, 184, 80, False, 'hardswish', 1],
                [3, 480, 112, True, 'hardswish', 1],
                [3, 672, 112, True, 'hardswish', 1],
                [5, 672, 160, True, 'hardswish', 2],
                [5, 960, 160, True, 'hardswish', 1],
                [5, 960, 160, True, 'hardswish', 1],
            ]
            cls_ch_squeeze = 960
        elif model_name == "small":
            cfg = [
                # k, exp, c,  se,     nl,  s,
                [3, 16, 16, True, 'relu', 2],
                [3, 72, 24, False, 'relu', 2],
                [3, 88, 24, False, 'relu', 1],
                [5, 96, 40, True, 'hardswish', 2],
                [5, 240, 40, True, 'hardswish', 1],
                [5, 240, 40, True, 'hardswish', 1],
                [5, 120, 48, True, 'hardswish', 1],
                [5, 144, 48, True, 'hardswish', 1],
                [5, 288, 96, True, 'hardswish', 2],
                [5, 576, 96, True, 'hardswish', 1],
                [5, 576, 96, True, 'hardswish', 1],
            ]
            cls_ch_squeeze = 576
        else:
            raise NotImplementedError("mode[" + model_name +
                                      "_model] is not implemented!")

这里是对参数进行赋值，根据选择的参数进行赋值。
这是MobileNetV3-Large网络结构
在这里插入图片描述
MobileNetV3-Small

上面两张图是MobileNetV2和MobileNetV3的网络块结构。可以看出，MobileNetV3是综合了以下三种模型的思想：MobileNetV1的深度可分离卷积（depthwise separable convolutions）、MobileNetV2的具有线性瓶颈的逆残差结构(the inverted residual with linear bottleneck)和MnasNet的基于squeeze and excitation结构的轻量级注意力模型。
在这里插入图片描述
MobileNetV2模型中反转残差结构和变量利用了11卷积来构建最后层，以便于拓展到高维的特征空间，虽然对于提取丰富特征进行预测十分重要，但却引入了二外的计算开销与延时。为了在保留高维特征的前提下减小延时，将平均池化前的层移除并用11卷积来计算特征图。特征生成层被移除后，先前用于瓶颈映射的层也不再需要了，这将为减少10ms的开销，在提速15%的同时减小了30m的操作数。
h-swish激活函数，计算公式如下：
在这里插入图片描述
作者发现swish激活函数能够有效提高网络的精度，然而，swish的计算量太大了。作者提出h-swish（hard version of swish）如下所示