YOLOv5s剪枝+量化+安卓部署学习记录───在剪枝模型的基础上量化

zswzswzsw233

已于 2023-06-11 14:46:00 修改

阅读量3.2k

点赞数 5

文章标签： YOLO 剪枝学习

于 2023-06-09 17:31:09 首次发布

本文链接：https://blog.csdn.net/zswzswzsw233/article/details/131127822

版权

二、模型量化

之前我们已经完成了模型的剪枝工作，现在继续对这个剪枝后的模型进行量化。

使用的代码还是参考了https://github.com/midasklr/yolov5prune/tree/v6.0，量化的代码在export.py中。选择输出模型的格式为tflite，以便部署在安卓上。TensorFlow Lite 是 TensorFlow 在移动和 IoT 等边缘设备端的解决方案，提供了 Java、Python 和 C++ API 库，可以运行在 Android、iOS 和 Raspberry Pi 等设备上。

量化选择动态量化，动态量化的过程发生在模型训练后，针对模型权重采取量化，之后会在模型预测过程中，再决定是否针对激活值采取量化，因此称作动态量化（在预测时可能发生量化）。关于动态量化的官方指南：

为了实现动态量化，需要注释掉export.py中export_tflite的这一句：

converter.target_spec.supported_types = [tf.float16]

export_tflite函数代码如下，int8参数是false，所以if的内容不执行（如果int8为true是全整型量化）：

def export_tflite(keras_model, im, file, int8, data, ncalib, prefix=colorstr('TensorFlow Lite:')):
    # YOLOv5 TensorFlow Lite export
    try:
        import tensorflow as tf

        LOGGER.info(f'\n{prefix} starting export with tensorflow {tf.__version__}...')
        batch_size, ch, *imgsz = list(im.shape)  # BCHW
        f = str(file).replace('.pt', '-fp16.tflite')

        #读取keras模型
        converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)
        #使用TensorFlow Lite内置操作转换模型
        converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
        #指定模型量化类型为int16
        # converter.target_spec.supported_types = [tf.float16]
        #指定优化器
        converter.optimizations = [tf.lite.Optimize.DEFAULT]
        if int8:
            from models.tf import representative_dataset_gen
            dataset = LoadImages(check_dataset(data)['train'], img_size=imgsz, auto=False)  # representative data
            converter.representative_dataset = lambda: representative_dataset_gen(dataset, ncalib)
            converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
            converter.target_spec.supported_types = []
            #指定输入和输出类型是int8
            converter.inference_input_type = tf.uint8  # or tf.int8
            converter.inference_output_type = tf.uint8  # or tf.int8
            converter.experimental_new_quantizer = False
            f = str(file).replace('.pt', '-int8.tflite')

        tflite_model = converter.convert()
        open(f, "wb").write(tflite_model)
        LOGGER.info(f'{prefix} export success, saved as {f} ({file_size(f):.1f} MB)')
        return f
    except Exception as e:
        LOGGER.info(f'\n{prefix} export failure: {e}')

export.py中的参数设置如下，模型文件是剪枝后的last.pt，为了降低运算量，输入图片尺寸改成了320，--include处default改成tflite。

def parse_opt():
    parser = argparse.ArgumentParser()
    parser.add_argument('--data', type=str, default=ROOT / 'Traffic_Lights_Dataset_Domestic/Traffic_Lights_Dataset_Domestic/Traffic_Lights_Dataset_Domestic.yaml', help='dataset.yaml path')
    parser.add_argument('--weights', nargs='+', type=str, default=ROOT / 'last.pt', help='model.pt path(s)')
    parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[320, 320], help='image (h, w)')
    parser.add_argument('--batch-size', type=int, default=1, help='batch size')
    parser.add_argument('--device', default='cpu', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--half', action='store_true', help='FP16 half-precision export')
    parser.add_argument('--inplace', action='store_true', help='set YOLOv5 Detect() inplace=True')
    parser.add_argument('--train', action='store_true', help='model.train() mode')
    parser.add_argument('--optimize', action='store_true', help='TorchScript: optimize for mobile')
    parser.add_argument('--int8', action='store_true', help='CoreML/TF INT8 quantization')
    parser.add_argument('--dynamic', action='store_true', help='ONNX/TF: dynamic axes')
    parser.add_argument('--simplify', action='store_true', help='ONNX: simplify model')
    parser.add_argument('--opset', type=int, default=12, help='ONNX: opset version')
    parser.add_argument('--verbose', action='store_true', help='TensorRT: verbose log')
    parser.add_argument('--workspace', type=int, default=4, help='TensorRT: workspace size (GB)')
    parser.add_argument('--nms', action='store_true', help='TF: add NMS to model')
    parser.add_argument('--agnostic-nms', action='store_true', help='TF: add agnostic NMS to model')
    parser.add_argument('--topk-per-class', type=int, default=100, help='TF.js NMS: topk per class to keep')
    parser.add_argument('--topk-all', type=int, default=100, help='TF.js NMS: topk for all classes to keep')
    parser.add_argument('--iou-thres', type=float, default=0.45, help='TF.js NMS: IoU threshold')
    parser.add_argument('--conf-thres', type=float, default=0.25, help='TF.js NMS: confidence threshold')
    parser.add_argument('--include', nargs='+',
                        default=['tflite'],
                        help='torchscript, onnx, openvino, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs')
    opt = parser.parse_args()
    print_args(FILE.stem, opt)
    return opt

运行export.py，报错😵，'NoneType' object has no attribute 'call'：

经过多次调试，终于定位到错误的地方export.py->run->export_saved_model->tf_model=TFModel处，TFModel是定义在tf.py中，出错的函数是parse_model。已经大概知道为什么报错了，tf.py中的parse_model作用是通过yaml解析模型，并转换成keras模型（注意和yolo.py中的parse_model函数不同，那里输出的是pytorch模型），在此基础上输出为tflite。这里原代码解析的是原始的模型而不是经过我们剪枝后的模型，所以不能直接使用原来的parse_model函数。经过进一步调试，终于发现了错误的根本原因：

这里m_str正常应该是遍历到的当前模块的名称，应该是一个字符串类型，使用剪枝后的模型量化调试到这里发现m_str格式居然是一个class类。出错在下面这一句，功能是去掉m_str中的'nn.'关键字，加上TF转换成tensorflow模块：

tf_m = eval('TF' + m_str.replace('nn.', ''))

因为这里m_str都不是一个字符串，所以replace会出错。之所以m_str格式不对，是因为在剪枝时重新生成yaml时各个模块是定义的类而不是字符串：

解决方法：

不能使用原来的parse_model函数进行模型解析了，好在剪枝时定义ModelPruned这个类时使用了parse_pruned_model这个函数，可以解析剪枝后的模型。经过一些修改后放在tf.py中：

def parse_pruned_model(maskbndict, d, ch, model, imgsz):  # model_dict, input_channels(3)
    LOGGER.info(f"\n{'':>3}{'from':>18}{'n':>3}{'params':>10}  {'module':<40}{'arguments':<30}")
    anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple']
    na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors  # number of anchors
    no = na * (nc + 5)  # number of outputs = anchors * (classes + 5)
    ch = [3]
    fromlayer = []  # last module bn layer name
    from_to_map = {}
    layers, save, c2 = [], [], ch[-1]  # layers, savelist, ch out
    for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']):  # from, number, module, args
        m = eval(m) if isinstance(m, str) else m  # eval strings
        for j, a in enumerate(args):
            try:
                args[j] = eval(a) if isinstance(a, str) else a  # eval strings
            except NameError:
                pass

        n = n_ = max(round(n * gd), 1) if n > 1 else n  # depth gain
        named_m_base = "model.{}".format(i)
        if m in [Conv]:
            named_m_bn = named_m_base + ".bn"

            bnc = int(maskbndict[named_m_bn].sum())
            c1, c2 = ch[f], bnc
            args = [c1, c2, *args[1:]]
            layertmp = named_m_bn
            if i>0:
                from_to_map[layertmp] = fromlayer[f]
            fromlayer.append(named_m_bn)
            m_str = 'Conv'#手动设置m_str

        elif m in [C3Pruned]:
            named_m_cv1_bn = named_m_base + ".cv1.bn"
            named_m_cv2_bn = named_m_base + ".cv2.bn"
            named_m_cv3_bn = named_m_base + ".cv3.bn"
            from_to_map[named_m_cv1_bn] = fromlayer[f]
            from_to_map[named_m_cv2_bn] = fromlayer[f]
            fromlayer.append(named_m_cv3_bn)

            cv1in = ch[f]
            cv1out = int(maskbndict[named_m_cv1_bn].sum())
            cv2out = int(maskbndict[named_m_cv2_bn].sum())
            cv3out = int(maskbndict[named_m_cv3_bn].sum())
            args = [cv1in, cv1out, cv2out, cv3out, n, args[-1]]
            bottle_args = []
            chin = [cv1out]

            c3fromlayer = [named_m_cv1_bn]
            for p in range(n):
                named_m_bottle_cv1_bn = named_m_base + ".m.{}.cv1.bn".format(p)
                named_m_bottle_cv2_bn = named_m_base + ".m.{}.cv2.bn".format(p)
                bottle_cv1in = chin[-1]
                bottle_cv1out = int(maskbndict[named_m_bottle_cv1_bn].sum())
                bottle_cv2out = int(maskbndict[named_m_bottle_cv2_bn].sum())
                chin.append(bottle_cv2out)
                bottle_args.append([bottle_cv1in, bottle_cv1out, bottle_cv2out])
                from_to_map[named_m_bottle_cv1_bn] = c3fromlayer[p]
                from_to_map[named_m_bottle_cv2_bn] = named_m_bottle_cv1_bn
                c3fromlayer.append(named_m_bottle_cv2_bn)
            args.insert(4, bottle_args)
            c2 = cv3out
            n = 1
            from_to_map[named_m_cv3_bn] = [c3fromlayer[-1], named_m_cv2_bn]
            m_str='C3Pruned'#手动设置m_str

        elif m in [SPPFPruned]:
            named_m_cv1_bn = named_m_base + ".cv1.bn"
            named_m_cv2_bn = named_m_base + ".cv2.bn"
            cv1in = ch[f]
            from_to_map[named_m_cv1_bn] = fromlayer[f]
            from_to_map[named_m_cv2_bn] = [named_m_cv1_bn]*4
            fromlayer.append(named_m_cv2_bn)
            cv1out = int(maskbndict[named_m_cv1_bn].sum())
            cv2out = int(maskbndict[named_m_cv2_bn].sum())
            args = [cv1in, cv1out, cv2out, *args[1:]]
            c2 = cv2out
            m_str='SPPFPruned'#手动设置m_str

        elif m is nn.BatchNorm2d:
            args = [ch[f]]
        elif m is Concat:
            c2 = sum(ch[x] for x in f)
            inputtmp = [fromlayer[x] for x in f]
            fromlayer.append(inputtmp)
            m_str='Concat'#手动设置m_str

        elif m is Detect:
            from_to_map[named_m_base + ".m.0"] = fromlayer[f[0]]
            from_to_map[named_m_base + ".m.1"] = fromlayer[f[1]]
            from_to_map[named_m_base + ".m.2"] = fromlayer[f[2]]
            args.append([ch[x] for x in f])
            if isinstance(args[1], int):  # number of anchors
                args[1] = [list(range(args[1] * 2))] * len(f)
            args.append(imgsz)
            m_str='Detect'#手动设置m_str

        elif m is Contract:
            c2 = ch[f] * args[0] ** 2
        elif m is Expand:
            c2 = ch[f] // args[0] ** 2
        else:
            c2 = ch[f]
            fromtmp = fromlayer[-1]
            fromlayer.append(fromtmp)
            m_str='Upsample'#手动设置m_str

        tf_m = eval('TF' + m_str.replace('nn.', ''))

        m_ = keras.Sequential([tf_m(*args, w=model.model[i][j]) for j in range(n)]) if n > 1 \
            else tf_m(*args, w=model.model[i])  # module

        torch_m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module
        t = str(m)[8:-2].replace('__main__.', '')  # module type
        np = sum(x.numel() for x in torch_m_.parameters())  # number params
        m_.i, m_.f, m_.type, m_.np = i, f, t, np  # attach index, 'from' index, type, number params
        LOGGER.info(f'{i:>3}{str(f):>18}{str(n):>3}{np:>10}  {t:<40}{str(args):<30}')  # print
        save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1)  # append to savelist
        layers.append(m_)
        if i == 0:
            ch = []
        ch.append(c2)
    return keras.Sequential(layers), sorted(save), from_to_map

由于当前模块m是一个类而不是字符串，所以选择在每种情况下手动设置m_str，以便最后成功转换成tensorflow模型。

接着我们需要在tf.py中把剪枝后的C3Pruned、SPPFPruned和BottleneckPruned模块分别写一个tensorflow版本，命名为TFC3Pruned、TFSPPFPruned和TFBottleneckPruned：

class TFBottleneckPruned(keras.layers.Layer):
    # Pruned bottleneck
    def __init__(self, cv1in, cv1out, cv2out, shortcut=True, g=1, w=None):  # ch_in, ch_out, shortcut, groups, expansion
        super().__init__()
        self.cv1 = TFConv(cv1in, cv1out, 1, 1, w=w.cv1)
        self.cv2 = TFConv(cv1out, cv2out, 3, 1, g=g, w=w.cv2)
        self.add = shortcut and cv1in == cv2out

    def call(self, x):
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))

class TFC3Pruned(keras.layers.Layer):
    # CSP Bottleneck with 3 convolutions
    def __init__(self, cv1in, cv1out, cv2out, cv3out, bottle_args, n=1, shortcut=True, g=1, w=None):
        super().__init__()
        cv3in = bottle_args[-1][-1]
        self.cv1 = TFConv(cv1in, cv1out, 1, 1, w=w.cv1)
        self.cv2 = TFConv(cv1in, cv2out, 1, 1, w=w.cv2)
        self.cv3 = TFConv(cv3in+cv2out, cv3out, 1, w=w.cv3)
        self.m = keras.Sequential([TFBottleneckPruned(*bottle_args[k], shortcut, g, w=w.m[k]) for k in range(n)])

    def call(self, inputs):
        return self.cv3(tf.concat((self.m(self.cv1(inputs)), self.cv2(inputs)), axis=3))

class TFSPPFPruned(keras.layers.Layer):
    # Spatial pyramid pooling layer used in YOLOv3-SPP
    def __init__(self, cv1in, cv1out, cv2out, k=5, w=None):
        super().__init__()
        self.cv1 = TFConv(cv1in, cv1out, 1, 1, w=w.cv1)
        self.cv2 = TFConv(cv1out * 4, cv2out, 1, 1, w=w.cv2)
        self.m = keras.layers.MaxPool2D(pool_size=k, strides=1, padding='SAME')

    def call(self, x):
        x = self.cv1(x)
        y1 = self.m(x)
        y2 = self.m(y1)
        return self.cv2(tf.concat([x, y1, y2, self.m(y2)], 3))

TFModel这个类的初始化需要修改，参数需要多一个maskbndict，就是之前存放剪枝mask的字典，同时调用我们修改后的parse_pruned_model函数。

class TFModel:
    def __init__(self,maskbndict, cfg='yolov5s.yaml', ch=3, nc=None, model=None, imgsz=(640, 640)):  # model, channels, classes
        super().__init__()
        self.maskbndict = maskbndict
        if isinstance(cfg, dict):
            self.yaml = cfg  # model dict
        else:  # is *.yaml
            import yaml  # for torch hub
            self.yaml_file = Path(cfg).name
            with open(cfg) as f:
                self.yaml = yaml.load(f, Loader=yaml.FullLoader)  # model dict

        # Define model
        if nc and nc != self.yaml['nc']:
            LOGGER.info(f"Overriding {cfg} nc={self.yaml['nc']} with nc={nc}")
            self.yaml['nc'] = nc  # override yaml value

        # self.model, self.savelist = parse_model(deepcopy(self.yaml), ch=[ch], model=model, imgsz=imgsz)
        self.model, self.savelist, self.from_to_map = parse_pruned_model(self.maskbndict, deepcopy(self.yaml), ch=[ch], model=model, imgsz=imgsz)
        #后面的代码省略

回到export.py中，在函数export_saved_model中新增一个参数maskbndict，创建TFModel类时也要新建参数maskbndict。

def export_saved_model(maskbndict, model, im, file, dynamic,
                       tf_nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45,
                       conf_thres=0.25, keras=False, prefix=colorstr('TensorFlow SavedModel:')):
    # YOLOv5 TensorFlow SavedModel export
    try:
        import tensorflow as tf
        from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2

        from models.tf import TFDetect, TFModel

        LOGGER.info(f'\n{prefix} starting export with tensorflow {tf.__version__}...')
        f = str(file).replace('.pt', '_saved_model')
        batch_size, ch, *imgsz = list(im.shape)  # BCHW


        # tf_model = TFModel(cfg=model.yaml, model=model, nc=model.nc, imgsz=imgsz)
        tf_model = TFModel(maskbndict,cfg=model.yaml, model=model, nc=model.nc, imgsz=imgsz)
        #后面的代码省略

再回到一开始的run函数，这里调用export_saved_model函数也是一样加入maskbndict。

    #前面的代码省略    
    # TensorFlow Exports
    if any((saved_model, pb, tflite, edgetpu, tfjs)):
        if int8 or edgetpu:  # TFLite --int8 bug https://github.com/ultralytics/yolov5/issues/5707
            check_requirements(('flatbuffers==1.12',))  # required before `import tensorflow`
        assert not (tflite and tfjs), 'TFLite and TF.js models must be exported separately, please pass only one type.'
        # model, f[5] = export_saved_model(model, im, file, dynamic, tf_nms=nms or agnostic_nms or tfjs,
        #                                  agnostic_nms=agnostic_nms or tfjs, topk_per_class=topk_per_class,
        #                                  topk_all=topk_all, conf_thres=conf_thres, iou_thres=iou_thres)  # keras model

        model, f[5] = export_saved_model(maskbndict, model, im, file, dynamic, tf_nms=nms or agnostic_nms or tfjs,
                                         agnostic_nms=agnostic_nms or tfjs, topk_per_class=topk_per_class,
                                         topk_all=topk_all, conf_thres=conf_thres, iou_thres=iou_thres)  # keras model
    #后面的代码省略

这里读入Pytorch模型后加入两句，得到maskbndict：

    #前面的代码省略
    # Load PyTorch model
    device = select_device(device)
    assert not (device.type == 'cpu' and half), '--half only compatible with GPU export, i.e. use --device 0'
    model = attempt_load(weights, map_location=device, inplace=True, fuse=True)  # load FP32 model

    ckpt = torch.load(weights, map_location=device)  # load checkpoint
    maskbndict = ckpt['model'].maskbndict
    #后面的代码省略

至此所有地方修改完毕，运行export.py，可以看到模型终于解析成功，并且得到了tflite模型last-fp16.tflite。

模型参数如下：

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(1, 320, 320, 3)]   0           []                               
                                                                                                  
 tf_conv (TFConv)               (1, 160, 160, 30)    3360        ['input_1[0][0]']                
                                                                                                  
 tf_conv_1 (TFConv)             (1, 80, 80, 64)      17536       ['tf_conv[0][0]']                
                                                                                                  
 tfc3_pruned (TFC3Pruned)       (1, 80, 80, 55)      16743       ['tf_conv_1[0][0]']              
                                                                                                  
 tf_conv_7 (TFConv)             (1, 40, 40, 86)      42914       ['tfc3_pruned[0][0]']            
                                                                                                  
 tfc3_pruned_1 (TFC3Pruned)     (1, 40, 40, 67)      93731       ['tf_conv_7[0][0]']              
                                                                                                  
 tf_conv_15 (TFConv)            (1, 20, 20, 34)      20638       ['tfc3_pruned_1[0][0]']          
                                                                                                  
 tfc3_pruned_2 (TFC3Pruned)     (1, 20, 20, 39)      504681      ['tf_conv_15[0][0]']             
                                                                                                  
 tf_conv_25 (TFConv)            (1, 10, 10, 10)      3550        ['tfc3_pruned_2[0][0]']          
                                                                                                  
 tfc3_pruned_3 (TFC3Pruned)     (1, 10, 10, 13)      664399      ['tf_conv_25[0][0]']             
                                                                                                  
 tfsppf_pruned (TFSPPFPruned)   (1, 10, 10, 7)       388         ['tfc3_pruned_3[0][0]']          
                                                                                                  
 tf_conv_33 (TFConv)            (1, 10, 10, 6)       66          ['tfsppf_pruned[0][0]']          
                                                                                                  
 tf_upsample (TFUpsample)       (1, 20, 20, 6)       0           ['tf_conv_33[0][0]']             
                                                                                                  
 tf_concat (TFConcat)           (1, 20, 20, 45)      0           ['tf_upsample[0][0]',            
                                                                  'tfc3_pruned_2[0][0]']          
                                                                                                  
 tfc3_pruned_4 (TFC3Pruned)     (1, 20, 20, 11)      902         ['tf_concat[0][0]']              
                                                                                                  
 tf_conv_39 (TFConv)            (1, 20, 20, 14)      210         ['tfc3_pruned_4[0][0]']          
                                                                                                  
 tf_upsample_1 (TFUpsample)     (1, 40, 40, 14)      0           ['tf_conv_39[0][0]']             
                                                                                                  
 tf_concat_1 (TFConcat)         (1, 40, 40, 81)      0           ['tf_upsample_1[0][0]',          
                                                                  'tfc3_pruned_1[0][0]']          
                                                                                                  
 tfc3_pruned_5 (TFC3Pruned)     (1, 40, 40, 113)     26350       ['tf_concat_1[0][0]']            
                                                                                                  
 tf_conv_45 (TFConv)            (1, 20, 20, 43)      43903       ['tfc3_pruned_5[0][0]']          
                                                                                                  
 tf_concat_2 (TFConcat)         (1, 20, 20, 57)      0           ['tf_conv_45[0][0]',             
                                                                  'tf_conv_39[0][0]']             
                                                                                                  
 tfc3_pruned_6 (TFC3Pruned)     (1, 20, 20, 165)     31018       ['tf_concat_2[0][0]']            
                                                                                                  
 tf_conv_51 (TFConv)            (1, 10, 10, 44)      65516       ['tfc3_pruned_6[0][0]']          
                                                                                                  
 tf_concat_3 (TFConcat)         (1, 10, 10, 50)      0           ['tf_conv_51[0][0]',             
                                                                  'tf_conv_33[0][0]']             
                                                                                                  
 tfc3_pruned_7 (TFC3Pruned)     (1, 10, 10, 266)     36772       ['tf_concat_3[0][0]']            
                                                                                                  
 tf_detect (TFDetect)           ((1, 6300, 9),       14769       ['tfc3_pruned_5[0][0]',          
                                 [(1, 3, 1600, 9),                'tfc3_pruned_6[0][0]',          
                                 (1, 3, 400, 9),                  'tfc3_pruned_7[0][0]']          
                                 (1, 3, 100, 9)])                                                 
                                                                                                  
==================================================================================================
Total params: 1,587,446
Trainable params: 0
Non-trainable params: 1,587,446
__________________________________________________________________________________________________

最后模型大小为1.65MB：