基于BM1684移植YOLOV7

最新推荐文章于 2024-01-09 11:38:53 发布

流浪诗人Zz

最新推荐文章于 2024-01-09 11:38:53 发布

阅读量1.9k

点赞数 1

文章标签： docker linux 运维

本文链接：https://blog.csdn.net/captain_wangnb/article/details/126178134

版权

一、背景

为了方便后续客户在BM1684平台使用YOLOV7，这里基于官方YOLOV7原生模型进行适配。

官方仓库：https://github.com/WongKinYiu/yolov7
模型地址：https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7x.pt

二、环境

BM1684环境则基于官方BMNNSDK2开发手册，环境搭建可以参考我另外一篇总结-BMNNSDK2实战记录，这里直接基于该环境进行，环境大致配置如下，采用官方docker：

docker环境：bmnnsdk2-bm1684-ubuntu-docker-py37
BMNNSDK2包：bmnnsdk2_bm1684_v2.7.0_20220531patched.zip

三、迁移

3.1 模型迁移

创建如下目录结构：

#目录结构
YOLOv7_object/
`-- model
    |-- download_yolov7_model.sh     #下载原始模型
    |-- gen_bmodel.sh                #生成fp32 bmodel
    |-- gen_umodel_int8bmodel.sh     #生成int8 bmodel
    |-- out                          #bmode等输出目录
    |   `-- YOLOv7
    `-- yolov7.pt                    #原生模型

3.1.1 生成fp32 bmodel

3.1.1.1 脚本实现

采用bmnetp工具，脚本实现如下：

#!/bin/bash

model_dir=$(dirname $(readlink -f "$0"))
echo "model path: ${model_dir}"
top_dir=$model_dir/../../..
sdk_dir=$top_dir

export LD_LIBRARY_PATH=${sdk_dir}/lib/bmcompiler:${sdk_dir}/lib/bmlang:${sdk_dir}/lib/thirdparty/x86:${sdk_dir}/lib/bmnn/cmodel
export PATH=$PATH:${sdk_dir}/bmnet/bmnetp

#generate output directory
mkdir -p out/YOLOv7

# python 
echo "start model transform......"
python3 -m bmnetp \
       --net_name=yolov7 \
       --target=BM1684 \
       --opt=1 \
       --cmp=true \
       --shapes="[1,3,640,640]" \
       --model="${model_dir}/yolov7.pt" \
       --outdir=output/YOLOv7 \
       --dyn=false
if [ $? -eq 0 ]; then
    echo "Congratulation! Everything is OK!"
else
    echo "Something is wrong, pleae have a check!"
    exit -1
fi

运行脚本报错如下：

root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# ./gen_bmodel.sh
model path: /workspace/examples/YOLOv7_object/model
start model transform......
Namespace(cmp=True, desc=None, descs=None, dyn=False, enable_profile=False, input_structure=None, log_dir='', log_prefix=True, mode='compile', model='/workspace/examples/YOLOv7_object/model/yolov7.pt', net_name='yolov7', op_list=False, opt=1, outdir='output/YOLOv7', seed=42, shapes=[[1, 3, 640, 640]], target='BM1684', v=3)
python3 -m bmnetp --model=/workspace/examples/YOLOv7_object/model/yolov7.pt --net_name=yolov7 --target=BM1684 --outdir=output/YOLOv7 --shapes="[1,3,640,640]" --opt=1 --cmp=true --dyn=false --enable_profile=false --mode=compile --seed=42
/root/.local/lib/python3.7/site-packages/bmnetp/bin/bmnetp.bin --model=/workspace/examples/YOLOv7_object/model/yolov7.pt --net_name=yolov7 --target=BM1684 --outdir=output/YOLOv7 --shapes="[1,3,640,640]" --opt=1 --cmp=true --dyn=false --enable_profile=false --mode=0 --seed=42
terminate called after throwing an instance of 'c10::Error'
  what():  [enforce fail at inline_container.cc:222] . file not found: archive/constants.pkl
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x47 (0x7f990e0c80e7 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::getRecordID(std::string const&) + 0xed (0x7f98fe93accd in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
frame #2: caffe2::serialize::PyTorchStreamReader::getRecord(std::string const&) + 0x21 (0x7f98fe93ad41 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
frame #3: torch::jit::readArchiveAndTensors(std::string const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&) + 0x62 (0x7f98ffe322f2 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x348b8b4 (0x7f98ffe328b4 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0x348d193 (0x7f98ffe34193 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
frame #6: torch::jit::load(std::shared_ptr<caffe2::serialize::ReadAdapterInterface>, c10::optional<c10::Device>, std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > >&) + 0x193 (0x7f98ffe352f3 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
frame #7: torch::jit::load(std::string const&, c10::optional<c10::Device>, std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > >&) + 0xad (0x7f98ffe376bd in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
frame #8: torch::jit::load(std::string const&, c10::optional<c10::Device>) + 0x54 (0x7f98ffe37794 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x177ed2 (0x7f9911793ed2 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libbmnetp.so)
frame #10: bm::check(std::string const&) + 0x2f (0x7f9911797df7 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libbmnetp.so)
frame #11: main + 0xdc (0x445195 in /root/.local/lib/python3.7/site-packages/bmnetp/bin/bmnetp.bin)
frame #12: __libc_start_main + 0xf0 (0x7f98fb57a840 in /lib/x86_64-linux-gnu/libc.so.6)
frame #13: _start + 0x2a (0x4418aa in /root/.local/lib/python3.7/site-packages/bmnetp/bin/bmnetp.bin)

Aborted (core dumped)
compile failed, exit code=134
<class 'SystemExit'> 134 <traceback object at 0x7fa01fcd1448>
Something is wrong, pleae have a check!

首先怀疑是torch版本问题，通过查阅YOLOV7官方资料，YOLOV7要求torch>=1.7.0,!=1.12.0，torchvision>=0.8.1,!=0.13.0，而当前docker内相关包版本不匹配，如下：

root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# pip list | grep torch
torch                         1.5.0+cpu
torchvision                   0.6.0+cpu

后面通过咨询同事，发现模型迁移的原生模型必须要通过torchscript处理后才能进行迁移。鉴于此，需要重新配置docker环境。

3.1.1.2 docker环境配置

下面通过conda，进行docker环境管理。首先，需要安装conda，在上述docker中，命令如下：

[2022-07-14 20:02:54]  root@bitmain-SYS-4028GR-TR2:/workspace# mkdir miniconda
[2022-07-14 20:03:29]  root@bitmain-SYS-4028GR-TR2:/workspace# cd miniconda/
[2022-07-14 20:03:31]  root@bitmain-SYS-4028GR-TR2:/workspace/miniconda# wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Minicond aot@bitmain-SYS-4028GR-TR2:/workspace/miniconda# w
......
[2022-07-14 20:04:07]  2022-07-14 20:04:15 (3.83 MB/s) - 'Miniconda3-latest-Linux-x86_64.sh' saved [76607678/76607678]
[2022-07-14 20:04:37]  root@bitmain-SYS-4028GR-TR2:/workspace/miniconda# bash Miniconda3-latest-Linux-x86_64.sh 
......
[2022-07-14 20:05:34]  conda config --set auto_activate_base false
[2022-07-14 20:05:34]  
[2022-07-14 20:05:34]  Thank you for installing Miniconda3!

退出docker后，再次进入，conda就会默认启用了，注意看如下日志最后一行的(base)即表示已启用conda，日志如下：

[2022-07-14 20:05:53]  root@bitmain-SYS-4028GR-TR2:/workspace/miniconda# exit
[2022-07-14 20:06:09]  exit
[2022-07-14 20:06:09]  (base) ningbo.wang@bitmain-SYS-4028GR-TR2:~$docker exec -it ubuntu16.0-py37-wnb bash
[2022-07-14 20:06:12]  (base) root@bitmain-SYS-4028GR-TR2:/workspace# conda

为了使用方便，这里配置conda开机不默认启动，顺便把镜像源都配置为国内，下载包的速度会比较快，这里配置为清华镜像源，配置命令如下：

[2022-07-14 20:06:18]  (base) root@bitmain-SYS-4028GR-TR2:/workspace# con   condaexitconda config --set auto_activate_base fals
[2022-07-14 20:08:11]  (base) root@bitmain-SYS-4028GR-TR2:/workspace# conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anacon 
da/pkgs/free/
[2022-07-14 20:08:34]  (base) root@bitmain-SYS-4028GR-TR2:/workspace# onda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anacond 
a/pkgs/main/bitmain-SYS-4028GR-TR2:/workspace# conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda
[2022-07-14 20:08:45]  (base) root@bitmain-SYS-4028GR-TR2:/workspace# conda config --set show_channel_urls yes
[2022-07-14 20:09:02]  (base) root@bitmain-SYS-4028GR-TR2:/workspace# pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simpl 
e

上述配置完成后，可以退出docker后，重新进入，然后创建yolov7环境，进行环境配置，如下：

[2022-07-14 20:14:09]  root@bitmain-SYS-4028GR-TR2:/workspace# conda create -n yolov7 python=3.7
......
[2022-07-14 20:16:16]  # To activate this environment, use
[2022-07-14 20:16:16]  #
[2022-07-14 20:16:16]  #     $ conda activate yolov7
[2022-07-14 20:16:16]  #
[2022-07-14 20:16:16]  # To deactivate an active environment, use
[2022-07-14 20:16:16]  #
[2022-07-14 20:16:16]  #     $ conda deactivate
[2022-07-14 20:16:16]  root@bitmain-SYS-4028GR-TR2:/workspace# conda activate yolov7
[2022-07-14 20:17:01]  (yolov7) root@bitmain-SYS-4028GR-TR2:/workspace#

3.1.1.3 YOLOv7模型准备

拉取官方代码仓库，并下载原生模型，之后将原生模型转换为torchscript模型，目录结构如下：

[2022-07-14 21:00:58]  (yolov7) root@bitmain-SYS-4028GR-TR2:/workspace/code/yolov7# python m models/export.py --weights yolov7.pt 
......
[2022-07-14 21:01:24]  Export complete (10.26s). Visualize with https://github.com/lutzroeder/netron.

root@bitmain-SYS-4028GR-TR2:/workspace/code/yolov7# tree -L 1
.
|-- LICENSE.md
|-- README.md
|-- cfg
|-- data
|-- detect.py
|-- figure
|-- hubconf.py
|-- inference
|-- models
|-- requirements.txt
|-- scripts
|-- test.py
|-- tools
|-- train.py
|-- train_aux.py
|-- utils
|-- yolov7.onnx
|-- yolov7.pt
`-- yolov7.torchscript.pt

3.1.1.4 模型转换

下面基于yolov7.torchscript.pt进行模型迁移，这里需要将停止docker，采用官方原生的docker环境，即【3.1.1.1】中的环境，命令如下：

root@bitmain-SYS-4028GR-TR2:/workspace/code/yolov7# cp yolov7.torchscript.pt ../../examples/YOLOv7_object/model/
core                      gen_bmodel.sh             out/
download_yolov7_model.sh  gen_umodel_int8bmodel.sh  yolov7.pt
root@bitmain-SYS-4028GR-TR2:/workspace/code/yolov7# cp yolov7.torchscript.pt ../../examples/YOLOv7_object/model/
root@bitmain-SYS-4028GR-TR2:/workspace/code/yolov7# cd ../../examples/YOLOv7_object/model
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# ./gen_bmodel.sh
model path: /workspace/examples/YOLOv7_object/model
start model transform......
......
BMLIB Send Quit Message
Compiling succeeded.
Congratulation! Everything is OK!

root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# tree output/YOLOv7/
output/YOLOv7/
|-- compilation.bmodel
|-- input_ref_data.dat
|-- io_info.dat
`-- output_ref_data.dat

root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# bm_model.bin --info ./output/YOLOv7/compilation.bmodel
bmodel version: B.2.2
chip: BM1684
create time: Fri Jul 15 10:40:10 2022

==========================================
net 0: [yolov7]  static
------------
stage 0:
input: x.1, [1, 3, 640, 640], float32, scale: 1
output: 756, [1, 3, 80, 80, 85], float32, scale: 1
output: 757, [1, 3, 40, 40, 85], float32, scale: 1
output: 758, [1, 3, 20, 20, 85], float32, scale: 1

3.1.1.5 精度回归

下面借助官方工具，进行转换模型精度回归，精度符合预期，如下所示：

root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# bmrt_test --context_dir=./output/YOLOv7/
[BMRT][deal_with_options:1412] INFO:Loop num: 1
bmcpu init: skip cpu_user_defined
......
[BMRT][bmrt_test:1038] INFO:==>comparing #0 output ...
[BMRT][bmrt_test:1043] INFO:+++ The network[yolov7] stage[0] cmp success +++
[BMRT][bmrt_test:1063] INFO:load input time(s): 0.004891
[BMRT][bmrt_test:1064] INFO:calculate  time(s): 0.084028
[BMRT][bmrt_test:1065] INFO:get output time(s): 0.007568
[BMRT][bmrt_test:1066] INFO:compare    time(s): 0.027697

至此，fp32bmodel生成完毕。

3.1.2 生成int8 bmodel

int8量化模型相较于fp32复杂一些，大致需要一下步骤。

转存失败重新上传取消

3.1.2.1 量化数据集准备

这里基于coco128数据进行处理，主要参考YOLOv7前处理，需要保持一致，主要是等比例加框处理、归一化，将数据集处理成lmdb格式的文件，命令执行如下:

root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/data# python3 convert_imageset.py --imageset_rootfolder ./coco128/images/train2017/ --imageset_lmdbfolder ./ --image_size  640 --bgr2rgb True --gray False
remove original lmdb file /workspace/examples/YOLOv7_object/data/data.mdb
remove original lmdb file /workspace/examples/YOLOv7_object/data/data.mdb Ok!

reading image /workspace/examples/YOLOv7_object/data/coco128/images/train2017/000000000472.jpg
original shape: (226, 640, 3)
save test.jpg done
......

root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/data# tree -L 1
.
|-- coco128
|-- convert_imageset.py
|-- data.mdb
`-- download_coco128.sh

这里为了方便查看前处理图片是否正确，将加框处理后图片存出后，对比查看，如下可以看出前处理正确:

转存失败重新上传取消

3.1.2.2 fp32umodel生成

采用ufw.tools.pt_to_umode工具，进行fp32umodel生成，命令如下：

root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# ./gen_fp32umodel.sh
/workspace/examples/YOLOv7_object/model
/usr/local/lib/python3.7/runpy.py:125: RuntimeWarning: 'ufw.tools.pt_to_umodel' found in sys.modules after import of package 'ufw.tools', but prior to execution of 'ufw.tools.pt_to_umodel'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
python3 -m bmnetp --model=./yolov7.torchscript.pt --net_name=yolov7.torchscript --target=BM1684 --outdir=compilation_fp32umodel --shapes="[1,3,640,640]" --opt=2 --cmp=true --dyn=false --enable_profile=false --mode=GenUmodel
/root/.local/lib/python3.7/site-packages/bmnetp/bin/bmnetp.bin --model=./yolov7.torchscript.pt --net_name=yolov7.torchscript --target=BM1684 --outdir=compilation_fp32umodel --shapes="[1,3,640,640]" --opt=2 --cmp=true --dyn=false --enable_profile=false --mode=1
All ops supported.
......
Compiling succeeded.
####################################
Converting Process Done Sucessfully
####################################
fp32umodel done

3.1.2.3 int8umodel生成

下面，基于上述章节生成的数据集、fp32umode等成果物，进行int8umodel转换，主要包含两部分：

对输入浮点网络进行图优化，这一步在【3.1.2.2】中已包含，也可以在此处做
对浮点网络进行量化，得到int8的网络及权重文件

这里我们只进行int8的量化，不进行图优化，迭代200次，命令执行如下：

root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# ./gen_int8umodel.sh
/workspace/examples/YOLOv7_object/model
I0718 10:21:30.706017  6933 common.cpp:62] ufw version with commit id:bc3faf38c90b7216f95796e9edaa8cecd9227d8d
I0718 10:21:30.706442  6933 calibration_use_pb.cpp:171] calibration-tools version with commit id:bc3faf38c90b7216f95796e9edaa8cecd9227d8d
......
/usr/bin/dot
I0718 10:46:52.879577  6933 cali_core.cpp:1474] used time=0 hour:25 min:22 sec
I0718 10:46:52.879654  6933 cali_core.cpp:1476] int8 calibration done.
Congratulation! Everything is OK!

#目录结构
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# tree compilation_fp32umodel/
compilation_fp32umodel/
|-- io_info.dat
|-- yolov7.fp32umodel -> yolov7.torchscript_bmnetp.fp32umodel
|-- yolov7.int8umodel
|-- yolov7.prototxt -> yolov7.torchscript_bmnetp_test_fp32.prototxt
|-- yolov7.torchscript_bmnetp.fp32umodel
|-- yolov7.torchscript_bmnetp_test_fp32.prototxt
|-- yolov7_deploy_fp32_unique_top.prototxt
`-- yolov7_deploy_int8_unique_top.prototxt

注意，官方工具存在一些问题，手册讲解与工具实际表现不一致，如下：

-winograd配置为false或者true均会无报错，返回0状态退出，经过尝试，实际是只要-winograd则是使能（true），否则不配置该参数即为false，而官方手册及工具本身的help都是错误的，需要更新，另外，通过试探-save_test_proto、-graph_transform均是如此

转存失败重新上传取消

3.1.2.4 int8bmodel生成

下面生成板上部署使用的bmodel，代码大致如下：

#!/bin/bash
#1batch bmodel
mkdir int8model
bmnetu \
    -model compilation_fp32umodel/yolov7_deploy_int8_unique_top.prototxt \
    -weight compilation_fp32umodel/yolov7.int8umodel \
    -outdir=./int8model \
    -cmp true

if [ $? -eq 0 ]; then
    cp ./int8model/compilation.bmodel ./int8model/yolov7_int8_1b.bmodel
    echo "Congratulation! Everything is OK!"
else
    echo "Something is wrong, pleae have a check!"
    exit -1
fi

运行脚本，命令执行及最终成果物路径如下：

root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# ./gen_int8bmodel.sh
/workspace/examples/YOLOv7_object/model
mkdir: cannot create directory 'int8model': File exists
......
============================================================
*** Store bmodel of BMCompiler...
============================================================
BMLIB Send Quit Message
Congratulation! Everything is OK!
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# tree -L 1 int8model/
int8model/
|-- compilation.bmodel
|-- input_ref_data.dat
|-- io_info.dat
|-- output_ref_data.dat
`-- yolov7_int8_1b.bmodel  #1batch的最终成果物

3.1.2.5 精度回归

下面借助官方工具，进行转换模型精度回归，精度符合预期，如下所示：

root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# bmrt_test --context_dir=./int8model/
[BMRT][deal_with_options:1412] INFO:Loop num: 1
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
[BMRT][load_bmodel:1018] INFO:Loading bmodel from [./int8model//compilation.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:982] INFO:pre net num: 0, load net num: 1
[BMRT][show_net_info:1336] INFO: ########################
[BMRT][show_net_info:1337] INFO: NetName: yolov7, Index=0
[BMRT][show_net_info:1339] INFO: ---- stage 0 ----
[BMRT][show_net_info:1347] INFO:   Input 0) 'x.1' shape=[ 1 3 640 480 ] dtype=INT8 scale=127.031
[BMRT][show_net_info:1356] INFO:   Output 0) '756' shape=[ 1 3 80 60 85 ] dtype=INT8 scale=0.198189
[BMRT][show_net_info:1356] INFO:   Output 1) '757' shape=[ 1 3 40 30 85 ] dtype=INT8 scale=0.202178
[BMRT][show_net_info:1356] INFO:   Output 2) '758' shape=[ 1 3 20 15 85 ] dtype=INT8 scale=0.169756
[BMRT][show_net_info:1359] INFO: ########################
[BMRT][bmrt_test:770] INFO:==> running network #0, name: yolov7, loop: 0
[BMRT][bmrt_test:834] INFO:reading input #0, bytesize=921600
[BMRT][bmrt_test:987] INFO:reading output #0, bytesize=1224000
[BMRT][bmrt_test:987] INFO:reading output #1, bytesize=306000
[BMRT][bmrt_test:987] INFO:reading output #2, bytesize=76500
[BMRT][bmrt_test:1019] INFO:net[yolov7] stage[0], launch total time is 32659 us (npu 32530 us, cpu 129 us)
[BMRT][bmrt_test:1022] INFO:+++ The network[yolov7] stage[0] output_data +++
[BMRT][bmrt_test:1038] INFO:==>comparing #0 output ...
[BMRT][bmrt_test:1043] INFO:+++ The network[yolov7] stage[0] cmp success +++
[BMRT][bmrt_test:1063] INFO:load input time(s): 0.000951
[BMRT][bmrt_test:1064] INFO:calculate  time(s): 0.032664
[BMRT][bmrt_test:1065] INFO:get output time(s): 0.001572
[BMRT][bmrt_test:1066] INFO:compare    time(s): 0.005961