一、背景
为了方便后续客户在BM1684平台使用YOLOV7,这里基于官方YOLOV7原生模型进行适配。
官方仓库:https://github.com/WongKinYiu/yolov7
模型地址:https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7x.pt
二、环境
BM1684环境则基于官方BMNNSDK2开发手册,环境搭建可以参考我另外一篇总结-BMNNSDK2实战记录,这里直接基于该环境进行,环境大致配置如下,采用官方docker:
docker环境:bmnnsdk2-bm1684-ubuntu-docker-py37
BMNNSDK2包:bmnnsdk2_bm1684_v2.7.0_20220531patched.zip
三、迁移
3.1 模型迁移
创建如下目录结构:
#目录结构
YOLOv7_object/
`-- model
|-- download_yolov7_model.sh #下载原始模型
|-- gen_bmodel.sh #生成fp32 bmodel
|-- gen_umodel_int8bmodel.sh #生成int8 bmodel
|-- out #bmode等输出目录
| `-- YOLOv7
`-- yolov7.pt #原生模型
3.1.1 生成fp32 bmodel
3.1.1.1 脚本实现
采用bmnetp工具,脚本实现如下:
#!/bin/bash
model_dir=$(dirname $(readlink -f "$0"))
echo "model path: ${model_dir}"
top_dir=$model_dir/../../..
sdk_dir=$top_dir
export LD_LIBRARY_PATH=${sdk_dir}/lib/bmcompiler:${sdk_dir}/lib/bmlang:${sdk_dir}/lib/thirdparty/x86:${sdk_dir}/lib/bmnn/cmodel
export PATH=$PATH:${sdk_dir}/bmnet/bmnetp
#generate output directory
mkdir -p out/YOLOv7
# python
echo "start model transform......"
python3 -m bmnetp \
--net_name=yolov7 \
--target=BM1684 \
--opt=1 \
--cmp=true \
--shapes="[1,3,640,640]" \
--model="${model_dir}/yolov7.pt" \
--outdir=output/YOLOv7 \
--dyn=false
if [ $? -eq 0 ]; then
echo "Congratulation! Everything is OK!"
else
echo "Something is wrong, pleae have a check!"
exit -1
fi
运行脚本报错如下:
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# ./gen_bmodel.sh
model path: /workspace/examples/YOLOv7_object/model
start model transform......
Namespace(cmp=True, desc=None, descs=None, dyn=False, enable_profile=False, input_structure=None, log_dir='', log_prefix=True, mode='compile', model='/workspace/examples/YOLOv7_object/model/yolov7.pt', net_name='yolov7', op_list=False, opt=1, outdir='output/YOLOv7', seed=42, shapes=[[1, 3, 640, 640]], target='BM1684', v=3)
python3 -m bmnetp --model=/workspace/examples/YOLOv7_object/model/yolov7.pt --net_name=yolov7 --target=BM1684 --outdir=output/YOLOv7 --shapes="[1,3,640,640]" --opt=1 --cmp=true --dyn=false --enable_profile=false --mode=compile --seed=42
/root/.local/lib/python3.7/site-packages/bmnetp/bin/bmnetp.bin --model=/workspace/examples/YOLOv7_object/model/yolov7.pt --net_name=yolov7 --target=BM1684 --outdir=output/YOLOv7 --shapes="[1,3,640,640]" --opt=1 --cmp=true --dyn=false --enable_profile=false --mode=0 --seed=42
terminate called after throwing an instance of 'c10::Error'
what(): [enforce fail at inline_container.cc:222] . file not found: archive/constants.pkl
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x47 (0x7f990e0c80e7 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::getRecordID(std::string const&) + 0xed (0x7f98fe93accd in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
frame #2: caffe2::serialize::PyTorchStreamReader::getRecord(std::string const&) + 0x21 (0x7f98fe93ad41 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
frame #3: torch::jit::readArchiveAndTensors(std::string const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&) + 0x62 (0x7f98ffe322f2 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x348b8b4 (0x7f98ffe328b4 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0x348d193 (0x7f98ffe34193 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
frame #6: torch::jit::load(std::shared_ptr<caffe2::serialize::ReadAdapterInterface>, c10::optional<c10::Device>, std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > >&) + 0x193 (0x7f98ffe352f3 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
frame #7: torch::jit::load(std::string const&, c10::optional<c10::Device>, std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > >&) + 0xad (0x7f98ffe376bd in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
frame #8: torch::jit::load(std::string const&, c10::optional<c10::Device>) + 0x54 (0x7f98ffe37794 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x177ed2 (0x7f9911793ed2 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libbmnetp.so)
frame #10: bm::check(std::string const&) + 0x2f (0x7f9911797df7 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libbmnetp.so)
frame #11: main + 0xdc (0x445195 in /root/.local/lib/python3.7/site-packages/bmnetp/bin/bmnetp.bin)
frame #12: __libc_start_main + 0xf0 (0x7f98fb57a840 in /lib/x86_64-linux-gnu/libc.so.6)
frame #13: _start + 0x2a (0x4418aa in /root/.local/lib/python3.7/site-packages/bmnetp/bin/bmnetp.bin)
Aborted (core dumped)
compile failed, exit code=134
<class 'SystemExit'> 134 <traceback object at 0x7fa01fcd1448>
Something is wrong, pleae have a check!
首先怀疑是torch版本问题,通过查阅YOLOV7官方资料,YOLOV7要求torch>=1.7.0,!=1.12.0,torchvision>=0.8.1,!=0.13.0,而当前docker内相关包版本不匹配,如下:
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# pip list | grep torch
torch 1.5.0+cpu
torchvision 0.6.0+cpu
后面通过咨询同事,发现模型迁移的原生模型必须要通过torchscript处理后才能进行迁移。鉴于此,需要重新配置docker环境。
3.1.1.2 docker环境配置
下面通过conda,进行docker环境管理。首先,需要安装conda,在上述docker中,命令如下:
[2022-07-14 20:02:54] root@bitmain-SYS-4028GR-TR2:/workspace# mkdir miniconda
[2022-07-14 20:03:29] root@bitmain-SYS-4028GR-TR2:/workspace# cd miniconda/
[2022-07-14 20:03:31] root@bitmain-SYS-4028GR-TR2:/workspace/miniconda# wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Minicond aot@bitmain-SYS-4028GR-TR2:/workspace/miniconda# w
......
[2022-07-14 20:04:07] 2022-07-14 20:04:15 (3.83 MB/s) - 'Miniconda3-latest-Linux-x86_64.sh' saved [76607678/76607678]
[2022-07-14 20:04:37] root@bitmain-SYS-4028GR-TR2:/workspace/miniconda# bash Miniconda3-latest-Linux-x86_64.sh
......
[2022-07-14 20:05:34] conda config --set auto_activate_base false
[2022-07-14 20:05:34]
[2022-07-14 20:05:34] Thank you for installing Miniconda3!
退出docker后,再次进入,conda就会默认启用了,注意看如下日志最后一行的(base)即表示已启用conda,日志如下:
[2022-07-14 20:05:53] root@bitmain-SYS-4028GR-TR2:/workspace/miniconda# exit
[2022-07-14 20:06:09] exit
[2022-07-14 20:06:09] (base) ningbo.wang@bitmain-SYS-4028GR-TR2:~$docker exec -it ubuntu16.0-py37-wnb bash
[2022-07-14 20:06:12] (base) root@bitmain-SYS-4028GR-TR2:/workspace# conda
为了使用方便,这里配置conda开机不默认启动,顺便把镜像源都配置为国内,下载包的速度会比较快,这里配置为清华镜像源,配置命令如下:
[2022-07-14 20:06:18] (base) root@bitmain-SYS-4028GR-TR2:/workspace# con condaexitconda config --set auto_activate_base fals
[2022-07-14 20:08:11] (base) root@bitmain-SYS-4028GR-TR2:/workspace# conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anacon
da/pkgs/free/
[2022-07-14 20:08:34] (base) root@bitmain-SYS-4028GR-TR2:/workspace# onda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anacond
a/pkgs/main/bitmain-SYS-4028GR-TR2:/workspace# conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda
[2022-07-14 20:08:45] (base) root@bitmain-SYS-4028GR-TR2:/workspace# conda config --set show_channel_urls yes
[2022-07-14 20:09:02] (base) root@bitmain-SYS-4028GR-TR2:/workspace# pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simpl
e
上述配置完成后,可以退出docker后,重新进入,然后创建yolov7环境,进行环境配置,如下:
[2022-07-14 20:14:09] root@bitmain-SYS-4028GR-TR2:/workspace# conda create -n yolov7 python=3.7
......
[2022-07-14 20:16:16] # To activate this environment, use
[2022-07-14 20:16:16] #
[2022-07-14 20:16:16] # $ conda activate yolov7
[2022-07-14 20:16:16] #
[2022-07-14 20:16:16] # To deactivate an active environment, use
[2022-07-14 20:16:16] #
[2022-07-14 20:16:16] # $ conda deactivate
[2022-07-14 20:16:16] root@bitmain-SYS-4028GR-TR2:/workspace# conda activate yolov7
[2022-07-14 20:17:01] (yolov7) root@bitmain-SYS-4028GR-TR2:/workspace#
3.1.1.3 YOLOv7模型准备
拉取官方代码仓库,并下载原生模型,之后将原生模型转换为torchscript模型,目录结构如下:
[2022-07-14 21:00:58] (yolov7) root@bitmain-SYS-4028GR-TR2:/workspace/code/yolov7# python m models/export.py --weights yolov7.pt
......
[2022-07-14 21:01:24] Export complete (10.26s). Visualize with https://github.com/lutzroeder/netron.
root@bitmain-SYS-4028GR-TR2:/workspace/code/yolov7# tree -L 1
.
|-- LICENSE.md
|-- README.md
|-- cfg
|-- data
|-- detect.py
|-- figure
|-- hubconf.py
|-- inference
|-- models
|-- requirements.txt
|-- scripts
|-- test.py
|-- tools
|-- train.py
|-- train_aux.py
|-- utils
|-- yolov7.onnx
|-- yolov7.pt
`-- yolov7.torchscript.pt
3.1.1.4 模型转换
下面基于yolov7.torchscript.pt进行模型迁移,这里需要将停止docker,采用官方原生的docker环境,即【3.1.1.1】中的环境,命令如下:
root@bitmain-SYS-4028GR-TR2:/workspace/code/yolov7# cp yolov7.torchscript.pt ../../examples/YOLOv7_object/model/
core gen_bmodel.sh out/
download_yolov7_model.sh gen_umodel_int8bmodel.sh yolov7.pt
root@bitmain-SYS-4028GR-TR2:/workspace/code/yolov7# cp yolov7.torchscript.pt ../../examples/YOLOv7_object/model/
root@bitmain-SYS-4028GR-TR2:/workspace/code/yolov7# cd ../../examples/YOLOv7_object/model
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# ./gen_bmodel.sh
model path: /workspace/examples/YOLOv7_object/model
start model transform......
......
BMLIB Send Quit Message
Compiling succeeded.
Congratulation! Everything is OK!
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# tree output/YOLOv7/
output/YOLOv7/
|-- compilation.bmodel
|-- input_ref_data.dat
|-- io_info.dat
`-- output_ref_data.dat
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# bm_model.bin --info ./output/YOLOv7/compilation.bmodel
bmodel version: B.2.2
chip: BM1684
create time: Fri Jul 15 10:40:10 2022
==========================================
net 0: [yolov7] static
------------
stage 0:
input: x.1, [1, 3, 640, 640], float32, scale: 1
output: 756, [1, 3, 80, 80, 85], float32, scale: 1
output: 757, [1, 3, 40, 40, 85], float32, scale: 1
output: 758, [1, 3, 20, 20, 85], float32, scale: 1
3.1.1.5 精度回归
下面借助官方工具,进行转换模型精度回归,精度符合预期,如下所示:
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# bmrt_test --context_dir=./output/YOLOv7/
[BMRT][deal_with_options:1412] INFO:Loop num: 1
bmcpu init: skip cpu_user_defined
......
[BMRT][bmrt_test:1038] INFO:==>comparing #0 output ...
[BMRT][bmrt_test:1043] INFO:+++ The network[yolov7] stage[0] cmp success +++
[BMRT][bmrt_test:1063] INFO:load input time(s): 0.004891
[BMRT][bmrt_test:1064] INFO:calculate time(s): 0.084028
[BMRT][bmrt_test:1065] INFO:get output time(s): 0.007568
[BMRT][bmrt_test:1066] INFO:compare time(s): 0.027697
至此,fp32bmodel生成完毕。
3.1.2 生成int8 bmodel
int8量化模型相较于fp32复杂一些,大致需要一下步骤。
转存失败重新上传取消
3.1.2.1 量化数据集准备
这里基于coco128数据进行处理,主要参考YOLOv7前处理,需要保持一致,主要是等比例加框处理、归一化,将数据集处理成lmdb格式的文件,命令执行如下:
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/data# python3 convert_imageset.py --imageset_rootfolder ./coco128/images/train2017/ --imageset_lmdbfolder ./ --image_size 640 --bgr2rgb True --gray False
remove original lmdb file /workspace/examples/YOLOv7_object/data/data.mdb
remove original lmdb file /workspace/examples/YOLOv7_object/data/data.mdb Ok!
reading image /workspace/examples/YOLOv7_object/data/coco128/images/train2017/000000000472.jpg
original shape: (226, 640, 3)
save test.jpg done
......
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/data# tree -L 1
.
|-- coco128
|-- convert_imageset.py
|-- data.mdb
`-- download_coco128.sh
这里为了方便查看前处理图片是否正确,将加框处理后图片存出后,对比查看,如下可以看出前处理正确:
转存失败重新上传取消
3.1.2.2 fp32umodel生成
采用ufw.tools.pt_to_umode工具,进行fp32umodel生成,命令如下:
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# ./gen_fp32umodel.sh
/workspace/examples/YOLOv7_object/model
/usr/local/lib/python3.7/runpy.py:125: RuntimeWarning: 'ufw.tools.pt_to_umodel' found in sys.modules after import of package 'ufw.tools', but prior to execution of 'ufw.tools.pt_to_umodel'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
python3 -m bmnetp --model=./yolov7.torchscript.pt --net_name=yolov7.torchscript --target=BM1684 --outdir=compilation_fp32umodel --shapes="[1,3,640,640]" --opt=2 --cmp=true --dyn=false --enable_profile=false --mode=GenUmodel
/root/.local/lib/python3.7/site-packages/bmnetp/bin/bmnetp.bin --model=./yolov7.torchscript.pt --net_name=yolov7.torchscript --target=BM1684 --outdir=compilation_fp32umodel --shapes="[1,3,640,640]" --opt=2 --cmp=true --dyn=false --enable_profile=false --mode=1
All ops supported.
......
Compiling succeeded.
####################################
Converting Process Done Sucessfully
####################################
fp32umodel done
3.1.2.3 int8umodel生成
下面,基于上述章节生成的数据集、fp32umode等成果物,进行int8umodel转换,主要包含两部分:
-
对输入浮点网络进行图优化,这一步在【3.1.2.2】中已包含,也可以在此处做
-
对浮点网络进行量化,得到int8的网络及权重文件
这里我们只进行int8的量化,不进行图优化,迭代200次,命令执行如下:
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# ./gen_int8umodel.sh
/workspace/examples/YOLOv7_object/model
I0718 10:21:30.706017 6933 common.cpp:62] ufw version with commit id:bc3faf38c90b7216f95796e9edaa8cecd9227d8d
I0718 10:21:30.706442 6933 calibration_use_pb.cpp:171] calibration-tools version with commit id:bc3faf38c90b7216f95796e9edaa8cecd9227d8d
......
/usr/bin/dot
I0718 10:46:52.879577 6933 cali_core.cpp:1474] used time=0 hour:25 min:22 sec
I0718 10:46:52.879654 6933 cali_core.cpp:1476] int8 calibration done.
Congratulation! Everything is OK!
#目录结构
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# tree compilation_fp32umodel/
compilation_fp32umodel/
|-- io_info.dat
|-- yolov7.fp32umodel -> yolov7.torchscript_bmnetp.fp32umodel
|-- yolov7.int8umodel
|-- yolov7.prototxt -> yolov7.torchscript_bmnetp_test_fp32.prototxt
|-- yolov7.torchscript_bmnetp.fp32umodel
|-- yolov7.torchscript_bmnetp_test_fp32.prototxt
|-- yolov7_deploy_fp32_unique_top.prototxt
`-- yolov7_deploy_int8_unique_top.prototxt
注意,官方工具存在一些问题,手册讲解与工具实际表现不一致,如下:
-
-winograd配置为false或者true均会无报错,返回0状态退出,经过尝试,实际是只要-winograd则是使能(true),否则不配置该参数即为false,而官方手册及工具本身的help都是错误的,需要更新,另外,通过试探-save_test_proto、-graph_transform均是如此
转存失败重新上传取消
3.1.2.4 int8bmodel生成
下面生成板上部署使用的bmodel,代码大致如下:
#!/bin/bash
#1batch bmodel
mkdir int8model
bmnetu \
-model compilation_fp32umodel/yolov7_deploy_int8_unique_top.prototxt \
-weight compilation_fp32umodel/yolov7.int8umodel \
-outdir=./int8model \
-cmp true
if [ $? -eq 0 ]; then
cp ./int8model/compilation.bmodel ./int8model/yolov7_int8_1b.bmodel
echo "Congratulation! Everything is OK!"
else
echo "Something is wrong, pleae have a check!"
exit -1
fi
运行脚本,命令执行及最终成果物路径如下:
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# ./gen_int8bmodel.sh
/workspace/examples/YOLOv7_object/model
mkdir: cannot create directory 'int8model': File exists
......
============================================================
*** Store bmodel of BMCompiler...
============================================================
BMLIB Send Quit Message
Congratulation! Everything is OK!
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# tree -L 1 int8model/
int8model/
|-- compilation.bmodel
|-- input_ref_data.dat
|-- io_info.dat
|-- output_ref_data.dat
`-- yolov7_int8_1b.bmodel #1batch的最终成果物
3.1.2.5 精度回归
下面借助官方工具,进行转换模型精度回归,精度符合预期,如下所示:
root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# bmrt_test --context_dir=./int8model/
[BMRT][deal_with_options:1412] INFO:Loop num: 1
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
[BMRT][load_bmodel:1018] INFO:Loading bmodel from [./int8model//compilation.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:982] INFO:pre net num: 0, load net num: 1
[BMRT][show_net_info:1336] INFO: ########################
[BMRT][show_net_info:1337] INFO: NetName: yolov7, Index=0
[BMRT][show_net_info:1339] INFO: ---- stage 0 ----
[BMRT][show_net_info:1347] INFO: Input 0) 'x.1' shape=[ 1 3 640 480 ] dtype=INT8 scale=127.031
[BMRT][show_net_info:1356] INFO: Output 0) '756' shape=[ 1 3 80 60 85 ] dtype=INT8 scale=0.198189
[BMRT][show_net_info:1356] INFO: Output 1) '757' shape=[ 1 3 40 30 85 ] dtype=INT8 scale=0.202178
[BMRT][show_net_info:1356] INFO: Output 2) '758' shape=[ 1 3 20 15 85 ] dtype=INT8 scale=0.169756
[BMRT][show_net_info:1359] INFO: ########################
[BMRT][bmrt_test:770] INFO:==> running network #0, name: yolov7, loop: 0
[BMRT][bmrt_test:834] INFO:reading input #0, bytesize=921600
[BMRT][bmrt_test:987] INFO:reading output #0, bytesize=1224000
[BMRT][bmrt_test:987] INFO:reading output #1, bytesize=306000
[BMRT][bmrt_test:987] INFO:reading output #2, bytesize=76500
[BMRT][bmrt_test:1019] INFO:net[yolov7] stage[0], launch total time is 32659 us (npu 32530 us, cpu 129 us)
[BMRT][bmrt_test:1022] INFO:+++ The network[yolov7] stage[0] output_data +++
[BMRT][bmrt_test:1038] INFO:==>comparing #0 output ...
[BMRT][bmrt_test:1043] INFO:+++ The network[yolov7] stage[0] cmp success +++
[BMRT][bmrt_test:1063] INFO:load input time(s): 0.000951
[BMRT][bmrt_test:1064] INFO:calculate time(s): 0.032664
[BMRT][bmrt_test:1065] INFO:get output time(s): 0.001572
[BMRT][bmrt_test:1066] INFO:compare time(s): 0.005961