Yolo-FastestV2在树莓派4B上的MNN移植记录

最新推荐文章于 2024-06-20 09:36:01 发布

weixin_39266208

最新推荐文章于 2024-06-20 09:36:01 发布

阅读量3.9k

点赞数 3

分类专栏：深度学习文章标签： mnn 深度学习目标检测

本文链接：https://blog.csdn.net/weixin_39266208/article/details/122131303

版权

致谢

Yolo-FastestV2 https://github.com/dog-qiuqiu/Yolo-FastestV2/，非常感谢作者的分享！

模型准备

首先，下载代码，根据要求训练，或者直接使用作者训练好的模型，根据作者的文档，导出onnx模型。

MNN编译

下载最新的MNN代码。

编译MNNConvert
首先编译MNNConvert，这个是x86_64版本的，还好，用cmake可以在不同的目录里build，根据官方文档编译，默认是不成功的，我的版本是1.2.1。编译方式如下：

cd MNN/
./schema/generate.sh
mkdir build
cd build
# 我电脑16核，根据情况选择
cmake .. -DMNN_BUILD_CONVERTER=true && make -j16

报错如下：

/tmp/MNN-1.2.1/tools/converter/include/cxxopts.hpp: In function ‘void cxxopts::values::detail::check_signed_range(bool, U, const string&)’:
/tmp/MNN-1.2.1/tools/converter/include/cxxopts.hpp:343:25: error: ‘numeric_limits’ is not a member of ‘std’
  343 |     SignedCheck<T, std::numeric_limits<T>::is_signed>()(negative, value, text);
      |                         ^~~~~~~~~~~~~~
/tmp/MNN-1.2.1/tools/converter/include/cxxopts.hpp:343:25: error: ‘numeric_limits’ is not a member of ‘std’
/tmp/MNN-1.2.1/tools/converter/include/cxxopts.hpp:343:41: error: template argument 2 is invalid
  343 |     SignedCheck<T, std::numeric_limits<T>::is_signed>()(negative, value, text);
      |                                         ^
/tmp/MNN-1.2.1/tools/converter/include/cxxopts.hpp:343:53: error: qualified-id in declaration before ‘>’ token
  343 |     SignedCheck<T, std::numeric_limits<T>::is_signed>()(negative, value, text);
      |                                                     ^

解决办法如下：

vim /tmp/MNN-1.2.1/tools/converter/include/cxxopts.hpp
# 添加如下一行头文件到这个文件中，注意位置，头文件有防重复包含的宏，要在它的范围之内， vim是i进入编辑模式，编辑完成ESC，然后:wq退出，不熟悉vim可以使用gedit或nano代替，免得进去了出不去！
#include <limits>

如果报错：

Could NOT find Protobuf (missing: Protobuf_LIBRARIES Protobuf_INCLUDE_DIR)

使用下面方式安装可以解决。

sudo apt-get install protobuf-compiler libprotobuf-dev

也可以下载编译好的，见https://www.yuque.com/mnn/cn/model_convert的最后面。

模型转换：
没有使用--bizCode biz，不知道干啥用的，网上搜也没搜到，但是文档中却有，没空研究MNN代码。谁知道，麻烦告诉我，谢谢。

# 首先根据Yolo-FastestV2作者提供的python脚本转换成onnx
python3 pytorch2onnx.py --data data/coco.data --weights modelzoo/coco2017-0.241078ap-model.pth --output yolo-fastestv2.onnx
# 实际测试发现，这个优化效果并不明显，循环100次只能看到毫秒级别的提升，速度统计的波动都可能会覆盖这个差别
python3 -m onnxsim yolo-fastestv2.onnx yolo-fastestv2-opt.onnx
# 转换完成保存起来，等待移植好了之后使用
./MNNConvert -f ONNX --modelFile /home/yiifburj/code/Yolo-FastestV2/yolo-fastestv2-opt.onnx --MNNModel yolofastestv2-opt.mnn

MNN官方称1.2.0版本已经支持TORCH的方式转换，但实测发现还是不支持，MNN官方给出的pytorch导出模型方法如下：

import torch
# ...
#  model is exported model
model.eval()
# trace
model_trace = torch.jit.trace(model, torch.rand(1, 3, 1200, 1200))
model_trace.save('model_trace.pt')
# script
model_script = torch.jit.script(model)
model_script.save('model_script.pt')

可以放在 pytorch2onnx.py 里面，这是两种方法，网上搜索发现，如果包含了一些不支持的操作，后面一种是不成功的，如果使用前面那种方案，为了保证正常工作，要保证所有的警告都已经被处理，另外前面的方案会和当前使用的device绑定，比如CPU或GPU，如果不是后面的方案失败，或是为了提速，建议使用后面那种(网上说的)，因为暂不支持，所以暂未测试这种方式。

下载交叉编译器，由于新的交叉编译器编译的结果在老的系统中运行会有问题，我下载了一个相对比较老的版本。gcc-linaro-7.5.0-2019.12-x86_64_aarch64-linux-gnu.tar.xz，在https://releases.linaro.org/components/toolchain/binaries/latest-7/aarch64-linux-gnu/里面找的。
如果树莓派里面的程序比较新，应该可以直接使用ubuntu源里面提供的，我的ubuntu太新了，而树莓派里面的ubuntu比较老，18.04，比较老，经测试，这个编译器编译，运行有问题，提示找不到xxx，比如下面：

/home/huike/nf/usr/local/lib/libMNN.so: undefined reference to `pthread_create@GLIBC_2.34'
/home/huike/nf/usr/local/lib/libMNN.so: undefined reference to `__libc_single_threaded@GLIBC_2.32'
/home/huike/nf/usr/local/lib/libMNN.so: undefined reference to `std::_Sp_make_shared_tag::_S_eq(std::type_info const&)@GLIBCXX_3.4.26'
/home/huike/nf/usr/local/lib/libMNN.so: undefined reference to `exp@GLIBC_2.29'
/home/huike/nf/usr/local/lib/libMNN.so: undefined reference to `std::__throw_bad_array_new_length()@GLIBCXX_3.4.29'
/home/huike/nf/usr/local/lib/libMNN.so: undefined reference to `log@GLIBC_2.29'
collect2: error: ld returned 1 exit status
CMakeFiles/yolodepth.dir/build.make:161: recipe for target 'yolodepth' failed
make[2]: *** [yolodepth] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/yolodepth.dir/all' failed
make[1]: *** [CMakeFiles/yolodepth.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

一看提示，缺少的东西带的版本号，就是libc等版本过低的问题了，基于新版本编译的都会无法运行。

使用源里面的交叉编译器的方法如下：

sudo apt install  gcc-aarch64-linux-gnu g++-aarch64-linux-gnu

以下两个选项，经测试，树莓派上都不支持(开启-DMNN_OPENCL=ON，已经编译成功，我的树莓派的ubuntu中也有opencl的软件包，但是运行还是提示不支持，不知道为什么)，所以开关都无所谓。

MNN_OPENCL  MNN_VULKAN

MNN_OPENMP开启了也不管用，因为优先使用了MNN_USE_THREAD_POOL，在cmake ..的那一步就会提示，可以cmake步骤之后通过ccmake ..配置关闭或者在cmake的时候通过-D关闭。不过，两种方式应该差别不是很大。

上面说的都是后话了。先按照官方文档中的尝试一下：

# 在代码路径执行
# 根据文档要求
./schema/generate.sh

mkdir aarch64build
cd aarch64build

# 最后一个选项是为了生成compile_commands.json，用于代码编辑器跳转和补全，可以忽略
cmake .. -DCMAKE_SYSTEM_NAME=Linux -DCMAKE_SYSTEM_VERSION=1 -DCMAKE_SYSTEM_PROCESSOR=aarch64 -DCMAKE_C_COMPILER=交叉编译器路径/gcc-linaro-7.5.0-2019.12-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -DCMAKE_CXX_COMPILER=交叉编译器路径/gcc-linaro-7.5.0-2019.12-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-g++ -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
# 我电脑16核，根据情况选择
make -j 16

报错如下：

/tmp/MNN-1.2.1/source/backend/cpu/compute/WinogradInt8Helper.cpp: 在函数‘void MNN::TRANS_4x4(MNN::VecType&, MNN::VecType&, MNN::VecType&, MNN::VecType&)’中:
/tmp/MNN-1.2.1/source/backend/cpu/compute/WinogradInt8Helper.cpp:39:48: 附注： use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts
     auto m0 = vtrn1q_s32(vec0.value, vec1.value), m1 = vtrn2q_s32(vec0.value, vec1.value);
                                                ^
/tmp/MNN-1.2.1/source/backend/cpu/compute/WinogradInt8Helper.cpp:39:48: 错误： cannot convert ‘int8x16_t {
   aka __vector(16) signed char}’ to ‘int32x4_t {
   aka __vector(4) int}’ for argument ‘1’ to ‘int32x4_t vtrn1q_s32(int32x4_t, int32x4_t)’
/tmp/MNN-1.2.1/source/backend/cpu/compute/WinogradInt8Helper.cpp:40:48: 错误： cannot convert ‘int8x16_t {
   aka __vector(16) signed char}’ to ‘int32x4_t {
   aka __vector(4) int}’ for argument ‘1’ to ‘int32x4_t vtrn1q_s32(int32x4_t, int32x4_t)’
     auto m2 = vtrn1q_s32(vec2.value, vec3.value), m3 = vtrn2q_s32(vec2.value, vec3.value);
                                                ^
/tmp/MNN-1.2.1/source/backend/cpu/compute/WinogradInt8Helper.cpp:42:29: 错误： ‘m1’在此作用域中尚未声明
     vec1.value = vtrn1q_s64(m1, m3);
                             ^~
/tmp/MNN-1.2.1/source/backend/cpu/compute/WinogradInt8Helper.cpp:42:33: 错误： ‘m3’在此作用域中尚未声明
     vec1.value = vtrn1q_s64(m1, m3);
                                 ^~

解决方案已经在报错中给出了，添加-flax-vector-conversionsflag，两种办法：

ccmake ..
# t 打开高级模式
# CMAKE_CXX_FLAGS中添加-flax-vector-conversions
# 回车，c使修改生效，q退出， 操作都有提示，根据提示操作就可以

cmake .. -DCMAKE_SYSTEM_NAME=Linux -DCMAKE_SYSTEM_VERSION=1 -DCMAKE_SYSTEM_PROCESSOR=aarch64 -DCMAKE_C_COMPILER=交叉编译器路径/gcc-linaro-7.5.0-2019.12-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -DCMAKE_CXX_COMPILER=交叉编译器路径/gcc-linaro-7.5.0-2019.12-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-g++ -DCMAKE_CXX_FLAGS='-flax-vector-conversions'

然后再编译，成功了。
收集安装文件，头文件和库:

# 收集安装文件，头文件和库
mkdir installdir
make DESTDIR=installdir install
# 然后cp到树莓派的系统中，供编译使用，nfs或scp等任意方式

还有一种方法，直接使用源码中提供的编译脚本。但是由于我使用的编译器的名字和脚本中的不同，需要做一些修改。我的编译器是aarch64-linux-gnu-xxx，而脚本中使用的是aarch64-linux-gnueabihf-gcc-xxx，具体两者是否有区别，还没清楚，以前都是叫aarch64-linux-gnueabihf-gcc-xxx这样的名字，和elf等编译裸机程序的相区别，现在又出现了aarch64-linux-gnu-xxx这种名字，暂不清楚二者的区别，谁知道，可以留言告诉我，谢谢。

注意修改了cmake文件需要删除编译的临时目录重来，仅仅make clean，往往是不行的。

# 不会vim请用其他编译器，gedit等
# 所有的aarch64-gnueabihf替换成aarch64-linux-gnu
# vim：在第一行VG全部选中，然后s/aarch64-gnueabihf/aarch64-linux-gnu/g全部替换
# 根据namelist里面的顺序，看是第几个，然后找到对应的位置，修改，注意从0开始
# 同样，为了防止前面提到的报错，添加 参数-DCMAKE_CXX_FLAGS="-flax-vector-conversions"
# make -j 16 提速。make DESTDIR=installdir install 收集安装文件，和上面说的相同。
vim project/cross-compile/build.sh

# 和上面类似，使用aarch64-linux-gnu
# 然后et( CMAKE_C_COMPILER aarch64-linux-gnueabihf-gcc)那里和set( CMAKE_CXX_COMPILER aarch64-linux-gnueabihf-g++)那里也要改，改成对应的编译器的名字，如果是编译器已经在PATH中，直接使用名字，如果没有在PATH中，使用绝对路径，即 绝对路径/aarch64-linux-gnu-gcc， 绝对路径/aarch64-linux-gnu-g++
vim project/cross-compile/arm.toolchain.cmake

# 编译默认生成的目录是 build-aarch64-linux-gnu，文件都在这里面
project/cross-compile/build.sh aarch64-linux-gnu

另外，还可以针对cpu提速，树莓派4B用的是cotex-a72，不过没看见明显的增强。同样官方1.2.0开始支持的BF16，-DMNN_SUPPORT_BF16=ON开启，代码中也要开启，微小提升，但是测试图片识别少识别了两个框，这说明由于这是极精简的模型，本来就不是十分精确，再用这种降低精确度的优化，未必会很适合。另外，同时开启-DCMAKE_CXX_FLAGS="-mcpu=cortex-a72"和-DMNN_SUPPORT_BF16=ON，和只开启-DMNN_SUPPORT_BF16=ON相比，性能反而降低了一点点，这是为什么呢，很神奇，不知道原理！

-DCMAKE_CXX_FLAGS="-flax-vector-conversions" 修改为
-DCMAKE_CXX_FLAGS="-flax-vector-conversions -mcpu=cortex-a72"
添加-mcpu=cortex-a72

代码移植

这部分花了比较长的时间，一个是不熟悉输入输出，通过看Yolo-FastestV2的训练和测试程序，还有用netron看导出的onnx的模型的图，可以了解，主要是后处理，输出的几部分都是什么意思要清楚，Yolo-FastestV2在是否export onnx的时候是不一样的，主要是是否包含sigmoid和softmax的计算，这点在后处理上是有区别的，实际上简化了后处理。还有一个原因就是不熟悉MNN，第一次用，要一点点探索，尤其是输入通道顺序等的转换问题，和其他相关bug混在一起，造成误判断，分析的时候浪费了不少时间。

代码是在Yolo-FastestV2的基础上修改的，主要是把ncnn相关的部分程序和数据结构改成MNN的。关于MNN的输入和输出通道的问题，参见注释。另外读取图片使用的opencv，没有使用MNN的那一套。代码如下：

yolo-fastestv2.h:

#ifndef YOLO_FASTEST_V2_H_
#define YOLO_FASTEST_V2_H_

#include <vector>
#include <opencv2/opencv.hpp>
#include <MNN/Interpreter.hpp>

class TargetBox
{
   
private:
    float getWidth() {
    return (x2 - x1); };
    float getHeight() {
    return (y2 - y1); };

public:
    int x1;
    int y1;
    int x2;
    int y2;

    int cate;
    float score;

    float area() {
    return getWidth() * getHeight(); };
};

class yoloFastestv2
{
   
private:
	std::shared_ptr<MNN::Interpreter> net=nullptr;
	// 文档说net释放了它也会释放
	MNN::Session* session=nullptr;
	// 没看源码，不过猜测是一次性使用的，放在栈空间应该就可以
    MNN::ScheduleConfig config;
	// 不能放在栈空间中
	MNN::BackendConfig backendConfig;
    std::vector<float> anchor;

	const char *inputName;
	const char *outputName1;
	const char *outputName2;
	const char *outputNames[2];

    int numAnchor;
    int numOutput;
    int numThreads;
    int numCategory;
    int inputWidth, inputHeight;

    float nmsThresh;

    int nmsHandle(std::vector<TargetBox> &tmpBoxes, std::vector<TargetBox> &dstBoxes);
    int getCategory(const float *values, int index, int &category, float &score);

	int predHandle(std::unique_ptr<MNN::Tensor>*outs, std::vector<TargetBox> &dstBoxes,
			const float scaleW, const float scaleH, const float thresh);

public:
    yoloFastestv2();
    ~yoloFastestv2();

    int loadModel(const char* binPath);
    int detection(const cv::Mat srcImg, std::vector<TargetBox> &dstBoxes, 
                  const float thresh = 0.3);
};

#endif

yolo-fastestv2.cpp:

#include <cmath>
#include <algorithm>
#include "MNN/Tensor.hpp"
#include "yolo-fastestv2.h"
#include <opencv2/core/matx.hpp>
#include <vector>
#include <memory>

using namespace std;

//模型的参数配置
yoloFastestv2

最低0.47元/天解锁文章

weixin_39266208

关注

3
点赞
踩
23

收藏

觉得还不错? 一键收藏
2
评论
Yolo-FastestV2在树莓派4B上的MNN移植记录

Yolo-FastestV2 https://github.com/dog-qiuqiu/Yolo-FastestV2/，非常感谢作者的分享！模型准备首先，下载代码，根据要求训练，或者直接使用作者训练好的模型，根据作者的文档，导出onnx模型。MNN编译下载最新的MNN代码。编译MNNConvert首先编译MNNConvert，这个是x86_64版本的，还好，用cmake可以在不同的目录里build，根据官方文档编译，默认是不成功的，我的版本是1.2.1。编译方式如下：cd MNN/./s
复制链接

扫一扫

专栏目录