步骤
安装git和cmake
$ sudo apt-get install git
$ sudo apt-get install cmake
安装protocobuf和其它依赖项
这段命令参考自《树莓派3B完成ncnn框架测试》
sudo apt-get install -y gfortran
sudo apt-get install -y libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev
sudo apt-get install -y libgflags-dev libgoogle-glog-dev liblmdb-dev libatlas-base-dev
本来想从源码安装protocobuf的,但是总是编译失败。直接后果就是没法编译ncnn的模型转换工具。后来找到上面这段命令,可以安装预编译的版本。
下载ncnn源码
$ git clone https://github.com/Tencent/ncnn.git
$ cdn ncnn
在编译之前,我们希望和示例程序一起编译。需要修改CmakeList.txt文件。去掉下面两段代码前面的#
add_subdirectory(examples)
add_subdirectory(benchmark)
开启opemmp支持
发现按照wiki里的方法在树莓派上编译并不能支持openmp,虽然我在前面安装过openmp相关的库。
开启的方法就是在src/CMakelist.txt大概41行。把41行注释掉,改为下面的样子:
不知为何OpenMP_CXX_FOUND不会起效,这里控制了是否链接openmp。
后面跑benchmark的时候,可以开启top并按H切换到线程视图下,看是否支持了多线程加速。
上面的修改方法用benchmark和example测试都开启了openmp,说明这个方法的可靠性。
编译
参考:how to build
$ cd <ncnn-root-dir>
$ sudo mkdir -p build
$ cd build
$ cmake -DCMAKE_TOOLCHAIN_FILE=../toolchains/pi3.toolchain.cmake -DPI3=ON ..
$ make -j4 # 生成./src/libncnn.a
$ sudo make install # 安装在在当前的./install目录下
转换模型
参考《树莓派3B完成ncnn框架测试》
下载训练好的yolo模型:
https://github.com/eric612/MobileNet-YOLO/tree/master/models/yolov2
这里我们下载:
mobilenet_yolo_deploy_iter_80000.caffemodel
mobilenet_yolo_deploy.prototxt
这两个文件。
使用前面编译好的工具转换成ncnn框架可以使用的文件:
caffe2ncnn mobilenet_yolo_deploy.prototxt mobilenet_yolo_deploy_iter_80000.caffemodel mobilenet_yolo.param mobilenet_yolo.bin
转换完成后,拷贝mobilenet_yolo.param mobilenet_yolo.bin到编译好的build/examples文件夹下面,再传一张测试图片进来,执行:
./yolov2 zoo.jpeg
不知道为啥把石头和大象识别成牛羊了,后来看了下模型训练的分类里没有大象这一类…。人的检测还可以。
benchmark 速度测试
用ncnn自带的benchamark测试,发现多线程没啥用。后来找到办法开启了OpenMP,重新编译,多线程就可以用了。
单线程情况下关闭NEON优化后的测试时间变为之前的5倍左右。
开启NEON和OpenMP情况下,使用benchmark程序在树莓派3B+上测试。分别测试1~8线程的性能,表格如下:
结论:
- NEON优化在单线程下优化速度为5倍
- OpenMP优化在2~4线程下有效果,8线程反而回退了(3B+是4核的,多于4核可能面临线程切换的消耗)
所以,没事就开CPU核数个线程。对树莓派3B+来说有4个cpu核,最好开4线程。
补充1
camke的时候指定了toolchains/pi3.toolchain.cmake这个文件,看下内容:
SET(CMAKE_SYSTEM_NANE Android)
SET(CMAKE_SYSTEM_PROCESSOR "armv7l")
SET(ANDROID_ARCH_NAME "arm")
SET(UNIX true)
SET(CMAKE_C_COMPILER "gcc")
SET(CMAKE_CXX_COMPILER "g++")
上面第一行之所以要指定为Android,是因为只在Android和ios系统下才会开启NEO优化。
补充2
ncnn的模型转换工具支持量化的,因为时间关系,我上面并没有开启这个特性。
参考资料
protocolbuffers/protobuf github
附录
benchmark测试输出如下:
$ ./benchncnn 30 1 0 ; ./benchncnn 30 2 0 ; ./benchncnn 30 4 0 ; ./benchncnn 30 8 0
loop_count = 30
num_threads = 1
powersave = 0
squeezenet min = 191.15 max = 197.95 avg = 193.71
mobilenet min = 332.52 max = 712.73 avg = 373.83
mobilenet_v2 min = 278.99 max = 567.85 avg = 334.95
shufflenet min = 134.46 max = 290.25 avg = 191.72
mnasnet min = 228.93 max = 488.14 avg = 343.62
proxylessnasnet min = 274.45 max = 587.73 avg = 327.98
googlenet min = 778.38 max = 1707.05 avg = 920.31
resnet18 min = 919.95 max = 1686.31 avg = 1067.66
alexnet min = 1401.70 max = 1413.93 avg = 1404.25
vgg16 min = 3814.86 max = 3831.69 avg = 3822.89
squeezenet-ssd min = 495.19 max = 499.31 avg = 496.61
mobilenet-ssd min = 634.35 max = 1335.46 avg = 807.15
mobilenet-yolo min = 1575.70 max = 2885.69 avg = 1740.90
mobilenet-yolov3 min = 1496.59 max = 2566.92 avg = 1555.01
loop_count = 30
num_threads = 2
powersave = 0
squeezenet min = 120.47 max = 235.58 avg = 173.24
mobilenet min = 230.05 max = 406.12 avg = 367.83
mobilenet_v2 min = 214.18 max = 348.14 avg = 290.10
shufflenet min = 89.34 max = 170.82 avg = 140.35
mnasnet min = 155.09 max = 283.80 avg = 230.67
proxylessnasnet min = 189.55 max = 339.81 avg = 251.46
googlenet min = 459.43 max = 919.78 avg = 708.38
resnet18 min = 641.69 max = 1031.05 avg = 684.19
alexnet min = 750.21 max = 1625.59 avg = 1068.26
vgg16 min = 2618.68 max = 4212.59 avg = 3354.56
squeezenet-ssd min = 361.47 max = 584.30 avg = 461.95
mobilenet-ssd min = 417.30 max = 754.31 avg = 611.25
mobilenet-yolo min = 1074.40 max = 1840.71 avg = 1579.28
mobilenet-yolov3 min = 969.24 max = 1726.64 avg = 1507.29
loop_count = 30
num_threads = 4
powersave = 0
squeezenet min = 130.88 max = 224.45 avg = 164.43
mobilenet min = 227.88 max = 347.24 avg = 290.05
mobilenet_v2 min = 256.39 max = 338.31 avg = 280.18
shufflenet min = 72.37 max = 188.53 avg = 121.16
mnasnet min = 134.82 max = 262.01 avg = 197.08
proxylessnasnet min = 173.73 max = 311.15 avg = 246.68
googlenet min = 503.19 max = 624.17 avg = 571.78
resnet18 min = 642.78 max = 816.13 avg = 745.53
alexnet min = 668.78 max = 921.54 avg = 839.75
vgg16 min = 3040.07 max = 3749.61 avg = 3270.55
squeezenet-ssd min = 420.24 max = 537.55 avg = 475.67
mobilenet-ssd min = 426.47 max = 599.20 avg = 528.55
mobilenet-yolo min = 1284.95 max = 1469.43 avg = 1395.46
mobilenet-yolov3 min = 1172.91 max = 1368.70 avg = 1283.90
loop_count = 30
num_threads = 8
powersave = 0
squeezenet min = 155.76 max = 212.55 avg = 181.92
mobilenet min = 230.69 max = 368.30 avg = 308.16
mobilenet_v2 min = 239.35 max = 357.53 avg = 315.37
shufflenet min = 122.54 max = 237.19 avg = 172.18
mnasnet min = 188.97 max = 299.02 avg = 236.72
proxylessnasnet min = 207.24 max = 344.90 avg = 285.54
googlenet min = 544.46 max = 669.89 avg = 619.41
resnet18 min = 573.18 max = 824.10 avg = 757.05
alexnet min = 689.93 max = 920.11 avg = 833.42
vgg16 min = 3394.19 max = 3854.80 avg = 3639.00
squeezenet-ssd min = 438.48 max = 568.54 avg = 513.22
mobilenet-ssd min = 462.58 max = 628.85 avg = 556.35
mobilenet-yolo min = 1397.14 max = 1609.71 avg = 1496.27
mobilenet-yolov3 min = 1245.85 max = 1445.50 avg = 1363.50