通过ARM Compute Library写的VGG16的模型

最新推荐文章于 2024-11-19 11:41:23 发布

ppplinday

最新推荐文章于 2024-11-19 11:41:23 发布

阅读量6.4k

点赞数 3

本文链接：https://blog.csdn.net/u014432647/article/details/73613642

版权

本文介绍如何通过ARM Compute Library而非Tensorflow或Caffe在手机CPU上实现VGG16模型。首先从官方GitHub获取库，然后进行编译，创建OpenCL库。主要代码位于`vgg16_model_arm_compute_library_NEON.cpp`，编译后使用NDK在Android设备上运行。注意，手机内存至少2GB以防止内存不足导致的崩溃。提供的GitHub资源包含了权重文件和运行示例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

PS：之前那一篇写的太简单了，可能很多操作大家都不懂，这次修改很很多，完全写好了详细一点，顺便说说我用的arm compute library是17.05版的，前几天出了17.06版应该修好了挺多bug，例如17.05版softmax会有overflow的情况，还有一些typo的问题，大家可以尝试用用17.06版，估计过几个月也会有新的版本。而且arm compute library暂时不能train，所以train可能要在tensorflow或者caffe上跑，然后把数据加载到这边，可能arm compute library暂时只优化了验证这边吧。至于如何通过arm compute library写模型就看PO主这篇薄荷！！！传送们在这！！！

http://blog.csdn.net/u014432647/article/details/74738947

正文开始啦！！！
不是通过tensorflow和caffe配置的，是通过ARM Compute Library写的。测试是手机CPU。
首先放出ARM Compute Library的github：

https://github.com/ARM-software/ComputeLibrary

具体的操作方法和基本的examples实现上面都有，这个的确很麻烦，操作了很久。
首先先看看研究一下上面那个东东。PO主放出自己的github：

https://github.com/ppplinday/vgg16-by-ARM-Compute-Library

这个PO主是已经下载好了arm compute library，把写好的文件和权重都放进去啦！本文基本都是按照PO主这个github操作的！但由于之前没有好好将一下详细过程，所以这次讲一讲吧！当然想偷懒的小伙伴也可以直接跑PO主的脚本！下面总结也会说到的！

第一步就要开始build这个库：

scons Werror=1 -j8 debug=0 asserts=1 neon=1 opencl=1 embed_kernels=1 os=android arch=arm64-v8a

在arm compute library中的Documentation的neon是=0，由于是cpu所以这边也是neon=1。而且PO主是用64位滴，大伙注意啦！

用这个建好库之后会多出一个build的文件夹内面有很多.so这些库。然后再建一个空的opencl库。。。。

aarch64-linux-android-gcc -o libOpenCL.so -Iinclude -shared opencl-1.2-stubs/opencl_stubs.c -fPIC -shared

就会多出一个libOpenCL.so
PO主的主要cpp是放在examples中的vgg16_model_arm_compute_library_NEON.cpp
接着是编译：（编译是要aarch64-linux-android-g++这些编译器，自己下载啦！）

aarch64-linux-android-g++ examples/vgg16_model_arm_compute_library_NEON.cpp test_helpers/Utils.cpp -I. -Iinclude -std=c++11 -larm_compute-static -lOpenCL -L/home/zhoupeilin/vgg16-by-ARM-Compute-Library/build/arm_compute -L. -o main -static-libstdc++ -pie

编译好了的文件是main。由于PO主是用android手机跑，所以要按转NDK。NDK的安装和环境变量设置，这些就不是本文要说的啦！自行google啦！
接着就要跑一下PO主的脚本，把权重都放入手机上面：

sh pushweight.sh

权证文件放在vgg16_weight中大约有1.8g的csv文件。。。。。。
最后可以把刚刚编译好的程序push到手机上接着就可以跑啦！！！

adb push main /data/local/tmp/
adb shell /data/local/tmp/main

PO主的手机是红米Note 4X，cpu是高通骁龙625，内存3g可用大约2g，这个程序的手机可用内存最好是2G以上，否测可能会因分配内存不够而死机。
想偷懒的童鞋可以下载好PO主的github然后直接跑两个脚本文件就行啦！记得要插上手机！

sh pushweight.sh
sh main.sh

最后VGG16就不用多说吧，深度学习的都懂。我是在Linux系统交叉编译然后在Android手机跑的程序。
放出我的vgg的代码！！！PO主会在下一篇博客会详细介绍怎样用arm compute lirbrary写模型！！！

代码来啦！！！

#include "arm_compute/runtime/NEON/NEFunctions.h"
#include "arm_compute/runtime/CL/CLFunctions.h"

#include "arm_compute/core/Types.h"
#include "test_helpers/Utils.h"
#include <iostream>
#include <sstream>
#include <fstream>
#include <ostream>
#include <sys/time.h>
#include <map>

using namespace arm_compute;
using namespace test_helpers;

float StringToFloat(const std::string & str){
    std::istringstream iss(str);
    float number;
    iss >> number;
    return number;
}

void main_neon_vgg16(int argc, const char **argv)
{
    /*----------------------------------[init_model_vgg16]-----------------------------------*/

    /*----------------------------------BEGIN:[init_Tensor]----------------------------------*/
    //init_input_tensor
    Tensor input;

    //init_conv_1_tensor
    Tensor weights_1_1;
    Tensor biases_1_1;
    Tensor out_1_1;
    Tensor act_1_1;

    Tensor weights_1_2;
    Tensor biases_1_2;
    Tensor out_1_2;
    Tensor act_1_2;

    Tensor pool_1;

    //init_conv_2_tensor
    Tensor weights_2_1;
    Tensor biases_2_1;
    Tensor out_2_1;
    Tensor act_2_1;

    Tensor weights_2_2;
    Tensor biases_2_2;
    Tensor out_2_2;
    Tensor act_2_2;

    Tensor pool_2;

    //init_conv_3_tensor
    Tensor weights_3_1;
    Tensor biases_3_1;
    Tensor out_3_1;
    Tensor act_3_1;

    Tensor weights_3_2;
    Tensor biases_3_2;
    Tensor out_3_2;
    Tensor act_3_2;

    Tensor weights_3_3;
    Tensor biases_3_3;
    Tensor out_3_3;
    Tensor act_3_3;

    Tensor pool_3;

    //init_conv_4_tensor
    Tensor weights_4_1;
    Tensor biases_4_1;
    Tensor out_4_1;
    Tensor act_4_1;

    Tensor weights_4_2;
    Tensor biases_4_2;
    Tensor out_4_2;
    Tensor act_4_2;

    Tensor weights_4_3;
    Tensor biases_4_3;
    Tensor out_4_3;
    Tensor act_4_3;

    Tensor pool_4;

    //init_conv_5_tensor
    Tensor weights_5_1;
    Tensor biases_5_1;
    Tensor out_5_1;
    Tensor act_5_1;

    Tensor weights_5_2;
    Tensor biases_5_2;
    Tensor out_5_2;
    Tensor act_5_2;

    Tensor weights_5_3;
    Tensor biases_5_3;
    Tensor out_5_3;
    Tensor act_5_3;

    Tensor pool_5;

    //init_fc_6
    Tensor weights_6;
    Tensor biases_6;
    Tensor out_6;
    Tensor act_6;

    //init_fc_7
    Tensor weights_7;
    Tensor biases_7;
    Tensor out_7;
    Tensor act_7;

    //init_fc_8
    Tensor weights_8;
    Tensor biases_8;
    Tensor out_8;

    Tensor softmax_tensor;

    //init_tensor
    constexpr unsigned int input_width  = 224;
    constexpr unsigned int input_height = 224;
    constexpr unsigned int input_fm     = 3;

    const TensorShape input_shape(input_width, input_height, input_fm);
    input.allocator() -> init(TensorInfo(input_shape, 1, DataType::F32));

    //init_conv_1_1
    constexpr unsigned int conv_1_1_kernel_x = 3;
    constexpr unsigned int conv_1_1_kernel_y = 3;
    constexpr unsigned int conv_1_1_fm       = 64;

    const TensorShape conv_1_1_weights_shape(conv_1_1_kernel_x, conv_1_1_kernel_y, input_shape.z(), conv_1_1_fm);
    const TensorShape conv_1_1_biases_shape(conv_1_1_weights_shape[3]);
    const TensorShape conv_1_1_out_shape(input_shape.x(), input_shape.y(), conv_1_1_weights_shape[3]);

    weights_1_1.allocator() -> init(TensorInfo(conv_1_1_weights_shape, 1, DataType::F32));
    biases_1_1.allocator() -> init(TensorInfo(conv_1_1_biases_shape, 1, DataType::F32));
    out_1_1.allocator() -> init(TensorInfo(conv_1_1_out_shape, 1, DataType::F32));

    act_1_1.allocator() -> init(TensorInfo(conv_1_1_out_shape, 1, DataType::F32));

    //init_conv_1_2
    constexpr unsigned int conv_1_2_kernel_x = 3;
    constexpr unsigned int conv_1_2_kernel_y = 3;
    constexpr unsigned int conv_1_2_fm       = 64;

    const TensorShape conv_1_2_weights_shape(conv_1_2_kernel_x, conv_1_2_kernel_y, conv_1_1_out_shape.z(), conv_1_2_fm);
    const TensorShape conv_1_2_biases_shape(conv_1_2_weights_shape[3]);
    const TensorShape conv_1_2_out_shape(conv_1_1_out_shape.x(), conv_1_1_out_shape.y(), conv_1_1_weights_shape[3]);

    weights_1_2.allocator() -> init(TensorInfo(conv_1_2_weights_shape, 1, DataType::F32));
    biases_1_2.allocator() -> init(TensorInfo(conv_1_2_biases_shape, 1, DataType::F32));
    out_1_2.allocator() -> init(TensorInfo(conv_1_2_out_shape, 1, DataType::F32));

    act_1_2.allocator() -> init(TensorInfo(conv_1_2_out_shape, 1, DataType::F32));

    TensorShape conv_1_pool = conv_1_2_out_shape;
    conv_1_pool.set(0, conv_1_pool.x() / 2);
    conv_1_pool.set(1, conv_1_pool.y() / 2);
    pool_1.allocator() -> init(TensorInfo(conv_1_pool, 1, DataType::F32));

    //init_conv_2_1
    constexpr unsigned int conv_2_1_kernel_x = 3;
    constexpr unsigned int conv_2_1_kernel_y = 3;
    constexpr unsigned int conv_2_1_fm       = 128;

    const TensorShape conv_2_1_weights_shape(conv_2_1_kernel_x, conv_2_1_kernel_y, conv_1_pool.z(), conv_2_1_fm);
    const TensorShape conv_2_1_biases_shape(conv_2_1_weights_shape[3]);
    const TensorShape conv_2_1_out_shape(conv_1_pool.x(), conv_1_pool.y(), conv_2_1_weights_shape[3]);

    weights_2_1.allocator() -> init(TensorInfo(conv_2_1_weights_shape, 1, DataType::F32));
    biases_2_1.allocator() -> init(TensorInfo(conv_2_1_biases_shape, 1, DataType::F32));
    out_2_1.allocator() -> init(TensorInfo(conv_2_1_out_shape, 1, DataType::F32));

    act_2_1.allocator() -> init(TensorInfo(conv_2_1_out_shape, 1, DataType::F32));

    //init_conv_2_2
    constexpr unsigned int conv_2_2_kernel_x = 3;
    constexpr unsigned int conv_2_2_kernel_y = 3;
    constexpr unsigned int conv_2_2_fm       = 128;

    const TensorShape conv_2_2_weights_shape(conv_2_2_kernel_x, conv_2_2_kernel_y, conv_2_1_out_shape.z(), conv_2_2_fm);
    const TensorShape conv_2_2_biases_shape(conv_2_2_weights_shape[3]);
    const TensorShape conv_2_2_out_shape(conv_2_1_out_shape.x(), conv_2_1_out_shape.y(), conv_2_2_weights_shape[3]);

    weights_2_2.allocator() -> init(TensorInfo(conv_2_2_weights_shape, 1, DataType::F32));
    biases_2_2.allocator() -> init(TensorInfo(conv_2_2_biases_shape, 1, DataType::F32));
    out_2_2.allocator() -> init(TensorInfo(conv_2_2_out_shape, 1, DataType::F32));

    act_2_2.allocator() -> init(TensorInfo(conv_2_2_out_shape, 1, DataType::F32));

    TensorShape conv_2_pool = conv_2_2_out_shape;
    conv_2_pool.set(0, conv_2_pool.x() / 2);
    conv_2_pool.set(1, conv_2_pool.y() / 2);
    pool_2.allocator() -> init(TensorInfo(conv_2_pool, 1, DataType::F32));

    //init_conv_3_1
    constexpr unsigned int conv_3_1_kernel_x = 3;
    constexpr unsigned int conv_3_1_kernel_y = 3;
    constexpr unsigned int conv_3_1_fm       = 256;

    const TensorShape conv_3_1_weights_shape(conv_3_1_kernel_x, conv_3_1_kernel_y, conv_2_pool.z(), conv_3_1_fm);
    const TensorShape conv_3_1_biases_shape(conv_3_1_weights_shape[3]);
    const TensorShape conv_3_1_out_shape(conv_2_pool.x(), conv_2_pool.y(), conv_3_1_weights_shape[3]);

    weights_3_1.allocator() -> init(TensorInfo(conv_3_1_weights_shape, 1, DataType::F32));
    biases_3_1.allocator() -> init(TensorInfo(conv_3_1_biases_shape, 1, DataType::F32));
    out_3_1.allocator() -> init(TensorInfo(conv_3_1_out_shape, 1, DataType::F32));

    act_3_1.allocator() -> init(TensorInfo(conv_3_1_out_shape, 1, DataType::F32));

    //init_conv_3_2
    constexpr unsigned int conv_3_2_kernel_x = 3;
    constexpr unsigned int conv_3_2_kernel_y = 3;
    constexpr unsigned int conv_3_2_fm       = 256;

    const TensorShape conv_3_2_weights_shape(conv_3_2_kernel_x, conv_3_2_kernel_y, conv_3_1_out_shape.z(), conv_3_2_fm);
    const TensorShape conv_3_2_biases_shape(conv_3_2_weights_shape[3]);
    const TensorShape conv_3_2_out_shape(conv_3_1_out_shape.x(), conv_3_1_out_shape.y(), conv_3_2_weights_shape[3]);

    weights_3_2.allocator() -> init(TensorInfo(conv_3_2_weights_shape, 1, DataType::F32));
    biases_3_2.allocator() -> init(TensorInfo(conv_3_2_biases_shape, 1, DataType::F32));
    out_3_2.allocator() -> init(TensorInfo(conv_3_2_out_shape, 1, DataType::F32));

    act_3_2.allocator() -> init(TensorInfo(conv_3_2_out_shape, 1, DataType::F32));

    //init_conv_3_3
    constexpr unsigned int conv_3_3_kernel_x = 3;
    constexpr unsigned int conv_3_3_kernel_y = 3;
    constexpr unsigned int conv_3_3_fm       = 256;

    const TensorShape conv_3_3_weights_shape(conv_3_3_kernel_x, conv_3_3_kernel_y, conv_3_2_out_shape.z(), conv_3_3_fm);
    const TensorShape conv_3_3_biases_shape(conv_3_3_weights_shape[3]);
    const TensorShape conv_3_3_out_shape(conv_3_2_out_shape.x(), conv_3_2_out_shape.y(), conv_3_3_weights_shape[3]);

    weights_3_3.allocator() -> init(TensorInfo(conv_3_3_weights_shape, 1, DataType::F32));
    biases_3_3.allocator() -> init(TensorInfo(conv_3_3_biases_shape, 1, DataType::F32));
    out_3_3.allocator() -> init(TensorInfo(conv_3_3_out_shape, 1, DataType::F32));

    act_3_3.allocator() -> init(TensorInfo(conv_3_3_out_shape, 1, DataType::F32));

    TensorShape conv_3_pool = conv_3_3_out_shape;
    conv_3_pool.set(0, conv_3_pool.x() / 2);
    conv_3_pool.set(1, conv_3_pool.y() / 2);
    pool_3.allocator() -> init(TensorInfo(conv_3_pool, 1, DataType::F32));

    //init_conv_4_1
    constexpr unsigned int conv_4_1_kernel_x = 3;
    constexpr unsigned int conv_4_1_kernel_y = 3;
    constexpr unsigned int conv_4_1_fm       = 512;

    const TensorShape conv_4_1_weights_shape(conv_4_1_kernel_x, conv_4_1_kernel_y, conv_3_pool.z(), conv_4_1_fm);
    const TensorShape conv_4_1_biases_shape(conv_4_1_weights_shape[3]);
    const TensorShape conv_4_1_out_shape(conv_3_pool.x(), conv_3_pool.y(), conv_4_1_weights_shape[3]);

    weights_4_1.allocator() -> init(TensorInfo(conv_4_1_weights_shape, 1, DataType::F32));
    biases_4_1.allocator() -> init(TensorInfo(conv_4_1_biases_shape, 1, DataType::F32));
    out_4_1.allocator() -> init(TensorInfo(conv_4_1_out_shape, 1, DataType::F32));

    act_4_1.allocator() -> init(TensorInfo(conv_4_1_out_shape, 1, DataType::F32));

    //init_conv_4_2
    constexpr unsigned int conv_4_2_kernel_x = 3;
    constexpr unsigned int conv_4_2_kernel_y = 3;
    constexpr unsigned int conv_4_2_fm       = 512;

    const TensorShape conv_4_2_weights_shape(conv_4_2_kernel_x, conv_4_2_kernel_y, conv_4_1_out_shape.z(), conv_4_2_fm);
    const TensorShape conv_4_2_biases_shape(conv_4_2_weights_shape[3]);
    const TensorShape conv_4_2_out_shape(conv_4_1_out_shape.x(), conv_4_1_out_shape.y(), conv_4_2_weights_shape[3]);

    weights_4_2.allocator() -> init(TensorInfo(conv_4_2_weights_shape, 1, DataType::F32));
    biases_4_2.allocator() -> init(TensorInfo(conv_4_2_biases_shape, 1, DataType::F32));
    out_4_2.allocator() -> init(TensorInfo(conv_4_2_out_shape, 1, DataType::F32));

    act_4_2.allocator() -> init(TensorInfo(conv_4_2_out_shape, 1, DataType::F32));

    //init_conv_4_3
    constexpr unsigned int conv_4_3_kernel_x = 3;
    constexpr unsigned int conv_4_3_kernel_y = 3;
    constexpr unsigned int conv_4_3_fm       = 512;

    const TensorShape conv_4_3_weights_shape(conv_4_3_kernel_x, conv_4_3_kernel_y, conv_4_2_out_shape.z(), conv_4_3_fm);
    const TensorShape conv_4_3_biases_shape(conv_4_3_weights_shape[3]);
    const TensorShape conv_4_3_out_shape(conv_4_2_out_shape.x(), conv_4_2_out_shape.y(), conv_4_3_weights_shape[3]);

    weights_4_3.allocator() -> init(TensorInfo(conv_4_3_weights_shape, 1, DataType::F32));
    biases_4_3.allocator() -> init(TensorInfo(conv_4_3_biases_shape, 1, DataType::F32));
    out_4_3.allocator() -> init(TensorInfo(conv_4_3_out_shape, 1, DataType::F32));

    act_4_3.allocator() -> init(TensorInfo(conv_4_3_out_shape, 1, DataType::F32));

    TensorShape conv_4_pool = conv_4_3_out_shape;
    conv_4_pool.set(0, conv_4_pool.x() / 2);
    conv_4_pool.set(1, conv_4_pool.y() / 2);
    pool_4.allocator() -> init(TensorInfo(conv_4_pool, 1, DataType::F32));

    //init_conv_5_1
    constexpr unsigned int conv_5_1_kernel_x = 3;
    constexpr unsigned int conv_5_1_kernel_y = 3;
    constexpr unsigned int conv_5_1_fm       = 512;

    const TensorShape conv_5_1_weights_shape(conv_5_1_kernel_x, conv_5_1_kernel_y, conv_4_pool.z(), conv_5_1_fm);
    const TensorShape conv_5_1_biases_shape(conv_5_1_weights_shape[3]);
    const TensorShape conv_5_1_out_shape(conv_4_pool.x(), conv_4_pool.y(), conv_5_1_weights_shape[3]);

    weights_5_1.allocator() -> init(TensorInfo(conv_5_1_weights_shape, 1, DataType::F32));
    biases_5_1.allocator

最低0.47元/天解锁文章