在上一篇博客中已经讲了挺多关于arm compute library的情况。
传送们在这里!!!
这篇薄荷主要是想将一下如何用arm compute library写深度学习模型。这里以vgg16为例子,再一次放上我自己写的vgg16的github!!!(vgg16的代码放在example这个文件中,文件名是: vgg16_model_arm_compute_library_NEON.cpp vgg16_model_arm_compute_library_NEON.cpp)
- 初始化Tensor
tensor是一个多维矩阵,用来存放某一个操作的数据。例如输入的tensor通过激活操作或者卷积操作之后,把新的数据放入一个输出tensor,这个很重要,基本整个arm compute library就是围绕这个tensor运行的。放出一点定的代码:
//init_input_tensor
Tensor input;
//init_conv_1_tensor
Tensor weights_1_1;
Tensor biases_1_1;
Tensor out_1_1;
Tensor act_1_1;
Tensor weights_1_2;
Tensor biases_1_2;
Tensor out_1_2;
Tensor act_1_2;
Tensor pool_1;
这里分别定义了输入,和vgg前两层卷积层的tensor
TensorShape:用来描述一个tensor是有多少维,分别是多大。有方法x(),y(),z()相当于数组中的[0][1][2],如果有更高的维度就要用数组的表达方式
TensorInfo:用来提取TensorShape的维度信息
//init_tensor
constexpr unsigned int input_width = 224;
constexpr unsigned int input_height = 224;
constexpr unsigned int input_fm = 3;
const TensorShape input_shape(input_width, input_height, input_fm);
input.allocator() -> init(TensorInfo(input_shape, 1, DataType::F32));
//init_conv_1_1
constexpr unsigned int conv_1_1_kernel_x = 3;
constexpr unsigned int conv_1_1_kernel_y = 3;
constexpr unsigned int conv_1_1_fm = 64;
const TensorShape conv_1_1_weights_shape(conv_1_1_kernel_x, conv_1_1_kernel_y, input_shape.z(), conv_1_1_fm);
const TensorShape conv_1_1_biases_shape(conv_1_1_weights_shape[3]);
const TensorShape conv_1_1_out_shape(input_shape.x(), input_shape.y(), conv_1_1_weights_shape[3]);
weights_1_1.allocator() -> init(TensorInfo(conv_1_1_weights_shape, 1, DataType::F32));
biases_1_1.allocator() -> init(TensorInfo(conv_1_1_biases_shape, 1, DataType::F32));
out_1_1.allocator() -> init(TensorInfo(conv_1_1_out_shape, 1, DataType::F32));
act_1_1.allocator() -> init(TensorInfo(conv_1_1_out_shape, 1, DataType::F32));
首先先看一下input这个tensor。vgg的input是224,224,3,所以在input的tensorshape定义了它的维度:const TensorShape input_shape(input_width, input_height, input_fm);
然后通过tensor的allocator方法进行初始化:input.allocator() -> init(TensorInfo(input_shape, 1, DataType::F32));
在init方法中是输入tensorinfo的信息。在卷积核的时候先定义大小,然后根据模型调用allocator方法,之后的tensor初始化都是这样。
- Configure Functions
这个的意思是定义操作,定义不同功能的层,然后tensor在这些层中流进进行操作然后流出
NEConvolutionLayer conv_1_1;
NEConvolutionLayer conv_1_2;
NEConvolutionLayer conv_2_1;
这是定义层的。
//conv_1
//in: 224 * 224 * 3, kernel: 3 * 3 * 3 * 64, out: 224 * 224 * 64
conv_1_1.configure(&input, &weights_1_1, &biases_1_1, &out_1_1, PadStrideInfo(1, 1, 1, 1));
//in: 224 * 224 * 64, out: 224 * 224 * 64
Nact_1_1.configure(&out_1_1, &act_1_1, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::RELU));
//in: 224 × 224 × 64, kernel: 3 * 3 * 64 * 64, out: 224 * 224 * 64
conv_1_2.configure(&act_1_1, &weights_1_2, &biases_1_2, &out_1_2, PadStrideInfo(1, 1, 1, 1));
//in: 224 * 224 * 64, out: 224 * 224 * 64
Nact_1_2.configure(&out_1_2, &act_1_2, ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::RELU));
//in: 224 * 224 * 64, out: 112 * 112 * 64
Npool_1.configure(&act_1_2, &pool_1, PoolingLayerInfo(PoolingType::MAX, 2, PadStrideInfo(2, 2)));
然后这个是层的configure方法。例如第一个卷积层,流入是input,通过weight和biase流出out,之后的是padstrideinfo,体现strides和padding。卷积核的大小是体现与weight的维度中。
- Allocate tensors
相当于内存分配,这一步用掉手机的大量内存,最好可用内存有2g以上吧,这步简单容易理解。
//input
input.allocator() -> allocate();
//conv_1
weights_1_1.allocator() -> allocate();
biases_1_1.allocator() -> allocate();
out_1_1.allocator() -> allocate();
act_1_1.allocator() -> allocate();
weights_1_2.allocator() -> allocate();
biases_1_2.allocator() -> allocate();
out_1_2.allocator() -> allocate();
act_1_2.allocator() -> allocate();
pool_1.allocator() -> allocate();
- Load the weights
比较重要的一部分,input,weight和biase的数据都是这样子加载进去,最后的结果也是用这样的方法提取出来。先上例子代码:
//conv_1_1
//W: 3 * 3 * 3 * 64
x = y = z = k = 0;
std::ifstream conv_1_1_Wfile("/data/local/tmp/conv1_1_W.csv");
line = "";
while(getline(conv_1_1_Wfile, line)){
std::stringstream strstr(line);
std::string word = "";
while(getline(strstr, word, ',')){
float temp = StringToFloat(word);
*reinterpret_cast<float *>(weights_1_1.buffer() + weights_1_1.info()->offset_element_in_bytes(Coordinates(x, y, z, k))) = temp;
++ x;
if(x == conv_1_1_weights_shape.x()) ++ y, x = 0;
if(y == conv_1_1_weights_shape.y()) ++ z, y = 0;
if(z == conv_1_1_weights_shape.z()) ++ k, z = 0;
}
}
conv_1_1_Wfile.close();
这里以conv11中的weight为例子。首先是读取文件中的权重,通过getline在getline得到没一个权重的string值。然后通过这句:
*reinterpret_cast<float *>(weights_1_1.buffer() + weights_1_1.info()->offset_element_in_bytes(Coordinates(x, y, z, k))) = temp;
这句中的weights_1_1.buffer()
是找到weights的首地址,接着weights_1_1.info()->offset_element_in_bytes(Coordinates(x, y, z, k))
是找到现在相应的地址。它不是像数据那样连续的(这样读取或出错,有些数据读不进去),下面在更新xyzk就行了(就像用了4个for循环这样更新)
- Execute the functions
这步就是把代码给跑起来啦!!!
//conv_1
conv_1_1.run();
Nact_1_1.run();
conv_1_2.run();
Nact_1_2.run();
Npool_1.run();
跑完之后就可以用上面看个方法提取出最后output的内存,就可以知道概率啦!!!
大约整个流程是这样子,以上的代码基本都是给出了卷积层1作为参考,其他也是大同小异,最后放出整个完整的vgg16的代码作为参考吧,如果有什么问题可以尽管留言哈哈哈!!!
#include "arm_compute/runtime/NEON/NEFunctions.h"
#include "arm_compute/runtime/CL/CLFunctions.h"
#include "arm_compute/core/Types.h"
#include "test_helpers/Utils.h"
#include <iostream>
#include <sstream>
#include <fstream>
#include <ostream>
#include <sys/time.h>
#include <map>
using namespace arm_compute;
using namespace test_helpers;
static float StringToFloat(const std::string & str){
std::istringstream iss(str);
float number;
iss >> number;
return number;
}
void main_neon_dnn(int argc, const char **argv)
{
/*----------------------------------[init_model_vgg16]-----------------------------------*/
/*----------------------------------BEGIN:[init_Tensor]----------------------------------*/
//init_input_tensor
Tensor input;
//init_conv_1_tensor
Tensor weights_1_1;
Tensor biases_1_1;
Tensor out_1_1;
Tensor act_1_1;
Tensor weights_1_2;
Tensor biases_1_2;
Tensor out_1_2;
Tensor act_1_2;
Tensor pool_1;
//init_conv_2_tensor
Tensor weights_2_1;
Tensor biases_2_1;
Tensor out_2_1;
Tensor act_2_1;
Tensor weights_2_2;
Tensor biases_2_2;
Tensor out_2_2;
Tensor act_2_2;
Tensor pool_2;
//init_conv_3_tensor
Tensor weights_3_1;
Tensor biases_3_1;
Tensor out_3_1;
Tensor act_3_1;
Tensor weights_3_2;
Tensor biases_3_2;
Tensor out_3_2;
Tensor act_3_2;
Tensor weights_3_3;
Tensor biases_3_3;
Tensor out_3_3;
Tensor act_3_3;
Tensor pool_3;
//init_conv_4_tensor
Tensor weights_4_1;
Tensor biases_4_1;
Tensor out_4_1;
Tensor act_4_1;
Tensor weights_4_2;
Tensor biases_4_2;
Tensor out_4_2;
Tensor act_4_2;
Tensor weights_4_3;
Tensor biases_4_3;
Tensor out_4_3;
Tensor act_4_3;
Tensor pool_4;
//init_conv_5_tensor
Tensor weights_5_1;
Tensor biases_5_1;
Tensor out_5_1;
Tensor act_5_1;
Tensor weights_5_2;
Tensor biases_5_2;
Tensor out_5_2;
Tensor act_5_2;
Tensor weights_5_3;
Tensor biases_5_3;
Tensor out_5_3;
Tensor act_5_3;
Tensor pool_5;
//init_fc_6
Tensor weights_6;
Tensor biases_6;
Tensor out_6;
Tensor act_6;
//init_fc_7
Tensor weights_7;
Tensor biases_7;
Tensor out_7;
Tensor act_7;
//init_fc_8
Tensor weights_8;
Tensor biases_8;
Tensor out_8;
Tensor softmax_tensor;
//init_tensor
constexpr unsigned int input_width = 224;
constexpr unsigned int input_height = 224;
constexpr unsigned int input_fm = 3;
const TensorShape input_shape(input_width, input_height, input_fm);
input.allocator() -> init(TensorInfo(input_shape, 1, DataType::F32));
//init_conv_1_1
constexpr unsigned int conv_1_1_kernel_x = 3;
constexpr unsigned int conv_1_1_kernel_y = 3;
constexpr unsigned int conv_1_1_fm = 64;
const TensorShape conv_1_1_weights_shape(conv_1_1_kernel_x, conv_1_1_kernel_y, input_shape.z(), conv_1_1_fm);
const TensorShape conv_1_1_biases_shape(conv_1_1_weights_shape[3]);
const TensorShape conv_1_1_out_shape(input_shape.x(), input_shape.y(), conv_1_1_weights_shape[3]);
weights_1_1.allocator() -> init(TensorInfo(conv_1_1_weights_shape, 1, DataType::F32));
biases_1_1.allocator() -> init(TensorInfo(conv_1_1_biases_shape, 1, DataType::F32));
out_1_1.allocator() -> init(TensorInfo(conv_1_1_out_shape, 1, DataType::F32));
act_1_1.allocator() -> init(TensorInfo(conv_1_1_out_shape, 1, DataType::F32));
//init_conv_1_2
constexpr unsigned int conv_1_2_kernel_x = 3;
constexpr unsigned int conv_1_2_kernel_y = 3;
constexpr unsigned int conv_1_2_fm = 64;
const TensorShape conv_1_2_weights_shape(conv_1_2_kernel_x, conv_1_2_kernel_y, conv_1_1_out_shape.z(), conv_1_2_fm);
const TensorShape conv_1_2_biases_shape(conv_1_2_weights_shape[3]);
const TensorShape conv_1_2_out_shape(conv_1_1_out_shape.x(), conv_1_1_out_shape.y(), conv_1_1_weights_shape[3]);
weights_1_2.allocator() -> init(TensorInfo(conv_1_2_weights_shape, 1, DataType::F32));
biases_1_2.allocator() -> init(TensorInfo(conv_1_2_biases_shape, 1, DataType::F32));
out_1_2.allocator() -> init(TensorInfo(conv_1_2_out_shape, 1, DataType::F32));
act_1_2.allocator() -> init(TensorInfo(conv_1_2_out_shape, 1, DataType::F32));
TensorShape conv_1_pool = conv_1_2_out_shape;
conv_1_pool.set(0, conv_1_pool.x() / 2);
conv_1_pool.set(1, conv_1_pool.y() / 2);
pool_1.allocator() -> init(TensorInfo(conv_1_pool, 1, DataType::F32));
//init_conv_2_1
constexpr unsigned int conv_2_1_kernel_x = 3;
constexpr unsigned int conv_2_1_kernel_y = 3;
constexpr unsigned int conv_2_1_fm = 128;
const TensorShape conv_2_1_weights_shape(conv_2_1_kernel_x, conv_2_1_kernel_y, conv_1_pool.z(), conv_2_1_fm);
const TensorShape conv_2_1_biases_shape(conv_2_1_weights_shape[3]);
const TensorShape conv_2_1_out_shape(conv_1_pool.x(), conv_1_pool.y(), conv_2_1_weights_shape[3]);
weights_2_1.allocator() -> init(TensorInfo(conv_2_1_weights_shape, 1, DataType::F32));
biases_2_1.allocator() -> init(TensorInfo(conv_2_1_biases_shape, 1, DataType::F32));
out_2_1.allocator() -> init(TensorInfo(conv_2_1_out_shape, 1, DataType::F32));
act_2_1.allocator() -> init(TensorInfo(conv_2_1_out_shape, 1, DataType::F32));
//init_conv_2_2
constexpr unsigned int conv_2_2_kernel_x = 3;
constexpr unsigned int conv_2_2_kernel_y = 3;
constexpr unsigned int conv_2_2_fm = 128;
const TensorShape conv_2_2_weights_shape(conv_2_2_kernel_x, conv_2_2_kernel_y, conv_2_1_out_shape.z(), conv_2_2_fm);
const TensorShape conv_2_2_biases_shape(conv_2_2_weights_shape[3]);
const TensorShape conv_2_2_out_shape(conv_2_1_out_shape.x(), conv_2_1_out_shape.y(), conv_2_2_weights_shape[3]);
weights_2_2.allocator() -> init(TensorInfo(conv_2_2_weights_shape, 1, DataType::F32));
biases_2_2.allocator() -> init(TensorInfo(conv_2_2_biases_shape, 1, DataType::F32));
out_2_2.allocator() -> init(TensorInfo(conv_2_2_out_shape, 1, DataType::F32));
act_2_2.allocator() -> init(TensorInfo(conv_2_2_out_shape, 1, DataType::F32));
TensorShape conv_2_pool = conv_2_2_out_shape;
conv_2_pool.set(0, conv_2_pool.x() / 2);
conv_2_pool.set(1, conv_2_pool.y() / 2);
pool_2.allocator() -> init(TensorInfo(conv_2_pool, 1, DataType::F32));
//init_conv_3_1
constexpr unsigned int conv_3_1_kernel_x = 3;
constexpr unsigned int conv_3_1_kernel_y = 3;
constexpr unsigned int conv_3_1_fm = 256;
const TensorShape conv_3_1_weights_shape(conv_3_1_kernel_x, conv_3_1_kernel_y, conv_2_pool.z(), conv_3_1_fm);
const TensorShape conv_3_1_biases_shape(conv_3_1_weights_shape[3]);
const TensorShape conv_3_1_out_shape(conv_2_pool.x(), conv_2_pool.y(), conv_3_1_weights_shape[3]);
weights_3_1.allocator() -> init(TensorInfo(conv_3_1_weights_shape, 1, DataType::F32));
biases_3_1.allocator() -> init(TensorInfo(conv_3_1_biases_shape, 1, DataType::F32));
out_3_1.allocator() -> init(TensorInfo(conv_3_1_out_shape, 1, DataType::F32));
act_3_1.allocator() -> init(TensorInfo(conv_3_1_out_shape, 1, DataType::F32));
//init_conv_3_2
constexpr unsigned int conv_3_2_kernel_x = 3;
constexpr unsigned int conv_3_2_kernel_y = 3;
constexpr unsigned int conv_3_2_fm = 256;
const TensorShape conv_3_2_weights_shape(conv_3_2_kernel_x, conv_3_2_kernel_y, conv_3_1_out_shape.z(), conv_3_2_fm);
const TensorShape conv_3_2_biases_shape(conv_3_2_weights_shape[3]);
const TensorShape conv_3_2_out_shape(conv_3_1_out_shape.x(), conv_3_1_out_shape.y(), conv_3_2_weights_shape[3]);
weights_3_2.allocator() -> init(TensorInfo(conv_3_2_weights_shape, 1, DataType::F32));
biases_3_2.allocator() -> init(TensorInfo(conv_3_2_biases_shape, 1, DataType::F32));
out_3_2.allocator() -> init(TensorInfo(conv_3_2_out_shape, 1, DataType::F32));
act_3_2.allocator() -> init(TensorInfo(conv_3_2_out_shape, 1, DataType::F32));
//init_conv_3_3
constexpr unsigned int conv_3_3_kernel_x = 3;
constexpr unsigned int conv_3_3_kernel_y = 3;
constexpr unsigned int conv_3_3_fm = 256;
const TensorShape conv_3_3_weights_shape(conv_3_3_kernel_x, conv_3_3_kernel_y, conv_3_2_out_shape.z(), conv_3_3_fm);
const TensorShape conv_3_3_biases_shape(conv_3_3_weights_shape[3]);
const TensorShape conv_3_3_out_shape(conv_3_2_out_shape.x(), conv_3_2_out_shape.y(), conv_3_3_weights_shape[3]);
weights_3_3.allocator() -> init(TensorInfo(conv_3_3_weights_shape, 1, DataType::F32));
biases_3_3.allocator() -> init(TensorInfo(conv_3_3_biases_shape, 1, DataType::F32));
out_3_3.allocator() -> init(TensorInfo(conv_3_3_out_shape, 1, DataType::F32));
act_3_3.allocator() -> init(TensorInfo(conv_3_3_out_shape, 1, DataType::F32));
TensorShape conv_3_pool = conv_3_3_out_shape;
conv_3_pool.set(0, conv_3_pool.x() / 2);
conv_3_pool.set(1, conv_3_pool.y() / 2);
pool_3.allocator() -> init(TensorInfo(conv_3_pool, 1, DataType::F32));
//init_conv_4_1
constexpr unsigned int conv_4_1_kernel_x = 3;
constexpr unsigned int conv_4_1_kernel_y = 3;
constexpr unsigned int conv_4_1_fm = 512;
const TensorShape conv_4_1_weights_shape(conv_4_1_kernel_x, conv_4_1_kernel_y, conv_3_pool.z(), conv_4_1_fm);
const TensorShape conv_4_1_biases_shape(conv_4_1_weights_shape[3]);
const TensorShape conv_4_1_out_shape(conv_3_pool.x(), conv_3_pool.y(), conv_4_1_weights_shape[3]);
weights_4_1.alloca