使用libMACE在Android GPU上运行PyTorch模型

本文介绍了如何在Android设备上利用libMACE库实现PyTorch模型到ONNX的转换,并在GPU上进行推理。通过创建虚拟环境,安装bazel和Android NDK,配置环境变量,将PyTorch模型转换为libMACE格式,接着在Android Studio项目中设置C++ Native Application,添加MACE库,并在MainActivity.java和native_lib.cpp中实现模型加载和推断功能。
摘要由CSDN通过智能技术生成

In recent years, there has been a trend towards using GPU inference on mobile phones. In Tensorflow, GPU support on mobile devices is built into the standard library, but it is not yet implemented in the case of PyTorch, so we need to use third-party libraries. In this article, we will look at the PyTorch➔ONNX➔libMACE bundle.

近年来,已出现在手机上使用GPU推理的趋势。 在Tensorflow中,移动设备对GPU的支持已内置在标准库中,但是在PyTorch的情况下尚未实现,因此我们需要使用第三方库。 在本文中,我们将研究PyTorch➔ONNX➔libMACE软件包。

安装libMACE (Installation of libMACE)

Next, we will look at the steps for installing libMACE.

接下来,我们将研究安装libMACE的步骤。

Let’s create a separate virtual environment that will contain the library and everything it needs to work.

让我们创建一个单独的虚拟环境,其中将包含库及其运行所需的一切。

python3 -m virtualenv ./VENV
source ./VENV/bin/activate

Then we need to install the bazel build system, according to the official installation guide.

然后,根据官方安装指南 ,我们需要安装bazel构建系统。

sudo apt install g++ unzip zip
wget https://github.com/bazelbuild/bazel/releases/download/0.13.0/bazel-0.13.0-installer-linux-x86_64.sh
chmod +x ./bazel-0.13.0-installer-linux-x86_64.sh
./bazel-0.13.0-installer-linux-x86_64.sh --prefix=/home/user/VENV/opt/bazel
export PATH="${PATH}:/home/user/VENV/opt/bazel/bin/"

Building the MACE library requires Android NDK. We need the r15c version, which we will download to the ~/VENV/opt/android-ndk-r15c/ directory.

构建MACE库需要Android NDK。 我们需要r15c版本,我们将其下载到〜/ VENV / opt / android-ndk-r15c /目录。

wget -q https://dl.google.com/android/repository/android-ndk-r15c-linux-x86_64.zip
unzip -q android-ndk-r15c-linux-x86_64.zip

Next, we need to install additional libraries.

接下来,我们需要安装其他库。

sudo apt install cmake android-tools-adb
pip install numpy scipy jinja2 pyyaml sh==1.12.14 pycodestyle==2.4.0 filelock
pip install onnx

At the last stage, we will create a script for configuring environment variables android_env.sh (shown below).

在最后阶段,我们将创建一个用于配置环境变量android_env.sh的脚本(如下所示)。

export ANDROID_NDK_VERSION=r15c
export ANDROID_NDK=/home/user/VENV/opt/android-ndk-r15c/
export ANDROID_NDK_HOME=${ANDROID_NDK}
export PATH=${PATH}:${ANDROID_NDK_HOME}
export PATH=${PATH}:/home/user/VENV/opt/bazel/bin/

转换模型 (Converting a model)

MACE uses its own format for neural networks representation, so we need to transform the original model. The conversion process consists of several stages. We will look at it using the example of ResNet 50 from the torchvision library.

MACE使用其自己的格式来表示神经网络,因此我们需要转换原始模型。 转换过程包括几个阶段。 我们将使用torchvision库中的ResNet 50的示例进行查看。

At the first stage, we convert the PyTorch model to ONNX format.

在第一阶段,我们将PyTorch模型转换为ONNX格式。

import torch
from torchvision import models
model = models.resnet50(pretrained=True)
data = torch.rand(1, 3, 256, 256)
input_names = ["input"]
output_names = ["output"]
torch.onnx.export(model, data, "model.onnx", verbose=True, input_names=input_names,  output_names=output_names, opset_version=11, export_params=True, keep_initializers_as_inputs=True)

After conversion, the contents of the folder should look like this.

转换后,文件夹的内容应如下所示。

~/VENV/opt$ ls
android_env.sh  android-ndk-r15c  mace  model.onnx  resnet.ipynb

In the second stage, we need to save the model in its own libMACE format. Let’s create a configuration file according to the guide.

在第二阶段,我们需要将模型保存为自己的libMACE格式。 让我们根据指南创建一个配置文件。

library_name: resnet_model
target_abis: [arm64-v8a]
model_graph_format: file
model_data_format: file
models:
  resnet_model: 
    platform: onnx
    model_file_path: /home/user/VENV/opt/model.onnx
    model_sha256_checksum: 1cafd297ee8ac70dd6e2e5644d4c29e8121f6af0b3f42d652888c18cf73314ee
    subgraphs:
      - input_tensors:
          - input
        input_shapes:
          - 1,3,256,256
        output_tensors:
          - output
        output_shapes:
          - 1,1000
        input_data_formats:
          - NCHW
        backend: pytorch
    runtime: cpu+gpu
    winograd: 1

The file must specify the absolute path to the ONNX file (model_file_path), the SHA256 checksum (model_sha256_checksum), the geometry and names of input and output tensors(input_tensors, input_shapes, output_tensors, and output_shapes), and the data format, in our case NCHW.

该文件必须指定ONNX文件的绝对路径( model_file_path ),SHA256校验和( model_sha256_checksum ),输入和输出张量的几何形状和名称( input_tensorsinput_shapesoutput_tensorsoutput_shapes ),以及数据格式(在我们的示例中) NCHW

The checksum can be calculated using the sha256sum utility.

可以使用sha256sum实用工具计算校验和。

sha256sum ../model.onnx

After creating the configuration file, we can run the model conversion script.

创建配置文件后,我们可以运行模型转换脚本。

cd ./mace/
source ./android_env.sh
python ./tools/converter.py convert --config ../resnet_model.yml

If the conversion was completed without errors, the following text will be displayed.

如果转换成功完成且没有错误,将显示以下文本。

********************************************
       Model resnet_model converted          
********************************************


--------------------------------------------
                  Library                   
--------------------------------------------
|      key       |          value          |
============================================
| MACE Model Path| build/resnet_model/model|
--------------------------------------------

The result of the conversion will be the files reset_model.data and reset_model.pb, in ~/VENV/opt/mace/build/resnet_model/model/.

转换的结果将是〜/ VENV / opt / mace / build / resnet_model / model /中的文件reset_model.datareset_model.pb

配置Android Studio项目 (Configuring the Android Studio project)

For our app in Android Studio, we need to specify the type C++ Native Application.

对于Android Studio中的应用程序,我们需要指定C ++ Native Application类型。

Image for post
Creating new Android Studio project
创建新的Android Studio项目

Next we need a binary build of the MACE library from the repository.

接下来,我们需要从存储库中对MACE库进行二进制构建。

wget https://github.com/XiaoMi/mace/releases/download/v0.13.0/libmace-v0.13.0.tar.gz

Let’s create a directory /app/lib make with folders arm64-v8 and armeabi-v7a where we copy versions of libmace.so for cpu_gpu arch from libmace-v0.13.0/lib/.

让我们创建一个目录/应用/ lib目录化妆用的文件夹arm64-V8armeabi-V7A我们复制libmace.so的版本cpu_gpu拱从libmace-v0.13.0 / lib中/。

In the CMakeLists.txt we need to add MACE includes dir.

CMakeLists.txt中,我们需要添加MACE include dir。

include_directories(/home/user/VENV/opt/libmace-v0.13.0/include/)

Next in the file CMakeLists.txt we need to create a lib_mace library and add it to the target_link_libraries list. We also need to add -ljnigraphics to this list for JNI Bitmap support.

接下来在文件中的CMakeLists.txt,我们需要创建一个lib_mace库,并将其添加到列表target_link_libraries。 我们还需要在此列表中添加-ljnigraphics以支持JNI Bitmap。

set(LIBMACE_DIR ${CMAKE_SOURCE_DIR}/../../../libmace/)
add_library(lib_mace SHARED IMPORTED)
set_target_properties(lib_mace PROPERTIES IMPORTED_LOCATION ${LIBMACE_DIR}/${CMAKE_ANDROID_ARCH_ABI}/libmace.so)


target_link_libraries( native-lib lib_mace -ljnigraphics
                       ${log-lib} )

In the app/build.gradle file, we need to add the abiFilters and externalNativeBuild subsections to the defaultConfig.

app / build.gradle文件中,我们需要将abiFiltersexternalNativeBuild子节添加到defaultConfig

defaultConfig {
        ...      
        ndk {
            abiFilters 'arm64-v8a', 'armeabi-v7a'
        }
  
        externalNativeBuild {
            cmake {
                cppFlags ""
                arguments "-DANDROID_ARM_NEON=TRUE","-DANDROID_STL=c++_shared"


            }
        } 
     }

In the android section, we need to add the sourceSets entry.

android部分,我们需要添加sourceSets条目。

android {
    ...
    sourceSets {
        main {
            jniLibs.srcDirs = ['libmace/']
        }
    }
}

To AndroidManifest.xml we add reading permission to the file system

AndroidManifest.xml中,我们向文件系统添加了读取权限

<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"/>

Next, let’s create an assets folder (Make New➔Folder➔Assets Folder) and copy resnet_model.pb and resnet_model.data to it.

接下来,让我们创建一个资产文件夹(新建➔文件夹➔ 资产文件夹),然后将resnet_model.pbresnet_model.data复制到其中。

模型加载 (Model loading)

First, we will add the model loading function to MainActivity.java.

首先,我们将模型加载功能添加到MainActivity.java中

public native long loadModel(String cache_dir, String model_data, String model_pb);

We will also add its implementation to native_lib.cpp.

我们还将其实现添加到native_lib.cpp中。

extern "C" JNIEXPORT jlong JNICALL
Java_com_example_resnetm_MainActivity_loadModel(JNIEnv *env, jobject thiz, jstring cache_dir, jstring model_data, jstring model_pb)
{…}

The MACE library requires a special startup configuration, which is shown below.

MACE库需要特殊的启动配置,如下所示。

#if USE_GPU
    DeviceType device_type = DeviceType::GPU;
    MaceStatus status;
    MaceEngineConfig config(device_type);
    std::shared_ptr<GPUContext> gpu_context;
    gpu_context = GPUContextBuilder()
            .SetStoragePath(storage_path)
            .Finalize();
    config.SetGPUContext(gpu_context);
    config.SetGPUHints(
            static_cast<GPUPerfHint>(GPUPerfHint::PERF_NORMAL),
            static_cast<GPUPriorityHint>(GPUPriorityHint::PRIORITY_LOW));
#else
    DeviceType device_type = DeviceType::CPU;
    MaceStatus status;
    MaceEngineConfig config(device_type);
#endif

Let’s look at the code in more detail. First, we need to select a device for computing (device_type). In the case of the GPU, MACE prepares OpenCL binaries, so it requires the storage_path directory where library can save them. Next, we specify the priority of the task(GPUPerfHint and GPUPriorityHint). If we select a high priority, the interface may freeze.

让我们更详细地看一下代码。 首先,我们需要选择一个用于计算的设备( device_type )。 对于GPU,MACE会准备OpenCL二进制文件,因此它需要storage_path目录,库可以在其中保存它们。 接下来,我们指定任务的优先级( GPUPerfHintGPUPriorityHint )。 如果选择高优先级,则接口可能会冻结。

After creating the startup configuration, we load the neural network into memory and create a MaceEngine.

创建启动配置后,我们将神经网络加载到内存中并创建MaceEngine

std::ifstream yoga_classifier_data_in( yoga_classifier_data_, std::ios::binary );
std::vector<unsigned char> yoga_classifier_data_buffer((std::istreambuf_iterator<char>(yoga_classifier_data_in)),  (std::istreambuf_iterator<char>( )));
std::ifstream yoga_classifier_pb_in( yoga_classifier_pb_, std::ios::binary );
std::vector<unsigned char> yoga_classifier_pb_buffer((std::istreambuf_iterator<char>(yoga_classifier_pb_in)),  (std::istreambuf_iterator<char>( )));


size_t data_size = yoga_classifier_data_buffer.size();
size_t pb_size = yoga_classifier_pb_buffer.size();
std::shared_ptr<mace::MaceEngine> engine;


std::vector<std::string> input_names = {"input"};
std::vector<std::string> output_names = {"output"};


MaceStatus create_engine_status =
  CreateMaceEngineFromProto(&yoga_classifier_pb_buffer[0],
                            pb_size,
                            &yoga_classifier_data_buffer[0],
                            data_size,
                            input_names,
                            output_names,
                            config,
                            &engine);
if (create_engine_status != MaceStatus::MACE_SUCCESS) {
    return 0;
}

The neural network is returned as shared_ptr, which we can’t directly pass to MainActivity, so we’ll introduce an intermediate class ModelData(shown below).

神经网络返回为shared_ptr ,我们不能直接将其传递给MainActivity ,因此我们将引入一个中间类ModelData (如下所示)。

class ModelData
{
public:
  std::shared_ptr<mace::MaceEngine> engine;
};

The result of loadModel is a pointer to an object of this type.

loadModel的结果是指向此类型对象的指针。

ModelData *result = new ModelData();
result->engine = engine;

Returning the pointer as long.

返回指针的时间很长

return long(result);

模型推论 (Model inference)

For model inference we declare the classification function in MainActivity.

对于模型推断,我们在MainActivity中声明分类函数

public native int classifyImage(long model_ptr, Bitmap bitmap);

Also we add its definition to native_lib.cpp.

另外,我们将其定义添加到native_lib.cpp

extern "C" JNIEXPORT jint JNICALL
Java_com_example_resnetm_MainActivity_classifyImage(JNIEnv *env, jobject thiz, jlong model_ptr, jobject bitmap) 
{…}

Later in this section, you will find a step-by-step description of this function.

在本节的后面,您将找到此功能的分步说明。

First, we restore the models pointer from long.

首先,我们从long还原模型指针。

ModelData *modelData = (ModelData*)model_ptr;

Next, we need to prepare the data in NCHW format (a sample code is provided below).

接下来,我们需要以NCHW格式准备数据(下面提供了示例代码)。

for (int i = 0; i < height; i++)
{
    for (int j = 0; j < width; j++)
    {
        for (int k = 0; k < n_channels; k++) {
            float c = src_input[i*width*bitmap_step + j*bitmap_step + k];
            c = (c / 255 - mean[k]) / std[k];
            dst_input[k*width*height + i*width + j] = c;
        }
    }
}

After data loading we need to form a parameters for a neural network as a tensors dictionary, like this code. The shapes of tensors must match the ones specified in the resnet_model.yml file.

加载数据后,我们需要像张代码字典一样为神经网络形成一个参数,例如此代码。 张量的形状必须与resnet_model.ym文件中指定的形状匹配。

std::vector<std::string> input_names = {"input"};
std::vector<std::string> output_names = {"output"};
vector<vector<int64_t>> input_shapes;
input_shapes.push_back(vector<int64_t> {1, 3, 256, 256});
vector<vector<int64_t>> output_shapes;
output_shapes.push_back(vector<int64_t> {1, 1000});


std::map<std::string, mace::MaceTensor> inputs;
inputs["input"] = mace::MaceTensor(input_shapes[0], buffer_in, DataFormat::NCHW);
auto buffer_out = std::shared_ptr<float>(new float[1*1000],
                                         std::default_delete<float[]>());
std::map<std::string, mace::MaceTensor> outputs;
outputs["output"] = mace::MaceTensor(output_shapes[0], buffer_out);

Now the data is ready. Let’s launch the models inference.

现在数据准备就绪。 让我们启动模型推断。

run_status = modelData->engine->Run(inputs, &outputs);


auto code = run_status.code();
string inference_info = run_status.information();


float *res = outputs["output"].data().get();

The predicted class will be the number of output with the maximum value.

预测的类别将是具有最大值的输出数量。

int result = 0;
float best_score = 0;
for (int i = 0; i < 1000; i++)
{
    float score = res[i];
    if (score > best_score)
    {
        best_score = score;
        result = i;
    }
}

Now, the main part of application is created. Also we need, some Java logic which is related to user interaction. It is not described in the article but can be found in the project repository. After adding it we can build the program and run it on the mobile phone.

现在,创建了应用程序的主要部分。 我们还需要一些与用户交互有关的Java逻辑。 本文未对其进行描述,但可以在项目存储库中找到它。 添加后,我们可以构建程序并在手机上运行它。

The full source code can be downloaded here.

完整的源代码可以在这里下载。

翻译自: https://medium.com/@v.hramchenko/run-your-pytorch-model-on-android-gpu-using-libmace-7e43f623d95c

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值