文章目录
rk3568-1.6.0
1.在任一个ubuntu系统上安装RKNN-Toolkit2
1.1 下载
新建 Projects 文件夹
mkdir Projects
进入该目录
cd Projects
下载 RKNN-Toolkit2 仓库
git clone https://github.com/airockchip/rknn-toolkit2.git --depth 1
下载 RKNN Model Zoo 仓库
git clone https://github.com/airockchip/rknn_model_zoo.git --depth 1
注意:
1.参数 --depth 1 表示只克隆最近一次 commit
2.如果遇到 git clone 失败的情况,也可以直接在 github 中下载压缩包到本地,然后解压至该目录
1.2 安装
首先安装依赖库
pip install -r doc/requirements_cpxx.txt
然后安装rknn-toolkit2
pip install packages/rknn_toolkit2-x.x.x+xxxxxxxx-cpxx-cpxxlinux_x86_64.whl
验证安装是否成功,进入 Python 交互模式
python
然后
from rknn.api import RKNN
2.在机器端(板端)查看 RKNPU2的驱动, 安装/更新 RKNPU2 环境
dmesg | grep -i rknp
uname -a查看系统架构
接下来安装板端的NPU环境,主要就是把相关文件copy到板端:
# 进入 rknpu2 目录
cd Projects/rknn-toolkit2/rknpu2
# 推送 rknn_server 到板端
# 注:在 64 位 Linux 系统中,BOARD_ARCH 对应 aarch64 目录,在 32 位系统,对应
armhf 目录。
adb push runtime/Linux/rknn_server/${BOARD_ARCH}/usr/bin/* /usr/bin
# 推送 librknnrt.so
adb push runtime/Linux/librknn_api/${BOARD_ARCH}/librknnrt.so
/usr/lib
# 进入板端
adb shell
# 赋予可执行权限
chmod +x /usr/bin/rknn_server
chmod +x /usr/bin/start_rknn.sh
chmod +x /usr/bin/restart_rknn.sh
# 重启 rknn_server 服务
restart_rknn.sh
# 查询 rknn_server 版本
strings /usr/bin/rknn_server | grep -i "rknn_server version"
# 查询 librknnrt.so 库版本
strings /usr/lib/librknnrt.so | grep -i "librknnrt version"
#如果出现以下输出信息,则代表 rknn_server 版本为 x.x.x,librknnrt.so 的版本为 x.x.x。
#rknn_server version: x.x.x
#librknnrt version: x.x.x
3.RKNN使用说明
3.1 模型转换
RKNN-Toolkit2 提供了丰富的功能,包括模型转换、性能分析、部署调试等。本节将重点介绍 RKNN-Toolkit2 的模型转换功能。模型转换是 RKNN-Toolkit2 的核心功能之一,它允许用户将各种深度学习模型从不同的框架转换为 RKNN 格式以在 RKNPU 上运行,用户可以参考模型转换流程图以帮助理解如何进行模型转换。
支持pytorch , tensorflow , caffe, onnx 模型转换为rknn格式。
a. RHKNN初始化和释放
rknn = RKNN(verbose=True, verbose_file=‘./mobilenet_build.log’)
rknn.release()
b. RKNN config
rknn.config(
mean_values=[[103.94, 116.78, 123.68]],
std_values=[[58.82, 58.82, 58.82]],
quant_img_RGB2BGR=False,
target_platform='rk3566')
mean_values 和 std_values 用于设置输入的均值和归一化值。这些值在量化过程中使用,
且 C API 推理阶段图片不需再做均值和归一化值减小布署耗时。
因此这里需要注意,这些值不仅是归一化用途,还要用在量化上。
quant_img_RGB2BGR 用于控制量化时加载量化校正图像时是否需要先进行 RGB 到BGR 的转换,默认值为 False。该配置只在量化数据集时生效,实际部署模型时,模型 推 理 阶 段 不 会 生 效 , 需 要 用 户 在 输 入 前 处 理 里 预 先 处 理 好 。 注 :
quant_img_RGB2BGR = True 时 模 型 的 推 理 顺 序 为 先 做 RGB2BGR 转 换 再 做
mean_values 和 std_values 操作,详细注意事项请见 10.3 章节
target_platform 用于指定 RKNN 模型的目标平台,支持 RK3568、RK3566、RK3562、
RK3588、RV1106 和 RV1103
quantized_algorithm 用于指定计算每一层的量化参数时采用的量化算法,可以选择
normal、mmse 或 kl_divergence,默认算法为 normal,详细说明见 3.1.7、6.1 和 6.2 章
节
quantized_method 支持 layer 或 channel,用于每层的权重是否共享参数,默认为
channel,详细说明见 3.1.7、6.1 和 6.2 章节。
optimization_level 通过修改模型优化等级,可以关掉部分或全部模型转换过程中使用
到的优化规则。该参数的默认值为 3,打开所有优化选项,值为 2 或 1 时关闭一部分
可能会对部分模型精度产生影响的优化选项,值为 0 时关闭所有优化选项。
这一部分很关键,而且转换后的模型和C代码也有很大关系。
c. 加载模型
ret = rknn.load_onnx(model=‘./arcface.onnx’)
d. 构建模型
用户加载原始模型后,下一步就是通过 rknn.build()接口构建 RKNN 模型。构建模型时,
用户可以选择是否进行量化,量化助于减小模型的大小和提高在 RKNPU 上的性能。
rknn.build()接口参数如下:
do_quantization 参数控制是否对模型进行量化,建议设置为 True。
dataset 参数用于提供用于量化校准的数据集,数据集的格式是文本文件。
dataset.txt 示例: ./imgs/ILSVRC2012_val_00000665.JPEG
./imgs/ILSVRC2012_val_00001123.JPEG
./imgs/ILSVRC2012_val_00001129.JPEG
./imgs/ILSVRC2012_val_00001284.JPEG
./imgs/ILSVRC2012_val_00003026.JPEG
./imgs/ILSVRC2012_val_00005276.JPEG
示例代码:
ret = rknn.build(do_quantization=True, dataset=‘./dataset.txt’)
e. 导出模型
rknn.export_rknn()接口将RKNN 模型保存为一个文件(.rknn 后缀),以便后续模型的部署。rknn.export_rknn()接口参数如下:
export_path 导出模型文件的路径。
cpp_gen_cfg 可以选择是否生成 C++ 部署示例。
ret = rknn.export_rknn(export_path=‘./mobilenet_v1.rknn’)
f.转换工具
3.2 python 代码示例导出模型, 下载的仓库里面优很多示例可以参考
/home/tony/nndeploy/mymodel/scripts/export2onnx.py 转换为pth或者onnx
/home/tony/nndeploy/rk3568/projects/rknn-toolkit2/rknn-toolkit2/examples/onnx/unet8 从pth或onnx转换为 rkkn
/home/tony/nndeploy/rk3568/projects/rknn-toolkit2/rknpu2/examples/rknn_mobilenet_demo_copy 生成 可运行程序, scp到rk3568板
import numpy as np
import cv2
from rknn.api import RKNN
import os
def export_pytorch_model():
import torch
import torchvision.models as models
net = models.quantization.resnet18(pretrained=True, quantize=True)
net.eval()
trace_model = torch.jit.trace(net, torch.Tensor(1, 3, 224, 224))
trace_model.save('./resnet18_i8.pt')
def show_outputs(output):
index = sorted(range(len(output)), key=lambda k : output[k], reverse=True)
fp = open('./labels.txt', 'r')
labels = fp.readlines()
top5_str = 'resnet18\n-----TOP 5-----\n'
for i in range(5):
value = output[index[i]]
if value > 0:
topi = '[{:>3d}] score:{:.6f} class:"{}"\n'.format(index[i], value, labels[index[i]].strip().split(':')[-1])
else:
topi = '[ -1]: 0.0\n'
top5_str += topi
print(top5_str.strip())
def show_perfs(perfs):
perfs = 'perfs: {}\n'.format(perfs)
print(perfs)
def softmax(x):
return np.exp(x)/sum(np.exp(x))
def torch_version():
import torch
torch_ver = torch.__version__.split('.')
torch_ver[2] = torch_ver[2].split('+')[0]
return [int(v) for v in torch_ver]
if __name__ == '__main__':
if torch_version() < [1, 9, 0]:
import torch
print("Your torch version is '{}', in order to better support the Quantization Aware Training (QAT) model,\n"
"Please update the torch version to '1.9.0' or higher!".format(torch.__version__))
exit(0)
model = './resnet18_i8.pt'
if not os.path.exists(model):
export_pytorch_model()
input_size_list = [[1, 3, 224, 224]]
# Create RKNN object
rknn = RKNN(verbose=True)
# Pre-process config
print('--> Config model')
rknn.config(mean_values=[123.675, 116.28, 103.53], std_values=[58.395, 58.395, 58.395], target_platform='rk3566')
print('done')
# Load model
print('--> Loading model')
ret = rknn.load_pytorch(model=model, input_size_list=input_size_list)
if ret != 0:
print('Load model failed!')
exit(ret)
print('done')
# Build model
print('--> Building model')
ret = rknn.build(do_quantization=False)
if ret != 0:
print('Build model failed!')
exit(ret)
print('done')
# Export rknn model
print('--> Export rknn model')
ret = rknn.export_rknn('./resnet_18.rknn')
if ret != 0:
print('Export rknn model failed!')
exit(ret)
print('done')
# Set inputs
img = cv2.imread('./space_shuttle_224.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = np.expand_dims(img, 0)
# Init runtime environment
print('--> Init runtime environment')
ret = rknn.init_runtime()
if ret != 0:
print('Init runtime environment failed!')
exit(ret)
print('done')
# Inference
print('--> Running model')
outputs = rknn.inference(inputs=[img], data_format=['nhwc'])
np.save('./pytorch_resnet18_qat_0.npy', outputs[0])
show_outputs(softmax(np.array(outputs[0][0])))
print('done')
rknn.release()
dd python
import os
import urllib
import traceback
import time
import sys
import numpy as np
import cv2
from rknn.api import RKNN
import torch
Q = 0
IMG_SIZE = 224
# Model from https://github.com/airockchip/rknn_model_zoo
PYTORCH_MODEL = 'checkpoint_10000_plain_m8c16.pth'
ONNX_MODEL = 'checkpoint_00000id_stu_n2n_400.onnx'
RKNN_MODEL = PYTORCH_MODEL[:-4] + '_'+str(IMG_SIZE) + '_quant' + str(Q)+'_.rknn'
IMG_PATH = './frame_400x400_3084.png'
DATASET = '/home/tony/nndeploy/mydataset/400/dataset_small.txt'
from model import Network, UNetSeeInDark_prune, PlainSR
import onnxruntime
def onnx_infer():
img = cv2.imread(IMG_PATH)
img = img.transpose(2,0,1) / 255.0
x = img.astype(np.float32)[None, ...]
print('x shape', x.shape)
ort_session = onnxruntime.InferenceSession(ONNX_MODEL,providers=['AzureExecutionProvider', 'CPUExecutionProvider'])
# 将张量转化为ndarray格式
def to_numpy(tensor):
return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()
# 构建输入的字典和计算输出结果
ort_inputs = {ort_session.get_inputs()[0].name: x}
ort_outs = ort_session.run(None, ort_inputs)
# 比较使用PyTorch和ONNX Runtime得出的精度
# np.testing.assert_allclose(to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05)
out = ort_outs[0].squeeze().transpose(1,2,0)
out_im = np.clip(out * 255 + 0.5, 0, 255).astype(np.uint8)
# post process
print(out_im.shape, out_im.dtype)
print("Exported model has been tested with ONNXRuntime, and the result looks good!")
cv2.imwrite(IMG_PATH[:-4] + '_onnx.png', out_im)
if __name__ == '__main__':
#onnx_infer()
raw = 1
if raw:
ch = 4
#model = PlainSR(4, 8, colors=ch)
model = UNetSeeInDark_prune(8, 4, 4)
IMG_SIZE = 224
PYTORCH_MODEL = 'plainsr_unet4.pth'
# checkpoint = torch.load(PYTORCH_MODEL, map_location='cpu')
# model.load_state_dict({k.replace('module.', ''): v for k, v in checkpoint.items()})
model.eval()
traced_model = torch.jit.trace(model, torch.randn(1, ch, IMG_SIZE, IMG_SIZE))
PYTORCH_MODEL2 = PYTORCH_MODEL[:-4] + str(IMG_SIZE) +'_ch4.pt'
torch.jit.save(traced_model, PYTORCH_MODEL2)
max_value = 1023
# Create RKNN object
rknn = RKNN(verbose=True)
# pre-process config
print('--> Config model')
#rknn.config( target_platform='rk3568')
rknn.config(mean_values=[[0, 0, 0, 0]], std_values=[[max_value, max_value, max_value,max_value]], target_platform='rk3568', quantized_algorithm='kl_divergence',optimization_level=2)
print('done')
# Load ONNX model
print('--> Loading model')
#ret = rknn.load_onnx(model=ONNX_MODEL)
ret = rknn.load_pytorch(model=PYTORCH_MODEL2,
input_size_list=[[1, ch, IMG_SIZE, IMG_SIZE]])
if ret != 0:
print('Load model failed!')
exit(ret)
print('done')
# Build model
print('--> Building model')
#ret = rknn.build(do_quantization=True, dataset=DATASET)
ret = rknn.build(do_quantization=(Q==1))
if ret != 0:
print('Build model failed!')
exit(ret)
print('done')
# Export RKNN model
print('--> Export rknn model')
ret = rknn.export_rknn(RKNN_MODEL)
if ret != 0:
print('Export rknn model failed!')
exit(ret)
print('done')
# Init runtime environment
# print('--> Init runtime environment')
# ret = rknn.init_runtime()
# if ret != 0:
# print('Init runtime environment failed!')
# exit(ret)
# print('done')
# #perf_detail = rknn.eval_perf()
# # Set inputs
# img = cv2.imread(IMG_PATH)
# img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# # need set in rknn.config and can be quant
# # img, ratio, (dw, dh) = letterbox(img, new_shape=(IMG_SIZE, IMG_SIZE))
# #img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
# print('input shape: ', img.shape)
# # Inference
# print('--> Running model')
# outputs = rknn.inference(inputs=[img], data_format=['nhwc'])
# print(len(outputs))
# print(outputs[0].shape)
# # np.save('./onnx_yolov5_0.npy', outputs[0])
# # np.save('./onnx_yolov5_1.npy', outputs[1])
# # np.save('./onnx_yolov5_2.npy', outputs[2])
# print('done')
# out = outputs[0].squeeze().transpose(1,2,0)
# out_im = np.clip(out * 255 + 0.5, 0, 255).astype(np.uint8)
# # post process
# print(out_im.shape, out_im.dtype)
# img_1 = out_im #cv2.cvtColor(out_im, cv2.COLOR_RGB2BGR)
# cv2.imwrite(IMG_PATH[:-4] + '_rknn_m4c8_noqu.png', img_1[...,::-1])
# rknn.release()
4.c++部署
比如降噪模型,输入是 uint8图片, 利用零拷贝 api
因此 在利用python代码进行模型格式转换的时候设置了 mean = [0,0,0], std=[255,255,255]相当于做了归一化,且自动在c++模型推理时应用, 包括归一化,量化,反量化等,c++代码里不需要再加入归一化相关代码。
因此如下图,输入的 uint8的image data, 然后归一化到0-1,因为设置了 outputs[0].want_float=1,然后model infer 得到 float data,范围 0-1. 以上全在 npu中进行,也就是全在rknn_run中进行,因为使用的时零拷贝 api.
最后cpu再 nchw -> hwc, 0-1 -> 0-255 uint8, 即可保存图像。
buildxxx.sh 中需要设置 gcc,g++交叉编译器, 注意 CMakeLists.txt,这样编译的程序可以在 rk3568平台使用
c code
/home/tony/nndeploy/rk3568/projects/rknn-toolkit2/rknpu2/examples/rknn_mobilenet_demo_copy/src/main.cc
// Copyright (c) 2021 by Rockchip Electronics Co., Ltd. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
/*-------------------------------------------
Includes
-------------------------------------------*/
#include "opencv2/core/core.hpp"
#include "opencv2/imgcodecs.hpp"
#include "opencv2/imgproc.hpp"
#include "rknn_api.h"
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <fstream>
#include <iostream>
using namespace std;
using namespace cv;
/*-------------------------------------------
Functions
-------------------------------------------*/
static void dump_tensor_attr(rknn_tensor_attr* attr)
{
printf(" index=%d, name=%s, n_dims=%d, dims=[%d, %d, %d, %d], n_elems=%d, size=%d, fmt=%s, type=%s, qnt_type=%s, "
"zp=%d, scale=%f\n",
attr->index, attr->name, attr->n_dims, attr->dims[0], attr->dims[1], attr->dims[2], attr->dims[3],
attr->n_elems, attr->size, get_format_string(attr->fmt), get_type_string(attr->type),
get_qnt_type_string(attr->qnt_type), attr->zp, attr->scale);
}
static unsigned char* load_model(const char* filename, int* model_size)
{
FILE* fp = fopen(filename, "rb");
if (fp == nullptr) {
printf("fopen %s fail!\n", filename);
return NULL;
}
fseek(fp, 0, SEEK_END);
int model_len = ftell(fp);
unsigned char* model = (unsigned char*)malloc(model_len);
fseek(fp, 0, SEEK_SET);
if (model_len != fread(model, 1, model_len, fp)) {
printf("fread %s fail!\n", filename);
free(model);
return NULL;
}
*model_size = model_len;
if (fp) {
fclose(fp);
}
return model;
}
static int rknn_GetTop(float* pfProb, float* pfMaxProb, uint32_t* pMaxClass, uint32_t outputCount, uint32_t topNum)
{
uint32_t i, j;
#define MAX_TOP_NUM 20
if (topNum > MAX_TOP_NUM)
return 0;
memset(pfMaxProb, 0, sizeof(float) * topNum);
memset(pMaxClass, 0xff, sizeof(float) * topNum);
for (j = 0; j < topNum; j++) {
for (i = 0; i < outputCount; i++) {
if ((i == *(pMaxClass + 0)) || (i == *(pMaxClass + 1)) || (i == *(pMaxClass + 2)) || (i == *(pMaxClass + 3)) ||
(i == *(pMaxClass + 4))) {
continue;
}
if (pfProb[i] > *(pfMaxProb + j)) {
*(pfMaxProb + j) = pfProb[i];
*(pMaxClass + j) = i;
}
}
}
return 1;
}
double __get_us(struct timeval t) { return (t.tv_sec * 1000000 + t.tv_usec); }
/*-------------------------------------------
Main Function
-------------------------------------------*/
int main(int argc, char** argv)
{
const int MODEL_IN_WIDTH = 224;
const int MODEL_IN_HEIGHT = 224;
const int MODEL_IN_CHANNELS = 3;
rknn_context ctx = 0;
int ret;
int model_len = 0;
unsigned char* model;
const char* model_path = argv[1];
const char* img_path = argv[2];
if (argc != 3) {
printf("Usage: %s <rknn model> <image_path> \n", argv[0]);
return -1;
}
// Load image
cv::Mat orig_img = imread(img_path, cv::IMREAD_COLOR);
if (!orig_img.data) {
printf("cv::imread %s fail!\n", img_path);
return -1;
}
//cv::Mat orig_img_rgb;
//rknn模型说明来源于RKNN-Toolkit2的的examples/tflite/mobilenet_v1示例,输入通道顺序与python代码保持一致
//cv::cvtColor(orig_img, orig_img_rgb, cv::COLOR_BGR2RGB);
cv::Mat img = orig_img.clone();
if (img.cols != MODEL_IN_WIDTH || img.rows != MODEL_IN_HEIGHT) {
printf("resize %d %d to %d %d\n", img.cols, img.rows, MODEL_IN_WIDTH, MODEL_IN_HEIGHT);
cv::resize(img, img, cv::Size(MODEL_IN_WIDTH, MODEL_IN_HEIGHT), 0, 0, cv::INTER_LINEAR);
}
// Load RKNN Model
model = load_model(model_path, &model_len);
ret = rknn_init(&ctx, model, model_len, 0, NULL);
if (ret < 0) {
printf("rknn_init fail! ret=%d\n", ret);
return -1;
}
// Get Model Input Output Info
rknn_input_output_num io_num;
ret = rknn_query(ctx, RKNN_QUERY_IN_OUT_NUM, &io_num, sizeof(io_num));
if (ret != RKNN_SUCC) {
printf("rknn_query fail! ret=%d\n", ret);
return -1;
}
printf("model input num: %d, output num: %d\n", io_num.n_input, io_num.n_output);
printf("input tensors:\n");
rknn_tensor_attr input_attrs[io_num.n_input];
memset(input_attrs, 0, sizeof(input_attrs));
for (int i = 0; i < io_num.n_input; i++) {
input_attrs[i].index = i;
ret = rknn_query(ctx, RKNN_QUERY_INPUT_ATTR, &(input_attrs[i]), sizeof(rknn_tensor_attr));
if (ret != RKNN_SUCC) {
printf("rknn_query fail! ret=%d\n", ret);
return -1;
}
dump_tensor_attr(&(input_attrs[i]));
}
printf("output tensors:\n");
rknn_tensor_attr output_attrs[io_num.n_output];
memset(output_attrs, 0, sizeof(output_attrs));
for (int i = 0; i < io_num.n_output; i++) {
output_attrs[i].index = i;
ret = rknn_query(ctx, RKNN_QUERY_OUTPUT_ATTR, &(output_attrs[i]), sizeof(rknn_tensor_attr));
if (ret != RKNN_SUCC) {
printf("rknn_query fail! ret=%d\n", ret);
return -1;
}
dump_tensor_attr(&(output_attrs[i]));
}
// Set Input Data
rknn_input inputs[1];
memset(inputs, 0, sizeof(inputs));
inputs[0].index = 0;
inputs[0].type = RKNN_TENSOR_UINT8;
inputs[0].size = img.cols * img.rows * img.channels() * sizeof(uint8_t);
inputs[0].fmt = RKNN_TENSOR_NHWC;
inputs[0].buf = img.data;
ret = rknn_inputs_set(ctx, io_num.n_input, inputs);
if (ret < 0) {
printf("rknn_input_set fail! ret=%d\n", ret);
return -1;
}
// Run
printf("rknn_run\n");
ret = rknn_run(ctx, nullptr);
if (ret < 0) {
printf("rknn_run fail! ret=%d\n", ret);
return -1;
}
// Get Output
rknn_output outputs[1];
memset(outputs, 0, sizeof(outputs));
outputs[0].want_float = 1;
ret = rknn_outputs_get(ctx, 1, outputs, NULL);
if (ret < 0) {
printf("rknn_outputs_get fail! ret=%d\n", ret);
return -1;
}
// Post Process
// for (int i = 0; i < io_num.n_output; i++) {
// uint32_t MaxClass[5];
// float fMaxProb[5];
// float* buffer = (float*)outputs[i].buf;
// uint32_t sz = outputs[i].size / 4;
// rknn_GetTop(buffer, fMaxProb, MaxClass, sz, 5);
// printf(" --- Top5 ---\n");
// for (int i = 0; i < 5; i++) {
// printf("%3d: %8.6f\n", MaxClass[i], fMaxProb[i]);
// }
// }
int height = img.rows;
int width = img.cols;
cv::Mat out(height, width, CV_32FC3, outputs[0].buf) ;
// get uint8 result
cv::Mat out1;
out.convertTo(out1,CV_8UC3, 255, 0);
std::string out_path1 = "./out1.jpg";
imwrite(out_path1, out1);
// chw->hwc
cv::Mat output(height, width, CV_8UC3);
uint8_t* img_data = (uint8_t*)output.data;
for (int h = 0; h < height; h++)
{
for (int w = 0; w < width; w++)
{
for (int c = 0; c < 3; c++)
{
int in_index = h * width * 3 + w * 3 + c;
int out_index = c * width * height + h * width + w;
img_data[in_index] = out1.data[out_index];
}
}
}
output.convertTo(output,CV_8UC3, 1, 0);
std::string out_path = "./out2.jpg";
imwrite(out_path, output);
// Release rknn_outputs
rknn_outputs_release(ctx, 1, outputs);
// 耗时统计
struct timeval start_time, stop_time;
int test_count = 10;
gettimeofday(&start_time, NULL);
for (int i = 0; i < test_count; ++i)
{
rknn_inputs_set(ctx, io_num.n_input, inputs);
ret = rknn_run(ctx, NULL);
ret = rknn_outputs_get(ctx, io_num.n_output, outputs, NULL);
#if PERF_WITH_POST
cv::Mat out(height, width, CV_32FC3, outputs[0].buf) ;
out.convertTo(out,CV_8UC3, 0, 255);
#endif
ret = rknn_outputs_release(ctx, io_num.n_output, outputs);
}
gettimeofday(&stop_time, NULL);
printf("loop count = %d , average run %f ms\n", test_count,
(__get_us(stop_time) - __get_us(start_time)) / 1000.0 / test_count);
// Release
if (ctx > 0)
{
rknn_destroy(ctx);
}
if (model) {
free(model);
}
return 0;
}
5.其他待研究
量化
多核
效果评估,
时间和空间评估
提高推理速度
等内容需要进一步测试。
以上内容在仓库 /doc/xx.pdf 中有详细的官方说明,尽量参考官方说明。