pp-human在rk3588上部署

Kun Li

已于 2023-08-06 11:56:02 修改

阅读量5.7k

点赞数 2

分类专栏：算法部署文章标签：人工智能

于 2023-03-01 09:33:52 首次发布

本文链接：https://blog.csdn.net/u012193416/article/details/129158595

版权

算法部署专栏收录该内容

27 篇文章

订阅专栏

https://github.com/leeguandong/Yolov5_rknnlite2https://github.com/leeguandong/Yolov5_rknnlite2 这是我在paddledetection和rknn官方基础上改的，用yolov5在rk3588，使用rknnlite2部署。

手把手教你百度飞桨PP-YOLOE部署到瑞芯微RK3588_pp飞桨怎么安装_布衣神棍的博客-CSDN博客手把手教你百度飞桨PP-YOLOE部署到瑞芯微RK3588https://blog.csdn.net/buyishengun/article/details/127653529 开发板概况 — TB-RK3588x 0.1 文档https://t.rock-chips.com/wiki/CN/tb-rk3588x/01%E5%BF%AB%E9%80%9F%E4%B8%8A%E6%89%8B.html 瑞芯微RK3588等AI硬件NPU部署（paddle）_Vertira的博客-CSDN博客瑞芯微RK3588等AI硬件NPU部署https://blog.csdn.net/Vertira/article/details/127811686 yolov5训练pt模型并转换为rknn模型，部署在RK3588开发板上——从训练到部署全过程_rknn yolov5_Billy_zz的博客-CSDN博客本文实现了yolov5模型从训练pt模型，到转换为rknn模型，最终部署在RK3588板子上使用NPU加速推理的过程。https://blog.csdn.net/m0_57315535/article/details/128250096?spm=1001.2101.3001.6650.5&utm_medium=distribute.pc_relevant.none-task-blog-2~default~CTRLIST~Rate-5-128250096-blog-126153227.pc_relevant_3mothn_strategy_and_data_recovery&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2~default~CTRLIST~Rate-5-128250096-blog-126153227.pc_relevant_3mothn_strategy_and_data_recovery&utm_relevant_index=9 rockchip-linux · GitCodeGitCode——开源代码托管平台，独立第三方开源社区，Git/Github/Gitlabhttps://gitcode.net/mirrors/rockchip-linux Firefly | 让科技更简单，让生活更智能Firefly是天启科技旗下的品牌，我们专注于开源智能硬件，物联网，数字音频产品的研发设计、生产和销售，同时提供了智能硬件产品的整体解决方案。Firefly产品包括行业主板，核心板，开源板等。全系列产品均是芯片原厂瑞芯微（Rockchip）推荐板卡，获得原生SDK支持。核心板与行业主板广泛应用于人工智能，商业显示，广告一体机，智能POS，人脸识别终端，物联网，智慧城市等领域。https://www.t-firefly.com/doc/download/106.html 1. NPU使用 — Firefly WikiEC-R3588SPC采用 Rockchip RK3588S新一代旗舰级八核64位处理器，最大可配32GB大内存；支持8K视频编解码；支持千兆网、；支持多种操作系统；可适用于ARM PC、边缘计算、云服务器、智能NVR等领域https://wiki.t-firefly.com/zh_CN/EC-R3588SPC/usage_npu.html 【FastDeploy + 瑞芯微】RV1126、RK3588全量化部署详解_哔哩哔哩_bilibili【FastDeploy + 瑞芯微】RV1126、RK3588全量化部署详解, 视频播放量 618、弹幕量 0、点赞数 9、投硬币枚数 4、收藏人数 24、转发人数 6, 视频作者飞桨PaddlePaddle, 作者简介后厂村第一炼丹师<(￣︶￣)>，相关视频：【瑞芯微NPU部署】官方YOLOV5+Tengine推理引擎+RV1126嵌入式AI硬件，3588--新一代NPU的介绍及使用，瑞芯微RK3588 ARM PC解决方案，摆脱电脑，直接在RK3588平台上推理，帮用户部署 RKNN 模加速 AI 应用的落地，瑞芯微RK3588高端平板解决方案，开箱首发！瑞芯微旗舰芯RK3588开发板，瑞芯微RK3588边缘计算及AI应用，瑞芯微RK3588智能车载360°全景环视，rv1126板子说明！，香蕉派开源社区完成瑞芯微RK3568/RK3588全国产化开发板硬件验证并运行国产麒麟Linux系统https://www.bilibili.com/video/BV1Xg411x7MV/?spm_id_from=333.999.top_right_bar_window_history.content.click&vd_source=4aed82e35f26bb600bc5b46e65e25c22 examples/vision/detection/paddledetection/rknpu2/README_CN.md · develop · mirrors / paddlepaddle / fastdeploy · GitCode⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile andhttps://gitcode.net/mirrors/paddlepaddle/fastdeploy/-/blob/develop/examples/vision/detection/paddledetection/rknpu2/README_CN.md yolov5训练并生成rknn模型以及3588平台部署_rknn yolov5_新鑫信心的博客-CSDN博客瑞芯微RK3588上yolov5目标检测的部署。https://blog.csdn.net/m0_51714298/article/details/125916417

1.bfloat16的安装

使用distutils构建Python扩展模块(Building Python extension module with distutils)_电脑培训使用distutils构建Python扩展模块(Building Python extension module with distutils),我正在使用distutils来构建一个用C ++编写的Python扩展模块。我遇到的问题是，为了编译https://www.656463.com/wenda/sydistutilsgjPythonkzmk_284 https://www.cnblogs.com/goldsunshine/p/8872623.htmlhttps://www.cnblogs.com/goldsunshine/p/8872623.html用源码编译，直接sudo pip install 安装有问题，源码python setup.py install --user，有的时候sudo pip install 不好使，就使用--user这种方式，有的时候三方库链接不到，python setup.py install不好使，可以试试python setup.py sdist，记住distutils只是一种打包方式。此外，升级的gcc5.4在源码编译时可能有问题，加上

module.extra_compile_args = ['--std=c++0x']

2.rknpu

250帧，640x640

训练框架自带的一些量化操作可以被rknn读取

rknpu即为上面的rknn runtime.

rk3588的cpu比较强，其实在PC或者板子上编译都可以。

dmesg | grep Galcore 查看版本

wget/adb/ssh推送都是可以的

rk3588对输入有优化

3.paddledetection中的pp-human在rk3588上部署

rknn的部署其实有两部分，第一部分是在linux系统上开发量化和rknn模型，用到的是rknn-toolkit2，在板子上python调用，pc端是必须要装的，因为给的rknn-toolkit2是用来转模型的，如果不用fastdeploy这样的框架的话。可以使用rknn-toolkit2中rknn-lite来部署，可以调用npu接口，如果对速度有更高要求，则需要使用rknpu来进行c++侧的部署，c++优化了预处理等操作，事实上，预处理是很耗时的。

3.1 paddle2rknn

paddle提供的权重已经是export_model之后的模型了，因此在export_model.py之前通过exclude_nms=True不可行。

下面这个链接可行
examples/vision/detection/paddledetection/rknpu2/README_CN.md · develop · mirrors / paddlepaddle / fastdeploy · GitCode⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile andhttps://gitcode.net/mirrors/paddlepaddle/fastdeploy/-/blob/develop/examples/vision/detection/paddledetection/rknpu2/README_CN.md

代码示例：

paddle2onnx --model_dir /home/sniss/local_disk/rknn-toolkit2-master/examples/onnx/ppyoloe/paddle_ori/mot_ppyoloe_l_36e_pipeline \
--model_filename /home/sniss/local_disk/rknn-toolkit2-master/examples/onnx/ppyoloe/paddle_ori/mot_ppyoloe_l_36e_pipeline/model.pdmodel \
--params_filename /home/sniss/local_disk/rknn-toolkit2-master/examples/onnx/ppyoloe/paddle_ori/mot_ppyoloe_l_36e_pipeline/model.pdiparams \
--save_file /home/sniss/local_disk/rknn-toolkit2-master/examples/onnx/ppyoloe/ppyoloeonnx/mot_ppyoloe_l_36e_pipeline.onnx \
--enable_dev_version True
--opset_version 12
--enable_onnx_checker True

python -m paddle2onnx.optimize --input_model /home/sniss/local_disk/rknn-toolkit2-master/examples/onnx/ppyoloe/ppyoloeonnx/mot_ppyoloe_l_36e_pipeline.onnx \
--output_model /home/sniss/local_disk/rknn-toolkit2-master/examples/onnx/ppyoloe/ppyoloeonnx/mot_ppyoloe_l_36e_pipeline_opt.onnx \
--input_shape_dict "{'image':[1,3,640,640], 'scale_factor':[1,2]}"

python export_rknn.py --config_path /home/sniss/local_disk/rknn-toolkit2-master/examples/onnx/ppyoloe/ppyoloeonnx/mot_ppyoloe_opt.yaml --target_platform rk3588

上面这种方式是通过rknn来进行onnx静态图的裁剪，避免了在onnx和paddle上面进行裁剪。

3.2 paddleinference->rknn推理

剪裁完节点之后输出是[(8400,4),8400]的array，第一维是两个点坐标有8400，后面是对应的置信度有8400个，此时解耦头输出的结果。

pphuman部署的是ppyoloe模型，其实是跟踪的模型，但是第一部分还是ppyoloe，后面加上卡尔曼滤波和匈牙利算法，所以核心还是ppyoloe在rk3588的部署。

此处写了两个版本的nms去做后处理，发现都输出的框都有问题，即便是没有量化，输出的框还是有问题。

import os
import urllib
import traceback
import time
import sys
import yaml
import numpy as np
import cv2
from rknn.api import RKNN

ONNX_MODEL = 'yolov5s.onnx'
RKNN_MODEL = '/home/sniss/local_disk/rknn-toolkit2-master/examples/onnx/ppyoloe/ppyoloeonnx/mot_ppyoloe_l_36e_pipeline_opt_rk3588_quantized.rknn'
IMG_PATH = './test.png'
DATASET = './dataset.txt'

QUANTIZE_ON = True

OBJ_THRESH = 0.25
NMS_THRESH = 0.45
IMG_SIZE = 640

CLASSES = ["player"]

def draw_results(result, image, draw_thresh=0.5):
#     plt.figure(figsize=(10, 10))
#     im = imread(filename)
#     plt.imshow(im)
#     currentAxis=plt.gca()
#     colors = ['r', 'g', 'b', 'k', 'y', 'pink', 'purple']
    for item in result:
        top, left, right, bottom= item[2:6]
        label = int(item[0])
        score = item[1]
        name = CLASSES[label]
        if item[1] > draw_thresh:
            top = int(top)
            left = int(left)
            right = int(right)
            bottom = int(bottom)

            cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2)
            cv2.putText(image, '{0} {1:.2f}'.format(name, score),
                    (top, left - 6),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.6, (0, 0, 255), 2)
#             draw_rectangle(currentAxis, box, edgecolor = colors[label])
#             plt.text(box[0], box[1], name, fontsize=12, color=colors[label])
#     plt.savefig('/home/aistudio/external-libraries/PaddleDection/outout_img/output_pic.png')
    
#     plt.show()
    

def draw(image, boxes, scores, classes):
    """Draw the boxes on the image.

    # Argument:
        image: original image.
        boxes: ndarray, boxes of objects.
        classes: ndarray, classes of objects.
        scores: ndarray, scores of objects.
        all_classes: all classes name.
    """
    for box, score, cl in zip(boxes, scores, classes):
        top, left, right, bottom = box
        print('class: {}, score: {}'.format(CLASSES[cl], score))
        print('box coordinate left,top,right,down: [{}, {}, {}, {}]'.format(top, left, right, bottom))
        top = int(top)
        left = int(left)
        right = int(right)
        bottom = int(bottom)

        cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2)
        cv2.putText(image, '{0} {1:.2f}'.format(CLASSES[cl], score),
                    (top, left - 6),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.6, (0, 0, 255), 2)

        
# 计算IoU，矩形框的坐标形式为xyxy，这个函数会被保存在box_utils.py文件中
def box_iou_xyxy(box1, box2):
    # 获取box1左上角和右下角的坐标
    x1min, y1min, x1max, y1max = box1[0], box1[1], box1[2], box1[3]
    # 计算box1的面积
    s1 = (y1max - y1min + 1.) * (x1max - x1min + 1.)
    # 获取box2左上角和右下角的坐标
    x2min, y2min, x2max, y2max = box2[0], box2[1], box2[2], box2[3]
    # 计算box2的面积
    s2 = (y2max - y2min + 1.) * (x2max - x2min + 1.)

    # 计算相交矩形框的坐标
    xmin = np.maximum(x1min, x2min)
    ymin = np.maximum(y1min, y2min)
    xmax = np.minimum(x1max, x2max)
    ymax = np.minimum(y1max, y2max)
    # 计算相交矩形行的高度、宽度、面积
    inter_h = np.maximum(ymax - ymin + 1., 0.)
    inter_w = np.maximum(xmax - xmin + 1., 0.)
    intersection = inter_h * inter_w
    # 计算相并面积
    union = s1 + s2 - intersection
    # 计算交并比
    iou = intersection / union
    return iou
    
def nms(bboxes, scores, score_thresh, nms_thresh, pre_nms_topk):
    """
    nms
    """
    inds = np.argsort(scores)
    inds = inds[::-1]
    inds = inds[:pre_nms_topk]
    keep_inds = []
    while(len(inds) > 0):
        cur_ind = inds[0]
        cur_score = scores[cur_ind]
        # if score of the box is less than score_thresh, just drop it
        if cur_score < score_thresh:
            break

        keep = True
        for ind in keep_inds:
            current_box = bboxes[cur_ind]
            remain_box = bboxes[ind]
            # import pdb;pdb.set_trace()
            iou = box_iou_xyxy(current_box, remain_box)
            if iou > nms_thresh:
                keep = False
                break
        if keep:
            keep_inds.append(cur_ind)
        inds = inds[1:]

    return np.array(keep_inds)

def multiclass_nms(bboxes, scores, score_thresh=0.05, nms_thresh=0.5, pre_nms_topk=1000, pos_nms_topk=100):
    """
    This is for multiclass_nms
    """
    batch_size = bboxes.shape[0]
    class_num = scores.shape[1]
    rets = []
    for i in range(batch_size):
        bboxes_i = bboxes[i]
        scores_i = scores[i]
        ret = []
        for c in range(class_num):
            scores_i_c = scores_i[c]
            keep_inds = nms(bboxes_i, scores_i_c, score_thresh, nms_thresh, pre_nms_topk)
            if len(keep_inds) < 1:
                continue
            keep_bboxes = bboxes_i[keep_inds]
            keep_scores = scores_i_c[keep_inds]
            keep_results = np.zeros([keep_scores.shape[0], 6])
            keep_results[:, 0] = c
            keep_results[:, 1] = keep_scores[:]
            keep_results[:, 2:6] = keep_bboxes[:, :]
            ret.append(keep_results)
        if len(ret) < 1:
            rets.append(ret)
            continue
        ret_i = np.concatenate(ret, axis=0)
        scores_i = ret_i[:, 1]
        if len(scores_i) > pos_nms_topk:
            inds = np.argsort(scores_i)[::-1]
            inds = inds[:pos_nms_topk]
            ret_i = ret_i[inds]

        rets.append(ret_i)

    return rets

def nms_boxes(boxes, scores):
    """Suppress non-maximal boxes.

    # Arguments
        boxes: ndarray, boxes of objects.
        scores: ndarray, scores of objects.

    # Returns
        keep: ndarray, index of effective boxes.
    """
    x = boxes[:, 0]
    y = boxes[:, 1]
    w = boxes[:, 2] - boxes[:, 0]
    h = boxes[:, 3] - boxes[:, 1]

    areas = w * h
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)

        xx1 = np.maximum(x[i], x[order[1:]])
        yy1 = np.maximum(y[i], y[order[1:]])
        xx2 = np.minimum(x[i] + w[i], x[order[1:]] + w[order[1:]])
        yy2 = np.minimum(y[i] + h[i], y[order[1:]] + h[order[1:]])

        w1 = np.maximum(0.0, xx2 - xx1 + 0.00001)
        h1 = np.maximum(0.0, yy2 - yy1 + 0.00001)
        inter = w1 * h1

        ovr = inter / (areas[i] + areas[order[1:]] - inter)
        inds = np.where(ovr <= NMS_THRESH)[0]
        order = order[inds + 1]
    keep = np.array(keep)
    return keep


def ppyolo_nms(pred_bboxes, pred_scores):
#     pred_bboxes:(1,8400,1) pred_scores:(1,1,8400)    
    boxes = pred_bboxes.reshape(-1,4)
    box_class_probs = pred_scores.reshape(pred_scores.shape[-1]*pred_scores.shape[0],-1)
    box_confidences = np.ones(boxes.shape[0]).reshape(-1,)
    
    _box_pos = np.where(box_confidences >= OBJ_THRESH)
    boxes = boxes[_box_pos]
    box_confidences = box_confidences[_box_pos]
    box_class_probs = box_class_probs[_box_pos]
    
    class_max_score = np.max(box_class_probs, axis=-1)
    classes = np.argmax(box_class_probs, axis=-1)
    _class_pos = np.where(class_max_score >= OBJ_THRESH)

    boxes = boxes[_class_pos]
    classes = classes[_class_pos]
    scores = (class_max_score* box_confidences)[_class_pos]    
    
#     boxes = np.concatenate(boxes)
#     classes = np.concatenate(classes)
#     scores = np.concatenate(scores)
    nboxes, nclasses, nscores = [], [], []
    
    for c in set(classes):
        inds = np.where(classes == c)
        b = boxes[inds]
        c = classes[inds]
        s = scores[inds]

        keep = nms_boxes(b, s)

        nboxes.append(b[keep])
        nclasses.append(c[keep])
        nscores.append(s[keep])

    if not nclasses and not nscores:
        return None, None, None

    boxes = np.concatenate(nboxes)
    classes = np.concatenate(nclasses)
    scores = np.concatenate(nscores)

    return boxes, classes, scores


    
def letterbox(im, new_shape=(640, 640), color=(0, 0, 0)):
    # Resize and pad image while meeting stride-multiple constraints
    shape = im.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return im, ratio, (dw, dh)


if __name__ == '__main__':   
    with open("/home/sniss/local_disk/rknn-toolkit2-master/examples/onnx/ppyoloe/ppyoloeonnx/mot_ppyoloe_opt.yaml") as file:
        file_data = file.read()
        yaml_config = yaml.safe_load(file_data)
    print(yaml_config)
    model = RKNN(True)

    # Config
    mean_values = yaml_config["mean"]
    std_values = yaml_config["std"]
    model.config(
        mean_values=mean_values,
        std_values=std_values,
        target_platform="rk3588")

    # Load ONNX model
    if yaml_config["outputs_nodes"] is None:
        ret = model.load_onnx(model=yaml_config["model_path"])
    else:
        ret = model.load_onnx(
            model=yaml_config["model_path"],
            outputs=yaml_config["outputs_nodes"])
    assert ret == 0, "Load model failed!"

    # Build model
    ret = model.build(
        do_quantization=yaml_config["do_quantization"],
        dataset=yaml_config["dataset"])
    assert ret == 0, "Build model failed!"

    # Init Runtime
    ret = model.init_runtime()
    assert ret == 0, "Init runtime environment failed!"

    # Export
    if not os.path.exists(yaml_config["output_folder"]):
        os.mkdir(yaml_config["output_folder"])

    model_base_name = os.path.basename(yaml_config["model_path"]).split(".")[0]
#     model_device_name = config.target_platform.lower()
    model_device_name = 'rk3588'
    if yaml_config["do_quantization"]:
        model_save_name = model_base_name + "_" + model_device_name + "_quantized" + ".rknn"
    else:
        model_save_name = model_base_name + "_" + model_device_name + "_unquantized" + ".rknn"
    ret = model.export_rknn(
        os.path.join(yaml_config["output_folder"], model_save_name))
    assert ret == 0, "Export rknn model failed!"
    print("Export OK!")
    
    # Set inputs
    img = cv2.imread(IMG_PATH)
    img, ratio, (dw, dh) = letterbox(img, new_shape=(IMG_SIZE, IMG_SIZE))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))

    # Inference
    print('--> Running model')
    outputs = model.inference(inputs=[img])
    
    print('done')
#     import pdb;pdb.set_trace()
#     pred_bboxes = outputs[0]
#     pred_scores = outputs[1]
#     bbox_pred = multiclass_nms(pred_bboxes,pred_scores)
#     bbox_pred = bbox_pred[0]
    
#     import pdb;pdb.set_trace()
#     img_1 = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
#     if bbox_pred is not None:
#         draw_results(bbox_pred, img_1, draw_thresh=0.1)
#     cv2.imwrite("1.png",img_1)
#     draw_results(result, image, draw_thresh=0.5):
    
    import pdb;pdb.set_trace()
    pred_bboxes = outputs[0]
    pred_scores = outputs[1]
#     pred_bboxes = pred_bboxes.numpy()
#     pred_scores = pred_scores.numpy()
    boxes, classes, scores = ppyolo_nms(pred_bboxes,pred_scores)
    img_1 = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
    if boxes is not None:
        draw(img_1, boxes, scores, classes)
    cv2.imwrite("1.png",img_1)

    rknn.release()

4.fastdeploy部署

fastdeploy可以部署在rk3588上，比较友好的其还有python的接口，可以做快速的尝试，还是用ppyoloe跑检测，后面在加上跟踪的代码，fastdeploy的python接口在rk3588上默认是onnxruntime的推理后端，python似乎是不支持rknpu2，没跑通。

'''
@Time    : 2023/2/27 14:37
@Author  : leeguandon@gmail.com
'''
import os
import fastdeploy as fd
import cv2
import numpy as np
from ocsort_tracker import OCSORTTracker
from collections import defaultdict
from pathlib import Path
import copy

OBJ_THRESH = 0.25
NMS_THRESH = 0.45
CLASSES = ["player"]
visual = True


def draw(image, boxes, scores, classes):
    """Draw the boxes on the image.

    # Argument:
        image: original image.
        boxes: ndarray, boxes of objects.
        classes: ndarray, classes of objects.
        scores: ndarray, scores of objects.
        all_classes: all classes name.
    """
    for box, score, cl in zip(boxes, scores, classes):
        top, left, right, bottom = box
        print('class: {}, score: {}'.format(CLASSES[cl], score))
        print('box coordinate left,top,right,down: [{}, {}, {}, {}]'.format(top, left, right, bottom))
        top = int(top)
        left = int(left)
        right = int(right)
        bottom = int(bottom)

        cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2)
        cv2.putText(image, '{0} {1:.2f}'.format(CLASSES[cl], score),
                    (top, left - 6),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.6, (0, 0, 255), 2)


def nms_boxes(boxes, scores):
    """Suppress non-maximal boxes.

    # Arguments
        boxes: ndarray, boxes of objects.
        scores: ndarray, scores of objects.

    # Returns
        keep: ndarray, index of effective boxes.
    """
    x = boxes[:, 0]
    y = boxes[:, 1]
    w = boxes[:, 2] - boxes[:, 0]
    h = boxes[:, 3] - boxes[:, 1]

    areas = w * h
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)

        xx1 = np.maximum(x[i], x[order[1:]])
        yy1 = np.maximum(y[i], y[order[1:]])
        xx2 = np.minimum(x[i] + w[i], x[order[1:]] + w[order[1:]])
        yy2 = np.minimum(y[i] + h[i], y[order[1:]] + h[order[1:]])

        w1 = np.maximum(0.0, xx2 - xx1 + 0.00001)
        h1 = np.maximum(0.0, yy2 - yy1 + 0.00001)
        inter = w1 * h1

        ovr = inter / (areas[i] + areas[order[1:]] - inter)
        inds = np.where(ovr <= NMS_THRESH)[0]
        order = order[inds + 1]
    keep = np.array(keep)
    return keep


def ppyolo_nms(pred_bboxes, pred_scores):
    boxes = pred_bboxes.reshape(-1, 4)
    box_confidences = np.ones(pred_bboxes.shape[0]).reshape(-1, )
    box_class_probs = pred_scores.reshape(pred_scores.shape[-1], -1)

    _box_pos = np.where(box_confidences >= OBJ_THRESH)
    boxes = boxes[_box_pos]
    box_confidences = box_confidences[_box_pos]
    box_class_probs = box_class_probs[_box_pos]

    class_max_score = np.max(box_class_probs, axis=-1)
    classes = np.argmax(box_class_probs, axis=-1)
    _class_pos = np.where(class_max_score >= OBJ_THRESH)

    boxes = boxes[_class_pos]
    classes = classes[_class_pos]
    scores = (class_max_score * box_confidences)[_class_pos]

    nboxes, nclasses, nscores = [], [], []

    for c in set(classes):
        inds = np.where(classes == c)
        b = boxes[inds]
        c = classes[inds]
        s = scores[inds]

        keep = nms_boxes(b, s)

        nboxes.append(b[keep])
        nclasses.append(c[keep])
        nscores.append(s[keep])

    if not nclasses and not nscores:
        return None, None, None

    boxes = np.concatenate(nboxes)
    classes = np.concatenate(nclasses)
    scores = np.concatenate(nscores)

    return boxes, classes, scores


def postprocess(boxes, classes, scores):
    nboxes = []
    if len(boxes) > 0:
        for i, box in enumerate(boxes):
            classes_scores = np.append(classes[i], scores[i])
            boxes_ = np.append(classes_scores, box)
            nboxes.append(boxes_.tolist())
    result = {"boxes": np.array([boxes_]), 'boxes_num': np.array([len(boxes)])}
    return result


class SDE_Detector(object):
    def __init__(self):
        use_byte = False
        det_thresh = 0.4
        max_age = 30
        min_hits = 3
        iou_threshold = 0.3
        delta_t = 3
        inertia = 0.2
        min_box_area = 0
        vertical_ratio = 0

        self.tracker = OCSORTTracker(
            det_thresh=det_thresh,
            max_age=max_age,
            min_hits=min_hits,
            iou_threshold=iou_threshold,
            delta_t=delta_t,
            inertia=inertia,
            min_box_area=min_box_area,
            vertical_ratio=vertical_ratio,
            use_byte=use_byte)

    def tracking(self, det_results):
        pred_dets = det_results['boxes']
        pred_embs = det_results.get('embeddings', None)

        online_targets = self.tracker.update(pred_dets, pred_embs)
        online_tlwhs = defaultdict(list)
        online_scores = defaultdict(list)
        online_ids = defaultdict(list)
        for t in online_targets:
            tlwh = [t[0], t[1], t[2] - t[0], t[3] - t[1]]  # top,left,w,h
            tscore = float(t[4])
            tid = int(t[5])
            if tlwh[2] * tlwh[3] <= self.tracker.min_box_area: continue
            if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[
                3] > self.tracker.vertical_ratio:
                continue
            if tlwh[2] * tlwh[3] > 0:
                online_tlwhs[0].append(tlwh)
                online_ids[0].append(tid)
                online_scores[0].append(tscore)
        tracking_outs = {
            'online_tlwhs': online_tlwhs,  # 坐标
            'online_scores': online_scores,  # >0.4
            'online_ids': online_ids,  # [10,9,8,7,6,5,4,3,2,1]
        }
        return tracking_outs


def get_color(idx):
    idx = idx * 3
    color = ((37 * idx) % 255, (17 * idx) % 255, (29 * idx) % 255)
    return color


def plot_tracking_dict(image,
                       num_classes,
                       tlwhs_dict,
                       obj_ids_dict,
                       scores_dict,
                       frame_id=0,
                       fps=0.,
                       ids2names=[]):
    im = np.ascontiguousarray(np.copy(image))  # shape：480,854,3
    im_h, im_w = im.shape[:2]
    text_scale = max(0.5, image.shape[1] / 3000.)
    text_thickness = 2
    line_thickness = max(1, int(image.shape[1] / 500.))

    for cls_id in range(num_classes):
        tlwhs = tlwhs_dict[cls_id]
        obj_ids = obj_ids_dict[cls_id]
        scores = scores_dict[cls_id]
        cv2.putText(
            im,
            'frame: %d fps: %.2f num: %d' % (frame_id, fps, len(tlwhs)),
            (0, int(15 * text_scale) + 5),
            cv2.FONT_ITALIC,
            text_scale, (0, 0, 255),
            thickness=text_thickness)

        record_id = set()
        for i, tlwh in enumerate(tlwhs):
            x1, y1, w, h = tlwh
            intbox = tuple(map(int, (x1, y1, x1 + w, y1 + h)))
            center = tuple(map(int, (x1 + w / 2., y1 + h / 2.)))
            obj_id = int(obj_ids[i])

            id_text = '{}'.format(int(obj_id))
            if ids2names != []:
                id_text = '{}_{}'.format(ids2names[cls_id], id_text)
            else:
                id_text = 'class{}_{}'.format(cls_id, id_text)

            _line_thickness = 1 if obj_id <= 0 else line_thickness

            in_region = False
            color = get_color(abs(obj_id)) if in_region == False else (0, 0,
                                                                       255)
            cv2.rectangle(
                im,
                intbox[0:2],
                intbox[2:4],
                color=color,
                thickness=line_thickness)
            cv2.putText(
                im,
                id_text, (intbox[0], intbox[1] - 25),
                cv2.FONT_ITALIC,
                text_scale,
                color,
                thickness=text_thickness)
    return im


option = fd.RuntimeOption()
# option.use_cpu()
# option.use_openvino_backend() # 一行命令切换使用 OpenVINO部署

model = fd.vision.detection.PPYOLOE(
    "/home/sniss/local_disk/rknn-toolkit2-master/examples/onnx/ppyoloe/paddle_ori/mot_ppyoloe_l_36e_pipeline/model.pdmodel",
    "/home/sniss/local_disk/rknn-toolkit2-master/examples/onnx/ppyoloe/paddle_ori/mot_ppyoloe_l_36e_pipeline/model.pdiparams" ,
"/home/sniss/local_disk/rknn-toolkit2-master/examples/onnx/ppyoloe/paddle_ori/mot_ppyoloe_l_36e_pipeline/infer_cfg.yml")

tracker = SDE_Detector()
mot_results = []

video_file = "kitch.mp4"
output_dir = "results"
capture = cv2.VideoCapture(video_file)

# Get Video info : resolution, fps, frame count
width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(capture.get(cv2.CAP_PROP_FPS))
frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
print("video fps: %d, frame_count: %d" % (fps, frame_count))

video_out_name = Path(video_file).stem
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
out_path = os.path.join(output_dir, video_out_name + ".mp4")
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))

frame_id = 0

while (1):
    # frame_id = 0
    if frame_id % 10 == 0:
        print('frame id: {}'.format(frame_id))

    ret, frame = capture.read()
    if not ret:
        break
    # img = cv2.imread("test.png")
    img = frame
    result = model.predict(copy.deepcopy(img))

    pred_bboxes = np.array(result.boxes)
    pred_scores = np.array(result.scores)

    boxes, classes, scores = ppyolo_nms(pred_bboxes, pred_scores)

    # boxes
    # array([[ 618.63458252,  172.54750061, 1023.77459717,  781.89233398]])
    # classes
    # array([0])
    #  scores
    # array([0.95259225])

    det_result = postprocess(boxes, classes, scores)
    tracking_outs = tracker.tracking(det_result)
    online_tlwhs = tracking_outs['online_tlwhs']
    online_scores = tracking_outs['online_scores']
    online_ids = tracking_outs['online_ids']
    mot_results.append([online_tlwhs, online_scores, online_ids])

    if visual:
        im = plot_tracking_dict(
            frame,
            1,
            online_tlwhs,
            online_ids,
            online_scores,
            frame_id=frame_id,
            ids2names=CLASSES)
        cv2.imwrite(
            os.path.join(output_dir, '{:05d}.jpg'.format(frame_id)), im)

    frame_id += 1

    writer.write(im)

    # img_1 = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
    # if boxes is not None:
    #     draw(img_1, boxes, scores, classes)
    #
    # img_2 = cv2.cvtColor(img_1, cv2.COLOR_BGR2RGB)
    # cv2.imwrite("fastdeploy_1.png", img_2)


writer.release()

fastdeploy的c++版本尝试、

5.rknn-lite2 npu的python接口

用rknn-lite2可以调用rk3588的npu，速度相当不错。

总结：整体来说，在pc端可以使用rknn-toolkit2做rknn模型的转换，这一步最关键的就是.rknn模型的获取，rknpu中虽然也有onnx2rknn，但是想在pc端用python做测试，还是走这个路子，有了rknn模型之后，可以用rknn-lite2在rk3588上推理，或者使用rknpu的c++接口，rknpu的c++接口高效处理了数据预处理，这一步其实很耗时间的，速度会更快点。