3588嵌入板块跑rknn yolov5基本方法

FL1768317420

已于 2024-03-31 16:11:30 修改

阅读量601

点赞数 7

分类专栏：深度学习文章标签： YOLO

于 2024-03-31 16:08:49 首次发布

本文链接：https://blog.csdn.net/FL1768317420/article/details/137202518

版权

深度学习专栏收录该内容

12 篇文章 0 订阅

订阅专栏

3588芯片有3个NPU，每个算力2T，跑一般的深度学习算法基本够用了。3588的深度学习框架是RKNN，官方和网上的资料都不太多，主要需要自己来看了，这个和nvida jetson差的还是比较远，nvidia的嵌入式环境，只要装个对应的torch版本就可以用了，然后可以基于deepstream等框架进行进一步的性能优化。rknn这方面就差很多，基本都需要自己来改模型接入。

rknn的资料主要有下面几个：

官方git：GitHub - airockchip/rknn-toolkit2

rknn模型库：https://github.com/airockchip/rknn_model_zoo

官方论坛：Toybrick-开源社区-人工智能

其他就是一些第三方厂商的资料库了：

冰达的：https://bingda.yuque.com/staff-hckvzc/ai5gkn/wtmxz9zv0y21asec

鲁班猫的：7. YOLOv5 - [野火]Python应用开发实战指南-基于LubanCat-RK系列板卡文档

然后就是在b站搜索3588 或者rknn。

首先是环境安装，rknn-toolkit2, 需要注意的是，官方提供了两个环境，是放在一个git项目里的。。。。

一般纯软件开发的，容易搞混了：

也不知道官方咋想的，除了git，连个官方网站都木有。

我们就直接在板卡上安装了，直接插显示器键盘当主机用就行，或者ssh进去。板卡的操作系统一般都是ubuntu20.04，不同厂商稍有不同。

首先是开发环境安装，不建议用conda，直接用系统环境， conda容易出错。

sudo apt-get update
sudo apt-get install -y python3 python3-dev python3-pip gcc

sudo apt-get install -y python3-opencv
sudo apt-get install -y python3-numpy

安装 RKNN Toolkit Lite2，在rknn-toolkit2 github里面，注意是*_lite2下面的package里面的，在板卡上git就行，然后在找到对应的文章安装：

# Python 3.7
pip3 install rknn_toolkit_lite2-1.x.0-cp37-cp37m-linux_aarch64.whl
# Python 3.8
pip3 install rknn_toolkit_lite2-1.x.0-cp38-cp38-linux_aarch64.whl
# Python 3.9
pip3 install rknn_toolkit_lite2-1.x.0-cp39-cp39-linux_aarch64.whl
# Python 3.10
pip3 install rknn_toolkit_lite2-1.x.0-cp310-cp310-linux_aarch64.whl

这就完事了。

然后是测试下基本yolov5模型，按照官方流程，需要自己转换模型，这里直接从rknn模型库下载了，模型和测试图片、视频文件都放百度盘了：

链接：https://pan.baidu.com/s/1l70QscGAB8Jkxzy3PkyoGw?pwd=587t

提取码：587t

--来自百度网盘超级会员V5的分享

基本的推理代码如下：

import cv2
import numpy as np
import platform
from rknnlite.api import RKNNLite

INPUT_SIZE = 640


if __name__ == '__main__':
    #rknn_model = "resnet18_for_rk3588.rknn"
    rknn_model = "yolov5s_for_rk3588.rknn"

    rknn_lite = RKNNLite()

    # load RKNN model
    print('--> Load RKNN model')
    ret = rknn_lite.load_rknn(rknn_model)


    ori_img = cv2.imread('./space_shuttle_224.jpg')
    img = cv2.cvtColor(ori_img, cv2.COLOR_BGR2RGB)
    img_convert = cv2.resize(img, (640, 640)) #缩放成需要的大小

    # init runtime environment
    print('--> Init runtime environment')
    # run on RK356x/RK3588 with Debian OS, do not need specify target.
    ret = rknn_lite.init_runtime()

    # Inference
    print('--> Running model')
    outputs = rknn_lite.inference(inputs=[img_convert])
    #print(outputs) #根据这个结果，可以自己做后续处理
    print('done')

    rknn_lite.release()

可以看到，rknn其实就提供了一个深度学习的最基础框架，能够基于转换后的模型进程推理预测，至于其他的图片后处理啥的，就要自己做了，当然，也可以结合torch的一些项目自己改。

一个完整的视频处理流程代码如下所示：

import os
import urllib
import traceback
import time
import datetime as dt
import sys
import numpy as np
import cv2
from rknnlite.api import RKNNLite

RKNN_MODEL = 'yolov5s_for_rk3588.rknn'
DATASET = './dataset.txt'

QUANTIZE_ON = True

OBJ_THRESH = 0.25
NMS_THRESH = 0.45
IMG_SIZE = 640

CLASSES = ("person", "bicycle", "car", "motorbike ", "aeroplane ", "bus ", "train", "truck ", "boat", "traffic light",
           "fire hydrant", "stop sign ", "parking meter", "bench", "bird", "cat", "dog ", "horse ", "sheep", "cow",
           "elephant",
           "bear", "zebra ", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis",
           "snowboard", "sports ball", "kite",
           "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup",
           "fork", "knife ",
           "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza ", "donut",
           "cake", "chair", "sofa",
           "pottedplant", "bed", "diningtable", "toilet ", "tvmonitor", "laptop	", "mouse	", "remote ",
           "keyboard ", "cell phone", "microwave ",
           "oven ", "toaster", "sink", "refrigerator ", "book", "clock", "vase", "scissors ", "teddy bear ",
           "hair drier", "toothbrush ")


def sigmoid(x):
    return 1 / (1 + np.exp(-x))


def xywh2xyxy(x):
    # Convert [x, y, w, h] to [x1, y1, x2, y2]
    y = np.copy(x)
    y[:, 0] = x[:, 0] - x[:, 2] / 2  # top left x
    y[:, 1] = x[:, 1] - x[:, 3] / 2  # top left y
    y[:, 2] = x[:, 0] + x[:, 2] / 2  # bottom right x
    y[:, 3] = x[:, 1] + x[:, 3] / 2  # bottom right y
    return y


def process(input, mask, anchors):
    anchors = [anchors[i] for i in mask]
    grid_h, grid_w = map(int, input.shape[0:2])

    box_confidence = sigmoid(input[..., 4])
    box_confidence = np.expand_dims(box_confidence, axis=-1)

    box_class_probs = sigmoid(input[..., 5:])

    box_xy = sigmoid(input[..., :2]) * 2 - 0.5

    col = np.tile(np.arange(0, grid_w), grid_w).reshape(-1, grid_w)
    row = np.tile(np.arange(0, grid_h).reshape(-1, 1), grid_h)
    col = col.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2)
    row = row.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2)
    grid = np.concatenate((col, row), axis=-1)
    box_xy += grid
    box_xy *= int(IMG_SIZE / grid_h)

    box_wh = pow(sigmoid(input[..., 2:4]) * 2, 2)
    box_wh = box_wh * anchors

    box = np.concatenate((box_xy, box_wh), axis=-1)

    return box, box_confidence, box_class_probs


def filter_boxes(boxes, box_confidences, box_class_probs):
    """Filter boxes with box threshold. It's a bit different with origin yolov5 post process!
    # Arguments
        boxes: ndarray, boxes of objects.
        box_confidences: ndarray, confidences of objects.
        box_class_probs: ndarray, class_probs of objects.
    # Returns
        boxes: ndarray, filtered boxes.
        classes: ndarray, classes for boxes.
        scores: ndarray, scores for boxes.
    """
    boxes = boxes.reshape(-1, 4)
    box_confidences = box_confidences.reshape(-1)
    box_class_probs = box_class_probs.reshape(-1, box_class_probs.shape[-1])

    _box_pos = np.where(box_confidences >= OBJ_THRESH)
    boxes = boxes[_box_pos]
    box_confidences = box_confidences[_box_pos]
    box_class_probs = box_class_probs[_box_pos]

    class_max_score = np.max(box_class_probs, axis=-1)
    classes = np.argmax(box_class_probs, axis=-1)
    _class_pos = np.where(class_max_score >= OBJ_THRESH)

    boxes = boxes[_class_pos]
    classes = classes[_class_pos]
    scores = (class_max_score * box_confidences)[_class_pos]

    return boxes, classes, scores


def nms_boxes(boxes, scores):
    """Suppress non-maximal boxes.
    # Arguments
        boxes: ndarray, boxes of objects.
        scores: ndarray, scores of objects.
    # Returns
        keep: ndarray, index of effective boxes.
    """
    x = boxes[:, 0]
    y = boxes[:, 1]
    w = boxes[:, 2] - boxes[:, 0]
    h = boxes[:, 3] - boxes[:, 1]

    areas = w * h
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)

        xx1 = np.maximum(x[i], x[order[1:]])
        yy1 = np.maximum(y[i], y[order[1:]])
        xx2 = np.minimum(x[i] + w[i], x[order[1:]] + w[order[1:]])
        yy2 = np.minimum(y[i] + h[i], y[order[1:]] + h[order[1:]])

        w1 = np.maximum(0.0, xx2 - xx1 + 0.00001)
        h1 = np.maximum(0.0, yy2 - yy1 + 0.00001)
        inter = w1 * h1

        ovr = inter / (areas[i] + areas[order[1:]] - inter)
        inds = np.where(ovr <= NMS_THRESH)[0]
        order = order[inds + 1]
    keep = np.array(keep)
    return keep


def yolov5_post_process(input_data):
    masks = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
    anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
               [59, 119], [116, 90], [156, 198], [373, 326]]

    boxes, classes, scores = [], [], []
    for input, mask in zip(input_data, masks):
        b, c, s = process(input, mask, anchors)
        b, c, s = filter_boxes(b, c, s)
        boxes.append(b)
        classes.append(c)
        scores.append(s)

    boxes = np.concatenate(boxes)
    boxes = xywh2xyxy(boxes)
    classes = np.concatenate(classes)
    scores = np.concatenate(scores)

    nboxes, nclasses, nscores = [], [], []
    for c in set(classes):
        inds = np.where(classes == c)
        b = boxes[inds]
        c = classes[inds]
        s = scores[inds]

        keep = nms_boxes(b, s)

        nboxes.append(b[keep])
        nclasses.append(c[keep])
        nscores.append(s[keep])

    if not nclasses and not nscores:
        return None, None, None

    boxes = np.concatenate(nboxes)
    classes = np.concatenate(nclasses)
    scores = np.concatenate(nscores)

    return boxes, classes, scores


def draw(image, boxes, scores, classes, fps):
    """Draw the boxes on the image.
    # Argument:
        image: original image.
        boxes: ndarray, boxes of objects.
        classes: ndarray, classes of objects.
        scores: ndarray, scores of objects.
        fps: int.
        all_classes: all classes name.
    """
    for box, score, cl in zip(boxes, scores, classes):
        top, left, right, bottom = box
        print('class: {}, score: {}'.format(CLASSES[cl], score))
        print('box coordinate left,top,right,down: [{}, {}, {}, {}]'.format(top, left, right, bottom))
        top = int(top)
        left = int(left)
        right = int(right)
        bottom = int(bottom)

        cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2)
        cv2.putText(image, '{0} {1:.2f}'.format(CLASSES[cl], score),
                    (top, left - 6),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.6, (0, 0, 255), 2)


def letterbox(im, new_shape=(640, 640), color=(0, 0, 0)):
    # Resize and pad image while meeting stride-multiple constraints
    shape = im.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return im, ratio, (dw, dh)


# ==================================
# 如下为改动部分，主要就是去掉了官方 demo 中的模型转换代码，直接加载 rknn 模型，并将 RKNN 类换成了 rknn_toolkit2_lite 中的 RKNNLite 类
# ==================================

rknn = RKNNLite()

# load RKNN model
print('--> Load RKNN model')
ret = rknn.load_rknn(RKNN_MODEL)

# Init runtime environment
print('--> Init runtime environment')
# use NPU
ret = rknn.init_runtime()
if ret != 0:
    print('Init runtime environment failed!')
    exit(ret)
print('done')

# Create a VideoCapture object and read from input file
# If the input is the camera, pass 0 instead of the video file name
#cap = cv2.VideoCapture(0)
video_path = "sample_720p.mp4"
cap = cv2.VideoCapture(video_path)


#Check if camera opened successfully
if (cap.isOpened() == False):
    print("Error opening video stream or file")

# Read until video is completed
while (cap.isOpened()):
    start = dt.datetime.utcnow()
    # Capture frame-by-frame
    ret, img = cap.read()
    if not ret:
        break

    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))

    # Inference
    print('--> Running model')
    outputs = rknn.inference(inputs=[img])
    print('done')

    # post process
    input0_data = outputs[0]
    input1_data = outputs[1]
    input2_data = outputs[2]

    input0_data = input0_data.reshape([3, -1] + list(input0_data.shape[-2:]))
    input1_data = input1_data.reshape([3, -1] + list(input1_data.shape[-2:]))
    input2_data = input2_data.reshape([3, -1] + list(input2_data.shape[-2:]))

    input_data = list()
    input_data.append(np.transpose(input0_data, (2, 3, 0, 1)))
    input_data.append(np.transpose(input1_data, (2, 3, 0, 1)))
    input_data.append(np.transpose(input2_data, (2, 3, 0, 1)))

    boxes, classes, scores = yolov5_post_process(input_data)
    duration = dt.datetime.utcnow() - start
    fps = round(10000000 / duration.microseconds)

    # draw process result and fps
    img_1 = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
    cv2.putText(img_1, f'fps: {fps}',
                (20, 20),
                cv2.FONT_HERSHEY_SIMPLEX,
                0.6, (0, 125, 125), 2)
    if boxes is not None:
        draw(img_1, boxes, scores, classes, fps)

    # show output
    cv2.imshow("post process result", img_1)

    # Press Q on keyboard to  exit
    if cv2.waitKey(25) & 0xFF == ord('q'):
        break

# When everything done, release the video capture object
cap.release()

# Closes all the frames
cv2.destroyAllWindows()

跑的结果视频如下所示：

00:17

因为3588的npu没有类似nvidia-smi的命令，可以使用系统信息获得npu使用情况：

sudo cat /sys/kernel/debug/rknpu/load

#显示结果如下
NPU load:  Core0: 39%, Core1:  0%, Core2:  0%,

可以看到3588有3个npu，yolo只用到了其中的一个，当然，可以参考网上一些多线程的文章，让3个npu并发跑，这样能提高帧率：

GitHub - leafqycc/rknn-multi-threaded: A simple demo of yolov5s running on rk3588/3588s using Python (about 72 frames). / 一个使用Python在rk3588/3588s上运行的yolov5s简单demo(大约72帧/s)。

https://github.com/leafqycc/rknn-cpp-Multithreading

GitHub - airockchip/rknn-toolkit2

如果是学习用，建议买香橙派5B的开发板，才六七百，pdd官方店的，比其他家便宜不少，最好再买个壳子。

FL1768317420

关注

7
点赞
踩
16

收藏

觉得还不错? 一键收藏
打赏
1
评论
3588嵌入板块跑rknn yolov5基本方法

3588的深度学习框架是RKNN，官方和网上的资料都不太多，主要需要自己来看了，这个和nvida jetson差的还是比较远，nvidia的嵌入式环境，只要装个对应的torch版本就可以用了，然后可以基于deepstream等框架进行进一步的性能优化。可以看到，rknn其实就提供了一个深度学习的最基础框架，能够基于转换后的模型进程推理预测，至于其他的图片后处理啥的，就要自己做了，当然，也可以结合torch的一些项目自己改。首先是开发环境安装，不建议用conda，直接用系统环境， conda容易出错。
复制链接

扫一扫