rk3588多模型检测部署quickrun

杨善锦

已于 2024-04-02 12:19:41 修改

阅读量1.1k

点赞数 10

分类专栏：视觉AI 文章标签： rknn rk3588 多模型推理并发 c++ AI视觉开发板

于 2023-12-22 14:35:15 首次发布

本文链接：https://blog.csdn.net/oYangShanJin/article/details/135151394

版权

视觉AI 专栏收录该内容

9 篇文章 1 订阅

订阅专栏

quickrun 是一款rk3588 rknn多模型高效高并发部署软件

软件框架

采用session思想，可以定义多个session满足不同模型的义务需求。比如充电桩检测，垃圾分类，悬崖检测，模型共用一个摄像头，采用yolov5的模型。
多线程模型推理部署rknn

采用消息队列存放照片数据是为了防止丢帧以及高效并发，一般情况下，采集照片数据25fps，前后处理+推理时间=40ms，25fps，所以基本取消息和存消息时间相等。
由于是模型是640*640 输入，摄像头是640 * 480输入，解码用cv::imdecode, rgb 格式输入模型，使用rga加速等比例缩放。
由于三个模型设置三个session的独立线程，之间相互独立，互不干扰。

模型输出

rk3588 yolov5模型输出注意转换为onnx的时候在forward 层去掉cat的操作，直接输出2020，4040，80*80三个特征图。yolo.py修改模型输出：

index fa05fcf..cfd4883 100644
@@ -53,28 +53,10 @@ class Detect(nn.Module):
         self.inplace = inplace  # use inplace ops (e.g. slice assignment)
 
     def forward(self, x):
-        z = []  # inference output
+        z = []
         for i in range(self.nl):
-            x[i] = self.m[i](x[i])  # conv
-            bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
-            x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
-
-            if not self.training:  # inference
-                if self.dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:
-                    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)
-
-                y = x[i].sigmoid()
-                if self.inplace:
-                    y[..., 0:2] = (y[..., 0:2] * 2 + self.grid[i]) * self.stride[i]  # xy
-                    y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
-                else:  # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953
-                    xy, wh, conf = y.split((2, 2, self.nc + 1), 4)  # y.tensor_split((2, 4, 5), 4)  # torch 1.8.0
-                    xy = (xy * 2 + self.grid[i]) * self.stride[i]  # xy
-                    wh = (wh * 2) ** 2 * self.anchor_grid[i]  # wh
-                    y = torch.cat((xy, wh, conf), 4)
-                z.append(y.view(bs, -1, self.no))
-
-        return x if self.training else (torch.cat(z, 1),) if self.export else (torch.cat(z, 1), x)
+            x[i] = self.m[i](x[i])
+        return x
 
     def _make_grid(self, nx=20, ny=20, i=0, torch_1_10=check_version(torch.__version__, '1.10.0')):
         d = self.anchors[i].device

在export.py 修改

index 4d0144a..f9310e6 100644
@@ -56,7 +56,7 @@ import pandas as pd
 import torch
 import yaml
 from torch.utils.mobile_optimizer import optimize_for_mobile
-
+import numpy as np
 FILE = Path(__file__).resolve()
 ROOT = FILE.parents[0]  # YOLOv5 root directory
 if str(ROOT) not in sys.path:
@@ -496,7 +496,7 @@ def run(
         y = model(im)  # dry runs
     if half and not coreml:
         im, model = im.half(), model.half()  # to FP16
-    shape = tuple((y[0] if isinstance(y, tuple) else y).shape)  # model output shape
+    shape = tuple(np.array((y[0] if isinstance(y, tuple) else y)).shape)  # model output shape
     LOGGER.info(f"\n{colorstr('PyTorch:')} starting from {file} with output shape {shape} ({file_size(file):.1f} MB)")

性能

一个模型占用npu：1.2T，cpu：40%（前处理，推理，后处理画框), 推理时间20ms。使用perf top -p 查看cpu使用率，并且可以精确到具体某一个函数的cpu使用率

AI视觉开发板 openamv（Almachinevision）是一款基于瑞芯微RV1103芯片具有高性价比的微型Linux开发板，旨在为开发者提供一个简单且高效的开发平台；支持多种接口，包括MIPICSI、GPIO、 UART、SPI、I2C、USB等，便于快速开发和调试。特别是针对AI视觉，可以低成本部署神经网络，做到高性价比的视觉监控功能。
自主研发摄像头支持夜视可选广角mipi接口 :支持300万像素；具备高灵敏度，高信噪比低照度的性能，能够呈现更加细腻、色彩更加逼真的夜视成像效果，更好地适应环境光线变化, 批量价格便宜。