嵌入式AI---在华为昇腾推理自己的yolov5目标检测模型

李华_

已于 2024-07-29 17:03:27 修改

阅读量1.6k

点赞数 34

文章标签：人工智能华为 YOLO

于 2024-07-29 16:36:16 首次发布

本文链接：https://blog.csdn.net/weixin_45816353/article/details/140766932

版权

文章目录

前言
一、MindStudio环境搭建
二、板端推理工程
- 1.工程创建
- 2.代码编写
总结

文章阅读顺序：
（1）训练自己的yolov5目标检测模型
 （2）yolov5模型转化为华为昇腾om模型
 （3）在华为昇腾推理自己的yolov5目标检测模型（本文）

前言

根据（1）（2），我们已经得到了yolov5的om模型文件，为了将自己训练的yolo模型应用至现实场景，接下来需要完成昇腾Atlas 200I DK A2的部署推理工作，开发语言采用Python。主要分为以下步骤：
一、Mindstudio环境搭建
二、推理代码编写

一、MindStudio环境搭建

为了完成板端推理工作，需要基于转换得到om模型文件进行推理代码编写。由于在边缘计算设备上安装IDE进行代码开发很不方便（需要外接输入设备，显示屏等或进行vnc远程操作），所以需要在MindStudio这款IDE上开发，使用起来和pycharm差不多，好处在于可以在采用远程板端环境的同时，在PC上完成代码编写，无需再在PC上安装一系列python环境。

（1）下载链接在这里插入图片描述
（2）安装教程
安装步骤较简单，此处不赘述

二、板端推理工程

1.工程创建

（一）由于PC端只负责代码编辑，环境来自于板端，故第一步需要确保PC已经与板端建立了硬件连接。
最简单的测试方法就是用MobaXterm远程连一下板子试试，能连通则说明硬件连接没问题。
在这里插入图片描述
（二）打开装好的MindStudio，点击New Project创建工程
1.选择Ascend App工程
2.选择python工程
3.选择CANN版本，由于CANN版本需要确保与板端相对应，故需要点击change

4.选择板端的CANN路径。点击Finish后，PC会和板端进行同步，同步加载需要一定时间
（若Remote Connection为空，则点击右边的+号，新建与板子的SSH连接）
在这里插入图片描述
5.同步完成后，点击Next

6.工程创建成功，接下来将远程的Python解释器添加到工程中

7.将远程的Python解释器作为本项目的Python解释器

再配置此处，然后点击OK

8.在板端创建一个文件夹om_detect_test
在这里插入图片描述
PC端点击File->settings->Tools->Ascend Deployment将映射同步路径设置为刚创建的om_detect_test

至此，环境搭建完成。CANN版本已与板端相对应，Python解释器采用板端的，项目文件夹与板端的om_detect_test映射同步，PC端修改代码会同步映射到板端的om_detect_test文件夹

2.代码编写

（一）在工程目录中：新建detect.py、det_utils.py、labels.txt；将转换好的om模型复制到工程中；复制一张测试图片到工程中
在这里插入图片描述
然后在MindStudio就可以看见相应文件

（二）填充labels.txt，因为（1）训练的是车辆检测模型，故labels.txt需按照训练的模型进行填充

car
bus
van
others

（三）编写detect.py(检测)和det_util.py(后处理)
根据官方demo改写即可，本文给出改写后的代码
det_utils.py

import time

import cv2
import numpy as np
import torch
import torchvision


def letterbox(img, new_shape=(640, 640), color=(114, 114, 114), auto=False, scaleFill=False, scaleup=True):
    # Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
    shape = img.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:  # only scale down, do not scale up (for better test mAP)
        r = min(r, 1.0)

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
    if auto:  # minimum rectangle
        dw, dh = np.mod(dw, 64), np.mod(dh, 64)  # wh padding
    elif scaleFill:  # stretch
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return img, ratio, (dw, dh)


def xyxy2xywh(x):
    # Convert nx4 boxes from [x1, y1, x2, y2] to [x, y, w, h] where xy1=top-left, xy2=bottom-right
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[:, 0] = (x[:, 0] + x[:, 2]) / 2  # x center
    y[:, 1] = (x[:, 1] + x[:, 3]) / 2  # y center
    y[:, 2] = x[:, 2] - x[:, 0]  # width
    y[:, 3] = x[:, 3] - x[:, 1]  # height
    return y


def non_max_suppression(
        prediction,
        conf_thres=0.25,
        iou_thres=0.45,
        classes=None,
        agnostic=False,
        multi_label=False,
        labels=(),
        max_det=300,
        nm=0,  # number of masks
):
    """Non-Maximum Suppression (NMS) on inference results to reject overlapping detections

    Returns:
         list of detections, on (n,6) tensor per image [xyxy, conf, cls]
    """

    if isinstance(prediction, (list, tuple)):  # YOLOv5 model in validation model, output = (inference_out, loss_out)
        prediction = prediction[0]  # select only inference output

    device = prediction.device
    mps = 'mps' in device.type  # Apple MPS
    if mps:  # MPS not fully supported yet, convert tensors to CPU before NMS
        prediction = prediction.cpu()
    bs = prediction.shape[0]  # batch size
    nc = prediction.shape[2] - nm - 5  # number of classes
    xc = prediction[..., 4] > conf_thres  # candidates

    # Checks
    assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
    assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'

    # Settings
    # min_wh = 2  # (pixels) minimum box width and height
    max_wh = 7680  # (pixels) maximum box width and height
    max_nms = 30000  # maximum number of boxes into torchvision.ops.nms()
    time_limit = 0.5 + 0.05 * bs  # seconds to quit after
    redundant = True  # require redundant detections
    multi_label &= nc > 1  # multiple labels per box (adds 0.5ms/img)
    merge = False  # use merge-NMS

    t = time.time()
    mi = 5 + nc  # mask start index
    output = [torch.zeros((0, 6 + nm), device=prediction.device)] * bs
    for xi, x in enumerate(prediction):  # image index, image inference
        # Apply constraints
        # x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0  # width-height
        x = x[xc[xi]]  # confidence

        # Cat apriori labels if autolabelling
        if labels and len(labels[xi]):
            lb = labels[xi]
            v = torch.zeros((len(lb), nc + nm + 5), device=x.device)
            v[:, :4] = lb[:, 1:5]  # box
            v[:, 4] = 1.0  # conf
            v[range(len(lb)), lb[:, 0].long() + 5] = 1.0  # cls
            x = torch.cat((x, v), 0)

        # If none remain process next image
        if not x.shape[0]:
            continue

        # Compute conf
        x[:, 5:] *= x[:, 4:5]  # conf = obj_conf * cls_conf

        # Box/Mask
        box = xywh2xyxy(x[:, :4])  # center_x, center_y, width, height) to (x1, y1, x2, y2)
        mask = x[:, mi:]  # zero columns if no masks

        # Detections matrix nx6 (xyxy, conf, cls)
        if multi_label:
            i, j = (x[:, 5:mi] > conf_thres).nonzero(as_tuple=False).T
            x = torch.cat((box[i], x[i, 5 + j, None], j[:, None].float(), mask[i]), 1)
        else:  # best class only
            conf, j = x[:, 5:mi].max(1, keepdim=True)
            x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]

        # Filter by class
        if classes is not None:
            x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]

        # Apply finite constraint
        # if not torch.isfinite(x).all():
        #     x = x[torch.isfinite(x).all(1)]

        # Check shape
        n = x.shape[0]  # number of boxes
        if not n:  # no boxes
            continue
        elif n > max_nms:  # excess boxes
            x = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence
        else:
            x = x[x[:, 4].argsort(descending=True)]  # sort by confidence

        # Batched NMS
        c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes
        boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores
        i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
        if i.shape[0] > max_det:  # limit detections
            i = i[:max_det]
        if merge and (1 < n < 3E3):  # Merge NMS (boxes merged using weighted mean)
            # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
            iou = box_iou(boxes[i], boxes) > iou_thres  # iou matrix
            weights = iou * scores[None]  # box weights
            x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)  # merged boxes
            if redundant:
                i = i[iou.sum(1) > 1]  # require redundancy

        output[xi] = x[i]
        if mps:
            output[xi] = output[xi].to(device)
        if (time.time() - t) > time_limit:
            LOGGER.warning(f'WARNING ⚠️ NMS time limit {time_limit:.3f}s exceeded')
            break  # time limit exceeded

    return output


def xywh2xyxy(x):
    # Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[:, 0] = x[:, 0] - x[:, 2] / 2  # top left x
    y[:, 1] = x[:, 1] - x[:, 3] / 2  # top left y
    y[:, 2] = x[:, 0] + x[:, 2] / 2  # bottom right x
    y[:, 3] = x[:, 1] + x[:, 3] / 2  # bottom right y
    return y


def scale_coords(img1_shape, coords, img0_shape, ratio_pad=None):
    # Rescale coords (xyxy) from img1_shape to img0_shape
    if ratio_pad is None:  # calculate from img0_shape
        gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1])  # gain  = old / new
        pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2  # wh padding
    else:
        gain = ratio_pad[0][0]
        pad = ratio_pad[1]

    coords[:, [0, 2]] -= pad[0]  # x padding
    coords[:, [1, 3]] -= pad[1]  # y padding
    coords[:, :4] /= gain
    clip_coords(coords, img0_shape)

    return coords

def clip_coords(boxes, shape):
    # Clip bounding xyxy bounding boxes to image shape (height, width)
    if isinstance(boxes, torch.Tensor):  # faster individually
        boxes[:, 0].clamp_(0, shape[1])  # x1
        boxes[:, 1].clamp_(0, shape[0])  # y1
        boxes[:, 2].clamp_(0, shape[1])  # x2
        boxes[:, 3].clamp_(0, shape[0])  # y2
    else:  # np.array (faster grouped)
        boxes[:, [0, 2]] = boxes[:, [0, 2]].clip(0, shape[1])  # x1, x2
        boxes[:, [1, 3]] = boxes[:, [1, 3]].clip(0, shape[0])  # y1, y2


def nms(box_out, conf_thres=0.4, iou_thres=0.5):
    try:
        boxout = non_max_suppression(box_out, conf_thres=conf_thres, iou_thres=iou_thres, multi_label=True)
    except:
        boxout = non_max_suppression(box_out, conf_thres=conf_thres, iou_thres=iou_thres)
    return boxout

detect.py

import cv2
import numpy as np
import torch
import os
from det_utils import letterbox, nms, scale_coords
from ais_bench.infer.interface import InferSession
from time import time

model_path = "./model.om"  # om格式模型文件
label_path = './labels.txt'  # 标签

detect = './test.jpg'  # 输入文件or目录
result = './result.jpg'  # 输出文件or目录

def preprocess_image(image, cfg, bgr2rgb=True):  # 图片预处理
    img, scale_ratio, pad_size = letterbox(image, new_shape=cfg['input_shape'])  # image尺度不定，故需调整尺寸适配模型输入
    if bgr2rgb:
        img = img[:, :, ::-1]
    img = img.transpose(2, 0, 1)  # HWC2CHW
    img = np.ascontiguousarray(img, dtype=np.float32)  # 将输入数组转换为连续存储数组，加速运算效率
    return img, scale_ratio, pad_size


def draw_bbox(bbox, img0, color, wt, names):
    """在图片上画预测框"""
    det_result_str = ''
    for idx, class_id in enumerate(bbox[:, 5]):
        if float(bbox[idx][4] < float(0.05)):
            continue
        img0 = cv2.rectangle(img0, (int(bbox[idx][0]), int(bbox[idx][1])), (int(bbox[idx][2]), int(bbox[idx][3])),
                             color, wt)
        img0 = cv2.putText(img0, str(idx) + ' ' + names[int(class_id)], (int(bbox[idx][0]), int(bbox[idx][1] + 16)),
                           cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)
        img0 = cv2.putText(img0, '{:.4f}'.format(bbox[idx][4]), (int(bbox[idx][0]), int(bbox[idx][1] + 32)),
                           cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)
        det_result_str += '{} {} {} {} {} {}\n'.format(
            names[bbox[idx][5]], str(bbox[idx][4]), bbox[idx][0], bbox[idx][1], bbox[idx][2], bbox[idx][3])
    return img0


def get_labels_from_txt(path):
    """从txt文件获取图片标签"""
    labels_dict = dict()
    with open(path) as f:
        for cat_id, label in enumerate(f.readlines()):
            labels_dict[cat_id] = label.strip()
    return labels_dict


def detect_img(model, detect_path, result_path):
    raw_img = cv2.imread(detect_path)  # 载入原始图片
    labels = get_labels_from_txt(label_path)
    # 预处理
    cfg = {
        'conf_thres': 0.4,  # 模型置信度阈值，阈值越低，得到的预测框越多
        'iou_thres': 0.5,  # IOU阈值，重叠率过低的框会被过滤
        'input_shape': [640, 640],  # 输入尺寸
    }
    img, scale_ratio, pad_size = preprocess_image(raw_img, cfg)
    img = img / 255.0  # 训练模型时将0~255值域转化为了0~1，故推理阶段也需同样处理

    # 检测
    t1 = time()
    output = model.infer([img])[0]
    output = torch.tensor(output)

    # 非极大值抑制后处理
    boxout = nms(output, conf_thres=cfg["conf_thres"], iou_thres=cfg["iou_thres"])
    pred_all = boxout[0].numpy()
    # 预测坐标转换
    scale_coords(cfg['input_shape'], pred_all[:, :4], raw_img.shape, ratio_pad=(scale_ratio, pad_size))
    t2 = time()
    print("detect time: %fs" % (t2 - t1))

    # 跟踪结果保存
    draw_bbox(pred_all, raw_img, (0, 255, 0), 2, labels)
    cv2.imwrite(result_path, raw_img)


if __name__ == "__main__":
    model = InferSession(0, model_path)
    detect_img(model, detect, result)
    print('Detect OK!')

板端运行ls也可以看见这些文件
（若没看见，则右键工程目录->Ascend Development->upload to->选择传输节点，将文件传输到板端）
在这里插入图片描述
板端调用python detect.py进行推理，运行结果如图所示

在板端执行ls命令可以看到检测结果result.jpg。