【ATU Book - i.MX系列 - ML】手把手教你玩 AI NXP i.MX8M Plus 快速训练与实现 YOLOv5s 的疲劳驾驶监控系统 (DMS)

WPG大大通

于 2024-06-11 16:25:13 发布

阅读量1k

点赞数 25

分类专栏： ATU 伊布小編 (一部) 文章标签：人工智能 YOLO ai deep learning

本文链接：https://blog.csdn.net/wpgddt/article/details/139602387

版权

ATU 伊布小編 (一部) 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

手把手教你玩 AI NXP i.MX8M Plus 快速训练与实现 YOLOv5s 的疲劳驾驶监控系统 (DMS)一、

概述

近年来，电脑视觉(Computer Vision) 领域迎来了重大改革，从过去一个一个从像素处理(Pixel) 的方式，已经转变成由“大数据(Big Data)”来统计出所谓“模组(Model) ”的深度学习(Deep Learning) 应用方式。更何况是颠覆人类想像的生成式 AI ( Generative AI ) 与 ChatGPT 、可说是人工智能(Artificial Intelligence) 的时代已经全面来临，让周边的设备智能化已经不是遥不可及的梦想 !! 然而，过去无数学者、研究员、工程师致力研发的系统，现今仅须透过一些简单的方法就能轻松实现，并配合恩智浦 NXP 搭载神经网路处理器(Neural Processing Unit, NPU) 的 i.MX 8M Plus 平台，即可快速实现成终端产品!! 并贯彻“边缘运算(Edge Computing)”的理念，更及时的反应运算结果 !!

如何建立 NXP 嵌入式系统的开发环境，读者可以阅读此【ATU Book - i.MX8系列 - OS】NXP i.MX Linux BSP 开发环境架设来快速布署恩智浦 NXP i.MX8 系列的开发环境，透过此博文或 ATU 一部小编的系列博文，即可轻松实现任何有关 i.MX8 的环境架设 !! 或是想要更快速进入到 NXP 平台的实作中，可以至官方网站下载官方发行的 Linux 映像档(Image)。

Embedded Linux for i.MX Applications Processors | NXP Semiconductors

疲劳驾驶监控系统 (DMS) - 软体架构示意图

然而，本篇文章将以 Google 的 Colab 云端运算为训练平台，并搭配 ultralytics 的 YOLOv5 框架与 Roboflow 提供的资料库( DataBase ) 来快速建置一套疲劳驾驶监控系统 (DMS) 来监视驾驶者是否呈现瞌睡状态 !! 并以 NXP i.MX8M Plus 作为最终的硬体呈现平台。

Note : 切记目前 ultralytics 所训练出来的模型，并非能够商用 ! 若需要商用需要符合 AGPL-3.0 的规范 !

二、建置 DMS 模型

运作下列步骤前，请先申请 Roboflow 帐号，并开启 COLAB 进行以下操作，来重新训练模型 :

(1) 开启 COLAB 云端运算平台，并选用 GPU 作为处理核心 :

若欲详细了解 COLAB 的操作，可参考【ATU Book-i.MX8系列 - Colab】 Colab 使用介绍

(2) 下载 ultralytics 代码 ( YOLOv5 框架) :

!git clone https://github.com/ultralytics/yolov5

(3) 安装必要套件 :

%cd yolov5
%pip install -qr requirements.txt # install dependencies
%pip install -q roboflow

(4) 安装 Roboflow :

Roboflow 是套线上资料管理平台，具标注影像资料、数据增强等等功能。

!pip install roboflow

(5) 下载 DataBase :

from roboflow import Roboflow
rf = Roboflow(api_key="********")
project = rf.workspace("drowsiness-detection-uhhz5").project("drownsy-dectetion-v2")
version = project.version(5)
dataset = version.download("yolov5pytorch")

此步骤，请先至 roboflow 网站的 drownsy-dectetion-v2 专案，依照下图操作取得相应的 API Key 。

(6) 训练 AI 模型

 !python train.py --img 256 --batch 16 --epochs 20 --data {dataset.location}/data.yaml --weights yolov5s.pt --cache

Note : 注意 mAP50 与 mAP50-95 的准确度是否符合预期。

(7) 验证 AI 模型

!python detect.py --weights /content/yolov5/runs/train/exp/weights/best.pt  \
--img 256 --conf 0.1 --source {dataset.location}/test/images

Note : 请至 /content/yolov5/runs/detect 查看测试结果。

(8) 输出 AI 模型

将 Pytorch 模组转换成 Tensorflow Lite的形式。

!python export.py --weights /content/yolov5/runs/train/exp/weights/best.pt --img-size 256 --include tflite

(9) 输出 AI 模型 (量化整数)

将 Tensorflow 转换成 Tensorflow Lite 模组，并将权重(weights) 量化成整数形式。

# Tensorflow to TF Lite  (INT)
import tensorflow as tf
import numpy as np
def representative_dataset_gen(): # 模拟数据集 (已乱随构成的测试集会影像精准度)
    for _ in range(250):
        yield [np.random.uniform(0.0, 1.0 , size=(1,256,256,3)).astype(np.float32)] 

converter = tf.lite.TFLiteConverter.from_saved_model("/content/yolov5/runs/train/exp/weights/best_saved_model")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.allow_custom_ops = True
converter.inference_input_type  = tf.float32
converter.inference_output_type = tf.float32
converter.representative_dataset = representative_dataset_gen
tflite_model = converter.convert()
with tf.io.gfile.GFile("DrownsyDectetion_v2-quant.tflite", 'wb') as f:
   f.write(tflite_model)
print("Quantization complete! - model.tflite")

三、运行至 NXP i.MX 8M Plus 平台

最后仅需要把“模组”布署至 NXP 平台，即可实现终端边缘运算应用 “疲劳驾驶监控系统(DMS)” !! 下列代码，请于 NXP i.MX8M Plus 平台的系统中操作 :

Note : 若读者欲想快速得到代码，亦可从 Github 查阅

(1) 建立资料夹

$ mkdir YOLOv5s_DrownsyDectetion_v2
$ cd YOLOv5s_DrownsyDectetion_v2
$ mkdir model
$ mkdir img
$ mkdir output

(2) 复制测试图片与模组

$ cp DrownsyDectetion_v2-quant.tflite  /home/root/YOLOv5s_DrownsyDectetion_v2/model
$ cp test.jpg /home/root/YOLOv5s_DrownsyDectetion_v2/img

Note : 请至 /content/yolov5/Drownsy-Dectetion-v2-5/test/images 复制一张测试图片。

(3) 建立 plot.py

import cv2
import torch
import numpy as np
from PIL import Image, ImageDraw, ImageFont

FONT = 'Arial.ttf' 

class Colors:
    # Ultralytics color palette https://ultralytics.com/
    def __init__(self):
        # hex = matplotlib.colors.TABLEAU_COLORS.values()
        hexs = ('FF3838', 'FF9D97', 'FF701F', 'FFB21D', 'CFD231', '48F90A', '92CC17', '3DDB86', '1A9334', '00D4BB',
                '2C99A8', '00C2FF', '344593', '6473FF', '0018EC', '8438FF', '520085', 'CB38FF', 'FF95C8', 'FF37C7')
        self.palette = [self.hex2rgb(f'#{c}') for c in hexs]
        self.n = len(self.palette)

    def __call__(self, i, bgr=False):
        c = self.palette[int(i) % self.n]
        return (c[2], c[1], c[0]) if bgr else c

    @staticmethod
    def hex2rgb(h):  # rgb order (PIL)
        return tuple(int(h[1 + i:1 + i + 2], 16) for i in (0, 2, 4))

colors = Colors()  # create instance for 'from utils.plots import colors'

def check_pil_font(font=FONT, size=10):
    # Return a PIL TrueType Font, downloading to CONFIG_DIR if necessary
    font = Path(font)
    font = font if font.exists() else (CONFIG_DIR / font.name)
    try:
        return ImageFont.truetype(str(font) if font.exists() else font.name, size)
    except Exception:  # download if missing
        try:
            check_font(font)
            return ImageFont.truetype(str(font), size)
        except TypeError:
            check_requirements('Pillow>=8.4.0')  # known issue https://github.com/ultralytics/yolov5/issues/5374
        except URLError:  # not online
            return ImageFont.load_default()

def is_ascii(s=''):
    # Is string composed of all ASCII (no UTF) characters? (note str().isascii() introduced in python 3.7)
    s = str(s)  # convert list, tuple, None, etc. to str
    return len(s.encode().decode('ascii', 'ignore')) == len(s)

class Annotator:
    # YOLOv5 Annotator for train/val mosaics and jpgs and detect/hub inference annotations
    def __init__(self, im, line_width=None, font_size=None, font='Arial.ttf', pil=False, example='abc'):
        assert im.data.contiguous, 'Image not contiguous. Apply np.ascontiguousarray(im) to Annotator() input images.'
        non_ascii = not is_ascii(example)  # non-latin labels, i.e. asian, arabic, cyrillic
        self.pil = pil or non_ascii
        if self.pil:  # use PIL
            self.im = im if isinstance(im, Image.Image) else Image.fromarray(im)
            self.draw = ImageDraw.Draw(self.im)
            self.font = check_pil_font(font='Arial.Unicode.ttf' if non_ascii else font,
                                       size=font_size or max(round(sum(self.im.size) / 2 * 0.035), 12))
        else:  # use cv2
            self.im = im
        self.lw = line_width or max(round(sum(im.shape) / 2 * 0.003), 2)  # line width

    def box_label(self, box, label='', color=(128, 128, 128), txt_color=(255, 255, 255)):
        # Add one xyxy box to image with label
        if self.pil or not is_ascii(label):
            self.draw.rectangle(box, width=self.lw, outline=color)  # box
            if label:
                w, h = self.font.getsize(label)  # text width, height (WARNING: deprecated) in 9.2.0
                # _, _, w, h = self.font.getbbox(label)  # text width, height (New)
                outside = box[1] - h >= 0  # label fits outside box
                self.draw.rectangle(
                    (box[0], box[1] - h if outside else box[1], box[0] + w + 1,
                     box[1] + 1 if outside else box[1] + h + 1),
                    fill=color,
                )
                # self.draw.text((box[0], box[1]), label, fill=txt_color, font=self.font, anchor='ls')  # for PIL>8.0
                self.draw.text((box[0], box[1] - h if outside else box[1]), label, fill=txt_color, font=self.font)
        else:  # cv2
            p1, p2 = (int(box[0]), int(box[1])), (int(box[2]), int(box[3]))
            cv2.rectangle(self.im, p1, p2, color, thickness=self.lw, lineType=cv2.LINE_AA)
            if label:
                tf = max(self.lw - 1, 1)  # font thickness
                w, h = cv2.getTextSize(label, 0, fontScale=self.lw / 3, thickness=tf)[0]  # text width, height
                outside = p1[1] - h >= 3
                p2 = p1[0] + w, p1[1] - h - 3 if outside else p1[1] + h + 3
                cv2.rectangle(self.im, p1, p2, color, -1, cv2.LINE_AA)  # filled
                cv2.putText(self.im,
                            label, (p1[0], p1[1] - 2 if outside else p1[1] + h + 2),
                            0,
                            self.lw / 3,
                            txt_color,
                            thickness=tf,
                            lineType=cv2.LINE_AA)

    def masks(self, masks, colors, im_gpu, alpha=0.5, retina_masks=False):
        """Plot masks at once.
        Args:
            masks (tensor): predicted masks on cuda, shape: [n, h, w]
            colors (List[List[Int]]): colors for predicted masks, [[r, g, b] * n]
            im_gpu (tensor): img is in cuda, shape: [3, h, w], range: [0, 1]
            alpha (float): mask transparency: 0.0 fully transparent, 1.0 opaque
        """
        if self.pil:
            # convert to numpy first
            self.im = np.asarray(self.im).copy()
        if len(masks) == 0:
            self.im[:] = im_gpu.permute(1, 2, 0).contiguous().cpu().numpy() * 255
        colors = torch.tensor(colors, device=im_gpu.device, dtype=torch.float32) / 255.0
        colors = colors[:, None, None]  # shape(n,1,1,3)
        masks = masks.unsqueeze(3)  # shape(n,h,w,1)
        masks_color = masks * (colors * alpha)  # shape(n,h,w,3)

        inv_alph_masks = (1 - masks * alpha).cumprod(0)  # shape(n,h,w,1)
        mcs = (masks_color * inv_alph_masks).sum(0) * 2  # mask color summand shape(n,h,w,3)

        im_gpu = im_gpu.flip(dims=[0])  # flip channel
        im_gpu = im_gpu.permute(1, 2, 0).contiguous()  # shape(h,w,3)
        im_gpu = im_gpu * inv_alph_masks[-1] + mcs
        im_mask = (im_gpu * 255).byte().cpu().numpy()
        self.im[:] = im_mask if retina_masks else scale_image(im_gpu.shape, im_mask, self.im.shape)
        if self.pil:
            # convert im back to PIL and update draw
            self.fromarray(self.im)

    def rectangle(self, xy, fill=None, outline=None, width=1):
        # Add rectangle to image (PIL-only)
        self.draw.rectangle(xy, fill, outline, width)

    def text(self, xy, text, txt_color=(255, 255, 255), anchor='top'):
        # Add text to image (PIL-only)
        if anchor == 'bottom':  # start y from font bottom
            w, h = self.font.getsize(text)  # text width, height
            xy[1] += 1 - h
        self.draw.text(xy, text, fill=txt_color, font=self.font)

    def fromarray(self, im):
        # Update self.im from a numpy array
        self.im = im if isinstance(im, Image.Image) else Image.fromarray(im)
        self.draw = ImageDraw.Draw(self.im)

    def result(self):
        # Return annotated image as array
        return np.asarray(self.im)

(4) 建立 app.py

import sys
import cv2
import torch
import torchvision
import time
import argparse
import numpy as np
import tflite_runtime.interpreter as tflite
from plots import Annotator, colors

V4L2_YUV2_720p = "v4l2src device=/dev/video2 ! video/x-raw,format=YUY2,width=1280,height=720, pixel-aspect-ratio=1/1, framerate=30/1! videoscale!videoconvert ! appsink"

# --------------------------------------------------------------------------------------------------------------
# API
# --------------------------------------------------------------------------------------------------------------
def InferenceDelegate( model, delegate ):
    if (delegate=="vx") :

interpreter = tflite.Interpreter(model, experimental_delegates=[ tflite.load_delegate("/usr/lib/libvx_delegate.so") ])
elif(delegate=="ethosu"):
interpreter = tflite.Interpreter(model, experimental_delegates=[tflite.load_delegate("/usr/lib/libethosu_delegate.so")])
elif(delegate=="xnnpack"):
interpreter = tflite.Interpreter(model)
else :
print("ERROR : Deleget Input Fault")
return 0
return interpreter

def Label_name(x):
names = {0:'drowsy', 1:'normal'}
return names[x]

def xywh2xyxy(x):
# Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
y[..., 0] = x[..., 0] - x[..., 2] / 2 # top left x
y[..., 1] = x[..., 1] - x[..., 3] / 2 # top left y
y[..., 2] = x[..., 0] + x[..., 2] / 2 # bottom right x
y[..., 3] = x[..., 1] + x[..., 3] / 2 # bottom right y
return y

def non_max_suppression(
prediction,
conf_thres=0.25,
iou_thres=0.45,
classes=None,
agnostic=False,
multi_label=False,
labels=(),
max_det=300,
nm=0, # number of masks
):
"""Non-Maximum Suppression (NMS) on inference results to reject overlapping detections

Returns:
list of detections, on (n,6) tensor per image [xyxy, conf, cls]
"""

# Checks
assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'
if isinstance(prediction, (list, tuple)): # YOLOv5 model in validation model, output = (inference_out, loss_out)
prediction = prediction[0] # select only inference output

device = prediction.device
mps = 'mps' in device.type # Apple MPS
if mps: # MPS not fully supported yet, convert tensors to CPU before NMS
prediction = prediction.cpu()
bs = prediction.shape[0] # batch size
nc = prediction.shape[2] - nm - 5 # number of classes
xc = prediction[..., 4] > conf_thres # candidates

# Settings
# min_wh = 2 # (pixels) minimum box width and height
max_wh = 7680 # (pixels) maximum box width and height
max_nms = 30000 # maximum number of boxes into torchvision.ops.nms()
time_limit = 0.5 + 0.05 * bs # seconds to quit after
redundant = True # require redundant detections
multi_label &= nc > 1 # multiple labels per box (adds 0.5ms/img)
merge = False # use merge-NMS

t = time.time()
mi = 5 + nc # mask start index
output = [torch.zeros((0, 6 + nm), device=prediction.device)] * bs
for xi, x in enumerate(prediction): # image index, image inference
# Apply constraints
# x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0 # width-height
x = x[xc[xi]] # confidence

# Cat apriori labels if autolabelling
if labels and len(labels[xi]):
lb = labels[xi]
v = torch.zeros((len(lb), nc + nm + 5), device=x.device)
v[:, :4] = lb[:, 1:5] # box
v[:, 4] = 1.0 # conf
v[range(len(lb)), lb[:, 0].long() + 5] = 1.0 # cls
x = torch.cat((x, v), 0)

# If none remain process next image
if not x.shape[0]:
continue

# Compute conf
x[:, 5:] *= x[:, 4:5] # conf = obj_conf * cls_conf

# Box/Mask
box = xywh2xyxy(x[:, :4]) # center_x, center_y, width, height) to (x1, y1, x2, y2)
mask = x[:, mi:] # zero columns if no masks

# Detections matrix nx6 (xyxy, conf, cls)
if multi_label:
i, j = (x[:, 5:mi] > conf_thres).nonzero(as_tuple=False).T
x = torch.cat((box[i], x[i, 5 + j, None], j[:, None].float(), mask[i]), 1)
else: # best class only
conf, j = x[:, 5:mi].max(1, keepdim=True)
x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]

# Filter by class
if classes is not None:
x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]

# Apply finite constraint
# if not torch.isfinite(x).all():
# x = x[torch.isfinite(x).all(1)]

# Check shape
n = x.shape[0] # number of boxes
if not n: # no boxes
continue
x = x[x[:, 4].argsort(descending=True)[:max_nms]] # sort by confidence and remove excess boxes

# Batched NMS
c = x[:, 5:6] * (0 if agnostic else max_wh) # classes
boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores
i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS
i = i[:max_det] # limit detections
if merge and (1 < n < 3E3): # Merge NMS (boxes merged using weighted mean)
# update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
iou = box_iou(boxes[i], boxes) > iou_thres # iou matrix
weights = iou * scores[None] # box weights
x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True) # merged boxes
if redundant:
i = i[iou.sum(1) > 1] # require redundancy

output[xi] = x[i]
if mps:
output[xi] = output[xi].to(device)
if (time.time() - t) > time_limit:
break # time limit exceeded

return output

def clip_boxes(boxes, shape):
# Clip boxes (xyxy) to image shape (height, width)
if isinstance(boxes, torch.Tensor): # faster individually
boxes[..., 0].clamp_(0, shape[1]) # x1
boxes[..., 1].clamp_(0, shape[0]) # y1
boxes[..., 2].clamp_(0, shape[1]) # x2
boxes[..., 3].clamp_(0, shape[0]) # y2
else: # np.array (faster grouped)
boxes[..., [0, 2]] = boxes[..., [0, 2]].clip(0, shape[1]) # x1, x2
boxes[..., [1, 3]] = boxes[..., [1, 3]].clip(0, shape[0]) # y1, y2

def scale_boxes(img1_shape, boxes, img0_shape, ratio_pad=None):
# Rescale boxes (xyxy) from img1_shape to img0_shape
if ratio_pad is None: # calculate from img0_shape
gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1]) # gain = old / new
pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2 # wh padding
else:
gain = ratio_pad[0][0]
pad = ratio_pad[1]

boxes[..., [0, 2]] -= pad[0] # x padding
boxes[..., [1, 3]] -= pad[1] # y padding
boxes[..., :4] /= gain
clip_boxes(boxes, img0_shape)
return boxes

def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
# Resize and pad image while meeting stride-multiple constraints
shape = im.shape[:2] # current shape [height, width]
if isinstance(new_shape, int):
new_shape = (new_shape, new_shape)

# Scale ratio (new / old)
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
if not scaleup: # only scale down, do not scale up (for better val mAP)
r = min(r, 1.0)

# Compute padding
ratio = r, r # width, height ratios
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding
if auto: # minimum rectangle
dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding
elif scaleFill: # stretch
dw, dh = 0.0, 0.0
new_unpad = (new_shape[1], new_shape[0])
ratio = new_shape[1] / shape[1], new_shape[0] / shape[0] # width, height ratios

dw /= 2 # divide padding into 2 sides
dh /= 2

if shape[::-1] != new_unpad: # resize
im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border
return im, ratio, (dw, dh)

# --------------------------------------------------------------------------------------------------------------
# 主程式
# --------------------------------------------------------------------------------------------------------------
def main():

# 解析外部资讯
APP_NAME = "YOLOv5s_DrownsyDectetion_v2"
parser = argparse.ArgumentParser()
parser.add_argument( '-c' ,"--camera", default="0")
parser.add_argument( '-d' ,"--display", default="0")
parser.add_argument("--save", default="1")

parser.add_argument( '-t', "--time", default="0")
parser.add_argument('--delegate' , default="vx", help = 'Please Input vx or xnnpack or ethosu')
parser.add_argument( '-m', '--model' , default="model/DrownsyDectetion_v2-quant.tflite", help='File path of .tflite file.')
parser.add_argument("--test_img", default="img/test2.jpg")
args = parser.parse_args()

# 解析解译器资讯
interpreter = InferenceDelegate(args.model,args.delegate)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
width = input_details[0]['shape'][2]
height = input_details[0]['shape'][1]
nChannel = input_details[0]['shape'][3]
scale, zero_point = input_details[0]['quantization']
#print(scale)

# 先行进行暖开机
if (input_details[0]['dtype']==np.uint8) :
interpreter.set_tensor(input_details[0]['index'], np.zeros((1,height,width,nChannel)).astype("uint8") )
else :
interpreter.set_tensor(input_details[0]['index'], np.zeros((1,height,width,nChannel)).astype("float32") )
interpreter.invoke()

# 是否启用摄镜头
if args.camera =="True" or args.camera == "1" :
cap = cv2.VideoCapture(V4L2_YUV2_720p)
if(cap.isOpened()==False) :
print( "Open Camera Failure !!")
sys.exit()
else :
print( "Open Camera Success !!")

# 回圈 / 重复推理
while(True):

# 视讯/影像资料来源
if args.camera =="True" or args.camera == "1" :
ret, frame = cap.read()
else :
frame = cv2.imread(args.test_img)

frame_resized = letterbox(frame, (width,height), stride=32, auto=False)[0] # padded resize
#frame_resized = cv2.cvtColor(frame_resized, cv2.COLOR_RGB2GRAY)
#frame_resized = cv2.cvtColor(frame_resized, cv2.COLOR_GRAY2RGB)
frame_resized = frame_resized.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB
frame_resized = np.ascontiguousarray(frame_resized)
if (input_details[0]['dtype']==np.uint8) :
frame_resized = (frame_resized/scale) + zero_point
else :
frame_resized = (frame_resized/255.0)

# 设置来源资料至解译器
if (input_details[0]['dtype']==np.uint8) :
input_data = np.expand_dims(frame_resized , axis=0).astype(np.uint8)
else :
input_data = np.expand_dims(frame_resized , axis=0).astype(np.float32)
input_data = input_data.swapaxes(1,2)
input_data = input_data.swapaxes(2,3)
interpreter.set_tensor(input_details[0]['index'], input_data )

# 解译器进行推理
interpreter_time_start = time.time()
interpreter.invoke()
interpreter_time_end = time.time()
if args.time =="True" or args.time == "1" :
print( APP_NAME + " Inference Time = ", (interpreter_time_end - interpreter_time_start)*1000 , " ms" )

# 取得解译器的预测结果
y = []
for output in output_details:
x = interpreter.get_tensor(output_details[0]['index'])
#print(x)
if (input_details[0]['dtype']==np.uint8) :
scale, zero_point = output_details[0]['quantization']
x = (x.astype(np.float32) - zero_point) * scale # re-scale
y.append(x)

y = [x if isinstance(x, np.ndarray) else x.numpy() for x in y]
y[0][..., :4] *= [width, height, width, height]

# 正规化
pred = non_max_suppression(torch.from_numpy(y[0]), 0.2, 0.4, None, False, max_det=1000)
pred[0] = pred[0][(pred[0][:, 0] - pred[0][:, 2]).abs() >= 10] # H-Filter
pred[0] = pred[0][(pred[0][:, 1] - pred[0][:, 3]).abs() >= 10] # W-Filter

# 建立输出结果
for idx, det in enumerate(pred):
annotator = Annotator(frame, line_width=3)
if len(det):
det[:, :4] = scale_boxes([width, height], det[:, :4], frame.shape).round()

# Add bbox to image
for *xyxy, conf, cls in reversed(det):
c = int(cls) # integer class
label = Label_name(c)
annotator.box_label(xyxy, label, color=colors(c, True))

# update show-out
frame = annotator.result()

# 显示输出结果
if args.save == "True" or args.save == "1" :
cv2.imwrite( "output/" + APP_NAME + "-" + args.test_img.split("/")[-1][:-4] +'_result.jpg', frame.astype("uint8"))
print("Save Reuslt Image Success , " + APP_NAME + args.test_img.split("/")[-1][:-4] + '_result.jpg')

if args.display =="True" or args.display == "1" :
cv2.imshow('frame', frame.astype('uint8'))
if cv2.waitKey(1) & 0xFF == ord('q'): break

if (args.display =="False" or args.display == "0") and( args.camera =="False" or args.camera == "0" ) : sys.exit()

cap.release()
cv2.destroyAllWindows()

if __name__ == "__main__":
main()

(5) 运行 APP

python3 app.py -c 0 -d 1 --save 0

Note : 若想更换输入源为摄像头，将代码由 -c 0 改成 -c 1 ( 并且注意装置是否为 /dev/video2 )。

四、结语

近年无数学者、研究员与业者致力于研究物件侦测相关的应用，如今仅需要利用简单几个步骤就完成一个简单的‘ YOLOv5 物件识别 ’，且仅需短短几个小时即可训练出模型，相比与过去实在天壤之别。因此如何部属至各个硬体平台端就是‘落地的关键指标之一’，本篇文章以‘NXP i.MX 8M Plus ’作为实现边缘运算的装置，并利用所谓的 NPU 来实现模组的加速运算，仅需 22 毫秒(ms) 即可完成一张物件辨识 ! 并搭配影像串流的技术即可轻松达到 Realtime 的效能 !! 让边缘装置可以最即时掌握与反应实际情况 !! 若对技术移植感兴趣的读者，可以持续关注 ATU 一部小编的系列博文或是直接联系 ATU 团队 ! 谢谢 !!

五、参考文件

[1] i.MX 8 Series Applications Processors Multicore Arm® Cortex® Processors

[2] NXP Document - i.MX Yocto Project User's Guide

[3] Welcome to the Yocto Project Documentation

[4] NXP Document - i.MX Linux Release Note

[5] NXP Document - i.MX Machine Learning User's Guide

[6] Roboflow

[7] Ultralytics

如有任何相关 Machine Learning 技术问题，欢迎至博文底下留言提问 !!
接下来还会分享更多 Machine Learning的技术文章 !!敬请期待【ATU Book-i.MX系列 - ML】 !!