基于热成像的巡检及AidLux方案实现

最新推荐文章于 2024-09-13 14:24:53 发布

雪色与月色之间

最新推荐文章于 2024-09-13 14:24:53 发布

阅读量137

点赞数

文章标签：计算机视觉人工智能

本文链接：https://blog.csdn.net/weixin_52430401/article/details/130379023

版权

该文描述了一个将基于PyTorch的RetinaNet目标检测模型转换为ONNX和TFLite格式，以便在移动端（如AidLux平台）进行实时视频处理的过程。代码示例展示了图像预处理步骤，包括尺寸调整、归一化、颜色空间转换和格式变换。模型在手机摄像头捕获的视频流上运行，进行文本行检测，并在检测到的文本周围绘制边界框。

摘要由CSDN通过智能技术生成

主要算法：目标检测网络RetinaNet

本方案需要完成前置模型转换工作采取的方案为：pt—onnx—tflite（tflite为了完成部署到移动端）

完成转换后将模型部署至aidlux平台，完成实时视频检测。部分代码如下：

def process_img(img, target_size=640, max_size=2000, multiple=32, keep_ratio=True, NCHW=True, ToTensor=True):
img = img[128:512, 0:480]
img = cv2.resize(img, (640, 512), interpolation=cv2.INTER_LINEAR)
im_shape = img.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
# resize with keep_ratio
if keep_ratio:
im_scale = float(target_size) / float(im_size_min)
if np.round(im_scale * im_size_max) > max_size:
im_scale = float(max_size) / float(im_size_max)
im_scale_x = np.floor(img.shape[1] * im_scale / multiple) * multiple / img.shape[1]
im_scale_y = np.floor(img.shape[0] * im_scale / multiple) * multiple / img.shape[0]
image_resized = cv2.resize(img, None, None, fx=im_scale_x, fy=im_scale_y, interpolation=cv2.INTER_LINEAR)
im_scales = np.array([im_scale_x, im_scale_y, im_scale_x, im_scale_y])
im = image_resized / 255.0 # np.float64
im = im.astype(np.float32)
PIXEL_MEANS =(0.485, 0.456, 0.406) # RGB format mean and variances
PIXEL_STDS = (0.229, 0.224, 0.225)
im -= np.array(PIXEL_MEANS)
im /= np.array(PIXEL_STDS)
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) # BGR2RGB
if NCHW:
im = np.transpose(im, (2, 0, 1)).astype(np.float32) # [SAI-KEY] TensorFlow use input with NHWC.
im = im[np.newaxis, ...]
if ToTensor:
im = torch.from_numpy(im)
return im, im_scales
else:
return None

此代码用于对输入的图片进行预处理，使其能够被神经网络处理。具体来说，输入的图片首先被裁剪和缩放到指定大小，然后通过减去 RGB 归一化值的均值和标准差进行归一化，接着将 RGB 通道转为 BGR 通道，最后改变图像的输入格式，将其转为 NCHW 格式的张量（N 代表 Batch size，C 代表通道数，H 代表高度，W 代表宽度）并转为 PyTorch 的 Tensor 类型。如果 keep_ratio=True，则图像的宽高比被保持不变。返回处理后的图像以及缩放比例。如果 keep_ratio=False，则返回 None。

if __name__=="__main__":
    tflite_model = '/home/R-RetinaNet/models/r-retinanet.tflite'
    # 定义输入输出shape
    in_shape = [1 * 640 * 800 * 3 * 4]  # HWC, float32
    out_shape = [1 * 53325 * 8 * 4]  # 8400: total cells, 52 = 48(num_classes) + 4(xywh), float32

    # AidLite初始化
    aidlite = aidlite_gpu.aidlite()
    # 加载R-RetinaNet模型
    res = aidlite.ANNModel(tflite_model, in_shape, out_shape, 4, -1) # Infer on -1: cpu, 0: gpu, 1: mixed, 2: dsp
    # print(res)
    '''
    读取手机实时摄像头数据
    '''
    cap = cvs.VideoCapture(0)
    frame_id = 0
    while True:
        frame = cap.read()
        if frame is None:
            continue
        im, im_scales = process_img(frame, NCHW=False, ToTensor=False)  # im: NHWC
        frame_id += 1
        if frame_id % 3 != 0:
            continue



        aidlite.setInput_Float32(im, 800, 640)
        # 推理
        aidlite.invoke()
        preds = aidlite.getOutput_Float32(0)
        preds = preds.reshape(1, 8, (int)(preds.shape[0]/8))
        # 后解算
        output = np.transpose(preds, (0, 2, 1))

        # 创建Anchor
        im_anchor = np.transpose(im, (0, 3, 1, 2)).astype(np.float32)
        anchors_list = []
        anchor_generator = Anchors(ratios = np.array([0.2, 0.5, 1, 2, 5]))
        original_anchors = anchor_generator(im_anchor)   # (bs, num_all_achors, 5)
        anchors_list.append(original_anchors)

        # 解算输出
        decode_output = decoder(im_anchor, anchors_list[-1], output[..., 5:8], output[..., 0:5], thresh=0.2, nms_thresh=0.2, test_conf=None)

        # 重构输出
        scores = decode_output[0].reshape(-1, 1)
        classes = decode_output[1].reshape(-1, 1)
        boxes = decode_output[2]
        boxes[:, :4] = boxes[:, :4] / im_scales
        if boxes.shape[1] > 5:   
            boxes[:, 5:9] = boxes[:, 5:9] / im_scales
        dets = np.concatenate([classes, scores, boxes], axis=1)

        # 过滤类别
        keep = np.where(classes > 0)[0]
        dets =  dets[keep, :]

        # 转换坐标('xywha'->'xyxyxyxy')
        res = sort_corners(rbox_2_quad(dets[:, 2:]))

        # cv绘图.
        for k in range(dets.shape[0]):
            cv2.line(frame, (int(res[k, 0]), int(res[k, 1])), (int(res[k, 2]), int(res[k, 3])), (0, 255, 0), 3)
            cv2.line(frame, (int(res[k, 2]), int(res[k, 3])), (int(res[k, 4]), int(res[k, 5])), (0, 255, 0), 3)
            cv2.line(frame, (int(res[k, 4]), int(res[k, 5])), (int(res[k, 6]), int(res[k, 7])), (0, 255, 0), 3)
            cv2.line(frame, (int(res[k, 6]), int(res[k, 7])), (int(res[k, 0]), int(res[k, 1])), (0, 255, 0), 3)

        cvs.imshow(frame)

此代码实现了通过手机摄像头实时检测图像中的文本行，首先初始化 AidLite，并加载 R-RetinaNet 模型。然后进入摄像头读取和处理的循环中，先调用 `process_img` 对图像进行预处理，然后将预处理后的图像输入给模型进行推理，再根据模型输出进行解析、过滤和转换坐标，最后在原图上绘制文本行框并显示出来。其中使用了 opencv 绘制框和显示图片。

retnianet

实现视频如下：