YOLOv8-OBB推理详解及部署实现

最新推荐文章于 2025-03-18 21:06:56 发布

爱听歌的周童鞋

最新推荐文章于 2025-03-18 21:06:56 发布

阅读量2.3w

点赞数 94

分类专栏：模型部署文章标签： YOLOv8-OBB 高性能 CUDA TensorRT 旋转目标检测

本博客上原创文章未经本人许可，不得用于商业用途。转载请注明出处，否则保留追究法律责任的权利

本文链接：https://blog.csdn.net/qq_40672115/article/details/135713830

版权

模型部署专栏收录该内容

49 篇文章

订阅专栏

前言

梳理下 YOLOv8-OBB 的预处理和后处理流程，顺便让 tensorRT_Pro 支持 YOLOv8

注：为了不必要的错误，下面我们以 YOLOv8 的固定版本 v8.1.0 来演示说明

参考：https://github.com/shouxieai/tensorRT_Pro

实现：https://github.com/Melody-Zhou/tensorRT_Pro-YOLOv8

一、YOLOv8-OBB推理(Python)

1. YOLOv8-OBB预测

我们先尝试利用官方预训练权重来推理一张图片并保存，看能否成功

在 YOLOv8 主目录下新建 predict-obb.py 预测文件，其内容如下：

import cv2
import torch
import numpy as np
from ultralytics import YOLO

def xywhr2xyxyxyxy(center):
    # reference: https://github.com/ultralytics/ultralytics/blob/v8.1.0/ultralytics/utils/ops.py#L545
    is_numpy = isinstance(center, np.ndarray)
    cos, sin = (np.cos, np.sin) if is_numpy else (torch.cos, torch.sin)

    ctr = center[..., :2]
    w, h, angle = (center[..., i : i + 1] for i in range(2, 5))
    cos_value, sin_value = cos(angle), sin(angle)
    vec1 = [w / 2 * cos_value, w / 2 * sin_value]
    vec2 = [-h / 2 * sin_value, h / 2 * cos_value]
    vec1 = np.concatenate(vec1, axis=-1) if is_numpy else torch.cat(vec1, dim=-1)
    vec2 = np.concatenate(vec2, axis=-1) if is_numpy else torch.cat(vec2, dim=-1)
    pt1 = ctr + vec1 + vec2
    pt2 = ctr + vec1 - vec2
    pt3 = ctr - vec1 - vec2
    pt4 = ctr - vec1 + vec2
    return np.stack([pt1, pt2, pt3, pt4], axis=-2) if is_numpy else torch.stack([pt1, pt2, pt3, pt4], dim=-2)

def hsv2bgr(h, s, v):
    h_i = int(h * 6)
    f = h * 6 - h_i
    p = v * (1 - s)
    q = v * (1 - f * s)
    t = v * (1 - (1 - f) * s)
    
    r, g, b = 0, 0, 0

    if h_i == 0:
        r, g, b = v, t, p
    elif h_i == 1:
        r, g, b = q, v, p
    elif h_i == 2:
        r, g, b = p, v, t
    elif h_i == 3:
        r, g, b = p, q, v
    elif h_i == 4:
        r, g, b = t, p, v
    elif h_i == 5:
        r, g, b = v, p, q

    return int(b * 255), int(g * 255), int(r * 255)

def random_color(id):
    h_plane = (((id << 2) ^ 0x937151) % 100) / 100.0
    s_plane = (((id << 3) ^ 0x315793) % 100) / 100.0
    return hsv2bgr(h_plane, s_plane, 1)

if __name__ == "__main__":

    model = YOLO("yolov8s-obb.pt")

    img = cv2.imread("P0032.jpg")
    results = model(img)[0]
    names   = results.names
    boxes   = results.obb.data.cpu()
    confs   = boxes[..., 5].tolist()
    classes = list(map(int, boxes[..., 6].tolist()))
    boxes   = xywhr2xyxyxyxy(boxes[..., :5])
    
    for i, box in enumerate(boxes):
        confidence = confs[i]
        label = classes[i]
        color = random_color(label)
        cv2.polylines(img, [np.asarray(box, dtype=int)], True, color, 2)
        caption = f"{names[label]} {confidence:.2f}"
        w, h = cv2.getTextSize(caption, 0 ,1, 2)[0]
        left, top = [int(b) for b in box[0]]
        cv2.rectangle(img, (left - 3, top - 33), (left + w + 10, top), color, -1)
        cv2.putText(img, caption, (left, top - 5), 0, 1, (0, 0, 0), 2, 16)

    cv2.imwrite("predict-obb.jpg", img)
    print("save done")

在上述代码中我们通过 opencv 读取了一张图像，并送入模型中推理得到输出 results，results 中保存着不同任务的结果，我们这里是旋转目标检测任务，因此只需要拿到对应的旋转框 boxes 即可。

拿到 boxes 后我们就可以将对应的旋转框和模型预测的类别以及置信度绘制在图像上并保存。

关于可视化的代码实现参考自 tensorRT_Pro 中的实现，可以参考：app_yolo.cpp#L95

关于随机颜色的代码实现参考自 tensorRT_Pro 中的实现，可以参考：ilogger.cpp#L90

模型推理保存的结果图像如下所示：

在这里插入图片描述

2. YOLOv8-OBB预处理

模型预测成功后我们就需要自己动手来写下 YOLOv8-OBB 的预处理和后处理，方便后续在 C++ 上的实现，我们先来看看预处理的实现。

经过我们的调试分析可知 YOLOv8-OBB 的预处理过程在 ultralytics/engine/predictor.py 文件中，可以参考：predictor.py#L113

代码如下：

def preprocess(self, im):
    """
    Prepares input image before inference.

    Args:
        im (torch.Tensor | List(np.ndarray)): BCHW for tensor, [(HWC) x B] for list.
    """
    not_tensor = not isinstance(im, torch.Tensor)
    if not_tensor:
        im = np.stack(self.pre_transform(im))
        im = im[..., ::-1].transpose((0, 3, 1, 2))  # BGR to RGB, BHWC to BCHW, (n, 3, h, w)
        im = np.ascontiguousarray(im)  # contiguous
        im = torch.from_numpy(im)

    im = im.to(self.device)
    im = im.half() if self.model.fp16 else im.float()  # uint8 to fp16/32
    if not_tensor:
        im /= 255  # 0 - 255 to 0.0 - 1.0
    return im

它包含以下步骤：

self.pre_transform：即 letterbox 添加灰条
im[…,::-1]：BGR → RGB
transpose((0, 3, 1, 2))：添加 batch 维度，HWC → CHW
torch.from_numpy：to Tensor
im /= 255：除以 255，归一化

大家如果对 YOLOv5 的预处理熟悉的话，会发现 YOLOv8-OBB 的预处理和 YOLOv5 的预处理一模一样，因此我们不难写出对应的预处理代码，如下所示：

def preprocess_warpAffine(image, dst_width=1024, dst_height=1024):
    scale = min((dst_width / image.shape[1], dst_height / image.shape[0]))
    ox = (dst_width  - scale * image.shape[1]) / 2
    oy = (dst_height - scale * image.shape[0]) / 2
    M = np.array([
        [scale, 0, ox],
        [0, scale, oy]
    ], dtype=np.float32)
    
    img_pre = cv2.warpAffine(image, M, (dst_width, dst_height), flags=cv2.INTER_LINEAR,
                             borderMode=cv2.BORDER_CONSTANT, borderValue=(114, 114, 114))
    IM = cv2.invertAffineTransform(M)
    img_pre = (img_pre[...,::-1] / 255.0).astype(np.float32)
    img_pre = img_pre.transpose(2, 0, 1)[None]
    img_pre = torch.from_numpy(img_pre)
    return img_pre, IM

其中的 letterbox 添加灰条步骤我们可以通过仿射变换 warpAffine 实现，warpAffine 非常适合在 CUDA 上加速，关于 warpAffine 仿射变换的细节大家可以参考 YOLOv5推理详解及预处理高性能实现，这边不再赘述。其它步骤倒是和官方的没有区别。

值得注意的是，letterbox 的操作是先将长边缩放到 1024，再将短边按比例缩放，同时确保缩放后的短边能整除 32，如果不能则向上取整多余部分填充。warpAffine 的操作则是将图像分辨率固定在 1024x1024，多余部分添加灰条，博主对一张 1689x2425 分辨率的图像经过两种不同预处理后的结果进行了对比，如下图所示：

在这里插入图片描述

图1-1 LeeterBox预处理图像

在这里插入图片描述

图1-2 warpAffine预处理图像

可以看到二者明显的差别，letterbox 中灰条只有小部分，因为长边缩放到 1024 后短边缩放到 713，然后短板需向上整除 32，最终缩放到 736。而 warpAffine 则是固定分辨率 1024x1024，因此短边多余部分全部将用灰条填充。

warpAffine 预处理方法将图像分辨率固定在 1024x1024，主要有以下几点考虑：(from chatGPT)

简化处理逻辑：所有预处理后的图像分辨率相同，可以简化 CUDA 中并行处理的逻辑，使得代码更易于编写和维护。
优化内存访问：在 GPU 上，连续的内存访问模式通常比非连续的访问更高效。如果所有图像具有相同的大小和布局，这可以帮助优化内存访问，提高处理速度。
避免动态内存分配：动态内存分配和释放是昂贵的操作，特别是在 GPU 上。固定分辨率意味着可以预先分配足够的内存，而不需要根据每个图像的大小动态调整内存大小。

这两种不同的预处理方法生成的图片输入到神经网络时的维度不同，letterbox 的输入是 torch.Size([1, 3, 736, 1024])，warpAffine 的输入是 torch.Size([1, 3, 1024, 1024])。由于输入维度不同将导致模型输出维度的差异，leetrbox 的输出是 torch.Size([1, 20, 15456]) 只有 15456 个框，而 warpAffine 的输出是 torch.Size([1, 20, 21504]) 有 21504 个框，这点大家需要清楚。

3. YOLOv8-OBB后处理

我们再来看看后处理的实现

经过我们的调试分析可知 YOLOv8-OBB 的后处理过程在 ultralytics/models/yolo/obb/predict.py 文件中，可以参考：obb/predict.py#L10

class OBBPredictor(DetectionPredictor):
    """
    A class extending the DetectionPredictor class for prediction based on an Oriented Bounding Box (OBB) model.

    Example:
        ```python
        from ultralytics.utils import ASSETS
        from ultralytics.models.yolo.obb import OBBPredictor

        args = dict(model='yolov8n-obb.pt', source=ASSETS)
        predictor = OBBPredictor(overrides=args)
        predictor.predict_cli()
        
    """

    def __init__(self, cfg=DEFAULT_CFG, overrides=None, _callbacks=None):
        """Initializes OBBPredictor with optional model and data configuration overrides."""
        super().__init__(cfg, overrides, _callbacks)
        self.args.task = "obb"

    def postprocess(self, preds, img, orig_imgs):
        """Post-processes predictions and returns a list of Results objects."""
        preds = ops.non_max_suppression(
            preds,
            self.args.conf,
            self.args.iou,
            agnostic=self.args.agnostic_nms,
            max_det=self.args.max_det,
            nc=len(self.model.names),
            classes=self.args.classes,
            rotated=True,
        )

        if not isinstance(orig_imgs, list):  # input images are a torch.Tensor, not a list
            orig_imgs = ops.convert_torch2numpy_batch(orig_imgs)

        results = []
        for i, (pred, orig_img, img_path) in enumerate(zip(preds, orig_imgs, self.batch[0])):
            pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], orig_img.shape, xywh=True)
            # xywh, r, conf, cls
            obb = torch.cat([pred[:, :4], pred[:, -1:], pred[:, 4:6]], dim=-1)
            results.append(Results(orig_img, path=img_path, names=self.model.names, obb=obb))
        return results

它包含以下步骤：

ops.non_max_suppression：非极大值抑制，即 NMS
ops.scale_boxes：框的解码，即 decode boxes

大家如果对 YOLOv5 的后处理熟悉的话，会发现 YOLOv8-OBB 的后处理和 YOLOv5 的后处理基本相似，为什么说基本相似呢，是因为 YOLOv8-OBB 是基于旋转框的，在 IoU 的计算以及框的解码上有略微差异，因此我们不难写出对应的后处理代码，如下所示：

def probiou(obb1, obb2, eps=1e-7):
    # Calculate the prob iou between oriented bounding boxes, https://arxiv.org/pdf/2106.06072v1.pdf.
    def covariance_matrix(obb):
        # Extract elements
        w, h, r = obb[2:5]
        a = (w ** 2) / 12
        b = (h ** 2) / 12

        cos_r = torch.cos(torch.tensor(r))
        sin_r = torch.sin(torch.tensor(r))
        
        # Calculate covariance matrix elements
        a_val = a * cos_r ** 2 + b * sin_r ** 2
        b_val = a * sin_r ** 2 + b * cos_r ** 2
        c_val = (a - b) * sin_r * cos_r

        return a_val, b_val, c_val

    a1, b1, c1 = covariance_matrix(obb1)
    a2, b2, c2 = covariance_matrix(obb2)

    x1, y1 = obb1[:2]
    x2, y2 = obb2[:2]

    t1 = ((a1 + a2) * ((y1 - y2) ** 2) + (b1 + b2) * ((x1 - x2) ** 2)) / ((a1 + a2) * (b1 + b2) - (c1 + c2) ** 2 + eps)
    t2 = ((c1 + c2) * (x2 - x1) * (y1 - y2)) / ((a1 + a2) * (b1 + b2) - (c1 + c2) ** 2 + eps)
    t3 = torch.log(((a1 + a2) * (b1 + b2) - (c1 + c2) ** 2) / (4 * torch.sqrt(a1 * b1 - c1 ** 2) * torch.sqrt(a2 * b2 - c2 ** 2) + eps) + eps)

    bd = 0.25 * t1 + 0.5 * t2 + 0.5 * t3
    hd = torch.sqrt(1.0 - torch.exp(-torch.clamp(bd, eps, 100.0)) + eps)
    return 1 - hd

def NMS(boxes, iou_thres):

    remove_flags = [False] * len(boxes)

    keep_boxes = []
    for i, ibox in enumerate(boxes):
        if remove_flags[i]:
            continue

        keep_boxes.append(ibox)
        for j in range(i + 1, len(boxes)):
            if remove_flags[j]:
                continue

            jbox = boxes[j]
            if(ibox[6] != jbox[6]):
                continue
            if probiou(ibox, jbox) > iou_thres:
                remove_flags[j] = True
    return keep_boxes

def postprocess(pred, IM=[], conf_thres=0.25, iou_thres=0.45):

    # 输入是模型推理的结果，即21504个预测框
    # 1,21504,20 [cx,cy,w,h,class*15,rotated]
    boxes = []
    for item in pred[0]:
        cx, cy, w, h = item[:4]
        angle = item[-1]
        label = item[4:-1].argmax()
        confidence = item[4 + label]
        if confidence < conf_thres:
            continue
        boxes.append([cx, cy, w, h, angle, confidence, label])

    boxes = np.array(boxes)
    cx = boxes[:, 0]
    cy = boxes[:, 1]
    wh = boxes[:, 2:4]
    boxes[:, 0] = IM[0][0] * cx + IM[0][2]
    boxes[:, 1] = IM[1][1] * cy + IM[1][2]
    boxes[:, 2:4] = IM[0][0] * wh
    boxes = sorted(boxes.tolist(), key=lambda x:x[5], reverse=True)
    
    return NMS(boxes, iou_thres)

其中预测框的解码我们是通过仿射变换逆矩阵 IM 实现的，关于 IM 的细节大家可以参考 YOLOv5推理详解及预处理高性能实现，这边不再赘述。关于 NMS 的代码参考自 tensorRT_Pro 中的实现：yolo.cpp#L119

值得注意的是 IoU 的计算 YOLOv8 官方考虑的是利用 ProbIoU 来计算两个旋转框的相似性，更多细节大家可以看论文：Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection

对于一张 1024x1024 的图片来说，YOLOv8-OBB 预测框的总数量是 21504，每个预测框的维度是 20（针对 DOTAv1 数据集的 15 个类别而言）
$\begin{aligned} 21504\times20&=128\times128\times20+64\times64\times20+32\times32\times20\\ &=128\times128\times(4+15+1)+64\times64\times(4+15+1)+32\times32\times(4+15+1) \end{aligned}$
其中的 4 对应的是 cx, cy, w, h，分别代表的含义是边界框中心点坐标、宽高；15 对应的是 DOTAv1 数据集中的 15 个类别置信度；1 对应的是旋转框的旋转角度 angle，其取值范围是在 [-pi/4, 3pi/4] 之间。

4. YOLOv8-OBB推理

通过上面对 YOLOv8-OBB 的预处理和后处理分析之后，整个推理过程就显而易见了。YOLOv8-OBB 的推理包括图像预处理、模型推理、预测结果后处理三部分，其中预处理主要包括 warpAffine 仿射变换，后处理主要包括 decode 解码和 NMS 两部分。

完整的推理代码如下：

import cv2
import torch
import numpy as np
from ultralytics.data.augment import LetterBox
from ultralytics.nn.autobackend import AutoBackend

def preprocess_letterbox(image):
    letterbox = LetterBox(new_shape=1024, stride=32, auto=True)
    image = letterbox(image=image)
    image = (image[..., ::-1] / 255.0).astype(np.float32) # BGR to RGB, 0 - 255 to 0.0 - 1.0
    image = image.transpose(2, 0, 1)[None]  # BHWC to BCHW (n, 3, h, w)
    image = torch.from_numpy(image)
    return image

def preprocess_warpAffine(image, dst_width=1024, dst_height=1024):
    scale = min((dst_width / image.shape[1], dst_height / image.shape[0]))
    ox = (dst_width  - scale * image.shape[1]) / 2
    oy = (dst_height - scale * image.shape[0]) / 2
    M = np.array([
        [scale, 0, ox],
        [0, scale, oy]
    ], dtype=np.float32)
    img_pre = cv2.warpAffine(image, M, (dst_width, dst_height), flags=cv2.INTER_LINEAR,
                             borderMode=cv2.BORDER_CONSTANT, borderValue=(114, 114, 114))
    IM = cv2.invertAffineTransform(M)
    img_pre = (img_pre[...,::-1] / 255.0).astype(np.float32)
    img_pre = img_pre.transpose(2, 0, 1)[None]
    img_pre = torch.from_numpy(img_pre)
    return img_pre, IM

def xywhr2xyxyxyxy(center):
    # reference: https://github.com/ultralytics/ultralytics/blob/v8.1.0/ultralytics/utils/ops.py#L545
    is_numpy = isinstance(center, np.ndarray)
    cos, sin = (np.cos, np.sin) if is_numpy else (torch.cos, torch.sin)

    ctr = center[..., :2]
    w, h, angle = (center[..., i : i + 1] for i in range(2, 5))
    cos_value, sin_value = cos(angle), sin(angle)
    vec1 = [w / 2 * cos_value, w / 2 * sin_value]
    vec2 = [-h / 2 * sin_value, h / 2 * cos_value]
    vec1 = np.concatenate(vec1, axis=-1) if is_numpy else torch.cat(vec1, dim=-1)
    vec2 = np.concatenate(vec2, axis=-1) if is_numpy else torch.cat(vec2, dim=-1)
    pt1 = ctr + vec1 + vec2
    pt2 = ctr + vec1 - vec2
    pt3 = ctr - vec1 - vec2
    pt4 = ctr - vec1 + vec2
    return np.stack([pt1, pt2, pt3, pt4], axis=-2) if is_numpy else torch.stack([pt1, pt2, pt3, pt4], dim=-2)

def probiou(obb1, obb2, eps=1e-7):
    # Calculate the prob iou between oriented bounding boxes, https://arxiv.org/pdf/2106.06072v1.pdf.
    def covariance_matrix(obb):
        # Extract elements
        w, h, r = obb[2:5]
        a = (w ** 2) / 12
        b = (h ** 2) / 12

        cos_r = torch.cos(torch.tensor(r))
        sin_r = torch.sin(torch.tensor(r))
        
        # Calculate covariance matrix elements
        a_val = a * cos_r ** 2 + b * sin_r ** 2
        b_val = a * sin_r ** 2 + b * cos_r ** 2
        c_val = (a - b) * sin_r * cos_r

        return a_val, b_val, c_val

    a1, b1, c1 = covariance_matrix(obb1)
    a2, b2, c2 = covariance_matrix(obb2)

    x1, y1 = obb1[:2]
    x2, y2 = obb2[:2]

    t1 = ((a1 + a2) * ((y1 - y2) ** 2) + (b1 + b2) * ((x1 - x2) ** 2)) / ((a1 + a2) * (b1 + b2) - (c1 + c2) ** 2 + eps)
    t2 = ((c1 + c2) * (x2 - x1) * (y1 - y2)) / ((a1 + a2) * (b1 + b2) - (c1 + c2) ** 2 + eps)
    t3 = torch.log(((a1 + a2) * (b1 + b2) - (c1 + c2) ** 2) / (4 * torch.sqrt(a1 * b1 - c1 ** 2) * torch.sqrt(a2 * b2 - c2 ** 2) + eps) + eps)

    bd = 0.25 * t1 + 0.5 * t2 + 0.5 * t3
    hd = torch.sqrt(1.0 - torch.exp(-torch.clamp(bd, eps, 100.0)) + eps)
    return 1 - hd

def NMS(boxes, iou_thres):

    remove_flags = [False] * len(boxes)

    keep_boxes = []
    for i, ibox in enumerate(boxes):
        if remove_flags[i]:
            continue

        keep_boxes.append(ibox)
        for j in range(i + 1, len(boxes)):
            if remove_flags[j]:
                continue

            jbox = boxes[j]
            if(ibox[6] != jbox[6]):
                continue
            if probiou(ibox, jbox) > iou_thres:
                remove_flags[j] = True
    return keep_boxes

def postprocess(pred, IM=[], conf_thres=0.25, iou_thres=0.45):

    # 输入是模型推理的结果，即21504个预测框
    # 1,21504,20 [cx,cy,w,h,class*15,rotated]
    boxes = []
    for item in pred[0]:
        cx, cy, w, h = item[:4]
        angle = item[-1]
        label = item[4:-1].argmax()
        confidence = item[4 + label]
        if confidence < conf_thres:
            continue
        boxes.append([cx, cy, w, h, angle, confidence, label])

    boxes = np.array(boxes)
    cx = boxes[:, 0]
    cy = boxes[:, 1]
    wh = boxes[:, 2:4]
    boxes[:, 0] = IM[0][0] * cx + IM[0][2]
    boxes[:, 1] = IM[1][1] * cy + IM[1][2]
    boxes[:, 2:4] = IM[0][0] * wh
    boxes = sorted(boxes.tolist(), key=lambda x:x[5], reverse=True)
    
    return NMS(boxes, iou_thres)

def hsv2bgr(h, s, v):
    h_i = int(h * 6)
    f = h * 6 - h_i
    p = v * (1 - s)
    q = v * (1 - f * s)
    t = v * (1 - (1 - f) * s)
    
    r, g, b = 0, 0, 0

    if h_i == 0:
        r, g, b = v, t, p
    elif h_i == 1:
        r, g, b = q, v, p
    elif h_i == 2:
        r, g, b = p, v, t
    elif h_i == 3:
        r, g, b = p, q, v
    elif h_i == 4:
        r, g, b = t, p, v
    elif h_i == 5:
        r, g, b = v, p, q

    return int(b * 255), int(g * 255), int(r * 255)

def random_color(id):
    h_plane = (((id << 2) ^ 0x937151) % 100) / 100.0
    s_plane = (((id << 3) ^ 0x315793) % 100) / 100.0
    return hsv2bgr(h_plane, s_plane, 1)

if __name__ == "__main__":

    img = cv2.imread("P0032.jpg")

    # img_pre = preprocess_letterbox(img)
    img_pre, IM = preprocess_warpAffine(img)
    model  = AutoBackend(weights="yolov8s-obb.pt")
    names  = model.names
    result = model(img_pre)[0].transpose(-1, -2)  # 1,21504,20

    boxes   = postprocess(result, IM)
    confs   = [box[5] for box in boxes]
    classes = [int(box[6]) for box in boxes]
    boxes   = xywhr2xyxyxyxy(np.array(boxes)[..., :5])

    for i, box in enumerate(boxes):
        confidence = confs[i]
        label = classes[i]
        color = random_color(label)
        cv2.polylines(img, [np.asarray(box, dtype=int)], True, color, 2)
        caption = f"{names[label]} {confidence:.2f}"
        w, h = cv2.getTextSize(caption, 0 ,1, 2)[0]
        left, top = [int(b) for b in box[0]]
        cv2.rectangle(img, (left - 3, top - 33), (left + w + 10, top), color, -1)
        cv2.putText(img, caption, (left, top - 5), 0, 1, (0, 0, 0), 2, 16)
    
    cv2.imwrite("infer-obb.jpg", img)
    print("save done")

推理效果如下图：

在这里插入图片描述

至此，我们在 Python 上面完成了 YOLOv8-OBB 的整个推理过程，下面我们去 C++ 上实现。

二、YOLOv8-OBB推理(C++)

C++ 上的实现我们使用的 repo 依旧是 tensorRT_Pro，现在我们就基于 tensorRT_Pro 完成 YOLOv8-OBB 在 C++ 上的推理。

1. ONNX导出

首先我们需要将 YOLOv8-OBB 模型导出为 ONNX，为了适配 tensorRT_Pro 我们需要做一些修改，主要有以下几点：

修改输出节点名为 output，输入输出只让 batch 维度动态，宽高不动态
增加 transpose 节点交换输出的 2、3 维度

具体修改如下：

1. 在 ultralytics/engine/exporter.py 文件中改动一处

353 行：输出节点名修改为 output
356 行：输入只让 batch 维度动态，宽高不动态
361 行：输出只让 batch 维度动态，宽高不动态

# ========== exporter.py ==========

# ultralytics/engine/exporter.py第353行
# output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0']
# dynamic = self.args.dynamic
# if dynamic:
#     dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}}  # shape(1,3,640,640)
#     if isinstance(self.model, SegmentationModel):
#         dynamic['output0'] = {0: 'batch', 2: 'anchors'}  # shape(1, 116, 8400)
#         dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'}  # shape(1,32,160,160)
#     elif isinstance(self.model, DetectionModel):
#         dynamic['output0'] = {0: 'batch', 2: 'anchors'}  # shape(1, 84, 8400)
# 修改为：

output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output']
dynamic = self.args.dynamic
if dynamic:
    dynamic = {'images': {0: 'batch'}}  # shape(1,3,640,640)
    if isinstance(self.model, SegmentationModel):
        dynamic['output0'] = {0: 'batch', 2: 'anchors'}  # shape(1, 116, 8400)
        dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'}  # shape(1,32,160,160)
    elif isinstance(self.model, DetectionModel):
        dynamic['output'] = {0: 'batch'}  # shape(1, 84, 8400)

2. 在 ultralytics/nn/modules/head.py 文件中改动一处

141 行：添加 transpose 节点交换输出的第 2 和第 3 维度

# ========== head.py ==========

# ultralytics/nn/modules/head.py第141行，forward函数
# return torch.cat([x, angle], 1) if self.export else (torch.cat([x[0], angle], 1), (x[1], angle))
# 修改为：

return torch.cat([x, angle], 1).permute(0, 2, 1) if self.export else (torch.cat([x[0], angle], 1), (x[1], angle))

以上就是为了适配 tensorRT_Pro 而做出的代码修改，修改好以后，将预训练权重 yolov8s-obb.pt 放在 ultralytics-main 主目录下，新建导出文件 export.py，内容如下：

from ultralytics import YOLO

model = YOLO("yolov8s-obb.pt")

success = model.export(format="onnx", dynamic=True, simplify=True)

在终端执行如下指令即可完成 onnx 导出：

python export.py

导出过程如下图所示：

在这里插入图片描述

可以看到导出的 pytorch 模型的输入 shape 是 1x3x1024x1024，输出 shape 是 1x21504x20，符合我们的预期。

导出成功后会在当前目录下生成 yolov8s-obb.onnx 模型，我们可以使用 Netron 可视化工具查看，如下图所示：

在这里插入图片描述

可以看到输入节点名是 images，维度是 batchx3x1024x1024，保证只有 batch 维度动态，输出节点名是 output，维度是 batchxTransposeoutput_dim_1xTransposeoutput_dim_2，保证只有 batch 维度动态，符合 tensorRT_Pro 的格式。

大家不要看到 Transposeoutput_dim_1 和 Transposeoutput_dim_2 就认为这也是动态的，其实输出节点的维度是根据输入节点的维度和模型的结构生成的，而额外的维度 Transposeoutput_dim_1 和 Transposeoutput_dim_2 可能是由模型结构中某些操作决定的，如通道数变换（Transpose）操作的输出维度，而不是由动态维度决定的。因此，通常情况下，这些维度是静态的，不会在推理时改变。

2. YOLOv8-OBB预处理

之前有提到过 YOLOv8-OBB 预处理部分和 YOLOv5 实现一模一样，因此我们在 tensorRT_Pro 中 YOLOv8-OBB 模型的预处理可以直接使用 YOLOv5 的预处理。

tensorRT_Pro 中预处理的代码如下：

__global__ void warp_affine_bilinear_and_normalize_plane_kernel(uint8_t* src, int src_line_size, int src_width, int src_height, float* dst, int dst_width, int dst_height, 
	uint8_t const_value_st, float* warp_affine_matrix_2_3, Norm norm, int edge){

	int position = blockDim.x * blockIdx.x + threadIdx.x;
	if (position >= edge) return;

	float m_x1 = warp_affine_matrix_2_3[0];
	float m_y1 = warp_affine_matrix_2_3[1];
	float m_z1 = warp_affine_matrix_2_3[2];
	float m_x2 = warp_affine_matrix_2_3[3];
	float m_y2 = warp_affine_matrix_2_3[4];
	float m_z2 = warp_affine_matrix_2_3[5];

	int dx      = position % dst_width;
	int dy      = position / dst_width;
	float src_x = m_x1 * dx + m_y1 * dy + m_z1;
	float src_y = m_x2 * dx + m_y2 * dy + m_z2;
	float c0, c1, c2;

	if(src_x <= -1 || src_x >= src_width || src_y <= -1 || src_y >= src_height){
		// out of range
		c0 = const_value_st;
		c1 = const_value_st;
		c2 = const_value_st;
	}else{
		int y_low = floorf(src_y);
		int x_low = floorf(src_x);
		int y_high = y_low + 1;
		int x_high = x_low + 1;

		uint8_t const_value[] = {const_value_st, const_value_st, const_value_st};
		float ly    = src_y - y_low;
		float lx    = src_x - x_low;
		float hy    = 1 - ly;
		float hx    = 1 - lx;
		float w1    = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
		uint8_t* v1 = const_value;
		uint8_t* v2 = const_value;
		uint8_t* v3 = const_value;
		uint8_t* v4 = const_value;
		if(y_low >= 0){
			if (x_low >= 0)
				v1 = src + y_low * src_line_size + x_low * 3;

			if (x_high < src_width)
				v2 = src + y_low * src_line_size + x_high * 3;
		}
		
		if(y_high < src_height){
			if (x_low >= 0)
				v3 = src + y_high * src_line_size + x_low * 3;

			if (x_high < src_width)
				v4 = src + y_high * src_line_size + x_high * 3;
		}
		
		// same to opencv
		c0 = floorf(w1 * v1[0] + w2 * v2[0] + w3 * v3[0] + w4 * v4[0] + 0.5f);
		c1 = floorf(w1 * v1[1] + w2 * v2[1] + w3 * v3[1] + w4 * v4[1] + 0.5f);
		c2 = floorf(w1 * v1[2] + w2 * v2[2] + w3 * v3[2] + w4 * v4[2] + 0.5f);
	}

	if(norm.channel_type == ChannelType::Invert){
		float t = c2;
		c2 = c0;  c0 = t;
	}

	if(norm.type == NormType::MeanStd){
		c0 = (c0 * norm.alpha - norm.mean[0]) / norm.std[0];
		c1 = (c1 * norm.alpha - norm.mean[1]) / norm.std[1];
		c2 = (c2 * norm.alpha - norm.mean[2]) / norm.std[2];
	}else if(norm.type == NormType::AlphaBeta){
		c0 = c0 * norm.alpha + norm.beta;
		c1 = c1 * norm.alpha + norm.beta;
		c2 = c2 * norm.alpha + norm.beta;
	}

	int area = dst_width * dst_height;
	float* pdst_c0 = dst + dy * dst_width + dx;
	float* pdst_c1 = pdst_c0 + area;
	float* pdst_c2 = pdst_c1 + area;
	*pdst_c0 = c0;
	*pdst_c1 = c1;
	*pdst_c2 = c2;
}

关于预处理部分其实就是调用了上述 CUDA 核函数来实现 warpAffine，由于在 CUDA 中我们是对每个像素进行操作，因此非常容易实现 BGR → RGB，/255.0 等操作。关于代码的具体分析可以参考 YOLOv5推理详解及预处理高性能实现，这边不再赘述。

3. YOLOv8-OBB后处理

之前有提到过 YOLOv8-OBB 后处理部分和 YOLOv5 基本相似，但由于 YOLOv8-OBB 多了角度信息，因此对于 decode 解码部分我们需要进行简单调整，此外 IoU 的计算也需要调整为 ProbIoU，代码可参考：yolo.cu#L129

因此我们不难写出 YOLOv8-OBB 的 decode 解码部分的实现代码，如下所示：

static __global__ void decode_kernel(float* predict, int num_bboxes, int num_classes, float confidence_threshold, float* invert_affine_matrix, float* parray, int max_objects){  
    // cx, cy, w, h, cls, angle
    int position = blockDim.x * blockIdx.x + threadIdx.x;
    if (position >= num_bboxes) return;

    float* pitem            = predict + (5 + num_classes) * position;
    float* class_confidence = pitem + 4;
    float confidence        = *class_confidence++;
    int label               = 0;
    for(int i = 1; i < num_classes; ++i, ++class_confidence){
        if(*class_confidence > confidence){
            confidence = *class_confidence;
            label      = i;
        }
    }

    if(confidence < confidence_threshold)
        return;

    int index = atomicAdd(parray, 1);
    if(index >= max_objects)
        return;

    float cx         = *pitem++;
    float cy         = *pitem++;
    float width      = *pitem++;
    float height     = *pitem++;
    float angle      = *(pitem + num_classes);
    affine_project(invert_affine_matrix, cx, cy, width, height, &cx, &cy, &width, &height);

    float* pout_item = parray + 1 + index * NUM_BOX_ELEMENT;
    *pout_item++ = cx;
    *pout_item++ = cy;
    *pout_item++ = width;
    *pout_item++ = height;
    *pout_item++ = angle;
    *pout_item++ = confidence;
    *pout_item++ = label;
    *pout_item++ = 1; // 1 = keep, 0 = ignore
}

关于 decode 的具体实现其实就是启动多个线程，每个线程处理一个框的解码，我们会通过仿射变换逆矩阵 IM 将坐标映射回原图上，值得注意的是角度维度在最后一维，类别信息在中间，即一个旋转框 20 维的信息为 [cx, cy, w, h, cls*15, angle]。

另外关于 NMS 部分，由于在 YOLOv8-OBB 模型中采用的是 ProbIoU 计算两个旋转框相似度，因此也需要适当调整，调整后的 NMS 代码如下：

static __device__ void convariance_matrix(float w, float h, float r, float& a, float& b, float& c){
    float a_val = w * w / 12.0f;
    float b_val = h * h / 12.0f;
    float cos_r = cosf(r); 
    float sin_r = sinf(r);

    a = a_val * cos_r * cos_r + b_val * sin_r * sin_r;
    b = a_val * sin_r * sin_r + b_val * cos_r * cos_r;
    c = (a_val - b_val) * sin_r * cos_r;
}

static __device__ float box_probiou(
    float cx1, float cy1, float w1, float h1, float r1,
    float cx2, float cy2, float w2, float h2, float r2,
    float eps = 1e-7
){

    // Calculate the prob iou between oriented bounding boxes, https://arxiv.org/pdf/2106.06072v1.pdf.
    float a1, b1, c1, a2, b2, c2;
    convariance_matrix(w1, h1, r1, a1, b1, c1);
    convariance_matrix(w2, h2, r2, a2, b2, c2);

    float t1 = ((a1 + a2) * powf(cy1 - cy2, 2) + (b1 + b2) * powf(cx1 - cx2, 2)) / ((a1 + a2) * (b1 + b2) - powf(c1 + c2, 2) + eps);
    float t2 = ((c1 + c2) * (cx2 - cx1) * (cy1 - cy2)) / ((a1 + a2) * (b1 + b2) - powf(c1 + c2, 2) + eps);
    float t3 = logf(((a1 + a2) * (b1 + b2) - powf(c1 + c2, 2)) / (4 * sqrtf(fmaxf(a1 * b1 - c1 * c1, 0.0f)) * sqrtf(fmaxf(a2 * b2 - c2 * c2, 0.0f)) + eps) + eps); 
    float bd = 0.25f * t1 + 0.5f * t2 + 0.5f * t3;
    bd = fmaxf(fminf(bd, 100.0f), eps);
    float hd = sqrtf(1.0f - expf(-bd) + eps);
    return 1 - hd;    
}

static __global__ void nms_kernel(float* bboxes, int max_objects, float threshold){

    int position = (blockDim.x * blockIdx.x + threadIdx.x);
    int count = min((int)*bboxes, max_objects);
    if (position >= count) 
        return;
    
    // cx, cy, w, h, angle, confidence, class_label, keepflag
    float* pcurrent = bboxes + 1 + position * NUM_BOX_ELEMENT;
    for(int i = 0; i < count; ++i){
        float* pitem = bboxes + 1 + i * NUM_BOX_ELEMENT;
        if(i == position || pcurrent[6] != pitem[6]) continue;

        if(pitem[5] >= pcurrent[5]){
            if(pitem[5] == pcurrent[5] && i < position)
                continue;

            float iou = box_probiou(
                pcurrent[0], pcurrent[1], pcurrent[2], pcurrent[3], pcurrent[4],
                pitem[0],    pitem[1],    pitem[2],    pitem[3],    pitem[4]
            );

            if(iou > threshold){
                pcurrent[7] = 0;  // 1=keep, 0=ignore
                return;
            }
        }
    }
}

关于 NMS 的具体实现也是启动多个线程，每个线程处理一个框，如果剩余框中的置信度大于当前线程中处理的框，则计算两个框的 ProbIoU，通过 ProbIoU 值判断是否保留该框。相比于 CPU 版的 NMS 应该是少套了一层循环，另外一层循环是通过 CUDA 上线程的并行操作处理的，代码参考自：yolo_decode.cu#L81

4. YOLOv8-OBB推理

通过上面对 YOLOv8-OBB 的预处理和后处理分析之后，整个推理过程就显而易见了。C++ 上 YOLOv8-OBB 的预处理部分可直接沿用 YOLOv5 的预处理，后处理中的 decode 解码部分和 NMS 部分需要简单修改。

我们在终端执行如下指令即可完成推理（注意！完整流程博主会在后续内容介绍，这边只是简单演示）

make yolo_obb

编译图解如下所示：

在这里插入图片描述

推理结果如下图所示：

在这里插入图片描述

至此，我们在 C++ 上面完成了 YOLOv8-OBB 的整个推理过程，下面我们将完整的走一遍流程。

三、YOLOv8-OBB部署

博主新建了一个仓库 tensorRT_Pro-YOLOv8，该仓库基于 shouxieai/tensorRT_Pro，并进行了调整以支持 YOLOv8 的各项任务，目前已支持分类、检测、分割、姿态点估计、旋转目标检测任务。

下面我们就来具体看看如何利用 tensorRT_Pro-YOLOv8 这个 repo 完成 YOLOv8-OBB 的推理。

1. 源码下载

tensorRT_Pro-YOLOv8 的代码可以直接从 GitHub 官网上下载，源码下载地址是 https://github.com/Melody-Zhou/tensorRT_Pro-YOLOv8，Linux 下代码克隆指令如下：

git clone https://github.com/Melody-Zhou/tensorRT_Pro-YOLOv8

也可手动点击下载，点击右上角的 Code 按键，将代码下载下来。至此整个项目就已经准备好了。也可以点击 here【pwd:yolo】下载博主准备好的源代码（注意代码下载于 2024/1/21 日，若有改动请参考最新）

2. 环境配置

需要使用的软件环境有 TensorRT、CUDA、cuDNN、OpenCV、Protobuf，所有软件环境的安装可以参考 Ubuntu20.04软件安装大全，这里不再赘述，需要各位看官自行配置好相关环境😄，外网访问较慢，这里提供下博主安装过程中的软件安装包下载链接 Baidu Drive【pwd:yolo】🚀🚀🚀

tensorRT_Pro-YOLOv8 提供 CMakeLists.txt 和 Makefile 两种方式编译，二者选一即可

2.1 配置CMakeLists.txt

主要修改五处

1. 修改第 13 行，修改 OpenCV 路径

set(OpenCV_DIR   "/usr/local/include/opencv4/")

2. 修改第 15 行，修改 CUDA 路径

set(CUDA_TOOLKIT_ROOT_DIR     "/usr/local/cuda-11.6")

3. 修改第 16 行，修改 cuDNN 路径

set(CUDNN_DIR    "/usr/local/cudnn8.4.0.27-cuda11.6")

4. 修改第 17 行，修改 tensorRT 路径

set(TENSORRT_DIR "/opt/TensorRT-8.4.1.5")

5. 修改第 20 行，修改 protobuf 路径

set(PROTOBUF_DIR "/home/jarvis/protobuf")

2.2 配置Makefile

主要修改五处

1. 修改第 4 行，修改 protobuf 路径

lean_protobuf  := /home/jarvis/protobuf

2. 修改第 5 行，修改 tensorRT 路径

lean_tensor_rt := /opt/TensorRT-8.4.1.5

3. 修改第 6 行，修改 cuDNN 路径

lean_cudnn     := /usr/local/cudnn8.4.0.27-cuda11.6

4. 修改第 7 行，修改 OpenCV 路径

lean_opencv    := /usr/local

5. 修改第 8 行，修改 CUDA 路径

lean_cuda      := /usr/local/cuda-11.6

3. ONNX导出

导出细节可以查看之前的内容，这边不再赘述。记得将导出的 ONNX 模型放在 tensorRT_Pro-YOLOv8/workspace 文件夹下。

4. 源码修改

如果你想推理自己训练的模型还需要修改下源代码，YOLOv8-OBB 模型的推理代码主要在 app_yolo_obb.cpp 文件中，我们就只需要修改这一个文件中的内容即可，源码修改较简单主要有以下几点：

1. app_yolo_obb.cpp 249行，“yolov8s-obb” 修改为你导出的 ONNX 模型名
2. app_yolo_obb.cpp 10行，将 dotalabels 数组中的类别名称修改为你训练的类别

具体修改示例如下：

test(TRT::Mode::FP32, "best");	// 修改1 249行"yolov8s-obb"改成"best"

static const char *dotalabels[] = {"car", "aireplabe"};	// 修改2 10行修改检测类别，为自训练模型的类别名称

5. 运行

OK！源码修改好了，Makefile 编译文件也搞定了，ONNX 模型也准备好了，现在可以编译运行了，直接在终端执行如下指令即可：

make yolo_obb

编译过程如下所示：

在这里插入图片描述

编译运行成功后在 workspace 文件夹下会生成 engine 文件 yolov8s-obb.FP32.trtmodel 用于模型推理，同时它还会生成 yolov8s-obb_YoloV8-OBB_FP32_result 文件夹，该文件夹下保存了推理的图片。

模型推理效果如下图所示：

在这里插入图片描述

OK！以上就是使用 tensorRT_Pro-YOLOv8 推理 YOLOv8-OBB 的大致流程，若有问题，欢迎各位看官批评指正。

四. 拓展-ProbIoU

这里简单聊聊 ProbIoU 以及论文中的公式是如何跟代码对应上的

论文地址：https://arxiv.org/abs/2106.06072

以下内容 Copy 自论文解读系列十九：用于目标检测的高斯检测框与ProbIoU，建议大家阅读原文

在这里插入图片描述

现有目标检测的改进方向主要集中在：训练更大数据集（LVIS dataset）、处理类别不均衡、提出更好的 backbones、建立长距离相互作用模型（Transformers，LambdaNetworks）、分类和检测框的权衡分析，对于检测框的呈现形式相关研究较少。现有目标检测任务中以水平框（HBB）和旋转框（OBB）为主，呈现形式还是矩形或者类矩形。而现有目标距离及相似性计算方式包括：IoU（Intersection over Union）、GIoU（Generalized IoU）、DIoU（Distance IoU）、PIoU（Pixel IoU），Gaussian Wasserstein Distance（GWD）。

现有 OBB 算法在细长及旋转物体检测问题相对于 HBB 算法有所提高，但是与目标语义分割的贴合度不高，因此，本文提出更加贴合语义分割形式的分割呈现形式及对应的目标相似度计算方法。

该论文贡献如下：

提出一种新的椭圆形目标检测框（Gaussian Bounding Boxes，GBB）
- GBB 与目标的语义分割 mask 形状更为接近，更加贴合非矩形目标，在非矩形目标检测效果优于 HBB 和 OBB
提出一种新的目标相似度的计算方法（Probabilistic IoU，ProbIoU）
- 基于 Hellinger Distance 的 ProbIoU，考虑了 2D 高斯分布的特点，满足所有距离度量标准，能够表示不同分布间的真实距离，且处处可微，能提升 OBB 和 HBB 目标检测效果。

在这里插入图片描述

1. Gaussian Bounding Boxes(GBB)

为在二维区域确定一个二维高斯分布，需要计算其均值 $\mu$ 和协方差矩阵 $\Sigma$ ，其中 $\mu = (x_0,y_0)^T$ ，协方差矩阵 $\Sigma$ 可通过下面的公式进行计算：

在这里插入图片描述

在目标检测任务中可直接设置 $x_0,y_0,a,b,c)$ 作为目标检测回归任务中的参数，也可将回归任务中参数表示为 $x_0,y_0,a',b',0)$ ，而后者的形式更加符合现有旋转检测框的输出形式

水平框及旋转框向高斯框转换中遵循以下假设：目标区域为二维二元区域 $\Omega$ ，且 $\Omega$ 符合均匀概率分布，则该分布的均值 $\mu$ 和协方差矩阵 $\Sigma$ 可通过如下公式进行计算：

在这里插入图片描述

其中 $N$ 表示区域 $\Omega$ 的面积

HBB 转 GBB

对于 HBB，其二元区域 $\Omega$ 为以 $x_0,y_0)$ 为中心，高为 $H$ ，宽为 $W$ 的矩形区域，因此 $\mu = (x_0,y_0)$ ，它的协方差矩阵 $\Sigma$ 可通过下面的公式进行计算：

在这里插入图片描述

因此可以得出 $\frac{W^2}{12}$ ， $\frac{H^2}{12}$ ， $c = 0$ ，如上述公式所示，转换后的高斯框也可以转换为水平框，该过程是可逆的

OBB 转 GBB

OBB 转 GBB 需要计算 $(a',b',\theta)$ ，如下图所示，其中方差 $a^{'}$ 和 $b^{'}$ 可以通过将旋转框转化为水平框进行计算，其协方差矩阵可通过下面的公式进行计算：

在这里插入图片描述

2. ProbIoU

Bhattacharyya Distance (BD)

为计算不同 GBB 间的相似度，本文首先采用了 Bhattacharyya Coefficient（BC），两个概率密度函数 $p (x)$ 和 $q (x)$ 间的 BC 按下面的公式进行计算：

在这里插入图片描述

其中 $B_{C}(p,q)\in[0,1]$ ，当且仅当两个分布相同时 $B_{C}(p,q)=1$

基于上述 $B_{C}(p,q)$ 可以得到不同分布间的巴氏距离（Bhattacharyya Distance, BD），两个概率密度函数 $p (x)$ 和 $q (x)$ 间的 BD 按下面的公式进行计算：

在这里插入图片描述

当 $p\sim\mathcal{N}(\boldsymbol{\mu}_{1},\Sigma_{1})$ ， $q\sim\mathcal{N}(\boldsymbol{\mu}_{2},\Sigma_{2})$ 且目标检测中实际问题为二维向量及矩阵，巴氏距离 BD 可通过如下公式进行计算：

在这里插入图片描述

Hellinger Distance (HD)

由于 Bhattacharyya Distance 不满足三角不等式，所以它并不是真实的距离，为此我们需要采用 Hellinger Distance（HD），其计算公式如下：

在这里插入图片描述

其中 $H_{D}(p,q)\in[0,1]$ ，当且仅当两个分布相同时 $H_{D}(p,q)=0$

基于上述 Hellinger Distance （HD），本文提出高斯分布相似性计算方法 ProbIoU，其具体计算公式如下：

在这里插入图片描述

3. 代码

我们先来看看 ProbIoU 的计算代码，如下所示：

def probiou(obb1, obb2, eps=1e-7):
    # Calculate the prob iou between oriented bounding boxes, https://arxiv.org/pdf/2106.06072v1.pdf.
    def covariance_matrix(obb):
        # Extract elements
        w, h, r = obb[2:5]
        a = (w ** 2) / 12
        b = (h ** 2) / 12

        cos_r = torch.cos(torch.tensor(r))
        sin_r = torch.sin(torch.tensor(r))
        
        # Calculate covariance matrix elements
        a_val = a * cos_r ** 2 + b * sin_r ** 2
        b_val = a * sin_r ** 2 + b * cos_r ** 2
        c_val = (a - b) * sin_r * cos_r

        return a_val, b_val, c_val

    a1, b1, c1 = covariance_matrix(obb1)
    a2, b2, c2 = covariance_matrix(obb2)

    x1, y1 = obb1[:2]
    x2, y2 = obb2[:2]

    t1 = ((a1 + a2) * ((y1 - y2) ** 2) + (b1 + b2) * ((x1 - x2) ** 2)) / ((a1 + a2) * (b1 + b2) - (c1 + c2) ** 2 + eps)
    t2 = ((c1 + c2) * (x2 - x1) * (y1 - y2)) / ((a1 + a2) * (b1 + b2) - (c1 + c2) ** 2 + eps)
    t3 = torch.log(((a1 + a2) * (b1 + b2) - (c1 + c2) ** 2) / (4 * torch.sqrt(a1 * b1 - c1 ** 2) * torch.sqrt(a2 * b2 - c2 ** 2) + eps) + eps)

    bd = 0.25 * t1 + 0.5 * t2 + 0.5 * t3
    hd = torch.sqrt(1.0 - torch.exp(-torch.clamp(bd, eps, 100.0)) + eps)
    return 1 - hd

首先我们需要将 OBB 转换为 GBB，需要利用下面的公式：

在这里插入图片描述

值得注意的是这里的 $a^{'}$ 和 $b^{'}$ 需要按照 HBB 中的 $a$ 和 $b$ 计算，如下所示：

在这里插入图片描述

即 $\frac{W^2}{12}$ ， $\frac{H^2}{12}$

以上协方差公式的实现也就是 covariance_matrix 函数中的内容

通过 covariance_matrix 我们可以得到 $a_1$ ， $b_1$ ， $c_1$ ， $a_2$ ， $b_2$ ， $c_2$

接着我们需要计算两个分布之间的巴式距离 $B_D$ ，其计算公式如下：

在这里插入图片描述

对应到代码中 t1+t2 就是这里的 $B_1$ ，t3 就是这里的 $B_2$ ，bd 就是这里的 $B_D$

$B_D$ 计算完之后最后需要计算 ProbIoU，其计算公式如下：

在这里插入图片描述

对应到代码中 hd 就是这里的 $H_D$ ，最终返回的 1-hd 就是需要计算的 ProbIoU 值

结语

博主在这里针对 YOLOv8-OBB 的预处理和后处理做了简单分析，同时与大家分享了 C++ 上的实现流程，目的是帮大家理清思路，更好的完成后续的部署工作😄。感谢各位看到最后，创作不易，读后有收获的看官请帮忙点个👍⭐️

最后大家如果觉得 tensorRT_Pro-YOLOv8 这个 repo 对你有帮助的话，不妨点个 ⭐️ 支持一波，这对博主来说非常重要，感谢各位🙏。