【YOLOv8多模态入门必备】深度解析YOLOv8多模态（可见光+红外图像）推理模块，快速入门多模态检测，含超详细步骤分析！

本文链接：https://blog.csdn.net/qq_32575047/article/details/147103233

1. 文章主要内容

本篇博客聚焦YOLOv8多模态检测的推理模块。通过快速分析多模态推理模块的结构、代码实现及FPS计算流程，系统性地打通多模态检测的完整技术链条！另外，由于本篇幅度太长，多模态推理、热力图和FPS计算原理部分，将在下篇博客中详细更新，如果感兴趣的话，可以点个关注追更！

2. 相关说明

若读者尚未掌握多模态YOLOv8的训练方法、代码原理与核心架构，建议优先阅读本系列前两篇文章，以构建完整的知识体系。另外，本次推理过程，只涉及水平框的推理，至于旋转框，等之后有时间研究再进行博客知识分享。
一文详解YOLOv8多模态目标检测（可见光+红外图像，基于Ultralytics官方代码实现）
YOLOv8多模态检测核心技术解析：跨模态特征融合机制逐层拆解（含独家结构示意图）

3. 多模态YOLOv8推理模块详解

在这一节当中，首先会给出我重新修改整理的多模态推理代码（更加易懂、解耦合）、配套热力图可视化及FPS计算工具链，辅以关键修改点的说明，降低代码迁移成本；让大家能够简单的修改，快速上手代码。

3.1 多模态推理代码（快速上手看效果）

针对于多模态YOLOv8源码（不知道源码的读者，请看我上面推荐的第一篇博客，内含相关地址）水平框推理部分，其代码位于detect/hbbDetect.py（核心推理模块）与detect/hbbHeapmap.py（热力图生成模块），其中前一个是多模态推理的代码，第二个是绘制热力图的代码文件。接下来，将按顺序给出这两个文件的源码部分，并做相关解释。需要注意到的是：这两个源码我都做了相关的修改，因为源代码写的比较耦合，有点让人难以理解，我这里重构了代码，所以大家直接用我的代码进行粘贴复制就行。

3.1.1 hbbDetect推理代码修改

OK，先给出多模态推理修改后的代码hbbDetect.py文件。针对于此文件，需要修改几个部分，分别是：
1. 在def main函数中（下同，如有不同函数会进行说明），首先你需要定义一个path，这个path是你数据集的根目录，比如我这里使用的是DroneVehicle数据集。
2. 另外，定义一个weights_path，这个变量代表着你跑完多模态训练代码后生成的权重文件，博主这里选择best.pt文件。
3. 接着，定义一个rgb_dir，这个变量join函数path后面那一部分即为你可见光想要测试图片文件夹的根目录。（需特别注意：你红外ir图片测试的根目录一定要与这个可见光的一样，只是目录中的imgRGB改为imgIR，大家一定要这样定义）
4. 然后，定义一个save_dir，这个变量join函数path后面跟的是你想要保存推理结构图片的目录，不存在也没有关系，会自动进行创建。
5. 再往下看，会发现两个rgb_save_dir和ir_save_dir两个文件夹，这是基于上面第4点的save_dir文件夹进行构建。因为是多模态，所以推理的图片应该分为两个部分，所以就用两个文件夹分别是rgb和ir来保存推理的图片。
6. 我们再来看看def process_image函数，博主这里添加了FPS计算的相关代码，大家无需修改，可以正常运行！

import os
import cv2
import numpy as np
import torch
from ultralytics import YOLO
from tqdm import tqdm


class MultimodalDetector:
    def __init__(self, weights_path, device='cpu'):
        self.device = device
        self.model = self._load_model(weights_path)
        self.colors = [
            [165, 0, 255],
            [0, 255, 0],
            [102, 255, 255],
            [255, 165, 0],
            [255, 255, 0]
        ]
        self.class_names = ["car", "truck", "bus", "van", "freight"]

    def _load_model(self, weights_path):
        if not os.path.exists(weights_path):
            raise FileNotFoundError("Model weights not found!")
        model = YOLO(weights_path).to(self.device)
        model.fuse()
        model.info(verbose=False)
        return model

    def _draw_boxes(self, image, pt1, pt2, cls_value, cls_conf, thickness=4):
        # 根据类别分配颜色
        color = self.colors[int(cls_value)]
        # 画检测框
        cv2.rectangle(image, tuple(pt1), tuple(pt2), color, thickness)

    def _draw_text(self, image, text, pt1, cls_value, font_scale=1.6, thickness=4):
        color = self.colors[int(cls_value)]
        text_x, text_y = pt1[0], pt1[1]
        text_x = max(text_x, 0)
        text_y = max(text_y, 0)
        cv2.putText(image, text, (text_x, text_y), cv2.FONT_HERSHEY_SIMPLEX, font_scale, color, thickness)

    def process_image(self, rgb_path, ir_path):
        rgb_image = cv2.imread(rgb_path)
        ir_image = cv2.imread(ir_path)

        if rgb_image is None or ir_image is None:
            raise ValueError("Failed to load images")

        # 拼接 RGB 和 IR 图像
        combined_image = np.concatenate((rgb_image, ir_image), axis=2)

        # 模型推理
        result = self.model.predict(combined_image, save=False, imgsz=640, visualize=False, obb=False)
        boxes = result[0].boxes
        preprocess_time = result[0].speed["preprocess"]# 预处理总时间
        inference_time = result[0].speed["inference"] # 推理总时间
        postprocess_time = result[0].speed["postprocess"] # 后处理总时间

        # 计算 FPS
        total_time = preprocess_time + inference_time + postprocess_time
        fps = 1000 / total_time

        print(f"FPS: {fps:.1f}")
        # 绘制结果
        for box in boxes:
            cls_value = int(box.cls)
            conf = box.conf.item()

            xywh = box.xywh.cpu().numpy().flatten()  # 展平为一维数组
            x_center, y_center, width, height = xywh

            pt1 = (int(x_center - width / 2), int(y_center - height / 2))
            pt2 = (int(x_center + width / 2), int(y_center + height / 2))

            # 绘制矩形
            self._draw_boxes(rgb_image, pt1, pt2, cls_value, conf)
            self._draw_boxes(ir_image, pt1, pt2, cls_value, conf)

            # 绘制文本
            class_name = self.class_names[cls_value] if cls_value < len(self.class_names) else "未知类别"
            text = ""# f"{class_name}" # {float(conf):.2f}
            self._draw_text(rgb_image, text, pt1, cls_value)
            self._draw_text(ir_image, text, pt1, cls_value)

        return rgb_image, ir_image


def list_all_files(startpath):
    all_files = []
    for root, dirs, files in os.walk(startpath):
        for name in files:
            if name.lower().endswith(('.jpg', '.png', '.bmp')):
                all_files.append(os.path.join(root, name))
    return all_files


def main():
    path = '/root/autodl-tmp/data/DroneVehicle/'
    weights_path = '/root/autodl-tmp/project/TwoStream_Yolov8-main/runs/detect/train/weights/best.pt'
    detector = MultimodalDetector(weights_path, device='cuda' if torch.cuda.is_available() else 'cpu')

    rgb_dir = os.path.join(path, 'imgRGB/test/images/')
    # ir_dir = os.path.join(path, 'imgIR/test/images/')
    save_dir = os.path.join(path, 'results/')

    rgb_save_dir = os.path.join(save_dir, 'rgb')
    ir_save_dir = os.path.join(save_dir, 'ir')

    # 递归创建目录（包括父目录）
    os.makedirs(rgb_save_dir, exist_ok=True)
    os.makedirs(ir_save_dir, exist_ok=True)

    print(f"save detect result RGB: {os.path.abspath(rgb_save_dir)}")
    print(f"save detect result IR: {os.path.abspath(ir_save_dir)}\n")

    all_files = list_all_files(rgb_dir)
    for file in tqdm(all_files):
        rgb_path = file
        ir_path = file.replace('imgRGB', 'imgIR')
        try:
            rgb_result, ir_result = detector.process_image(rgb_path, ir_path)
            cv2.imwrite(os.path.join(rgb_save_dir, os.path.basename(file)), rgb_result)
            cv2.imwrite(os.path.join(ir_save_dir, os.path.basename(file)), ir_result)
        except Exception as e:
            print(f"Error processing {file}: {e}")


if __name__ == '__main__':
    main()

3.1.2 hbbHeapmap.py热力图代码修改

接下来，再给出热力图绘制的代码detect/hbbHeapmap.py。针对于此文件，需要修改几个部分，分别是：
1. 在if name == ‘main’: 中，依然是先定义数据集的根目录path，这里不过多解释。
2. 同样定义你想要绘制热力图的可见光和红外图像路径path_rgb和path_ir，注意其路径除了对应的imgRGB和imgIR不同之外，其它要相同，不然可能报错。
3. 然后，定义你保存热力图可视化的结果路径save_dir，这些都是比较简单的。
4. 接下来，来到get_params()函数，你只需要修改weight这一个变量，这个变量的意思也就是你训练完保存的权重文件，博主这里依旧选择best.pt。
5. 然后，来到def process函数，观察到有这一行代码： cam_image = show_cam_on_image(img_ir, grayscale_cam, use_rgb=True)。意思是，热力图的可视化是绘制在img_ir，也就是红外图像上。如果你想要绘制在对应的可见光图像上，只需要修改img_ir为img_rgb即可。

# 多模态水平边框热力图
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
import torch, cv2, os, shutil
import numpy as np
np.random.seed(0)
from tqdm import trange
from PIL import Image
from ultralytics.nn.tasks import attempt_load_weights
from ultralytics.utils.ops import xywh2xyxy, non_max_suppression
from pytorch_grad_cam import GradCAMPlusPlus
from pytorch_grad_cam.utils.image import show_cam_on_image, scale_cam_image
from pytorch_grad_cam.activations_and_gradients import ActivationsAndGradients


def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
    # Resize and pad image while meeting stride-multiple constraints
    shape = im.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:  # only scale down, do not scale up (for better val mAP)
        r = min(r, 1.0)

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
    if auto:  # minimum rectangle
        dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding
    elif scaleFill:  # stretch
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return im, ratio, (dw, dh)


class ActivationsAndGradients:
    """ Class for extracting activations and
    registering gradients from targetted intermediate layers """

    def __init__(self, model, target_layers, reshape_transform):
        self.model = model
        self.gradients = []
        self.activations = []
        self.reshape_transform = reshape_transform
        self.handles = []
        for target_layer in target_layers:
            self.handles.append(
                target_layer.register_forward_hook(self.save_activation))
            # Because of https://github.com/pytorch/pytorch/issues/61519,
            # we don't use backward hook to record gradients.
            self.handles.append(
                target_layer.register_forward_hook(self.save_gradient))

    def save_activation(self, module, input, output):
        activation = output

        if self.reshape_transform is not None:
            activation = self.reshape_transform(activation)
        self.activations.append(activation.cpu().detach())

    def save_gradient(self, module, input, output):
        if not hasattr(output, "requires_grad") or not output.requires_grad:
            # You can only register hooks on tensor requires grad.
            return

        # Gradients are computed in reverse order
        def _store_grad(grad):
            if self.reshape_transform is not None:
                grad = self.reshape_transform(grad)
            self.gradients = [grad.cpu().detach()] + self.gradients

        output.register_hook(_store_grad)

    def post_process(self, result):
        logits_ = result[:, 4:]
        boxes_ = result[:, :4]
        sorted, indices = torch.sort(logits_.max(1)[0], descending=True)
        return torch.transpose(logits_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(boxes_[0], dim0=0, dim1=1)[
            indices[0]], xywh2xyxy(torch.transpose(boxes_[0], dim0=0, dim1=1)[indices[0]]).cpu().detach().numpy()

    def __call__(self, x):
        self.gradients = []
        self.activations = []
        model_output = self.model(x)
        post_result, pre_post_boxes, post_boxes = self.post_process(model_output[0])
        return [[post_result, pre_post_boxes]]

    def release(self):
        for handle in self.handles:
            handle.remove()


class yolov8_target(torch.nn.Module):
    def __init__(self, ouput_type, conf, ratio) -> None:
        super().__init__()
        self.ouput_type = ouput_type
        self.conf = conf
        self.ratio = ratio

    def forward(self, data):
        post_result, pre_post_boxes = data
        result = []
        for i in trange(int(post_result.size(0) * self.ratio)):
            if float(post_result[i].max()) < self.conf:
                break
            if self.ouput_type == 'class' or self.ouput_type == 'all':
                result.append(post_result[i].max())
            elif self.ouput_type == 'box' or self.ouput_type == 'all':
                for j in range(4):
                    result.append(pre_post_boxes[i, j])
        return sum(result)


class yolov8_heatmap:
    def __init__(self, weight, device, method, layer, backward_type, conf_threshold, ratio, show_box, renormalize):
        device = torch.device(device)
        ckpt = torch.load(weight)
        model_names = ckpt['model'].names
        model = attempt_load_weights(weight, device)
        model.info()
        for p in model.parameters():
            p.requires_grad_(True)
        model.eval()

        target = yolov8_target(backward_type, conf_threshold, ratio)

        target_layers = [model.model[l] for l in layer]

        print(target_layers)
        method = eval(method)(model, target_layers, use_cuda=device.type == 'cuda')
        method.activations_and_grads = ActivationsAndGradients(model, target_layers, None)

        # colors = np.random.uniform(0, 255, size=(len(model_names), 3)).astype(np.int64)
        colors = [(0, 0, 255) for i in range(len(model_names))]
        self.__dict__.update(locals())

    def post_process(self, result):
        result = non_max_suppression(result, conf_thres=self.conf_threshold, iou_thres=0.65)[0]
        return result

    def draw_detections(self, box, color, name, img):

        xmin, ymin, xmax, ymax = list(map(int, list(box)))
        cv2.rectangle(img, (xmin, ymin), (xmax, ymax), tuple(int(x) for x in color), 2)
        cv2.putText(img, str(name), (xmin - 10, ymin - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.4, tuple(int(x) for x in color),
                    1, lineType=cv2.LINE_AA)
        return img

    def renormalize_cam_in_bounding_boxes(self, boxes, image_float_np, grayscale_cam):
        """Normalize the CAM to be in the range [0, 1] 
        inside every bounding boxes, and zero outside of the bounding boxes. """
        renormalized_cam = np.zeros(grayscale_cam.shape, dtype=np.float32)
        for x1, y1, x2, y2 in boxes:
            x1, y1 = max(x1, 0), max(y1, 0)
            x2, y2 = min(grayscale_cam.shape[1] - 1, x2), min(grayscale_cam.shape[0] - 1, y2)
            renormalized_cam[y1:y2, x1:x2] = scale_cam_image(grayscale_cam[y1:y2, x1:x2].copy())
        renormalized_cam = scale_cam_image(grayscale_cam.copy())
        renormalized_cam = scale_cam_image(renormalized_cam)
        eigencam_image_renormalized = show_cam_on_image(image_float_np, renormalized_cam, use_rgb=True)
        return eigencam_image_renormalized

    def preprocess_image(self,image_path):
        """
        预处理单张图片：读取、调整大小、颜色空间转换和归一化。
        """
        image = cv2.imread(image_path)
        image = letterbox(image)[0]
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        image = np.float32(image) / 255.0
        return image

    def process(self, img_path, imgir_path, save_path):
        # 预处理RGB图像
        img_rgb = self.preprocess_image(img_path)

        # 预处理IR图像
        img_ir = self.preprocess_image(imgir_path)

        # 拼接RGB和IR图像
        img = np.concatenate((img_rgb, img_ir), axis=2)

        # 转换为PyTorch张量
        tensor = torch.from_numpy(img.transpose(2, 0, 1)).unsqueeze(0).to(self.device)

        # 模型推理与CAM生成
        try:
            grayscale_cam = self.method(tensor, [self.target])
        except AttributeError as e:
            return
        grayscale_cam = grayscale_cam[0, :]
        # 生成热力图
        cam_image = show_cam_on_image(img_ir, grayscale_cam, use_rgb=True)

        # 绘制检测框
        if self.show_box:
            pred = self.model(tensor)[0]
            pred = self.post_process(pred)
            for data in pred:
                data = data.cpu().detach().numpy()
                cam_image = self.draw_detections(
                    data[:4],
                    self.colors[int(data[4:].argmax())],
                    f'{self.model_names[int(data[4:].argmax())]}', #  {float(data[4:].max()):.2f}
                    cam_image
                )

        # 保存结果
        cam_image = Image.fromarray(cam_image)
        cam_image.save(save_path)

    def __call__(self, img_path, imgir_path, save_path):
        # 删除已存在的保存路径（避免历史结果干扰）
        if os.path.exists(save_path):
            shutil.rmtree(save_path)
        os.makedirs(save_path, exist_ok=True)

        # 处理整个文件夹
        if os.path.isdir(img_path) and os.path.isdir(imgir_path):
            for filename in os.listdir(img_path):
                # 检查文件是否为图片
                if filename.lower().endswith(('.jpg', '.png', '.jpeg')):
                    rgb_file = os.path.join(img_path, filename)
                    ir_file = os.path.join(imgir_path, filename)

                    # 确保 IR 文件存在
                    if not os.path.exists(ir_file):
                        print(f"IR 文件 {filename} 不存在，跳过处理")
                        continue

                    # 生成保存路径（保持原文件名）
                    save_file = os.path.join(save_path, filename)
                    self.process(rgb_file, ir_file, save_file)

        # 处理单张图像对
        else:
            # 生成保存路径（避免覆盖，使用输入文件名）
            base_name = os.path.basename(img_path)
            save_file = os.path.join(save_path, base_name)
            self.process(img_path, imgir_path, save_file)


def get_params():
    params = {
        'weight': '/root/autodl-tmp/project/TwoStream_Yolov8-main/runs/detect/train/weights/best.pt',
        # 现在只需要指定权重即可,不需要指定cfg
        'device': 'cuda:0',
        'method': 'GradCAMPlusPlus',
        # GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM
        'layer': [7],
        'backward_type': 'all',  # class, box, all
        'conf_threshold': 0.2,  # 0.2
        'ratio': 0.02,  # 0.02-0.1
        'show_box': True,
        'renormalize': True
    }
    return params


if __name__ == '__main__':
    model = yolov8_heatmap(**get_params())

    path = "/root/autodl-tmp/data/DroneVehicle"
    # 定义输入和输出路径
    path_rgb = os.path.join(path, 'imgRGB/test/images/')
    path_ir = os.path.join(path, 'imgIR/test/images/')
    save_dir = os.path.join(path, 'result_heapMap')  # 结果保存目录

    # 调用模型处理整个文件夹
    model(path_rgb, path_ir, save_dir)