背景
YOLOv8对代码实现了高度的封装,如进行推理只需运行:
from ultralytics import YOLO
data_path = r"/path/to/data/*.mp4"
model = YOLO(task="detect", model='/path/to/model/*.pt')
results = model.predict(source=data_path, show=True, save=False, stream=True)
通过对模型类的高度封装,只需调用 model.predict() 即可实现推理。然而,实际工程中,一个模型可能只检测部分类别(如‘人’、‘车’等常见类别,用官方预训练模型即可,可以减少人工标注和重新训练的成本)。
YOLOv8官方代码的最终推理结果通过先调用YOLO类,再调用Predictor类获取,这在简单场景中较为方便。但在实际场景中,一张图片可能需要经过几个检测网络,不同网络的配置还会有差异,用前述方法获取结果反而变得麻烦繁琐。因此,需要解耦推理类,用其直接加载模型,并在不同模型获取部分结果后,融合多个模型的推理结果。
1.修改思路
改进点1:重写predict.py中推理类。
改进点2:矫正不同模型的预测结果。
2.涉及修改相关的文件
预测类基类文件:ultralytics/engine/predictor.py
预测类子类文件:ultralytics/task_bank/predict.py 【根据任务新建文件夹和predict.py】
预测的配置文件:ultralytics/cfg/bank_monitor/detect_predict.yaml 【默认配置选择部分参数】
3.新建预测配置文件
从ultralytics/cfg/default.yaml选择只与预测相关的参数,并重新设置(用官方默认配置,大部分都不需要改)。保存到:ultralytics/cfg/bank_monitor/detect_predict.yaml
model和source空出,后续用预测类的overrides参数传入。
# 需要修改的参数
task: "detect"
mode: "predict"
model:
source: # (str, optional) source directory for images or videos
batch: 16 # (int) number of images per batch (-1 for AutoBatch)
conf: 0.25 # (float, optional) object confidence threshold for detection (default 0.25 predict, 0.001 val)
iou: 0.7 # (float) intersection over union (IoU) threshold for NMS
data: # (str, optional) path to data file, i.e. coco8.yaml
vid_stride: 1 # (int) video frame-rate stride
verbose: True # (bool) whether to print verbose output,控制台是否打印信息
# Result save settings -------------------------------------------------------------------------------------------------
save: False # (bool) save predict results
project: # (str, optional) project name. 为空保存到 test/tmp/runs/task 文件夹下
name: # (str, optional) experiment name, results saved to 'project/name' directory
exist_ok: False # (bool) whether to overwrite existing experiment
save_dir: # 保存的路径,为空就按默认设置,保存在 run/project/name/task/mode_num 文件夹下
save_frames: False # (bool) save predicted individual video frames
save_txt: False # (bool) save results as .txt file
save_conf: False # (bool) save results with confidence scores
save_crop: False # (bool) save cropped images with results
# Visualize settings ---------------------------------------------------------------------------------------------------
show: False # (bool) show predicted images and videos if environment allows
show_labels: True # (bool) show prediction labels, i.e. 'person'
show_conf: True # (bool) show prediction confidence, i.e. '0.99'
show_boxes: True # (bool) show prediction boxes
line_width: # (int, optional) line width of the bounding boxes. Scaled to image size if None.
# 可能修改的参数
classes: # (int | list[int], optional) filter results by class, i.e. classes=0, or classes=[0,2,3]
imgsz: 640 # (int | list) input images size as int for train and val modes, or list[w,h] for predict and export modes
device: 0 # (int | str | list, optional) device to run on, i.e. cuda device=0 or device=0,1,2,3 or device=cpu
stream_buffer: False # (bool) buffer all streaming frames (True) or return the most recent frame (False)
dnn: False # (bool) use OpenCV DNN for ONNX inference
half: False # (bool) use half precision (FP16)
# 基本不用修改的参数
visualize: False # (bool) visualize model features,用于可视化模型推理中每一层的特征
augment: False # (bool) apply image augmentation to prediction sources
embed: # (list[int], optional) return feature vectors/embeddings from given layers
agnostic_nms: False # (bool) class-agnostic NMS,False考虑检测框的类别进行极大值抑制(True会导致几类同框只选最大那个框)
max_det: 300 # (int) maximum number of detections per image
crop_fraction: 1.0 # (float) image crop fraction for classification (0.1-1), 1.0 means no crop, must be greater than 0.
retina_masks: False # (bool) use high-resolution segmentation masks
4.重写推理子类
预测基类 ultralytics/engine/predictor.py,原预测子类 ultralytics/models/yolo/detect/predict.py
由上述两部分代码可知:
数据读取--数据前处理--模型加载--模型推理--推理结果后处理,整个基本流程是完整的。只需重新给定默认读取配置文件地址即可。
预测子类文件:ultralytics/task_bank/predict.py 中,只需修改DEFAULT_CFG。
from ultralytics.engine.predictor import BasePredictor
from ultralytics.engine.results import Results
from ultralytics.utils import ops
from ultralytics.task_bank import DEFAULT_CFG # 需要自己添加
class BankDetectionPredictor(BasePredictor):
"""
A class extending the BasePredictor class for prediction based on a detection model.
Example:
```python
from ultralytics.utils import ASSETS
from ultralytics.models.yolo.detect import DetectionPredictor
args = dict(model='yolov8n.pt', source=ASSETS)
predictor = DetectionPredictor(overrides=args)
predictor.predict_cli()
```
"""
def __init__(self, cfg=DEFAULT_CFG, overrides=None, _callbacks=None):
"""
Initializes the CustomPredictor class with a custom configuration path.
Args:
cfg (str, optional): Path to a configuration file. Defaults to '/path/to/custom/config.yaml'.
overrides (dict, optional): Configuration overrides. Defaults to None.
"""
super().__init__(cfg=cfg, overrides=overrides, _callbacks=_callbacks)
def postprocess(self, preds, img, orig_imgs): # 原预测子类类复制即可,不需要修改
"""Post-processes predictions and returns a list of Results objects."""
preds = ops.non_max_suppression(
preds,
self.args.conf,
self.args.iou,
agnostic=self.args.agnostic_nms,
max_det=self.args.max_det,
classes=self.args.classes,
)
if not isinstance(orig_imgs, list): # input images are a torch.Tensor, not a list
orig_imgs = ops.convert_torch2numpy_batch(orig_imgs)
results = []
for i, pred in enumerate(preds):
orig_img = orig_imgs[i]
pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], orig_img.shape)
img_path = self.batch[0][i]
results.append(Results(orig_img, path=img_path, names=self.model.names, boxes=pred))
return results
DEFAULT_CFG根据原文件 ultralytics/utils/__init__.py 修改得到。
重写的文件保存于:ultralytics/task_bank/__init__.py
from ultralytics.utils import yaml_load, IterableSimpleNamespace
from pathlib import Path
# Default configuration
FILE = Path(__file__).resolve()
ROOT = FILE.parents[1] # YOLO
BANK_DEFAULT_CFG_PATH = ROOT / "cfg/bank_monitor/detect_predict.yaml"
DEFAULT_CFG_DICT = yaml_load(BANK_DEFAULT_CFG_PATH)
for k, v in DEFAULT_CFG_DICT.items():
if isinstance(v, str) and v.lower() == "none":
DEFAULT_CFG_DICT[k] = None
DEFAULT_CFG = IterableSimpleNamespace(**DEFAULT_CFG_DICT)
5.利用参数overrides自定义加载模型
overrides需要传入一个字典,主要修改model获取模型路径,和classes获取需要的类比。返回结果是res,第i条数据的标签名称在res[i].name中,坐标、标签和置信度在对象res[i].boxes中。
from ultralytics.task_bank.predict import BankDetectionPredictor
import torch
overrides_1 = {"task": "detect",
"mode": "predict",
"model": r'path/to/weight1/yolov8m.pt', # 模型1:官方模型
"verbose": False, # 控制太不打印信息
"classes": [0] # COCO预训练模型有80类,只选取0类别(person)
}
overrides_2 = {"task": "detect",
"mode": "predict",
"model": r'path/to/weight2/best.pt', # 模型2:自训练模型
"verbose": False,
"classes": [0, 1, 2, 3] # 自训练模型,省略则给出全部类的预测结果
}
predictor_1 = BankDetectionPredictor(overrides=overrides_1) # 加载模型
predictor_2 = BankDetectionPredictor(overrides=overrides_2)
predictors = [predictor_1, predictor_2]
img_path = r'path/to/img/img_00000.jpg'
p_1 = predictor_1(source=img_path)[0] # 返回结果是列表,多张图片则依次遍历
p_2 = predictor_2(source=img_path)[0]
"""
返回结果对象包含的内容:
boxes: ultralytics.engine.results.Boxes object # 目标框
keypoints: None # 没用
masks: None # 没用
names: {0: 'person',..., 79: 'toothbrush'} # 类别标签
obb: None # 没用
orig_img: array([[[...]]], dtype=uint8) # 原始图片
orig_shape: (586, 1216) # 原图大小
path: 'path/to/img/img_00000.jpg' # 原图保存路径
probs: None # 没用
save_dir: None # 保存路径,没设置就没保存
speed: {'preprocess': , 'inference': , 'postprocess': } # 每阶段时间(ms)
返回的boxes对象包含的内容:
cls: tensor([2.,...], device='cuda:0')
conf: tensor([0.9186,...], device='cuda:0')
data: tensor([[7.3435e+02,...]], device='cuda:0') # xyxy, confs, cls
id: None
is_track: False
orig_shape: (586, 1216)
shape: torch.Size([7, 6])
xywh: tensor([[816.2213, ...]], device='cuda:0')
xywhn: tensor([[0.6712, ...]], device='cuda:0')
xyxy: tensor([[ 734.3535, ...]], device='cuda:0')
xyxyn: tensor([[0.6039,...]], device='cuda:0')
"""
6.修改标签,融合多模型预测结果
修改原因:不同模型的标签一般均从0开始,如官方COCO预训练模型0号标签为peroson,自定义训练的模型也存在0号标签,所以会产生冲突,因此希望重排标签:
# 官方预训练模型所用的标签id和名称
names:
0: person
1: bicycle
2: car
...
79: toothbrush
# 自己训练的模型所用的标签id和名称
names:
0: ycj
1: kx
2: kx_dk
3: money
# 希望结合不同模型后,对应的标签id和名称
class_name:
0: ycj
1: kx
2: kx_dk
3: money
4: person
修改思路:模型的类别标签字典:{int: str},目标的类别标签字典:{int_new, str},模型的预测结果:[int]。所以,只需根据预测结果int找到原模型类别标签字典key对应的str,然后在目标类别标签字典中找到相同的str,再把预测中的int替换为int_new。
class_name_num_str = { # 这部分实际从配置文件中读取为字典
0: 'ycj',
1: 'kx',
2: 'kx_dk',
3: 'money',
4: 'person'
}
def transform_and_concat_tensors(tensor_list, k1_v1_dict_list, k2_v2_dict):
def transform_tensor(tensor, k1_v1_dict):
original_dtype = tensor.dtype # 获取输入tensor的数据类型
original_device = tensor.device # 获取输入tensor的device
v1_to_k2 = {v: k for k, v in k2_v2_dict.items()} # 创建 v1 到 k2 的映射字典
transformed_list = [] # 创建一个新的列表来存储转换后的值
for value in tensor:
v1 = k1_v1_dict[value.item()] # 获取原模型的
if v1 not in v1_to_k2: # 合并后的类别标签不存在
raise ValueError(f"label [{v1}] from model not found in new labels.")
k2 = v1_to_k2[v1]
transformed_list.append(k2)
transformed_tensor = torch.tensor(transformed_list, dtype=original_dtype, device=original_device)
return transformed_tensor
transformed_tensors = [transform_tensor(tensor, k1_v1_dict) for tensor, k1_v1_dict in
zip(tensor_list, k1_v1_dict_list)] # 对每个 tensor 进行转换
result_tensor = torch.cat(transformed_tensors, dim=0) # 将转换后的 tensor 按 dim=1 进行连接
return result_tensor
tensor_list = [p_1.boxes.cls, p_2.boxes.cls] # 不同模型的张量列表
k1_v2_dict_list = [p_1.names, p_2.names] # 不同模型的label列表
res = transform_and_concat_tensors(tensor_list, k1_v2_dict_list, class_name_num_str)
print(res)
修改后的效果(原官方预训练模型的预测结果的[类别0]修改成了新的[类别4]):