DataWhale AI夏令营 “大运河杯”——城市治理进阶思路-CSDN博客

本文链接：https://blog.csdn.net/qq_51309289/article/details/141683944

上次运行Baseline后，效果不是很理想，发现MAP50和MAP（50-95）的数值较低，训练多轮之后精度始终无法上升，导致最终结果分数不是很理想，见下图。

经过学习与思考后，目前将从两个方面对模型精度进行提升。

一、增加训练和验证的数据集

Baseline代码是将5条视频作为训练集，3条数据作为验证集，事实上用于训练和验证的数据集不止于此，运行

len(train_videos)

可以查看videos列表里所有的视频数量，一般将训练集、验证集近似划分为2:8送入模型训练较为合理，所以我将40条视频作为训练集，12条视频作为验证集。

for anno_path, video_path in zip(train_annos[:40], train_videos[:40]):
    print(video_path)
    anno_df = pd.read_json(anno_path)
    cap = cv2.VideoCapture(video_path)
    frame_idx = 0 
    while True:
        ret, frame = cap.read()
        if not ret:
            break

        img_height, img_width = frame.shape[:2]
        
        frame_anno = anno_df[anno_df['frame_id'] == frame_idx]
        cv2.imwrite('./yolo-dataset/train/' + anno_path.split('/')[-1][:-5] + '_' + str(frame_idx) + '.jpg', frame)

        if len(frame_anno) != 0:
            with open('./yolo-dataset/train/' + anno_path.split('/')[-1][:-5] + '_' + str(frame_idx) + '.txt', 'w') as up:
                for category, bbox in zip(frame_anno['category'].values, frame_anno['bbox'].values):
                    category_idx = category_labels.index(category)
                    
                    x_min, y_min, x_max, y_max = bbox
                    x_center = (x_min + x_max) / 2 / img_width
                    y_center = (y_min + y_max) / 2 / img_height
                    width = (x_max - x_min) / img_width
                    height = (y_max - y_min) / img_height

                    if x_center > 1:
                        print(bbox)
                    up.write(f'{category_idx} {x_center} {y_center} {width} {height}\n')
        
        frame_idx += 1

for anno_path, video_path in zip(train_annos[-12:], train_videos[-12:]):
    print(video_path)
    anno_df = pd.read_json(anno_path)
    cap = cv2.VideoCapture(video_path)
    frame_idx = 0 
    while True:
        ret, frame = cap.read()
        if not ret:
            break

        img_height, img_width = frame.shape[:2]
        
        frame_anno = anno_df[anno_df['frame_id'] == frame_idx]
        cv2.imwrite('./yolo-dataset/val/' + anno_path.split('/')[-1][:-5] + '_' + str(frame_idx) + '.jpg', frame)

        if len(frame_anno) != 0:
            with open('./yolo-dataset/val/' + anno_path.split('/')[-1][:-5] + '_' + str(frame_idx) + '.txt', 'w') as up:
                for category, bbox in zip(frame_anno['category'].values, frame_anno['bbox'].values):
                    category_idx = category_labels.index(category)
                    
                    x_min, y_min, x_max, y_max = bbox
                    x_center = (x_min + x_max) / 2 / img_width
                    y_center = (y_min + y_max) / 2 / img_height
                    width = (x_max - x_min) / img_width
                    height = (y_max - y_min) / img_height

                    up.write(f'{category_idx} {x_center} {y_center} {width} {height}\n')
        
        frame_idx += 1

需要注意的是，增加数据集训练需要更多的内存，建议至少有70G的空间存放处理为视频帧后的数据，如果在GPU云平台运行时，建议将数据放在/data目录下（厚德云）或者hy-tmp目录下（恒源云），以免内存爆炸。

二、切换不同的预训练权重

yolo模型针对不同训练场景，训练环境有着多种的Detection预训练权重，包括YOLO(Nano)、YOLO8s(Small)、YOLOv8m(Medium)、YOLOv8I(Integrate)和LOYOv8x(Extra)五种。

Size(pixels)表示模型处理图像的分辨率大小。
MAP (50-95)表示模型的平均精度均值，在50%至95%的置信度区间内。这是一个衡量模型在目标检测任务上准确性的指标，数值越高表示模型的检测准确性越好。
Speed CPU ONNX (ms)表示使用ONNX格式在CPU上运行模型时，处理一张图像所需的平均时间（毫秒），数值越低表示模型运行速度越快。
Speed A100 TensorRT (ms)表示使用NVIDIA A100 GPU和TensorRT深度学习优化器时，处理一张图像所需的平均时间（毫秒）。
Params (M)表示模型的参数数量，以百万（Mega）为单位。参数数量越多，模型可能越复杂，但也可能需要更多的数据和计算资源。
FLOPs (B)表示模型执行的浮点运算次数，以十亿（Billion）为单位。这是衡量模型计算复杂度的一个指标，数值越高表示模型需要更多的计算资源。

可以选择适合的YOLOv8模型进行图像检测任务时，根据自己的想法综合考虑任务对精度和速度的需求、硬件资源、模型大小和计算复杂度、成本预算以及应用场景。在不断实验和评估不同模型在数据集上的表现后，选择一个既满足性能要求又适合预算和资源限制的模型。

在训练时只需要根据自己的需求下载对应的权重，在模型训练时初始化对应的权重即可，例如

!wget -q http://mirror.coggle.club/yolo/yolov8n-v8.2.0.pt -O yolov8n.pt
!wget -q http://mirror.coggle.club/yolo/yolov8s-v8.2.0.pt -O yolov8s.pt
!wget -q http://mirror.coggle.club/yolo/yolov8m-v8.2.0.pt -O yolov8m.pt

!mkdir -p ~/.config/Ultralytics/
!wget -q http://mirror.coggle.club/yolo/Arial.ttf -O ~/.config/Ultralytics/Arial.ttf
!wget -q http://mirror.coggle.club/yolo/Arial.Unicode.ttf -O ~/.config/Ultralytics/Arial.Unicode.ttf

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

import warnings
warnings.filterwarnings('ignore')

from ultralytics import YOLO
model = YOLO("yolov8m.pt")
results = model.train(data="fan-yolo-dataset/yolo.yaml", epochs=30, batch=16)