yolov8训练自己的数据集简单无脑版本

徐凤年20040902

已于 2024-09-17 15:03:30 修改

阅读量3.9k

点赞数 29

文章标签： yolov8

于 2024-09-17 14:56:32 首次发布

本文链接：https://blog.csdn.net/2301_80191741/article/details/142314745

版权

一、环境搭建

一、yolov8文件下载

1.下载yolov8源文件

https://github.com/wudashuo/yolov8

注：此链接下载的文件带环境配置文件，最新v8文件取消了配置文件需要自己单独下载或配置

2.下载权重yolov8.pt

yolov8分为不同作用的权重：

目标检测（Object Detection）：选择 yolov8n.pt, yolov8s.pt 等用于目标检测的权重。

图像分割（Segmentation）：选择 yolov8n-seg.pt, yolov8s-seg.pt 等，用于图像分割的任务。

图像分类（Classification）：选择 yolov8n-cls.pt, yolov8s-cls.pt 等，用于图像分类任务。

yolov8有五个不同大小模型从小到大为：YOLOv8n、YOLOv8s、YOLOv8m、YOLOv8l、YOLOv8x

下载可以打开yolov8-main文件夹下的README.zh-CN.md文件，找到模型标题，找到带有超链接的蓝色模型名称字样，点击下载。

可以修改下方链接最后一个文件名称来进行下载

点击直接下载： yolov8x.pt

二、环境配置(可看源文件里的README.zh-CN.md文件)

1.anaconda创建环境(环境要求 Python>=3.7)

conda create -n yolov8 python==3.10

2.切换环境并到目标根目录

(base) C:\Users\wwd>conda activate yolov8
(yolo) C:\Users\wwd>E:
(yolo) E:\>cd yolo/yolov8-main
(yolo) E:\yolo\yolov8-main>

3.使用清华源安装环境

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install ultralytics -i https://pypi.tuna.tsinghua.edu.cn/simple

4.安装toch GPU版本

检查cuda版本最好不要低于11.8，高版本没事

详细教程可看微信群文件(安装Anaconda、PyTorch（GPU版）库.pdf)

以下链接为pip安装toch代码，复制链接打开网页 ctrl+f 打开搜索torch==2.0.0+cu118，复制下载代码，环境运行代码（预计下载时间两个小时）

https://pytorch.org/get-started/previous-versions/

5.使用pycharm打开yolov8-main文件夹并将环境yolov8导入

二、电脑设置(如果训练数据量大)

一、增加页面文件（虚拟内存）大小

通过调整虚拟内存（页面文件）的大小，可以解决该问题。步骤如下：

右键“此电脑”（或“我的电脑”），选择“属性”。
点击 “高级系统设置”。
在 “性能” 部分，点击 “设置”。
在弹出的窗口中，选择 “高级” 标签页。
在 “虚拟内存” 部分，点击 “更改”。
取消勾选 “自动管理所有驱动器的分页文件大小”。
选择你的 系统驱动器（通常是C盘），然后选择 “自定义大小”。
将 初始大小 和 最大大小 设置为较大的值，建议初始大小为物理内存的 1.5 倍，最大大小为物理内存的 3 倍。

例如，如果你的物理内存是 16GB，建议将初始大小设置为 24000MB（24GB），最大大小为 48000MB（48GB）。

设置完成后，点击 “确定”，重启电脑使更改生效。

三、数据结构

一、数据结构

将处理好的文件放入yolov8-main文件夹下

dataset/  
├── images/           # 包含所有训练和验证的图像
│   ├── train/        # 训练集图像
│   └── val/          # 验证集图像
├── labels/           # 包含与图像对应的标签文件
│   ├── train/        # 训练集标签
│   └── val/          # 验证集标签

图像文件：JPEG/PNG 格式。

标签文件：每个图像都有一个对应的 .txt 文件，标签文件的内容包括目标类别和边界框坐标。标签文件格式为：

class_id x_center y_center width height

其中，x_center, y_center, width, height 的值是相对于图像宽高归一化到 [0, 1] 之间的数值。

二、数据处理

归一化标签：

import os
# 标签目录和图像目录
label_dir = 'dataset/labels/train'  # 未归一化的标签文件目录
image_dir = 'dataset/images/train'  # 图像文件目录
output_dir = 'dataset/labels/trains'    # 输出归一化标签的目录
# 获取图像的宽度和高度
import cv2
def get_image_size(image_path):
    image = cv2.imread(image_path)
    return image.shape[1], image.shape[0]  # 返回宽度和高度
# 将标签文件进行归一化
def normalize_label(label_path, image_path, output_path):
    image_width, image_height = get_image_size(image_path)
    with open(label_path, 'r') as f:
        lines = f.readlines()
    normalized_lines = []
    for line in lines:
        data = line.strip().split()
        class_id, xmin, ymin, xmax, ymax = map(float, data)
        # 计算中心点和宽高，并归一化
        x_center = (xmin + xmax) / 2 / image_width
        y_center = (ymin + ymax) / 2 / image_height
        width = (xmax - xmin) / image_width
        height = (ymax - ymin) / image_height
        # 格式化为YOLO格式
        normalized_lines.append(f"{int(class_id)} {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}\n")
    # 写入新的归一化标签文件
    with open(output_path, 'w') as f:
        f.writelines(normalized_lines)
# 批量处理所有标签文件
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
for label_file in os.listdir(label_dir):
    if label_file.endswith('.txt'):  # 假设标签文件为 .txt
        label_path = os.path.join(label_dir, label_file)
        image_path = os.path.join(image_dir, label_file.replace('.txt', '.jpg'))  # 假设图像文件为 .jpg
        output_path = os.path.join(output_dir, label_file)
        if os.path.exists(image_path):
            normalize_label(label_path, image_path, output_path)
            print(f"Processed: {label_file}")
        else:
            print(f"Image not found for {label_file}")

四、编写python代码

一、配置文件

在数据文件夹dataset中新建文件dataset.yaml并设置：

train: E:/yolo/yolov8-main/dataset/images/train  # 训练集路径
val: E:/yolo/yolov8-main/dataset/images/val      # 验证集路径

nc: 9 # 类别数
# 类别名称，索引对应编码
names: ['bus','traffic light','traffic sign','person','bike','truck','motor','car','rider']

将下载好的权重放入：ultralytics/yolo/v8/detect/yolov8x.pt

二、训练代码

在yolov8-main文件夹中新建python文件编写代码：

自定义训练参数可参考文件：yolov8-main\ultralytics\yolo\cfg\default.yaml

if __name__ == '__main__':

    from ultralytics import YOLO

    # 加载YOLOv8模型（选择一个预训练模型如 yolov8n.pt, yolov8s.pt, yolov8m.pt, yolov8l.pt, yolov8x.pt）
    model = YOLO('ultralytics/yolo/v8/detect/yolov8x.pt')  # 你可以选择其他权重文件

    # 训练模型
    # 自定义训练参数
    model.train(
        data='dataset/dataset.yaml',  # 数据集路径
        epochs=100,                   # 设置为 200 轮训练
        imgsz=720,                    # 图像大小 512x512
        batch=16,                     # 批次大小为16,显存不大不建议数值设置较大
        workers=4,                    # 数据加载的工作线程数量
        optimizer='AdamW',             # 使用AdamW优化器
        lr0=0.0001,                    # 初始学习率
        weight_decay=0.0001,          # 权重衰减系数
        patience=100,                 # 设置更高的耐心以防止过早停止
        device='0'                 # 使用GPU训练
    )
    print("训练完成！")

三、训练过程显示指标解释

1.开始训练参数：

Ultralytics YOLOv8.0.49  Python-3.10.0 torch-2.0.0+cu118 CUDA:0 (NVIDIA GeForce RTX 4080, 16376MiB)
yolo\engine\trainer: task=detect, mode=train, model=ultralytics/yolo/v8/detect/yolov8x.pt, data=dataset/dataset.yaml, epochs=100, patience=100, batch=16, imgsz=720…………

Ultralytics YOLOv8.2.92：你使用的 YOLOv8 版本。

Python-3.10.0：你使用的 Python 版本。

torch-2.0.0+cu118：PyTorch 版本，且使用的是 CPU。

CUDA:0 (NVIDIA GeForce RTX 4080, 16376MiB)：你的 GPU 型号。

2.训练过程参数：

Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     11/100         0G       1.43      1.161      1.015        126        640: 100%|██████████| 32/32 [01:04<00:00,  2.02s/it]

Epoch：当前的训练轮次。

GPU_mem：GPU 的显存占用。

box_loss：边界框损失 (Bounding Box Loss)，表示预测的边界框与真实框之间的误差。数值越低，表示模型的预测框与实际目标框越接近。

cls_loss：分类损失 (Classification Loss)，表示模型在目标分类上与真实类别的差距。该值越低，模型的分类越准确。

dfl_loss：分布式焦点损失 (Distillation-Focal Loss)，这是 YOLOv8 中的新损失项，用于更好地处理边界框回归，提高模型的精度。

Instances：表示当前批次中图像中的目标数量。这里显示有 126 个实例对象。

Size：图像尺寸

3.测试过程参数：

 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:06<00:00,  1.07it/s]
                   all        200       3916      0.389      0.216      0.206      0.111

Class：表示类别。

Images：验证集中使用的图像数量。这里你正在对 200 张图像进行验证。

Instances：这些图像中包含的目标数量。验证集中总共有 3916 个目标。

Box§：边界框的精度 (Precision, P)，表示模型预测的边界框中，真正的正样本比例。值越接近 1，表示误报越少。当前精度为 0.389，即约 38.9% 的预测是正确的。

R：召回率 (Recall, R)，表示真实目标中被模型正确预测的比例。值越接近 1，召回率越高。此处的召回率为 0.216，即约 21.6% 的真实目标被正确检测到。

mAP50：在 IoU 阈值为 0.5 时的平均精度 (Mean Average Precision, mAP)。这个值表示边界框的平均精度，越高越好。当前的 mAP50 值为 0.206，即约 20.6%。

mAP50-95：在 IoU 阈值从 0.5 到 0.95 的不同值时，取平均的 mAP 值。这个值衡量了模型在更高 IoU 阈值下的表现，当前值为 0.111，即约 11.1%。

四、测试代码

训练完成后会有一个runs文件夹里面有best.pt文件即训练后的权重

地址为：yolov8-main\runs\detect\train1\weights\best.pt

注：训练一次就会多出一个文件夹在yolov8-main\runs\detect\下面，如第一次训练会有train1第二次同级会有一个train2

   # 加载训练好的模型
    model = YOLO('E:\yolo\yolov8-main\runs\detect\train1\weights\best.pt')

    # 进行测试
    results = model.predict('path/to/your/test/images')# 测试集地址

    # 打印结果
    print(results.pandas().xyxy[0])  # 打印预测的结果

五、异常问题及解决

注：因错误代码较长在此只展示部分，可放入GPT来翻译因为什么原因再看此笔记解决方案或使用GPT解决方案

一、无法加载或损坏权重pt文件

错误代码：

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

1.下载正确的检测模型权重即为yolov8n.pt(不同大小模型看8后面字母)

2.查看加载权重的地址是否错误

3.检查文件下载是否完整

二、标签不符合要求(txt文件)

错误代码：

train: WARNING ⚠️ E:\yolo\DS\dataset\images\train\train_A_1539.jpg: ignoring corrupt image/label: non-normalized or out of bounds coordinates

1.检查标签文件txt文件是否进行归一，文件每一行为一个检测目标第一个数据为类别无需归一（批量归一方法查看三杠二数据处理）

2.检查文件坐标是否有错误，如坐标超出图片范围

三、找不到图片

错误代码：

WARNING ⚠️ No images found in E:\yolo\DS\dataset\labels\train.cache, training may not work correctly. See https://
docs.ultralytics.com/datasets for dataset formatting guidance.
Traceback (most recent call last):

1.检查数据结构是否跟三杠一的数据结构一样

2.图片文件名称和标签文件名称是否对应，图像文件：train_01.jpg标签文件：train_01.txt

3.配置文件错误，检查dataset文件夹下的dataset.yaml文件里的地址是否正确

4.清除.cache 文件：YOLOv8 会生成一个 .cache 文件在dataset/labels文件夹下，缓存数据集中的图像信息。如果你的数据集有变动（如新增图像、修改标签等），你可以删除 .cache 文件，然后重新运行训练命令

四、网络代理问题

错误代码：

urllib3.exceptions.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:997)

1.解除VPN

2.检查代理配置

注意：运行代码时候不能开VPN，运行中可以

五、警告torch.load函数

错误代码：

FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly.

1.降低pytorch版本(可使用2.0.0版本)：在未来的 PyTorch 版本中，weights_only 参数的默认值将会改为 True，这意味着 torch.load 将只允许加载模型权重而不允许执行任何额外的代码。这是为了提高安全性，避免潜在的代码执行风险。

六、警告`weights_only` 参数

错误代码：

TypeError: YOLO.__init__() got an unexpected keyword argument 'weights_only'

或

WeightsUnpickler error: Unsupported global: GLOBAL ultralytics.nn.tasks.DetectionModel was not an allowed global by default. Please use torch.serialization.add_safe_globals([DetectionModel]) to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.

1.不要设置weights参数（新版本torch设置此参数会出现很多错误）

七、torchvision::nms错误

错误代码：

otImplementedError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torchvision::nms' is only available for these backends: [CPU, Meta, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

1.torch版本不兼容问题，可查看安装Anaconda、PyTorch（GPU版）库.pdf文件来选择版本，torch版本最好不要超过2.0版本

八、没有找到验证集的标签文件

错误代码：

 raise FileNotFoundError(f'{self.prefix}No labels found in {cache_path}, can not start training. {HELP_URL}')
FileNotFoundError: val: No labels found in E:\yolo\yolov8-main\dataset\labels\val.cache, can not start training. See https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data

1.检查dataset/label/val文件夹下是否有完整的标签

九、windows进程错误

错误代码：

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

1.这个错误通常出现在使用多进程的代码中，尤其是在 Windows 系统上，而不像在 Unix 系统中那样使用 fork 启动新进程。在 Windows 系统上，代码中的子进程必须在 if __name__ == '__main__': 语句块内进行启动，确保主进程在启动子进程前完成必要的初始化。参考四杠二训练代码

十、CUDA显存不足问题

错误代码:

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.47 GiB. GPU 0 has a total capacity of 15.99 GiB of which 0 bytes is free. Of the allocated memory 26.81 GiB is allocated by PyTorch, and 1.90 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

1.设置参数减小批次大小（batch）可以有效减少每次训练时所需的显存。

2.手动清理显存

十一、虚拟内存过小

错误代码：

OSError: [WinError 1455] 页面文件太小，无法完成操作。 Error loading "D:\Anaconda\envs\yolo\lib\site-packages\torch\lib\cublas64_12.dll" or one of its dependencies.

1.这个错误提示是由于 页面文件（虚拟内存）太小，无法完成操作。尤其是在内存和显存使用较大的深度学习任务中，如果物理内存不够，Windows 会使用虚拟内存（页面文件）来补充。如果页面文件设置过小，系统就会报错。参考二、中的电脑设置

2.使用小模型如yolov8n.pt

yolov8训练自己的数据集简单无脑版本

一、环境搭建

一、yolov8文件下载

1.下载yolov8源文件

2.下载权重yolov8.pt

二、环境配置(可看源文件里的README.zh-CN.md文件)

二、电脑设置(如果训练数据量大)

一、 增加页面文件（虚拟内存）大小

三、数据结构

一、数据结构

二、数据处理

四、编写python代码

一、配置文件

二、训练代码

三、训练过程显示指标解释

1.开始训练参数：

2.训练过程参数：

3.测试过程参数：

四、测试代码

五、异常问题及解决

一、无法加载或损坏权重pt文件

二、标签不符合要求(txt文件)

三、找不到图片

四、网络代理问题

五、警告torch.load函数

六、警告weights_only 参数

七、torchvision::nms错误

八、没有找到验证集的标签文件

九、windows进程错误

十、CUDA显存不足问题

十一、虚拟内存过小

一、增加页面文件（虚拟内存）大小

六、警告`weights_only` 参数