YOLO算法训练数据集格式VOC转换YOLO详解

电阻电容及电线

已于 2023-10-20 23:34:36 修改

阅读量1.9k

点赞数 6

文章标签： YOLO 算法深度学习

于 2023-09-16 16:24:33 首次发布

本文链接：https://blog.csdn.net/qq_29633789/article/details/132919820

版权

YOLO算法训练数据集格式VOC转换YOLO详解

1、VOC格式文件和YOL格式文件介绍

使用YOLOV系列算法进行训练，需要将数据集格式由VOC格式转换为YOLO格式，方便进行训练。

VOC文件主要包含：图片名称、图片大小（高、宽、通道）、目标名称、标定框坐标位置。VOC格式文件详细内容如下：

VOC格式标签：图片实际宽和高，标注框的左上角和右下角坐标

YOLO文件主要包括：目标名称标签、标注框中心坐标、标注框的宽和高（数值全部为归一化的）。下图是与上图VOC格式转换后相对应的YOLO格式文件的详细内容：

YOLO格式标签：目标名称标签，标注框的中心坐标（归一化），标注框的宽和高（归一化）

2、VOC格式文件转换YOLO格式文件原理

如图所示，假设下图为一张照片，青绿色为目标位置，蓝色为照片背景。

VOC文件的目标信息为：

要想转换为YOLO文件格式，需要进行归一化处理，转换公式如下：

归一化中心坐标：

归一化标注框：

转换以后可以得到YOLO文件的目标信息：

3、VOC格式文件转换YOLO格式文件实操代码

通过运行下列代码，可以实现将VOC格式的训练数据集转换为YOLO格式的训练数据集，转换代码如下：

import os
import xml.etree.ElementTree as ET

# 定义自己的类别，自己数据集有几类就填写几类 Define the classes 
classes = ['class_1', 'class_2', 'class_3']

# 定义自己的输出文件夹 Define the output directory 
output_dir = 'yolo_format_dataset'

# 定义自己的输入文件夹 Define the input directory 
input_dir = 'voc_dataset'

# 把每一个输入文件夹里的VOC格式的xml文件转换为yolo格式
# Loop through each xml file in the input directory and convert to yolo format 
for file in os.listdir(input_dir):
    if file.endswith('.xml'):
        file_path = os.path.join(input_dir, file)
        tree = ET.parse(file_path)
        root = tree.getroot()
        
        # 获取照片的尺寸，这是转换计算需要的参数
        # Get the image size
        size = root.find('size')
        width = int(size.find('width').text)
        height = int(size.find('height').text)

        # 创建yolo格式文件
        # Create the yolo format file
        out_file = open(os.path.join(output_dir, file.replace('xml', 'txt')), 'w')

        # 遍历每个对象并写入yolo格式文件
        # Iterate over each object and write to the yolo format file
        for obj in root.iter('object'):
            cls = obj.find('name').text
            if cls not in classes:
                continue
            cls_id = classes.index(cls)
            xmlbox = obj.find('bndbox')
            b = (int(xmlbox.find('xmin').text), int(xmlbox.find('ymin').text), int(xmlbox.find('xmax').text), int(xmlbox.find('ymax').text))
            
            bbx_w = (b[2]-b[0])/float(width)
            bbx_h = (b[3]-b[1])/float(height)
            bbx_x = (b[0]+b[2])/2.0/float(width)
            bbx_y = (b[1]+b[3])/2.0/float(height)
            
            out_file.write(str(cls_id) + ' ' + str(bbx_x) + ' ' + str(bbx_y) + ' ' + str(bbx_w) + ' ' + str(bbx_h) + '\n')
        out_file.close()

也可参考：

https://blog.csdn.net/qq_29633789/article/details/132826212