【目标检测标签转换工具】YOLO 格式与 Pascal VOC XML 格式的互转详解（含完整代码）

最新推荐文章于 2025-05-13 19:13:45 发布

DragonnAi

最新推荐文章于 2025-05-13 19:13:45 发布

阅读量871

点赞数 25

分类专栏：人工智能文章标签：目标检测 YOLO xml

本文链接：https://blog.csdn.net/qq_43207259/article/details/147802321

版权

人工智能专栏收录该内容

21 篇文章

订阅专栏

一、写在前面：为什么需要标签格式转换？

在目标检测任务中，不同的模型和标注工具使用的标签格式常常不同：

YOLO 系列（YOLOv5/v8） 使用的是 .txt 格式，每行为一个目标，记录相对归一化的位置信息；
Faster R-CNN、SSD、RetinaNet 等 模型常使用 Pascal VOC 格式（.xml 文件）；
而像 LabelImg、CVAT 等标注工具 默认生成的就是 Pascal VOC 或 COCO 格式。

这就带来了一个常见需求：在模型训练、标注、评估之间自由切换格式，提高数据复用效率！

因此，本文提供 YOLO ↔ Pascal VOC 的双向转换工具代码，并详细说明使用流程、格式说明及代码实现，助你轻松完成数据迁移与模型切换。

二、YOLO 格式与 Pascal VOC 格式简介

YOLO 标签格式（每张图一个 `.txt` 文件）：

每一行一个目标，格式如下：

<class_id> <x_center> <y_center> <width> <height>

其中：

class_id 为类别 ID，从 0 开始；
x_center, y_center 为目标中心点坐标（相对图像宽高归一化）；
width, height 也为相对值。

示例：

1 0.521 0.637 0.221 0.340

Pascal VOC 标签格式（每张图一个 `.xml` 文件）：

结构化 XML 文件，包含图像尺寸与每个目标的 <object> 标签，坐标使用的是绝对像素值。

<annotation>
  <folder>VOC2007</folder>
  <filename>example.jpg</filename>
  <size>
    <width>1920</width>
    <height>1080</height>
    <depth>3</depth>
  </size>
  <object>
    <name>Spalling</name>
    <bndbox>
      <xmin>480</xmin>
      <ymin>360</ymin>
      <xmax>920</xmax>
      <ymax>880</ymax>
    </bndbox>
  </object>
</annotation>

⚙️ 三、YOLO ➡ Pascal VOC XML 转换

我们将 YOLO 中的中心点+宽高形式的标注转换为 Pascal VOC 格式的绝对坐标（xmin, ymin, xmax, ymax）。

完整代码如下：

import os
import glob
from PIL import Image

def convert_yolo_to_voc(yolo_dir, image_dir, output_dir, class_names, img_ext=".jpg"):
    os.makedirs(output_dir, exist_ok=True)
    image_list = glob.glob(os.path.join(image_dir, f"*{img_ext}"))
    print(f"Found {len(image_list)} images.")

    for idx, image_path in enumerate(image_list):
        image_name = os.path.splitext(os.path.basename(image_path))[0]
        label_path = os.path.join(yolo_dir, image_name + ".txt")
        xml_path = os.path.join(output_dir, image_name + ".xml")

        # 读取图像尺寸
        try:
            image = Image.open(image_path)
            w, h = image.size
        except Exception as e:
            print(f"Error reading image {image_path}: {e}")
            continue

        # 写入 XML 文件头部
        with open(xml_path, 'w') as xml_file:
            xml_file.write('<annotation>\n')
            xml_file.write(f'    <folder>VOC2007</folder>\n')
            xml_file.write(f'    <filename>{image_name + img_ext}</filename>\n')
            xml_file.write('    <size>\n')
            xml_file.write(f'        <width>{w}</width>\n')
            xml_file.write(f'        <height>{h}</height>\n')
            xml_file.write('        <depth>3</depth>\n')
            xml_file.write('    </size>\n')

            if os.path.exists(label_path):
                with open(label_path, 'r') as f:
                    lines = f.read().splitlines()

                for line in lines:
                    spt = line.strip().split()
                    if len(spt) != 5:
                        continue  # 非标准行跳过

                    class_id = int(spt[0])
                    if class_id >= len(class_names):
                        print(f"Warning: class_id {class_id} out of range in {label_path}")
                        continue

                    name = class_names[class_id]
                    xc, yc, bw, bh = map(float, spt[1:])

                    xmin = int((xc - bw / 2) * w)
                    ymin = int((yc - bh / 2) * h)
                    xmax = int((xc + bw / 2) * w)
                    ymax = int((yc + bh / 2) * h)

                    xml_file.write('    <object>\n')
                    xml_file.write(f'        <name>{name}</name>\n')
                    xml_file.write('        <pose>Unspecified</pose>\n')
                    xml_file.write('        <truncated>0</truncated>\n')
                    xml_file.write('        <difficult>0</difficult>\n')
                    xml_file.write('        <bndbox>\n')
                    xml_file.write(f'            <xmin>{xmin}</xmin>\n')
                    xml_file.write(f'            <ymin>{ymin}</ymin>\n')
                    xml_file.write(f'            <xmax>{xmax}</xmax>\n')
                    xml_file.write(f'            <ymax>{ymax}</ymax>\n')
                    xml_file.write('        </bndbox>\n')
                    xml_file.write('    </object>\n')

            xml_file.write('</annotation>\n')

        if (idx + 1) % 100 == 0:
            print(f"Processed {idx+1}/{len(image_list)} images.")

    print("All YOLO labels converted to VOC XML.")


# 示例调用：
if __name__ == "__main__":
    yolo_label_dir = "Rail/test/labels"
    image_dir = "Rail/test/images"
    output_xml_dir = "Annotations"
    class_names = ['Spalling', 'Wheel Burn', 'Squat', 'Corrugation']

    convert_yolo_to_voc(yolo_label_dir, image_dir, output_xml_dir, class_names)

注意事项：

自动读取图像尺寸；
支持批量处理；
会自动创建 XML 存储目录。

⚙️ 四、Pascal VOC XML ➡ YOLO 转换

本部分代码将 Pascal VOC 格式中的 <bndbox> 绝对坐标转换为 YOLO 所需的相对归一化格式。

完整代码如下：

import os
import glob
import xml.etree.ElementTree as ET
from PIL import Image

def convert_voc_to_yolo(xml_dir, image_dir, output_dir, class_names, img_ext=".jpg"):
    os.makedirs(output_dir, exist_ok=True)
    xml_list = glob.glob(os.path.join(xml_dir, "*.xml"))
    print(f"Found {len(xml_list)} XML files.")

    for idx, xml_path in enumerate(xml_list):
        tree = ET.parse(xml_path)
        root = tree.getroot()

        image_name = root.find('filename').text
        name_base = os.path.splitext(image_name)[0]
        img_path = os.path.join(image_dir, name_base + img_ext)
        txt_path = os.path.join(output_dir, name_base + ".txt")

        # 获取图像宽高
        try:
            image = Image.open(img_path)
            w, h = image.size
        except Exception as e:
            print(f"Error reading image {img_path}: {e}")
            continue

        with open(txt_path, 'w') as f_out:
            for obj in root.findall('object'):
                cls_name = obj.find('name').text
                if cls_name not in class_names:
                    print(f"Warning: class name {cls_name} not in list.")
                    continue

                cls_id = class_names.index(cls_name)
                bbox = obj.find('bndbox')
                xmin = int(float(bbox.find('xmin').text))
                ymin = int(float(bbox.find('ymin').text))
                xmax = int(float(bbox.find('xmax').text))
                ymax = int(float(bbox.find('ymax').text))

                # 转为YOLO格式：中心点 + 宽高，归一化
                x_center = (xmin + xmax) / 2.0 / w
                y_center = (ymin + ymax) / 2.0 / h
                box_width = (xmax - xmin) / w
                box_height = (ymax - ymin) / h

                f_out.write(f"{cls_id} {x_center:.6f} {y_center:.6f} {box_width:.6f} {box_height:.6f}\n")

        if (idx + 1) % 100 == 0:
            print(f"Processed {idx+1}/{len(xml_list)} annotations.")

    print(" All VOC XML annotations converted to YOLO format.")


# 示例调用：
if __name__ == "__main__":
    voc_xml_dir = "/Annotations"
    image_dir = "/test/images"
    output_yolo_dir = "/test/labels_converted"
    class_names = ['Spalling', 'Wheel Burn', 'Squat', 'Corrugation']

    convert_voc_to_yolo(voc_xml_dir, image_dir, output_yolo_dir, class_names)

注意事项：

自动读取图像尺寸用于归一化；
支持自定义类别名列表；
会跳过无效标签或未知类别。

五、实际调用示例（适配你的项目路径）

if __name__ == "__main__":
    class_names = ['Spalling', 'Wheel Burn', 'Squat', 'Corrugation']

    # YOLO → XML
    convert_yolo_to_voc(
        yolo_dir="Rail/test/labels",
        image_dir="Rail/test/images",
        output_dir="Annotations",
        class_names=class_names
    )

    # XML → YOLO
    convert_voc_to_yolo(
        xml_dir="Annotations",
        image_dir="Rail/test/images",
        output_dir="Rail/test/labels_converted",
        class_names=class_names
    )