YOLO 格式 TXT 标注文件（.txt）的批量替换类别索引-CSDN博客

本文链接：https://blog.csdn.net/weixin_69011755/article/details/147993820

一、需求分析

假设你有 14 个数据集（每个对应一个类别），每个数据集的 YOLO 标注文件（.txt）中每行首列均为 0（表示类别索引）。需要将每个数据集的标注索引替换为唯一的目标 ID（如 0→西瓜，1→苹果，…，13→第14类）。

二、代码核心逻辑

定义类别映射表：为每个数据集（类别）分配唯一 ID（如 {'watermelon': 0, 'apple': 1, ...}）。
遍历数据集文件夹：每个数据集对应一个标注文件夹（如 watermelon_labels）。
逐行修改 TXT 文件：读取每个 .txt 文件，将每行首列的 0 替换为目标 ID。
错误处理：跳过无标注文件、格式错误的文件，并输出提示。

三、完整 Python 代码

import os
import shutil

def update_yolo_labels(
        dataset_root,  # 数据集根目录（如C:\datasets）
        class_mapping,  # 类别名→目标ID的映射（需与实际文件夹前缀一致）
        label_folder_suffix='-label',  # 修改为实际后缀：'-label'
        backup=True
):
    """
    批量更新 YOLO 格式 TXT 标注文件的类别索引
    :param dataset_root: 数据集根目录（包含所有类别文件夹）
    :param class_mapping: 类别名到目标ID的映射字典
    :param label_folder_suffix: 标注文件夹的后缀（如'-label'）
    :param backup: 是否备份原标注文件（防止数据丢失）
    """
    for class_name, target_id in class_mapping.items():
        # 构造标注文件夹路径（如：C:\datasets\watermelon-label）
        label_dir = os.path.join(dataset_root, f"{class_name}{label_folder_suffix}")
        print(f"尝试访问路径：{label_dir}")  # 调试打印，确认路径是否正确

        # 检查文件夹是否存在
        if not os.path.exists(label_dir):
            print(f"警告：类别 {class_name} 的标注文件夹 {label_dir} 不存在，跳过！")
            continue

        # 遍历标注文件夹中的TXT文件（支持大小写后缀）
        txt_files = [f for f in os.listdir(label_dir) if f.lower().endswith('.txt')]
        if not txt_files:
            print(f"警告：类别 {class_name} 的标注文件夹无 .txt 文件，跳过！")
            continue

        # 处理每个TXT文件
        for txt_file in txt_files:
            txt_path = os.path.join(label_dir, txt_file)

            # 备份原文件
            if backup:
                backup_path = os.path.join(label_dir, f"backup_{txt_file}")
                if not os.path.exists(backup_path):
                    shutil.copy2(txt_path, backup_path)
                    print(f"已备份 {txt_path} → {backup_path}")

            # 读取并修改标注内容（根据实际编码调整）
            try:
                with open(txt_path, 'r', encoding='utf-8') as f:
                    lines = f.readlines()

                new_lines = []
                for line in lines:
                    line = line.strip()
                    if not line:
                        continue
                    parts = line.split()
                    if len(parts) != 5:
                        print(f"警告：文件 {txt_path} 格式错误（非5列），跳过该行！")
                        continue
                    parts[0] = str(target_id)
                    new_lines.append(' '.join(parts) + '\n')

                with open(txt_path, 'w', encoding='utf-8') as f:
                    f.writelines(new_lines)
                print(f"更新成功：{txt_path} → 类别ID {target_id}")

            except Exception as e:
                print(f"错误：处理 {txt_path} 时发生异常：{str(e)}")


# =============================================
# 使用示例（根据你的实际路径调整）
# =============================================
if __name__ == "__main__":
    # 类别映射表（需与实际文件夹前缀一致，如文件夹是watermelon-label，则类别名是'watermelon'）
    class_mapping = {
        'watermelon': 0,
        'apple': 1,
        'banana': 2,
        'grape': 3,
        'orange': 4,
        'pear': 5,
        'pomegranate': 6,
        'nectarine': 7,
        'mango': 8,
        'lychee': 9,
        'longan': 10,
        'durian': 11,
        'cantaloupe': 12,
        'blueberry': 13
    }

    # 运行脚本（修改为你的实际根目录）
    update_yolo_labels(
        dataset_root=r'C:\Users\29420\Desktop\1',  # 实际数据集根目录（如C:\datasets）
        class_mapping=class_mapping,
        label_folder_suffix='-label',  # 关键修改：后缀设为'-label'
        backup=True
    )