高效地整理和迁移数据集图片和注释

lindsayshuo

已于 2024-06-05 11:53:40 修改

阅读量317

点赞数 1

文章标签： windows microsoft linux

于 2024-06-05 11:52:32 首次发布

本文链接：https://blog.csdn.net/weixin_43269994/article/details/139468441

版权

在机器学习项目中，管理大量图像数据及其相关注释是一项挑战性工作。特别是在需要移动、重组文件目录结构时，手动处理每个文件会极其耗时。本文介绍一个简洁明了的Python脚本，帮助你自动化这一流程，从而高效地整理和迁移数据集。

脚本功能简介

主要功能如下：

自动查找‘images’和‘annotations’两个目录下的文件
确定名称相匹配的图像和注释文件
把它们移动到新的目录结构中

从头开始：创建目录

一开始，脚本定义了源数据集的路径和要把数据迁移到的新路径：

annotations_path = "annotations"
images_path = "images"
new_annotation_path = "./datastes/annotations"
new_images_path = "./datastes/images"

接下来，确保新路径存在，如果不存在则创建它们：

import os

if not os.path.exists(new_annotation_path):
    os.makedirs(new_annotation_path)

if not os.path.exists(new_images_path):
    os.makedirs(new_images_path)

筛选相匹配的文件

为了只迁移那些有注释的图像，脚本首先获取两个目录中所有文件的名称，并找出那些名称相匹配的文件：

import os

images_list_mix = [list(os.path.splitext(filename)) for filename in os.listdir(images_path)]
images_list = [i[0] for i in images_list_mix]
annotations_list = [os.path.splitext(filename)[0] for filename in os.listdir(annotations_path)]

common_names = list(set(images_list).intersection(annotations_list))

开始迁移

最后，对于每一个名称匹配的图像和注释，脚本计算它们的源路径和目标路径，并利用shutil.move函数完成迁移。

import shutil

for name in common_names:
    src_image = [os.path.join(images_path, filename) for filename in os.listdir(images_path) if filename.startswith(name)][0]
    src_annotation = os.path.join(annotations_path, name + ".xml")
    
    dst_image = os.path.join(new_images_path, name + os.path.splitext(src_image)[1])
    dst_annotation = os.path.join(new_annotation_path, name + os.path.splitext(src_annotation)[1])
    
    shutil.move(src_image, dst_image)
    shutil.move(src_annotation, dst_annotation)

代码

import os
import shutil

annotations_path = "annotations"

images_path = "images"

new_annotation_path = "./datastes/annotations"
new_images_path = "./datastes/images"

if not os.path.exists(new_annotation_path):
    os.makedirs(new_annotation_path)

if not os.path.exists(new_images_path):
    os.makedirs(new_images_path)

images_list_mix = [list(os.path.splitext(filename)) for filename in os.listdir(images_path)]

images_list = [i[0] for i in images_list_mix]
annotations_list = [os.path.splitext(filename)[0] for filename in os.listdir(annotations_path)]

common_names = list(set(images_list).intersection(annotations_list))

for name in common_names:
    src_image = \
        [os.path.join(images_path, filename) for filename in os.listdir(images_path) if filename.startswith(name)][0]
    src_annotation = os.path.join(annotations_path, name + ".xml")
    dst_image = os.path.join(new_images_path, name + os.path.splitext(src_image)[1])
    dst_annotation = os.path.join(new_annotation_path, name + os.path.splitext(src_annotation)[1])
    shutil.move(src_image, dst_image)
    shutil.move(src_annotation, dst_annotation)