SODA-A数据集处理成适合YOLOV8训练的数据集

嘉陵妹妹

已于 2024-08-24 19:36:19 修改

阅读量723

点赞数 3

文章标签： YOLO python 人工智能

于 2024-06-21 22:36:59 首次发布

本文链接：https://blog.csdn.net/x131644/article/details/139871325

版权

标注格式如下

{
    "type": "instance",
    "images": {
        "file_name": "00001.jpg",
        "height": 2744,
        "width": 4800,
        "id": 1
    },
    "annotations": [
        {
            "poly": [
                235.0841064453125,
                2551.818359375,
                249.3841552734375,
                2556.41845703125,
                256.6455383300781,
                2533.84521484375,
                242.34548950195312,
                2529.2451171875
            ],
            "area": 356.2015366703272,
            "category_id": 0,
            "image_id": 1,
            "id": 1
        },

SODA-A 遥感图像小目标检测数据处理代码
实现逻辑：把原图像裁切成为 640x640 大小的子图片，遍历每个标注文件标注框，若在子图片内，则加入。
B站可视化界面：https://www.bilibili.com/video/BV1Fn4y1o7aw/?spm_id_from=333.999.0.0

import os
import json
from PIL import Image


def open_annotations_json(image_name):
    # 构造标注文件名
    annotation_name = image_name.replace(".jpg", ".json")
    # 构造标注文件路径

    # 在 train、val 和 test 文件夹中查找标注文件
    for folder in ['train', 'val', 'test']:
        annotation_file_path = os.path.join("Annotations", folder, annotation_name)
        if os.path.exists(annotation_file_path):
            # 如果找到标注文件，则打开并处理
            with open(annotation_file_path, "r") as file:
                data = json.load(file)
                print(f"找到{annotation_file_path}对应的标注文件")
                # 在这里对标注文件进行处理
                return data  # 返回标注文件数据
            break
    else:
        print(f"未找到{annotation_file_path}对应的标注文件")


def is_points_inside_rect(points, left, upper, right, lower):
    for i in range(0, len(points), 2):
        x = points[i]
        y = points[i + 1]
        if x < left or x > right or y < upper or y > lower:
            return False
    return True


def get_polys_in_sub_images(left, upper, right, lower, annotations, crop_size):
    outcome = []
    for annotation in annotations:
        points = annotation['poly']
        category_id = annotation['category_id']

        # 判断多边形是否在子图像内
        flag = is_points_inside_rect(points, left, upper, right, lower)

        if flag:
            # 调整坐标，使左上角的点成为 (0, 0)
            adjusted_points = [category_id]
            for i in range(0, len(points), 2):
                x = points[i] - left
                y = points[i + 1] - upper
                adjusted_points.extend([x / crop_size, y / crop_size])

            outcome.append(adjusted_points)

    return outcome


def process_one_images(image_name, save_image_path, save_txt_path):
    data = open_annotations_json(image_name)
    if data:
        # 加载图片基本信息
        file_name = data['images']['file_name']
        height = data['images']['height']
        width = data['images']['width']
        print(f"图片文件名: {file_name}")
        print(f"图片高度: {height}")
        print(f"图片宽度: {width}")
    else:
        print("未找到对应的标注文件")
        return

    crop_size = 640

    image_name = os.path.join('Images', image_name)

    # 打开图像
    image = Image.open(image_name)

    # 计算裁剪的行数和列数
    rows = height // crop_size
    cols = width // crop_size

    # 创建保存路径
    if not os.path.exists(save_image_path):
        os.makedirs(save_image_path)
    if not os.path.exists(save_txt_path):
        os.makedirs(save_txt_path)

    # 开始裁剪和保存
    for row in range(rows):
        for col in range(cols):
            # 计算裁剪区域的坐标
            left = col * crop_size
            upper = row * crop_size
            right = left + crop_size
            lower = upper + crop_size

            tem_polys = get_polys_in_sub_images(left, upper, right, lower, data['annotations'], crop_size)
            if len(tem_polys) > 5:
                # 裁剪图像
                cropped_image = image.crop((left, upper, right, lower))
                # 构造保存路径
                save_filename = f"{file_name[:-4]}_crop_{row}_{col}.jpg"  # 生成保存文件名，例如：00001_crop_0_0.jpg
                save_file_path = os.path.join(save_image_path, save_filename)

                # 保存裁剪后的图像
                cropped_image.save(save_file_path)

                print(f"已保存裁剪后的图像: {save_file_path}")

                # 构造保存路径
                save_filename = f"{file_name[:-4]}_crop_{row}_{col}.txt"  # 生成保存文件名，例如：00001_crop_0_0.txt
                save_file_path = os.path.join(save_txt_path, save_filename)

                # 打开文件并写入数据
                with open(save_file_path, "w") as f:
                    for poly in tem_polys:
                        line = " ".join(map(str, poly)) + "\n"  # 将列表中的元素转换为字符串并用空格连接
                        f.write(line)

                print(f"已保存裁剪后的文本文件: {save_file_path}")


if __name__ == '__main__':
    # 获取图像文件夹中的所有文件
    image_files = os.listdir('Images')

    # 定义数据集划分比例
    val_ratio = 0.2
    train_ratio = 0.7
    test_ratio = 0.1

    # 计算各数据集的数量
    num_val = int(len(image_files) * val_ratio)
    num_train = int(len(image_files) * train_ratio)
    num_test = int(len(image_files) * test_ratio)

    # 打印数据集数量
    print(f"总共有 {len(image_files)} 张图片。")
    print(f"验证集数量：{num_val}，训练集数量：{num_train}，测试集数量：{num_test}。")

    # 首先创建保存图像和标签的文件夹
    save_dirs = ["val_images", "train_images", "test_images", "val_labels", "train_labels", "test_labels"]
    for save_dir in save_dirs:
        os.makedirs(save_dir, exist_ok=True)

    # 分配图像到不同的数据集文件夹
    val_images = image_files[:num_val]
    train_images = image_files[num_val:num_val + num_train]
    test_images = image_files[num_val + num_train:]

    # 遍历并处理验证集图像
    for image_name in val_images:
        save_image_path = "val_images"  # 保存验证集图像的路径
        save_txt_path = "val_labels"  # 保存验证集标签的路径
        process_one_images(image_name, save_image_path, save_txt_path)

    # 遍历并处理训练集图像
    for image_name in train_images:
        save_image_path = "train_images"  # 保存训练集图像的路径
        save_txt_path = "train_labels"  # 保存训练集标签的路径
        process_one_images(image_name, save_image_path, save_txt_path)

    # 遍历并处理测试集图像
    for image_name in test_images:
        save_image_path = "test_images"  # 保存测试集图像的路径
        save_txt_path = "test_labels"  # 保存测试集标签的路径
        process_one_images(image_name, save_image_path, save_txt_path)

可以看一下chatgpt的回答，理解实现逻辑

chatGPT

这段代码主要是一个图像处理脚本，负责将一组图像根据其标注数据进行裁剪，并将裁剪后的图像和相应的标注信息保存到不同的文件夹中。具体功能如下：

1. 打开标注文件 (open_annotations_json函数):

功能: 给定一个图像文件名，尝试在 Annotations 目录下的 train、val 和 test 文件夹中查找相应的标注文件（JSON格式）。找到后，将标注数据读取并返回。
实现逻辑:将图像文件名中的后缀.jpg替换为.json，得到对应的标注文件名。在 Annotations/train, Annotations/val, 和 Annotations/test 三个文件夹中依次查找该标注文件。找到标注文件后，读取其中的内容，并返回标注数据；若未找到，则输出提示信息。

2. 判断点是否在矩形内 (is_points_inside_rect函数):

功能: 检查一个多边形的点集是否完全位于一个矩形框内。
实现逻辑:遍历点集中的每个点，如果有任何一个点不在矩形框内，则返回False，否则返回True。

3. 获取子图像中的多边形 (get_polys_in_sub_images函数):

功能: 根据给定的裁剪区域，检查标注数据中的多边形是否完全位于该区域内，如果是，则调整多边形坐标，并返回调整后的多边形信息。
实现逻辑:遍历标注数据中的每个多边形，检查其是否完全位于裁剪区域内。如果多边形在区域内，则将其坐标进行调整，使其相对于子图像的左上角（即将子图像的左上角作为新坐标原点）。返回调整后的多边形数据。

4. 处理单张图像 (process_one_images函数):

功能: 对单张图像进行裁剪，并将裁剪后的图像和相应的标注信息保存到指定路径。
实现逻辑:通过open_annotations_json函数加载对应的标注文件，获取图像的宽度和高度信息。计算裁剪图像时需要的行数和列数（每个子图像大小为640x640像素）。循环遍历每个裁剪区域，获取该区域内的多边形，并进行裁剪。如果裁剪区域内包含超过5个多边形，则保存该区域的图像和对应的标注信息。

5. 主程序 (在main下运行的代码):

功能: 将整个图像集分为训练集、验证集和测试集，然后对每个数据集中的图像进行处理和保存。
实现逻辑:获取Images文件夹中的所有图像文件。按照预定义的比例（70%训练，20%验证，10%测试）划分数据集。创建保存裁剪后图像和标签的文件夹。对每个数据集中的图像进行裁剪和标注处理，并将结果保存到相应的文件夹中。

总结：

这段代码的主要目的是从标注文件中读取标注信息，然后根据预定的尺寸将图像进行裁剪，并生成对应裁剪区域内的标注信息。最终的输出是多个子图像及其对应的标注文件，它们分别保存在训练集、验证集和测试集的不同文件夹中。

用户问题

我将发你一个标注数据，请结合标注数据讲解一下

chatGPT

好的，请将标注数据发给我，我会结合数据进一步解释代码的功能和实现逻辑。

用户问题

{
“type”: “instance”,
“images”: {
“file_name”: “00001.jpg”,
“height”: 2744,
“width”: 4800,
“id”: 1
},
“annotations”: [
{
“poly”: [
235.0841064453125,
2551.818359375,
249.3841552734375,
2556.41845703125,
256.6455383300781,
2533.84521484375,
242.34548950195312,
2529.2451171875
],
“area”: 356.2015366703272,
“category_id”: 0,
“image_id”: 1,
“id”: 1
},
{
“poly”: [
273.10595703125,
2434.80810546875,
289.30609130859375,
2440.00830078125,
296.59283447265625,
2417.3076171875,
280.3927001953125,
2412.107421875
],
“area”: 405.6466100513935,
“category_id”: 0,
“image_id”: 1,
“id”: 2
},
{
“poly”: [
282.4454345703125,
2404.78076171875,
297.345458984375,
2410.880859375,
306.1392822265625,
2389.40087890625,
291.2392578125,
2383.30078125
],
“area”: 373.6954139471054,
“category_id”: 0,
“image_id”: 1,
“id”: 3
},
{
“poly”: [
303.7540283203125,
2345.409912109375,
317.65399169921875,
2349.209716796875,
323.67205810546875,
2327.196533203125,
309.7720947265625,
2323.396728515625
],
“area”: 328.8499227464199,
“category_id”: 0,
“image_id”: 1,
“id”: 4
},
{
“poly”: [
330.7451477050781,
2256.2265625,
345.8451232910156,
2263.62646484375,
356.5094299316406,
2241.86572265625,
341.4094543457031,
2234.4658203125
],
“area”: 407.5015034675598,
“category_id”: 0,
“image_id”: 1,
“id”: 5
},
{
“poly”: [
372.61279296875,
2140.016845703125,
386.51275634765625,
2144.416748046875,
393.40155029296875,
2122.655029296875,
379.5015869140625,
2118.255126953125
],
“area”: 332.7971143126488,
“category_id”: 0,
“image_id”: 1,
“id”: 6
},
{
“poly”: [
554.7942504882812,
1662.7821044921875,
575.8244018554688,
1669.8887939453125,
580.7244262695312,
1655.3887939453125,
559.6942749023438,
1648.2821044921875
],
“area”: 339.7601466476917,
“category_id”: 0,
“image_id”: 1,
“id”: 7
},
{
“poly”: [
582.298583984375,
1672.4954833984375,
607.0546875,
1680.6297607421875,
611.65478515625,
1666.6297607421875,
586.898681640625,
1658.4954833984375
],
“area”: 384.00391936302185,
“category_id”: 0,
“image_id”: 1,
“id”: 8
},
{
“poly”: [
108.03138732910156,
2399.113525390625,
114.69731140136719,
2417.236572265625,
132.0973663330078,
2410.836669921875,
125.43144226074219,
2392.713623046875
],
“area”: 358.0032742470503,
“category_id”: 0,
“image_id”: 1,
“id”: 9
},
{
“poly”: [
147.56044006347656,
2374.158203125,
184.94761657714844,
2387.55224609375,
196.4475555419922,
2355.4521484375,
159.0603790283203,
2342.05810546875
],
“area”: 1354.1626938134432,
“category_id”: 0,
“image_id”: 1,
“id”: 10
},
{
“poly”: [
654.5000610351562,
1670.9998779296875,
668.5000610351562,
1675.0997314453125,
670.2269897460938,
1669.2027587890625,
656.2269897460938,
1665.1029052734375
],
“area”: 89.63777193427086,
“category_id”: 2,
“image_id”: 1,
“id”: 11
},
{
“poly”: [
702.4000854492188,
1696.900146484375,
716.4002075195312,
1701.0,
718.1271362304688,
1695.102783203125,
704.1270141601562,
1691.0029296875
],
“area”: 89.64190977811813,
“category_id”: 2,
“image_id”: 1,
“id”: 12
},
{
“poly”: [
685.3787841796875,
1685.9095458984375,
701.0999755859375,
1688.2000732421875,
702.1212158203125,
1681.1905517578125,
686.4000244140625,
1678.9000244140625
],
“area”: 112.53720760345459,
“category_id”: 2,
“image_id”: 1,
“id”: 13
},

chatGPT

根据你提供的标注数据，我们可以更详细地解释代码的实现逻辑和它在处理这些数据时的具体操作。

标注数据结构概述

该标注数据是一个JSON格式的对象，主要包含以下字段：

images：描述图像的基本信息，如文件名、尺寸等。file_name: 图像文件名，这里是 “00001.jpg”。height: 图像的高度，这里是 2744 像素。width: 图像的宽度，这里是 4800 像素。id: 图像的唯一标识符，这里是 1。
annotations：包含多个多边形区域的标注数据，每个标注对象包含如下字段：poly: 由一系列坐标点（x, y）组成的多边形，这些点描述了图像中某个区域的轮廓。area: 多边形的面积。category_id: 多边形对应的类别。image_id: 该标注所属的图像ID。id: 标注的唯一标识符。

结合标注数据的代码逻辑解释

读取标注数据 (open_annotations_json函数):

当处理图像时，代码首先通过open_annotations_json函数找到并读取对应的标注文件，返回一个类似你提供的JSON数据结构。
在这一步，假设文件00001.jpg在Images目录中，代码将尝试在Annotations目录下找到相应的00001.json文件。

裁剪和处理图像 (process_one_images函数):

代码读取标注数据后，获取图像的基本信息（如宽度、高度等），然后按640x640像素的块对图像进行裁剪。
例如，图像宽度为4800像素，高度为2744像素，因此将被裁剪成7x4（即4800/640=7和2744/640≈4）个块。

判断多边形是否在裁剪区域内 (get_polys_in_sub_images函数):

对于每个裁剪块，代码会调用get_polys_in_sub_images函数，检查annotations中的每个多边形是否完全位于当前裁剪区域内。
在你提供的标注数据中，每个poly字段都是一个多边形的点集，例如[235.0841064453125, 2551.818359375, …]。这些点代表了图像中特定区域的轮廓。
如果多边形在裁剪区域内，代码会调整这些坐标点，使它们相对于当前裁剪块的左上角进行偏移，并将这些调整后的坐标保存在结果中。

保存裁剪后的图像和标注信息:

代码对每个裁剪区域内包含的多边形数目进行检查，如果区域内包含超过5个多边形（即tem_polys的长度大于5），那么代码会保存该裁剪区域的图像和调整后的标注数据。
例如，假设某个裁剪块包含多个多边形，那么这些多边形的坐标会被调整并存储到一个新的文本文件中，图像的这部分也会被保存为一个新的裁剪图像文件。

总结

标注数据为代码提供了每个图像中感兴趣区域的详细信息（如多边形坐标和类别）。
代码逻辑依赖于这些标注数据，通过裁剪图像并提取和保存相应区域内的标注信息，为后续可能的训练或分析步骤做好准备。
输出结果包括多个裁剪后的图像及其对应的标注文件，它们分别保存在不同的数据集（训练集、验证集、测试集）目录中。

用户问题

“poly”: [
235.0841064453125,
2551.818359375,
249.3841552734375,
2556.41845703125,
256.6455383300781,
2533.84521484375,
242.34548950195312,
2529.2451171875
], 一个8个数字，代表4个点，4个点框起来就是标识一个检测框

chatGPT

明白了，感谢你的补充！poly字段中包含的8个数字，确实表示了4个点的坐标，这些点一起构成了一个检测框。这个检测框可以用来标识图像中的某个物体或区域。

结合补充信息的代码逻辑解释

多边形表示的检测框:

每个poly字段代表4个点，即一个矩形的四个角，形成一个检测框。比如：

"poly": [
    235.0841064453125, 2551.818359375,  # 第一个点 (x1, y1)
    249.3841552734375, 2556.41845703125, # 第二个点 (x2, y2)
    256.6455383300781, 2533.84521484375, # 第三个点 (x3, y3)
    242.34548950195312, 2529.2451171875  # 第四个点 (x4, y4)
]