GRDDC2020数据集下载及介绍

AICVHub

已于 2024-10-22 16:52:03 修改

阅读量1.1w

点赞数 17

分类专栏： CV数据处理文章标签： GRDDC 道路损伤数据集

于 2022-04-08 17:34:56 首次发布

本文链接：https://blog.csdn.net/oYeZhou/article/details/124046566

版权

CV数据处理专栏收录该内容

28 篇文章

订阅专栏

本文介绍了GRDDC数据集的基本信息，并进行了一些统计；同时给出了数据集的百度云下载。

1. 基本介绍

2. 类别分布

3. 百度云下载

Update：

2022数据集也出来了，见：Data | 2022 IEEE International Conference on Big Data

1. 基本介绍

GRDDC是全球道路损伤检测挑战赛提供的数据集，从2018~2020每年举行一次。

2020年的数据集共21041个样本，包含三个国家："Czech", "India", "Japan"；样本量分别为：2829、7706、10506。

数据集中一共10类： ['d00', 'd01', 'd0w0', 'd10', 'd11', 'd20', 'd40', 'd43', 'd44', 'd50']，各类别含义分别为：

d00：纵向裂缝；
d01(d0w0为错误标签): 纵向拼接缝；
d10: 横向裂缝；
d11: 横向拼接缝；
d20：龟裂；
d40: 坑洞；
d43: 十字路口模糊；
d44: 白线模糊；
d50: 井盖；

2. 类别分布

各个类别的分布如下：

统计代码如下：

import os
import xml.etree.ElementTree as ET
import tqdm


def xml_parse(target):
    """
    Arguments:
        target (annotation) : the target annotation to be made usable
            will be an ET.Element
    Returns:
        a list containing lists of bounding boxes  [bbox coords, class name]
    """
    res = []
    for obj in target.iter("object"):
        name = obj.find("name").text.strip()
        bbox = obj.find("bndbox")

        pts = ["xmin", "ymin", "xmax", "ymax"]
        bndbox = [name]
        for i, pt in enumerate(pts):
            cur_pt = int(float(bbox.find(pt).text))
            # scale height or width
            # cur_pt = cur_pt / width if i % 2 == 0 else cur_pt / height
            bndbox.append(cur_pt)

        res.append(bndbox)  # [xmin, ymin, xmax, ymax, label_name]
        # img_id = target.find('filename').text[:-4]

    return res  # [[xmin, ymin, xmax, ymax, label_ind], ... ]


if __name__ == '__main__':
    raw_path = "path_to_grddc2020/train"
    sub_dirs = ["Czech", "India", "Japan"]

    statistic = {}
    for sub_dir in sub_dirs:
        img_path_root = os.path.join(raw_path, sub_dir, 'images')
        xml_path_root = os.path.join(raw_path, sub_dir, 'annotations', 'xmls')
        for f in tqdm.tqdm(os.listdir(img_path_root), ncols=100):

            # parse xml
            xml_path = os.path.join(xml_path_root, os.path.splitext(f)[0]+'.xml')
            target = ET.parse(xml_path)
            res = xml_parse(target)


            for r in res:
                cls = r[0]
                # statistic
                if cls not in statistic.keys():
                    statistic[cls] = 1
                else:
                    statistic[cls] += 1

    print(statistic)
    for key in sorted(statistic):
        print("{}: {}".format(key, statistic[key]))