【OWOD论文】开放世界中OD代码_1_数据部分

56 篇文章 3 订阅
51 篇文章 7 订阅

简介

这里记录下开放世界目标检测论文 【Towards Open World Object Detection】的实验和代码细节。

数据集基础

文章中给出的数据集数量分布情况,逐个Task来分析。该实验将VOC 2007和COCO 2017数据集进行了集成,其实仅用COCO 2017也可,但本领域开篇之作用了此方式,后续将会把这个设置作为基础操作。

VOC 2007数据集的类别信息:

数据集包含 训练集:5011 张,测试集:4952张,共9963张,20个类

[aeroplane,bicycle.bird,boat,bottle,bus,car,cat,chair,cow,

diningtable,dog,horse,motorbike,person,pottedplant,

sheep,sofa,train,tvmonitor]

 CoCo2017数据集包括train(118287张)、val(5000张)、test(40670张) 

目标检测类别80种。

[‘person’, ‘bicycle’, ‘car’, ‘motorcycle’, ‘airplane’, ‘bus’, ‘train’, ‘truck’, ‘boat’, ‘traffic light’, ‘fire hydrant’, ‘stop sign’, ‘parking meter’, ‘bench’, ‘bird’, ‘cat’, ‘dog’, ‘horse’, ‘sheep’, ‘cow’, ‘elephant’, ‘bear’, ‘zebra’, ‘giraffe’, ‘backpack’, ‘umbrella’, ‘handbag’, ‘tie’, ‘suitcase’, ‘frisbee’, ‘skis’, ‘snowboard’, ‘sports ball’, ‘kite’, ‘baseball bat’, ‘baseball glove’, ‘skateboard’, ‘surfboard’, ‘tennis racket’, ‘bottle’, ‘wine glass’, ‘cup’, ‘fork’, ‘knife’, ‘spoon’, ‘bowl’, ‘banana’, ‘apple’, ‘sandwich’, ‘orange’, ‘broccoli’, ‘carrot’, ‘hot dog’, ‘pizza’, ‘donut’, ‘cake’, ‘chair’, ‘couch’, ‘potted plant’, ‘bed’, ‘dining table’, ‘toilet’, ‘tv’, ‘laptop’, ‘mouse’, ‘remote’, ‘keyboard’, ‘cell phone’, ‘microwave’, ‘oven’, ‘toaster’, ‘sink’, ‘refrigerator’, ‘book’, ‘clock’, ‘vase’, ‘scissors’, ‘teddy bear’, ‘hair drier’, ‘toothbrush’]

大类12个,分别为

[‘appliance’, ‘food’, ‘indoor’, ‘accessory’, ‘electronic’, ‘furniture’, ‘vehicle’, ‘sports’, ‘animal’, ‘kitchen’, ‘person’, ‘outdoor’]

COCO数据集包含VOC数据集的各个类别。

实验中的数据集

实验中将数据集划分了4个子任务,可与代码中给出的datasets\OWOD_imagesets文件夹下的内容比对。每个任务所用数据集分别介绍如下:

Task 1包含的类别为VOC中的20种类别:

[aeroplane,bicycle.bird,boat,bottle,bus,car,cat,chair,cow,

diningtable,dog,horse,motorbike,person,pottedplant,

sheep,sofa,train,tvmonitor]

Task 1 训练集【t1_train.txt】中将包含VOC和COCO同类别实例的图像作为整体,训练集16551张图像,测试集【t1_known_test.txt】4952为VOC2007的测试集图像。

t1_train_with_unk.txt:相当于在【t1_train.txt】基础上增加了 1500张图像,即18051张,图像中包含未知类数据(相当于这些图像中没有Task1这20类的实例,这个具体要看代码确定)。

Task 2包含的类别为COCO中Outdoor等(如图)大类的的20种小类别:

T2_CLASS_NAMES = [
    "truck", "traffic light", "fire hydrant", "stop sign", "parking meter",
    "bench", "elephant", "bear", "zebra", "giraffe",
    "backpack", "umbrella", "handbag", "tie", "suitcase",
    "microwave", "oven", "toaster", "sink", "refrigerator"]

训练集【t2_train.txt】45520,测试集1914【t2_test.txt】,t2_train_with_unk.txt:相当于在【t2_train.txt】基础上增加了 2000张图像,即47520张,图像中包含未知类数据(相当于这些图像中没有Task2这20类的实例。)。

Task 3包含的类别为COCO中Outdoor等(如图)大类的的20种小类别:

T3_CLASS_NAMES = [
    "frisbee", "skis", "snowboard", "sports ball", "kite",
    "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket",
    "banana", "apple", "sandwich", "orange", "broccoli",
    "carrot", "hot dog", "pizza", "donut", "cake"]

训练集【t3_train.txt】39402,测试集1642【t2_test.txt】,t2_train_with_unk.txt:相当于在【t2_train.txt】基础上增加了 2000张图像,即41402张,图像中包含未知类数据(相当于这些图像中没有Task3这20类的实例。)。

Task 4包含的类别为COCO中Outdoor等(如图)大类的的20种小类别:

T4_CLASS_NAMES = [
    "bed", "toilet", "laptop", "mouse",
    "remote", "keyboard", "cell phone", "book", "clock",
    "vase", "scissors", "teddy bear", "hair drier", "toothbrush",
    "wine glass", "cup", "fork", "knife", "spoon", "bowl"]

训练集【t4_train.txt】40260,测试集1738【t4_test.txt】,

左侧datasets\coco17_voc_style\ImageSets\Main中的数据,右侧: datasets\OWOD_imagesets中的数据,右侧数据是文章真正用到的数据,左侧是右侧数据的预处理。

两文件夹中 t*_train 是一样的。左侧 t*_test对应上图的测试集,t*_test_unk与t*_test一致,t2_all_test_unk 中包含VOC 2007测试集全部4952张图像和4952张COCO验证集图像,共计9904张图像。右侧图中 t*_train_with_unk 表示在 t*_train 基础上增加了1500-2000张图像(有点类似包含了文中所述的验证集)。t*_ft 表示文中所说的,表示*时刻前所有类构成的图像平衡微调集合,根据文中,每个类最少保留 50张图像,所以可以看到 t*_ft 随着时间推移,数据量不断增大。

wr1.txt:为开集检测文件,包含VOC2007全部测试图像 4952和同数量的COCO未知类样本,检测模型应对未知类拒绝能力。

 

数据集的加载

数据集的加载是程序执行的前提条件,从模型训练文件 train_net.py DefaultTrainer出发:

detectron2\engine\engine中找到 DefaultTrainer 方法,开始阅读源码。

下面的内容就顺理成章了。 

  首先需要关注的是代码中 detectron2\data\datasets 文件夹 

 builtin.py:用于定义代码中 数据与对应文件的映射关系。

然后在此方法中 有下面的内容

for name, dirname, split in SPLITS:
        year = 2007 if "2007" in name else 2012
        register_pascal_voc(name, os.path.join(root, dirname), split, year)
        MetadataCatalog.get(name).evaluator_type = "pascal_voc"

主要关注:register_pascal_voc(name, os.path.join(root, dirname), split, year) 

由此打开 pascal_voc.py 中的方法

#("t4_voc_coco_2007_ft", "VOC2007", "t4_ft")
#name:t4_voc_coco_2007_ft 
#dirname:os.path.join(root, dirname)
#split : t4_ft
#year 2007
def register_pascal_voc(name, dirname, split, year):
    # if "voc_coco" in name:
    #     class_names = VOC_COCO_CLASS_NAMES
    # else:
    #     class_names = tuple(VOC_CLASS_NAMES)
    class_names = VOC_COCO_CLASS_NAMES
    DatasetCatalog.register(name, lambda: load_voc_instances(dirname, split, class_names))
    MetadataCatalog.get(name).set(
        thing_classes=list(class_names), dirname=dirname, year=year, split=split
    )

该方法中主要包含:DatasetCatalog.register /  load_voc_instances 和 MetadataCatalog

DatasetCatalog.register:实现数据集的注册和加载,最终形式为:

{”t4_voc_coco_2007_ft“:

[
    {
    "file_name": jpeg_file,# 00001.jog
    "image_id": fileid,# 00001
    "height": int(tree.findall("./size/height")[0].text),
    "width": int(tree.findall("./size/width")[0].text),
    "annotations": [{"category_id": class_names.index(cls), "bbox": bbox, "bbox_mode": BoxMode.XYXY_ABS},
    {"category_id": class_names.index(cls), "bbox": bbox, "bbox_mode": BoxMode.XYXY_ABS},
    ...]},
    {...},{...},...

]

}

class _DatasetCatalog(UserDict):
    """
    A global dictionary that stores information about the datasets and how to obtain them.

    It contains a mapping from strings
    (which are names that identify a dataset, e.g. "coco_2014_train")
    to a function which parses the dataset and returns the samples in the
    format of `list[dict]`.

    The returned dicts should be in Detectron2 Dataset format (See DATASETS.md for details)
    if used with the data loader functionalities in `data/build.py,data/detection_transform.py`.

    The purpose of having this catalog is to make it easy to choose
    different datasets, by just using the strings in the config.
    """

    def register(self, name, func):
        """
        Args:
            name (str): the name that identifies a dataset, e.g. "coco_2014_train".
            func (callable): a callable which takes no arguments and returns a list of dicts.
                It must return the same results if called multiple times.
        """
        assert callable(func), "You must register a function with `DatasetCatalog.register`!"
        assert name not in self, "Dataset '{}' is already registered!".format(name)
        self[name] = func

pascal_voc.py 中的 load_voc_instances 是上面的 func

#加载实例数据
#可修改一些文件位置
def load_voc_instances(dirname: str, split: str, class_names: Union[List[str], Tuple[str, ...]]):
    """
    Load Pascal VOC detection annotations to Detectron2 format.

    Args:
        dirname: Contain "Annotations", "ImageSets", "JPEGImages"
        split (str): one of "train", "test", "val", "trainval"
        class_names: list or tuple of class names
    """
    #加载 fileId
    with PathManager.open(os.path.join(dirname, "ImageSets", "Main", split + ".txt")) as f:
        fileids = np.loadtxt(f, dtype=np.str)

    # Needs to read many small annotation files. Makes sense at local
    annotation_dirname = PathManager.get_local_path(os.path.join(dirname, "Annotations/"))
    dicts = []
    for fileid in fileids:
        anno_file = os.path.join(annotation_dirname, fileid + ".xml")
        jpeg_file = os.path.join(dirname, "JPEGImages", fileid + ".jpg")

        try:
            with PathManager.open(anno_file) as f:
                tree = ET.parse(f)
        except:
            logger = logging.getLogger(__name__)
            logger.info('Not able to load: ' + anno_file + '. Continuing without aboarting...')
            continue

        r = {
            "file_name": jpeg_file,
            "image_id": fileid,
            "height": int(tree.findall("./size/height")[0].text),
            "width": int(tree.findall("./size/width")[0].text),
        }
        instances = []

        for obj in tree.findall("object"):
            cls = obj.find("name").text
            #className mapping
            if cls in VOC_CLASS_NAMES_COCOFIED:
                cls = BASE_VOC_CLASS_NAMES[VOC_CLASS_NAMES_COCOFIED.index(cls)]
            # We include "difficult" samples in training.
            # Based on limited experiments, they don't hurt accuracy.
            # difficult = int(obj.find("difficult").text)
            # if difficult == 1:
            # continue
            bbox = obj.find("bndbox")
            bbox = [float(bbox.find(x).text) for x in ["xmin", "ymin", "xmax", "ymax"]]
            # Original annotations are integers in the range [1, W or H]
            # Assuming they mean 1-based pixel indices (inclusive),
            # a box with annotation (xmin=1, xmax=W) covers the whole image.
            # In coordinate space this is represented by (xmin=0, xmax=W)
            bbox[0] -= 1.0
            bbox[1] -= 1.0
            instances.append(
                {"category_id": class_names.index(cls), "bbox": bbox, "bbox_mode": BoxMode.XYXY_ABS}
            )
        r["annotations"] = instances
        dicts.append(r)
    '''
    #dicts:[
    {
    "file_name": jpeg_file,# 00001.jog
    "image_id": fileid,# 00001
    "height": int(tree.findall("./size/height")[0].text),
    "width": int(tree.findall("./size/width")[0].text),
    "annotations": [{"category_id": class_names.index(cls), "bbox": bbox, "bbox_mode": BoxMode.XYXY_ABS},
    {"category_id": class_names.index(cls), "bbox": bbox, "bbox_mode": BoxMode.XYXY_ABS},
    ...]},
    {...},{...},...]
    '''
    return dicts

MetadataCatalog.get()

根据注释,metadata 为在程序执行时为单例模型,仅用于存放一些常量或共享信息,如类名等。通过get(name)调用。

class _MetadataCatalog(UserDict):
    """
    MetadataCatalog is a global dictionary that provides access to
    :class:`Metadata` of a given dataset.

    The metadata associated with a certain name is a singleton: once created, the
    metadata will stay alive and will be returned by future calls to ``get(name)``.

    It's like global variables, so don't abuse it.
    It's meant for storing knowledge that's constant and shared across the execution
    of the program, e.g.: the class names in COCO.
    """

    def get(self, name):
        """
        Args:
            name (str): name of a dataset (e.g. coco_2014_train).

        Returns:
            Metadata: The :class:`Metadata` instance associated with this name,
            or create an empty one if none is available.
        """
        assert len(name)
        r = super().get(name, None)
        if r is None:
            r = self[name] = Metadata(name=name)
        return r

metadata 在 buildin.py中的register_all_pascal_voc方法中先进行了注册,所以后续就可以拿来进行 get 了,其中存储的信息主要为下面内容:

"thing_classes":VOC_COCO_CLASS_NAMES
dirname=dirname, year=year, split=split

上述过程结束,基本就开始定义一些 sampler 和 dataloader 这些基本都是调用库的操作了,不过多赘述。 

  • 15
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值