Detecron2源码阅读4-Evaluator

Detectron2 包括一些DatasetEvaluator使用标准数据集特定 API(例如 COCO、LVIS)计算指标的工具。COCOEvaluator能够评估任何自定义数据集上框检测、实例分割、关键点检测的 AP(平均精度)。SemSegEvaluator能够评估任何自定义数据集上的语义分割指标。

如何实现自定义的评估器呢?

DatasetEvaluator 和 DatasetEvaluators 类:


class DatasetEvaluator:
    def reset(self):
        pass

    def process(self, inputs, outputs):
        pass

    def evaluate(self):
        pass

我这里删去了注释部分
process的实现:

for input_, output in zip(inputs, outputs):
    # do evaluation on single input/output pair
    ...
    # inputs (list): the inputs that's used to call the model.
    # outputs (list): the return value of `model(inputs)`

evaluate的实现:
返回字典


class DatasetEvaluators(DatasetEvaluator):
    def __init__(self, evaluators):
        super().__init__()
        self._evaluators = evaluators

    def reset(self):
        for evaluator in self._evaluators:
            evaluator.reset()

    def process(self, inputs, outputs):
        for evaluator in self._evaluators:
            evaluator.process(inputs, outputs)

    def evaluate(self):
        results = OrderedDict()
        for evaluator in self._evaluators:
            result = evaluator.evaluate()
            if is_main_process() and result is not None:
                for k, v in result.items():
                    assert (
                        k not in results
                    ), "Different evaluators produce results with the same key {}".format(k)
                    results[k] = v
        return results

DatasetEvaluator 是一个抽象基类,用于定义数据集评估器的接口。它包含了三个方法:reset()process(inputs, outputs)evaluate()。用户需要继承这个类,并实现这些方法来定义自己的数据集评估逻辑。
DatasetEvaluators 类是 DatasetEvaluator 的一个实现,用于将多个数据集评估器组合起来。它接受一个 evaluators 参数,该参数是一个评估器列表,它会在 evaluate() 方法中遍历所有评估器的结果,并将它们合并成一个字典返回。

inference_on_dataset

inference_on_dataset 函数是用于在给定数据集上运行模型推理并评估结果的主要函数。以下是对函数的详细解释:


def inference_on_dataset(
    model,
    data_loader,
    evaluator: Union[DatasetEvaluator, List[DatasetEvaluator], None],
    callbacks=None,
):
    """
    Run model on the data_loader and evaluate the metrics with evaluator.
    Also benchmark the inference speed of `model.__call__` accurately.
    The model will be used in eval mode.

    Args:
        model (callable): a callable which takes an object from
            `data_loader` and returns some outputs.

            If it's an nn.Module, it will be temporarily set to `eval` mode.
            If you wish to evaluate a model in `training` mode instead, you can
            wrap the given model and override its behavior of `.eval()` and `.train()`.
        data_loader: an iterable object with a length.
            The elements it generates will be the inputs to the model.
        evaluator: the evaluator(s) to run. Use `None` if you only want to benchmark,
            but don't want to do any evaluation.
        callbacks (dict of callables): a dictionary of callback functions which can be
            called at each stage of inference.

    Returns:
        The return value of `evaluator.evaluate()`
    """
    num_devices = get_world_size()
    logger = logging.getLogger(__name__)
    logger.info("Start inference on {} batches".format(len(data_loader)))

    total = len(data_loader)  # inference data loader must have a fixed length
    if evaluator is None:
        # create a no-op evaluator
        evaluator = DatasetEvaluators([])
    if isinstance(evaluator, abc.MutableSequence):
        evaluator = DatasetEvaluators(evaluator)
    evaluator.reset()

    num_warmup = min(5, total - 1)
    start_time = time.perf_counter()
    total_data_time = 0
    total_compute_time = 0
    total_eval_time = 0
    with ExitStack() as stack:
        if isinstance(model, nn.Module):
            stack.enter_context(inference_context(model))
        stack.enter_context(torch.no_grad())

        start_data_time = time.perf_counter()
        dict.get(callbacks or {}, "on_start", lambda: None)()
        for idx, inputs in enumerate(data_loader):
            total_data_time += time.perf_counter() - start_data_time
            if idx == num_warmup:
                start_time = time.perf_counter()
                total_data_time = 0
                total_compute_time = 0
                total_eval_time = 0

            start_compute_time = time.perf_counter()
            dict.get(callbacks or {}, "before_inference", lambda: None)()
            outputs = model(inputs)
            dict.get(callbacks or {}, "after_inference", lambda: None)()
            if torch.cuda.is_available():
                torch.cuda.synchronize()
            total_compute_time += time.perf_counter() - start_compute_time

            start_eval_time = time.perf_counter()
            evaluator.process(inputs, outputs)
            total_eval_time += time.perf_counter() - start_eval_time

            iters_after_start = idx + 1 - num_warmup * int(idx >= num_warmup)
            data_seconds_per_iter = total_data_time / iters_after_start
            compute_seconds_per_iter = total_compute_time / iters_after_start
            eval_seconds_per_iter = total_eval_time / iters_after_start
            total_seconds_per_iter = (time.perf_counter() - start_time) / iters_after_start
            if idx >= num_warmup * 2 or compute_seconds_per_iter > 5:
                eta = datetime.timedelta(seconds=int(total_seconds_per_iter * (total - idx - 1)))
                log_every_n_seconds(
                    logging.INFO,
                    (
                        f"Inference done {idx + 1}/{total}. "
                        f"Dataloading: {data_seconds_per_iter:.4f} s/iter. "
                        f"Inference: {compute_seconds_per_iter:.4f} s/iter. "
                        f"Eval: {eval_seconds_per_iter:.4f} s/iter. "
                        f"Total: {total_seconds_per_iter:.4f} s/iter. "
                        f"ETA={eta}"
                    ),
                    n=5,
                )
            start_data_time = time.perf_counter()
        dict.get(callbacks or {}, "on_end", lambda: None)()

    # Measure the time only for this worker (before the synchronization barrier)
    total_time = time.perf_counter() - start_time
    total_time_str = str(datetime.timedelta(seconds=total_time))
    # NOTE this format is parsed by grep
    logger.info(
        "Total inference time: {} ({:.6f} s / iter per device, on {} devices)".format(
            total_time_str, total_time / (total - num_warmup), num_devices
        )
    )
    total_compute_time_str = str(datetime.timedelta(seconds=int(total_compute_time)))
    logger.info(
        "Total inference pure compute time: {} ({:.6f} s / iter per device, on {} devices)".format(
            total_compute_time_str, total_compute_time / (total - num_warmup), num_devices
        )
    )

    results = evaluator.evaluate()
    # An evaluator may return None when not in main process.
    # Replace it by an empty dict instead to make it easier for downstream code to handle
    if results is None:
        results = {}
    return results

  1. 参数

    • model: 进行推理的模型。这个模型可以是一个可以调用的对象,接受数据集中的输入并返回输出。通常情况下,它是一个神经网络模型(torch.nn.Module)。
    • data_loader: 数据加载器,用于生成数据集中的样本。这个数据加载器必须是一个可迭代对象,并且具有固定的长度。
    • evaluator: 用于评估模型推理结果的评估器。可以是单个的评估器,也可以是评估器列表。如果传入 None,则表示只进行推理而不进行评估。
    • callbacks: 回调函数字典,包含在推理过程中不同阶段调用的回调函数。
  2. 功能

    • 函数首先通过 get_world_size() 获取可用的设备数量,并初始化日志记录器和一些计时器。
    • 如果 evaluatorNone,则创建一个空的评估器;如果是列表,则将多个评估器组合成一个 DatasetEvaluators 对象。
    • 接着函数开始推理过程。对数据加载器中的每个批次数据进行如下操作:
      • 计算数据加载时间。
      • 如果达到预热次数(默认为5),则开始计时整个推理过程。
      • 执行模型推理,记录计算时间。
      • 如果 GPU 可用,进行同步。
      • 使用评估器处理模型的输出。
      • 计算评估器的评估时间。
      • 计算并输出每个迭代周期的数据加载时间、计算时间、评估时间以及总时间。
    • 最后,函数返回评估器的评估结果。
  3. 总结

    • inference_on_dataset 函数提供了一个方便的接口,用于在给定数据集上对模型进行推理并评估结果。
    • 它处理了推理过程中的数据加载、模型推理、评估器处理和计时统计等操作。
    • 通过传入不同的模型、数据加载器和评估器,可以轻松地在不同的数据集上进行模型评估和比较。

Evaluator

怎么自定义一个Evaluator呢?

        def test_and_save_results():
            self._last_eval_results = self.test(self.cfg, self.model)
            return self._last_eval_results

        # Do evaluation after checkpointer, because then if it fails,
        # we can use the saved checkpoint to debug.
        ret.append(hooks.EvalHook(cfg.TEST.EVAL_PERIOD, test_and_save_results))

如何在训练进行或结束时,对模型进行评估呢?
Detectron2中给出的例子,是在build_hooks的时候,通过调用self.test的方法来获得eval的结果。那么self.test是怎么写的呢?

    @classmethod
    def test(cls, cfg, model, evaluators=None):
        """
        Evaluate the given model. The given model is expected to already contain
        weights to evaluate.

        Args:
            cfg (CfgNode):
            model (nn.Module):
            evaluators (list[DatasetEvaluator] or None): if None, will call
                :meth:`build_evaluator`. Otherwise, must have the same length as
                ``cfg.DATASETS.TEST``.

        Returns:
            dict: a dict of result metrics
        """
        logger = logging.getLogger(__name__)
        if isinstance(evaluators, DatasetEvaluator):
            evaluators = [evaluators]
        if evaluators is not None:
            assert len(cfg.DATASETS.TEST) == len(evaluators), "{} != {}".format(
                len(cfg.DATASETS.TEST), len(evaluators)
            )

        results = OrderedDict()
        for idx, dataset_name in enumerate(cfg.DATASETS.TEST):
            data_loader = cls.build_test_loader(cfg, dataset_name)
            # When evaluators are passed in as arguments,
            # implicitly assume that evaluators can be created before data_loader.
            if evaluators is not None:
                evaluator = evaluators[idx]
            else:
                try:
                    evaluator = cls.build_evaluator(cfg, dataset_name)
                except NotImplementedError:
                    logger.warn(
                        "No evaluator found. Use `DefaultTrainer.test(evaluators=)`, "
                        "or implement its `build_evaluator` method."
                    )
                    results[dataset_name] = {}
                    continue
            results_i = inference_on_dataset(model, data_loader, evaluator)
            results[dataset_name] = results_i
            if comm.is_main_process():
                assert isinstance(
                    results_i, dict
                ), "Evaluator must return a dict on the main process. Got {} instead.".format(
                    results_i
                )
                logger.info("Evaluation results for {} in csv format:".format(dataset_name))
                print_csv_format(results_i)

        if len(results) == 1:
            results = list(results.values())[0]
        return results

self.test的传入参数为cfg, model, evaluators=None,我们先忽略这个cfg。

  1. 首先 evaluators必须为list[DatasetEvaluator] or None,如果只有一个DatasetEvaluator,就用[]把它变成list;
  2. 判断len(cfg.DATASETS.TEST) == len(evaluators),虽然我还没搞明白这个cfg.DATASETS.TEST应该怎么写
  3. 返回的results是一个OrderedDict(),对cfg.DATASETS.TEST进行遍历,并使用build_test_daloader方法构建data_loader。如果evaluators为None,调用cls.build_evaluator方法构建evaluator。
  4. 通过results_i = inference_on_dataset(model, data_loader, evaluator)得到结果;
  5. 如果是主进程,打印结果

之前踩了一个坑,就是没详细的看train_dataloader和test_dataloader的区别。

@configurable(from_config=_train_loader_from_config)
def build_detection_train_loader(
    dataset,
    *,
    mapper,
    sampler=None,
    total_batch_size,
    aspect_ratio_grouping=True,
    num_workers=0,
    collate_fn=None,
    **kwargs
):

@configurable(from_config=_test_loader_from_config)
def build_detection_test_loader(
    dataset: Union[List[Any], torchdata.Dataset],
    *,
    mapper: Callable[[Dict[str, Any]], Any],
    sampler: Optional[torchdata.Sampler] = None,
    batch_size: int = 1,
    num_workers: int = 0,
    collate_fn: Optional[Callable[[List[Any]], Any]] = None,
) -> torchdata.DataLoader:

他们用的sample一个是TrainingSampler,一个是InferenceSampler。

  • 23
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值