计算机研究生是怎样上课的_在上课之前，它是一个对象

最新推荐文章于 2023-08-17 20:30:16 发布

weixin_26720549

最新推荐文章于 2023-08-17 20:30:16 发布

阅读量219

点赞数

文章标签： python

原文链接：https://medium.com/ai-in-plain-english/before-it-became-a-class-it-was-an-object-706c88fee299

版权

计算机研究生是怎样上课的

介绍(Introduction)

With the advent of wide-spread availability of training data and computational resources, computer vision and object detection have become increasingly more accessible. However, specific real-world use cases such as food detection require new classes all too often. Isn’t it annoying for your food to get cold while you wait for the data gathering process to finish?

随着训练数据和计算资源的广泛普及，计算机视觉和目标检测变得越来越容易获得。但是，特定的实际用例(例如食物检测)经常需要新的类。等待数据收集过程完成时，食物变冷是否令人讨厌？

In this blog post we present an extension to Facebook’s Detectron2 framework to detect new unseen objects, thus speeding up the data gathering process. We illustrate this below, using the detection output of a model pre-trained on the COCO dataset that has not yet seen either this type of bun or bounty chocolate bars.

在此博客文章中，我们介绍了Facebook Detectron2框架的扩展，以检测新的看不见的对象，从而加快了数据收集过程。我们使用在COCO数据集上预训练的模型的检测输出在下面进行说明，该模型尚未看到这种面包或赏金巧克力棒。

情境与动机 (Context and Motivation)

Finding a robust definition for the ideal dataset to use for object detection is difficult as the use cases and success metrics can vary, i.e. accuracy vs. generalisation. Typically, they have a balanced class distribution, enough images per class and quality annotations. However, assembling such datasets is an even more daunting task.

由于用例和成功指标可能会发生变化，即准确度与一般化程度会发生变化，因此很难找到用于目标检测的理想数据集的可靠定义。通常，它们具有均衡的类分配，每个类足够的图像和质量注释。但是，组装这样的数据集是一项更加艰巨的任务。

Data doesn’t come for free and it costs both time and money.

数据不是免费提供的，它既浪费时间又浪费金钱。

For example, adding a new class into an already curated dataset, means ensuring we have enough annotations and class instances as well as class balance, such that performance is not jeopardised. Thus, both acquisition and labelling present a bottleneck.

例如，将一个新的类添加到一个已整理的数据集中，意味着确保我们有足够的批注和类实例以及类平衡，以免影响性能。因此，获取和标记都存在瓶颈。

Google’s AI Data Labelling Service charges 25$ for 1000 classification annotations and 49$ for 1000 bounding box annotations. For a robust dataset of 10000 annotations, having the object localisation translates into a cost saving of 240$.

Google的AI数据标签服务对1000个分类注释收费25美元，对1000个边界框注释收费49美元。对于包含10000个批注的强大数据集，将对象本地化可以节省240美元的成本。

Our solution can help speed up this process considerably by reducing the number of actions needed to obtain object bounding boxes, and by improving the on-boarding process of a new class in the detection pipeline.

我们的解决方案可以通过减少获取对象边界框所需的动作数量，以及改善检测管道中新类的入门过程，来帮助大大加快此过程。

实作 (Implementation)

Detectron2 is Facebook’s AI Research framework for implementing Computer Vision algorithms. Designed to switch between tasks with ease, going from object detection to semantic segmentation or keypoint detection with a small change in a config file, Detectron2 offers state-of-the-art implementations for algorithms such as FasterRCNN and RetinaNet.

Detectron2是Facebook的AI研究框架，用于实现计算机视觉算法。 Detectron2旨在轻松切换任务，从对象检测到语义分段或关键点检测，只需对配置文件进行少量更改，即可为FasterRCNN和RetinaNet等算法提供最新的实现。

As it is described in this great explanation about the inner workings of Detectron, Detectron2 FasterRCNN-FPN is composed of the following building blocks:

正如在有关Detectron内部工作原理的出色解释中所描述的那样，Detectron2 FasterRCNN-FPN由以下构件组成：

Backbone network
骨干网
Region proposal network
区域提案网
ROI Heads (Box Head)
ROI头(盒头)

FasterRCNN computes a score for each RPN region which defines the confidence of an object being present in that region.

FasterRCNN为每个RPN区域计算一个分数，该分数定义该区域中存在的对象的置信度。

The regions with the best objectness scores will be classified and turned into class predictions if the class score is better than the class confidence threshold. Predictions that are below the class confidence threshold but above the objectness score threshold will be referred to as generic object predictions.

如果分类得分优于分类置信度阈值，则将具有最佳客观评分的区域分类并转换为分类预测。低于类别置信度阈值但高于客观分数阈值的预测将称为通用对象预测。

Our proposed solution extracts the objectness score of predictions from the head of the model. We use these scores to get predictions of object bounding boxes for classes unseen by the model in its training set. A prerequisite is that new object representations should live within the same representation space as the trained classes, i.e. the solution can find an apple’s location in an image with a model trained on a vegetable dataset that has seen tomatoes and potatoes. This gets us one click away from getting the image detection annotations, skipping the drawing process of a bounding box.

我们提出的解决方案从模型的头部提取预测的客观性得分。我们使用这些分数来获取模型在其训练集中看不到的类的对象边界框的预测。前提是新的对象表示应该与受过训练的类位于相同的表示空间内，即，该解决方案可以使用在看到西红柿和土豆的蔬菜数据集上训练的模型，在图像中找到苹果的位置。这使我们只需单击一下即可获得图像检测注释，从而跳过了边界框的绘制过程。

In order to achieve our goal, we have to modify ROI Heads so that it will output generic predictions. In Detectron2, ROI Heads is represented by StandardROIHeads class which contains theFastRCNNOutputLayers class which predicts bounding boxes and classification scores based on the region proposals.

为了实现我们的目标，我们必须修改ROI Heads，以便它可以输出常规预测。在Detectron2中，ROI Heads由StandardROIHeads类表示，该类包含FastRCNNOutputLayers类，该类根据区域建议预测边界框和分类分数。

Region proposals are the output of FastRCNNConvFCHead (another component of StandardROIHeads) and they have associated objectness logits which represent how likely an object is to be there. Because we want to obtain the generic predictions based on these scores, we create our own class GenericFastRCNNOutputLayers that subclasses FastRCNNOutputLayers. The roles of this class are to:

区域提议是FastRCNNConvFCHead ( StandardROIHeads另一个组件)的输出，它们具有关联的对象日志，表示对象存在的可能性。因为我们希望获得基于这些成绩一般的预测，我们创造我们自己的类GenericFastRCNNOutputLayers子类FastRCNNOutputLayers 。此类的作用是：

Obtain class predictions based on the detection scores, filter them using an established score threshold, and then apply NMS.
根据检测分数获取类别预测，使用已建立的分数阈值对其进行过滤，然后应用NMS。

def _get_class_predictions(self, boxes, scores, image_shape):        num_bbox_reg_classes = boxes.shape[1] // 4        # Convert to Boxes to use the `clip` function ...
        boxes = Boxes(boxes.reshape(-1, 4))
        boxes.clip(image_shape)
        boxes = boxes.tensor.view(-1, num_bbox_reg_classes, 4)  # R x C x 4        # Filter results based on detection scores
        filter_mask = scores > self.class_score_thresh_test        # R' x 2. First column contains indices of the R predictions;
        # Second column contains indices of classes.
        class_inds = filter_mask.nonzero()
        if num_bbox_reg_classes == 1:
            boxes = boxes[class_inds[:, 0], 0]
        else:
            boxes = boxes[filter_mask]
        scores = scores[filter_mask]        # Apply per-class NMS
        keep_class = batched_nms(boxes, scores, class_inds[:, 1],
                                 self.class_nms_thresh_test)
        if self.topk_per_image_test >= 0:
            keep_class = keep_class[:self.topk_per_image_test]        boxes, scores, class_inds = boxes[keep_class], scores[
            keep_class], class_inds[keep_class]        return boxes, scores, class_inds

Obtain generic predictions based on objectness logits, filter them by an objectness score threshold, filter the results that overlap with class predictions, then apply NMS on the final generic predictions. Usually, the values of the objectness threshold are above > 3 which indicates the presence of an object in the image.
根据客观性对数获取通用预测，通过客观性得分阈值对其进行过滤，过滤与类预测重叠的结果，然后将NMS应用于最终的通用预测。通常，客观性阈值大于> 3，这表示图像中存在物体。

def _get_generic_predictions(
        self, proposals: Instances, class_boxes: torch.FloatTensor,
        class_scores: torch.FloatTensor, class_inds: torch.FloatTensor,
        generic_idx: int
    ) -> (torch.FloatTensor, torch.FloatTensor, torch.IntTensor):        #####
        # Per object
        objectness = proposals.objectness_logits.reshape(
            (proposals.objectness_logits.shape[0], 1))        obj_boxes = proposals.proposal_boxes.tensor        # Filter by objectness threshold
        filter_object_mask = objectness > self.objectness_score_thresh_test        filter_obj_inds = filter_object_mask.nonzero()
        obj_boxes = obj_boxes[filter_obj_inds[:, 0]]        # Filter generic objects that overlap with class predictions
        generic_mask = self._find_generic_objects_suppression_mask(
            class_boxes, obj_boxes, self.objectness_nms_thresh_test)        objectness = objectness[filter_object_mask]        generic_boxes = obj_boxes[generic_mask]
        generic_inds = filter_obj_inds[:][generic_mask]
        generic_scores = objectness[generic_mask]        # Attribute generic id to selected predictions
        generic_inds[:, 1] = generic_idx        # Apply NMS to generic predictions
        nms_filtered = batched_nms(generic_boxes, generic_scores,
                                   generic_inds[:, 1],
                                   self.objectness_nms_thresh_test)        generic_boxes = generic_boxes[nms_filtered]
        generic_inds = generic_inds[:][nms_filtered]
        generic_scores = generic_scores[nms_filtered]        # Keep top detections - detected classes have priority
        if self.topk_per_image_test >= 0:
            remaining_objects = self.topk_per_image_test - len(class_boxes)
            sorted_generic = np.argsort(generic_scores)
            sorted_generic = sorted_generic[:remaining_objects]            generic_boxes = generic_boxes[sorted_generic]
            generic_inds = generic_inds[sorted_generic]
            generic_scores = generic_scores[sorted_generic]        return generic_boxes, generic_scores, generic_inds

The parameters we have added to the config of our generic object detection model are:

我们添加到通用对象检测模型的配置中的参数是：

cfg.MODEL.ROI_HEADS.OBJECTNESS_NMS_THRESH_TEST: this is the NMS cutoff IoU at which we discard generic object predictions that overlap with other generic object detections. We remove the predictions with the lowest prediction scores of the overlapping ones. We also use this threshold for suppressing generic object predictions that overlap with class instance predictions. Class instance predictions take priority over generic object predictions since they are more specific, so we remove the generic object predictions that have IoU larger than the threshold with class instances.
cfg.MODEL.ROI_HEADS.OBJECTNESS_NMS_THRESH_TEST ：这是NMS截止IuU，在该cfg.MODEL.ROI_HEADS.OBJECTNESS_NMS_THRESH_TEST处，我们丢弃了与其他通用对象检测重叠的通用对象预测。我们删除重叠得分最低的预测得分的预测。我们还使用此阈值来抑制与类实例预测重叠的通用对象预测。类实例预测比通用对象预测具有更高的优先级，因为它们更具体，因此我们删除了IoU大于类实例阈值的通用对象预测。
cfg.MODEL.ROI_HEADS.OBJECTNESS_SCORE_THRESH_TEST: used as a confidence threshold for objectness values returned by the network. It is similar in functionality to SCORE_THRESH_TEST but for objectness values rather than class prediction probabilities.
cfg.MODEL.ROI_HEADS.OBJECTNESS_SCORE_THRESH_TEST ：用作网络返回的cfg.MODEL.ROI_HEADS.OBJECTNESS_SCORE_THRESH_TEST值的置信度阈值。它的功能与SCORE_THRESH_TEST相似，但具有SCORE_THRESH_TEST值而不是类预测概率。

演示版 (Demo)

We have provided a fork of the Detectron2 repo with an implementation of the generic object detection solution. The Detectron2 demo is described more in depth in the repo’s documentation.

我们提供了Detectron2存储库的分支，其中包含通用对象检测解决方案的实现。在回购文档中将对Detectron2演示进行更深入的描述。

Detectron2 comes with a set of predefined configs that allow model customisation. Here is the config we used to test the generic object detection:

Detectron2随附了一组预定义的配置，这些配置允许自定义模型。这是我们用来测试通用对象检测的配置：

_BASE_: "../Base-RCNN-FPN.yaml"
MODEL:
  WEIGHTS: "detectron2://COCO-Detection/faster_rcnn_R_50_FPN_1x/137257794/model_final_b275ba.pkl"
  MASK_ON: False
  RESNETS:
    DEPTH: 50
  DEVICE: "cpu"
  ROI_HEADS:
    NAME: "GenericROIHeads"

We take the base RCNN with Feature Pyramid Network config and use a model with a Resnet50 backbone pre-trained on the COCO dataset. The change that affects the objectness detections is the ROI_HEADS.

我们将基础RCNN与功能金字塔网络配置一起使用，并使用在COCO数据集上预先训练有Resnet50主干的模型。影响对象检测的更改是ROI_HEADS 。

To run the demo, we can use the following command:

要运行演示，我们可以使用以下命令：

python demo/demo.py --input list_of_image_paths --config-file configs/COCO-Generic-Detection/faster_rcnn_R_50_FPN_1x.yaml

The parameters to the demo that we have introduced are:

我们介绍的演示的参数是：

--objectness-score-threshold: which defaults to 4.5 (explained in the implementation section)
--objectness-score-threshold ：默认为4.5(在实现部分中说明)
--objectness-nms: which defaults to 0.5 (explained in the implementation section)
--objectness-nms:默认为0.5(在实现部分中说明)

Below, we present another example of the model’s predictions using our demo. None of the objects in the image are represented in the training set. The COCO dataset contains objects of various sizes, such as dinning tables, which make our model predict objects that may not be of interest to us (ex. the largest bounding box in the image below).

下面，我们使用demo演示模型预测的另一个示例。图像中没有任何对象在训练集中表示。 COCO数据集包含各种尺寸的对象(例如餐桌)，这些对象使我们的模型可以预测我们可能不感兴趣的对象(例如，下图中最大的边界框)。

Image for post — None of the objects in the image are represented in the training set. The COCO dataset contains objects of various sizes, such as dinning tables, which make our model predict objects that may not be of interest to us (ex. the largest bounding box in this image).

结论 (Conclusion)

Generic object predictions are a powerful tool that make more use of the power of representations of deep CNNs and add scalability to a model in terms of detected objects. They reduce the time and resources required to define bounding boxes for new object instances.

通用对象预测是一种强大的工具，可以充分利用深层CNN的表示功能，并根据检测到的对象为模型增加可伸缩性。它们减少了为新对象实例定义边界框所需的时间和资源。

We have shown how obtaining a generic object detector is straightforward using existing tools (FasterRCNN in Detectron2) and without adding extra layers of complexity or changing the inference time. When deploying this model, having no extra inference time is critical.

我们已经展示了如何使用现有工具(Detron 2中的FasterRCNN)直接获得通用对象检测器，而又不会增加额外的复杂性或更改推理时间。部署此模型时，没有额外的推理时间至关重要。

What other uses cases in Computer Vision do you see where generic predictions can help? Leave a comment with your thoughts.

您会在《计算机视觉》中找到哪些其他用例呢？通用预测可以在哪些方面提供帮助？对您的想法发表评论。

In our next posts, we will look at applications of generic object detectors and see how they can be turned into valuable class-specific annotations without any human intervention.

在我们的下一篇文章中，我们将研究通用对象检测器的应用程序，并了解如何在没有任何人工干预的情况下将它们变成有价值的特定于类的注释。

—

Written by Daniela Palcu and Flaviu Samarghitan, Computer Vision Engineers at Neurolabs

由Neurolabs的计算机视觉工程师Daniela Palcu和Flaviu Samarghitan撰写

At Neurolabs, we believe that the lack of widespread adoption of machine learning is due to a lack of data. We use computer graphics, much like the special effects or video game industry, to produce realistic images at scale. In 5 minutes, you can have 10,000 images tailored for your problem. But we don’t stop there — we kick-start the machine learning training, and allow any industry to implement ready-made Computer Vision algorithms without an army of human annotators.

在Neurolabs，我们认为缺乏广泛采用机器学习的原因是缺乏数据。 我们使用计算机图形学来制作大规模逼真的图像，就像特殊效果或视频游戏行业一样。 在5分钟内，您可以为您的问题量身定制10,000张图像。 但是我们不止于此–我们启动了机器学习培训，并允许任何行业在没有大量人工注释者的情况下实现现成的计算机视觉算法。