[自用]目标检测综述学习

最新推荐文章于 2024-09-11 21:39:56 发布

IvoryTower152

最新推荐文章于 2024-09-11 21:39:56 发布

阅读量387

点赞数

分类专栏：学习笔记文章标签：目标检测计算机视觉深度学习

本文链接：https://blog.csdn.net/IvoryTower152/article/details/121984899

版权

学习笔记专栏收录该内容

5 篇文章 0 订阅

订阅专栏

A Survey of Deep Learning-based Object Detection

2021/12/15

the purpose of object detection: locating instances of semantic objects of a certain class

*object detection and domain-specific object detection

most of the state-of-the-art object detectors utilize deep learning networks as their backbone and detection network to extract features from input images (or videos), classification and localization respectively

well-researched domains of object detection include multi-categories detection, edge detection, salient object detection, pose detection, scene text detection, face detection and pedestrain detection etc

*benchmark: 一个领域公认的基准，具体表现为领域中论文一致使用的数据集、评价指标

two kinds of object detectors

two-stage: Faster R-CNN; one-stage: YOLO

two-stage detectors have high localization and object recognition accuracy, whereas the one-stage detectors achieve high inference speed

most of backbone networks for detection are the network for classification task taking out the last FC layer

2021/12/16

Two-stage Detectors

R-CNN (first deep learning-based detector)
Fast R-CNN (use of RoI Pooling)
Faster R-CNN (use of region proposal network/RPN, the use of multi-scale anchors)
Mask R-CNN (for instance segmentation task, use of feature pyramid network/FPN, use of RoIAlign)

*N+1-way classification layer, N for object classes and 1 for background

One-stage Detectors

YOLO (real-time detection of full images and webcam)
YOLOv2 (adopt a series of design decisions from past works with novel concepts， new backbone)
YOLOv3 (an improved version of YOLOv2)
SSD (a single-shot detector for multiple categories)
DSSD (a modified version of SSD)
RetinaNet (use of focal loss)
M2Det (have no idea about this)
RefineDet (have no idea about this)

detecting an object has to state that an object belongs to a specified class and locate it in the image

the localization of an object is typically represented by a bounding box

benchmarks

PASCAL VOC dataset (basic)
MS COCO benchmark (large in images per class)
ImageNet (large in class num)
VisDrone2018 (have no idea about this)
OpenImages V5 (have no idea about this)
Recall
Precision
Average Precision (AP)
mean Average Precision (mAP)

deep neural network based object detection piplines:

image pre-processing: resize raw data and perform data augmentation
feature extraction: a key step for further detection
classification and localization: concluding classification scores and bounding box coordinates
post-processing: delete any weak detecting results (like NMS)

to obtain precise detection results, there exists several methods can be used alone or in combination with other methods:

Enhanced features: for extracting effective features from input images (like FPN, Attention)
Increasing localization accuracy: design a novel loss function
Solving negatives-positives imbalance issue: for one-stage, like hard mining / add some item in classification loss
Improving post-processing NMS methods
Combining one-stage and two-stage detectors to make good results
Complicated scene solutions (have no idea about this)
Anchor-free: still a novel direction for further research
Training from scratch: 有的数据集就是需要从头训练才能保证稳定以及准确性
Designing new architecture
Speeding up detection
Achieving Fast and Accurate Detections

typical application areas: