deep learning object detection
Paper list from 2014 to 2019
Milestones
Object detector composed parts
- Input: Image, Patches, Image Pyramid
- Backbones: VGG16, ResNet-50, SpineNet, EfficientNet-B0/B7 , CSPResNeXt50, CSPDarknet53
- Neck:
- Additional blocks: SPP, ASPP, RFB, SAM
- Path-aggregation blocks: FPN, PAN, NAS-FPN, Fully-connected FPN, BiFPN, ASFF, SFAM
- Heads:
- Dense Prediction (one-stage):
- RPN, SSD, YOLO, RetinaNet (anchor based)
- CornerNet, CenterNet, MatrixNet, FCOS (anchor free)
- Sparse Prediction (two-stage):
- Faster R-CNN, R-FCN, Mask R-CNN (anchor based)
- RepPoints (anchor free)
- Dense Prediction (one-stage):
Detection methods category
Object detection steps
One-Stage
-
Extracts feature on all area of image, classify the objects,
localize bounding-box
Two-Stage
-
Generates category-independent region proposals,
extracts feature vector from each region proposal
-
Classify the objects, precisely bounding-box prediction (NMS)
Small object detection tricks
-
Framework for small object detection
- Multi-scale Feature Learning
-
Enhance the Receptive Fields (visual attention mechanisms)
-
Data Augmentation
- GAN-based Detection
- Flipping, cropping, rotating, scaling
-
Training Strategy
- Unsupervised object detection
- Weakly Supervised Object Detection
- Multi-Scale Training/Val/Test
- GPU accelerate
-
Context-based Detection
- Local context
- Global context
- Context interactive
-
Neural Architecture Search
- Stacking more pyramid networks
- Adding feature dimension
- Adopting high capacity architecture
-
Efficient post-processing methods
- Non maximum suppression (NMS)
- Soft-NMS
-
Deformable convolutional networks
-
Multi-task joint learning and optimization
- Object detection
- Semantic segmentation
- Instance segmentation
- Edge detection
- Highlight detection
-
Establish small object datasets
Performance table
FPS(Speed) index is related to the hardware spec(e.g. CPU, GPU, RAM, etc), so it is hard to make an equal comparison. The solution is to measure the performance of all models on hardware with equivalent specifications, but it is very difficult and time consuming.
Detector | COCO (mAP@IoU=0.5:0.95) | Published In |
---|---|---|
R-CNN | - | CVPR’14 |
Fast R-CNN | 19.7 | ICCV’15 |
Faster R-CNN | 21.9 | NIPS’15 |
YOLO v1 | - | CVPR’16 |
SSD | 31.2 | ECCV’16 |
R-FCN | 29.9 | NIPS’16 |
FPN | 36.2 | CVPR’17 |
YOLO v2 | - | CVPR’17 |
RetinaNet | 39.1 | ICCV’17 |
Mask R-CNN | 39.8 | ICCV’17 |
Soft-NMS | 40.9 | ICCV’17 |
YOLO v3 | 33.0 | arXiv’18 |
RefineDet | 41.8 | CVPR’18 |
Cascade R-CNN | 42.8 | CVPR’ 18 |
RFBNet | - | ECCV’18 |
Softer-NMS | - | arXiv’ 18 |
SNIPER | 43.5 | NIPS’ 18 |
M2Det | 44.2 | AAAI’19 |
Libra R-CNN | 43.0 | CVPR’19 |
FSAF | 44.6 | CVPR’19 |
ExtremeNet | 43.7 | CVPR’19 |
CenterNet | 45.1 | ICCV’19 |
FreeAnchor | 44.8 | NeurIPS’19 |
CBNet | 53.3 | AAAI’20 |
YOLOv4 | - | arXiv’20 |
ATSS | 50.7 | CVPR’ 20 |
Hit-Detector | 41.4 | CVPR’ 20 |
DetectoRS | 54.7 | arXiv’20 |
Performance on MS COCO
MS COCO detection evaluation metrics
2014
- [R-CNN] Rich feature hierarchies for accurate object detection and semantic segmentation | [CVPR’ 14] |
[pdf]
[official code - caffe]
CNN
2015
-
[Fast R-CNN] Fast R-CNN | [ICCV’ 15] |
[pdf]
[official code - caffe]
RoI
-
[Faster R-CNN, RPN] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks | [NIPS’ 15] |
[pdf]
[official code - caffe]
[unofficial code - tensorflow]
[unofficial code - pytorch]
Region Proposal Network (RPN)
NMS
2016
-
[YOLO v1] You Only Look Once: Unified, Real-Time Object Detection | [CVPR’ 16] |
[pdf]
[official code - c]
One-stage
-
[SSD] SSD: Single Shot MultiBox Detector | [ECCV’ 16] |
[pdf]
[official code - caffe]
[unofficial code - tensorflow]
[unofficial code - pytorch]
Multi-scale feature map
VGG16
NMS
-
[R-FCN] R-FCN: Object Detection via Region-based Fully Convolutional Networks | [NIPS’ 16] |
[pdf]
[official code - caffe]
[unofficial code - caffe]
2017
-
[FPN] Feature Pyramid Networks for Object Detection | [CVPR’ 17] |
[pdf]
[unofficial code - caffe]
Feature Pyramid Networks
-
[YOLO v2] YOLO9000: Better, Faster, Stronger | [CVPR’ 17] |
[pdf]
[official code - c]
[unofficial code - caffe]
[unofficial code - tensorflow]
[unofficial code - tensorflow]
[unofficial code - pytorch]
-
[RetinaNet] Focal Loss for Dense Object Detection | [ICCV’ 17] |
[pdf]
[official code - keras]
[unofficial code - pytorch]
[unofficial code - mxnet]
[unofficial code - tensorflow]
Focal Loss
-
[Mask R-CNN] Mask R-CNN | [ICCV’ 17] |
[pdf]
[official code - caffe2]
[unofficial code - tensorflow]
[unofficial code - tensorflow]
[unofficial code - pytorch]
-
[Soft-NMS] Improving Object Detection With One Line of Code | [ICCV’ 17] |
[pdf]
[official code - caffe]
Soft-NMS
2018
-
[YOLO v3] YOLOv3: An Incremental Improvement | [arXiv’ 18] |
[pdf]
[official code - c]
[unofficial code - pytorch]
[unofficial code - pytorch]
[unofficial code - keras]
[unofficial code - tensorflow]
-
[RefineDet] Single-Shot Refinement Neural Network for Object Detection | [CVPR’ 18] |
[pdf]
[official code - caffe]
[unofficial code - chainer]
[unofficial code - pytorch]
Combine one-stage and two-stage
-
[Cascade R-CNN] Cascade R-CNN: Delving into High Quality Object Detection | [CVPR’ 18] |
[pdf]
[official code - caffe]
Training Strategy
-
[RFBNet] Receptive Field Block Net for Accurate and Fast Object Detection | [ECCV’ 18] |
[pdf]
[official code - pytorch]
Enhance the Receptive Fields
-
[Softer-NMS] Softer-NMS: Rethinking Bounding Box Regression for Accurate Object Detection | [arXiv’ 18] |
[pdf]
Soft-NMS
-
[SNIPER] SNIPER: Efficient Multi-Scale Training | [NIPS’ 18] |
[pdf]
Training Strategy
2019
-
[M2Det] M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | [AAAI’ 19] |
[pdf]
[official code - pytorch]
Multi-scale Feature Learning
-
[Libra R-CNN] Libra R-CNN: Balanced Learning for Object Detection | [CVPR’ 19] |
[pdf]
Training Strategy
-
[FSAF] Feature Selective Anchor-Free Module for Single-Shot Object Detection | [CVPR’ 19] |
[pdf]
Anchor-Free
-
[ExtremeNet] Bottom-up Object Detection by Grouping Extreme and Center Points | [CVPR’ 19] |
[pdf]
|[official code - pytorch]
Instance Segmentation
-
[CenterNet] CenterNet: Keypoint Triplets for Object Detection | [ICCV’ 19] |
[pdf]
Keypoint-based detector
-
[FreeAnchor] FreeAnchor: Learning to Match Anchors for Visual Object Detection | [NeurIPS’ 19] |
[pdf]
Anchor-Free
2020
- [CBnet] Cbnet: A novel composite backbone network architecture for object detection | [AAAI’ 20] |
[pdf]
Composite Backbone Network
- [YOLOv4] YOLOv4: Optimal Speed and Accuracy of Object Detection | [arXiv’ 20] |
[pdf]
- Input: Mosaic data augmentation, Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT)
- BackBone: CSPDarknet53, Mish-activation, DropBlock regularization
- Neck: SPP block, PAN (path-aggregation block)
- Prediction: CIoU-loss, DIoU-NMS
- [ATSS] Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection | [CVPR’ 20] |
[pdf]
Anchor-Based
Training Strategy
- [Hit-Detector] Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection | [CVPR’ 20] |
[pdf]
Neural Architecture Search
- [DetectoRS] DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution | [arXiv’ 20] |
[pdf]
Recursive Feature Pyramid
Switchable Atrous Convolution
Instance Segmentation
Survey
- Recent advances in small object detection based on deep learning: A review
[pdf]
- A Survey of Deep Learning-based Object Detection
[pdf]
- Object Detection in 20 Y ears: A Survey
[pdf]
- Recent Advances in Deep Learning for Object Detection
[pdf]