Object Detection in 20 Years: A Survey 综述论文笔记

最新推荐文章于 2024-07-31 19:07:12 发布

二旬丶老汉

最新推荐文章于 2024-07-31 19:07:12 发布

阅读量339

点赞数 1

分类专栏：论文学习

本文链接：https://blog.csdn.net/qq1539543073/article/details/107785586

版权

论文学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

目标检测综述，做一些笔记记录一下

一、INTRODUCTION

1、From the application point of view, object detection can be grouped into two research topics “general object detection” and “detection applications”, where the former one aims to explore the methods of detecting different types of objects under a unified framework to simulate the human vision and cognition, and the later one refers to the detection under specific application scenarios, such as pedestrian detection,face detection, text detection, etc.

2、 After years of development,the state of the art object detection systems have been integrated with a large number of techniques such as “multiscale detection”, “hard negative mining”, “bounding box regression”, etc.

3、 The acceleration of object detection has long been a crucial but challenging task.

4、As different detection tasks have totally different objectives and constraints, their difficulties may vary from each other. In addition to some common challenges in other computer vision tasks such as objects under different viewpoints,illuminations, and intraclass variations, the challenges in object detection include but not limited to the following aspects: object rotation and scale changes (e.g., small objects), accurate object localization, dense and occluded object detection, speed up of detection, etc.

二、OBJECT DETECTION IN 20 YEARS

1、In the past two decades, it is widely accepted that the progress of object detection has generally gone through two historical periods: “traditional object detection period (before 2014)” and “deep learning based detection period (after 2014)”

2、 Most of the early object detection algorithms were built based on handcrafted features. Due to the lack of effective image representation at that time, people have no choice but to design sophisticated feature representations, and a variety of speed up skills to exhaust the usage of limited computing resources.

3、Viola Jones Detectors：The VJ detector has dramatically improved its detection speed by incorporating three important techniques: “integral image”, “feature selection”, and “detection cascades”.

4、HOG Detector：HOG can be considered as an important improvement of the scale-invariant feature transform and shape contexts of its time.

5、Deformable Part-based Model (DPM)：DPM was originally proposed by P. Felzenszwalb in 2008 as an extension of the HOG detector, and then a variety of improvements have been made by R. Girshick.The DPM follows the detection philosophy of “divide and conquer”, where the training can be simply considered as the learning of a proper way of decomposing an object,
and the inference can be considered as an ensemble of detections on different object parts.

6、Milestones: CNN based Two-stage Detectors：In 2012, the world saw the rebirth of convolutional neural networks；R.Girshick et al. took the lead to break the deadlocks in 2014 by proposing the Regions with CNN features (RCNN) for object detection.

7、RCNN：Although RCNN has made great progress, its drawbacks are obvious: the redundant feature computations on a large number of overlapped proposals (over 2000 boxes from one image) leads to an extremely slow detection speed (14s per image with GPU).

8、SPPNet：The main contribution of SPPNet is the introduction of a Spatial Pyramid Pooling (SPP，空间金字塔池) layer, which enables a CNN to generate a fixed-length representation regardless of the size of image/region of interest without rescaling it.

9、Fast RCNN：Fast RCNN enables us to simultaneously train a detector and a bounding box regressor under the same network configurations.

10、Faster RCNN： Faster RCNN is the first end-to-end, and the first near-realtime deep learning detector. The main contribution of Faster-RCNN is the introduction of Region Proposal Network (RPN) that enables nearly cost-free region proposals.

11、Feature Pyramid Networks：Although the features in deeper layers of a CNN are beneficial for category recognition, it is not conducive to localizing objects. To this end, a top-down architecture with lateral connections is developed in FPN for building high-level semantics at all scales. FPN has now become a basic building block of many latest detectors.

12、未完