Introduction
the accuracy of both detection and pose estimation hinges on three aspects:
- (1) the coverage of the 6D pose space in terms of viewpoint and scale,
- (2) the discriminative power of the feaures to tell objects and views apart
- (3) the robustness of matching towards clutter, illumination and occlusion.
CNN-base category
The input space is dense on the whole image and the output space is discretized into many overlapping bounding boxes of varying shapes and sizes.
allows for smooth scale search over many differently-sized feature maps and simultaneous classification of all boxes in a single pass.
Goal:
develop a deep network for object detection that can accurately deal with 3D models and 6D pose estimation by assuming an RGB image as unique input at test time
The concept of SSD
- (1) a training stage that makes use of synthetic 3D model information only
- (2) a decomposition of the model pose space that allows for easy training and handling of symmetries
- (3) an extension of SSD that produces 2D detections and infers proper 6D poses