Introduction
focus on two problem:locating object and train a high-capacity model with scarce label data
detection: sliding window detector
using the “recognition using region” paradigm
model: supervised Pre-training on a large auxiliary dataset, followed by domain specific fine-turning on a small dataset is a effective paradigm for learning high-capacity CNNs when dataset is scarce
Donahue: CNN can be used(without turning) as a black box feature extractor
Object detection with R-CNN
(1)categogy-independent region proposals
using selective search
https://ivi.fnwi.uva.nl/isis/publications/bibtexbrowser.php?key=UijlingsIJCV2013&bib=all.bib
(2)CNN
based on Alexnet
http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf
(3)SVM
Overlap threshold
Prepropocess for CNN
regardless of the size or aspect ratio of the candidate region, we warp all pixels in a tight bounding box around it to the required size. and prior to warping, we dilate the bounding box so that at the warped size there are exactly p pixels of warped image context around the original box.
2.2 Test-time detection
(1)all cnn parameters are shared across all categories
(2)cnn computer the low dimensional feature