PAST: Combine multiplle low-level image features with high-level context
Key insights:
- CNN ---- bottom-up region proposals in order to localize and segment objects
- labeled training data is scare ---- supervised pre-training for anauxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
The conventional solution to this problem is to use unsupervised pre-training,followed by supervised fine-tuning .
supervised pre-training on a large auxiliary dataset(ILSVRC),followed by domain specific fine-tuning on a small dataset (PASCAL), is an effective paradigm for learning high-capacity CNNs when data is scarce.
Object detection with R-CNN
Three module:
- generates category-independent region proposals
- a large convolutional neural network that extracts a fixed-length feature vector from each region
- a set of class specific linear SVMs.
2.1 Module design
Region proposals:
selective search
Feature extraction
2.2 Test-time detection
proposal 2000 region proposals ——> for each class, score each extracted feature vector --> non-maximum suppression
Run-time analysis
efficient
- all CNN parameters are shared across all categories
- the feature vectors computed by the CNN are low-dimensional
2.3 Training
Supervised pre-training
pre-trained the CNN on a large auxiliary dataset (ILSVRC2012 classification) using image-level annotations only (boundingbox labels are not available for this data).
Domain-specific fine-tuning
replacing the CNN’s ImageNetspecific 1000-way classification layer with a randomly initialized (N + 1)-way classification layer (where N is the number of object classes, plus 1 for background)