TITLE: Object Detection by Labelling Superpixels
AUTHOR: Yan, Junjie and Yu, Yinan and Zhu, Xiangyu and Lei, Zhen and Li, Stan Z.
FROM: CVPR2015
CONTRIBUTIONS
- Convert object detection problem into super-pixel labelling problem, which could avoid false negatives caused by proposals and could take advantages from global contexts.
- Conduct an energy function considering appearance, spatial context and numbers of labels.
METHOD
- The image is partitioned into a set of super-pixels, denoted as P={p1,p2,...,pN} .
- An energy function E(L) is calculated to measure the corresponding label configuration for each super-pixels, where L={l1,l2,...,lN} .
- The problem is transfered to select an L to minimise E(L) .
SOME DETAILS
The energy function is conducted as
where D(li,pi) is the data cost to capture the appearance of pi and measure its cost of belonging to label li , V(li,lj,pi,pj) is the pairwise smooth cost in the local area N and C(L) is the label cost to encourage compact detection and to punish the number of labels.
Data Cost
Super-pixels usually does not have enough semantic information, so corresponding regions are classified and their costs are propagated to super-pixels. In this work, RCNN is used to generate and classify semantic regions. The region set of
T
elements is denoted as
where α is set to 1.5 empirically. For each super-pixel the data cost is the weighted sum of T smallest costs,
where R(pi)t is the region pi belongs to with the t -th smallest cost.
Smooth Cost
The smooth cost is conducted for the reason that 1) adjacent super-pixels often have the same label and 2) super-pixels belonging to the same label should have similar apprearance. This attribute is measured by
where
Vl
is a boolean variable and is set to
1
when
where
cqi
and
tqi
are the values in the
q
-th bin of color and texture histogram of super-pixel
Label Cost
The label cost is used to encourage less number of labels and its defination is
where δ(⋅) is defined as
ADVANTAGES
- Super-pixels are compact and perceptually meaningful atomic regions for images.
- Avoid false negatives caused by inappropriate proposals generated by algorithms suchas Selective Search and BING.
- Super-pixel based method is a trade-off of Pixel based and Proposal based algorithm, leading to accurate and fast results.
DISADVANTAGES
- The CNN used in RCNN and the parameters in the energy function are learned separately.
- The region generated might not cover all the super-pixels.
- Time consumption is high. Its speed is 1fps for each 128 proposals on a NVIDIA Telsa K40 GPU. However, 128 proposals might not be enough.