Weakly Supervised Deep Detection Networks 阅读笔记
Overall architecture
1. Existing network(such as AlexNet pre-trained on ImageNet)
2. SPP --> region level descriptor
3. (1) class score --> recognition
(2) probability distribution(which region contains the most salient image structure) --> detection
4. aggregate the recognition and detection scores to predict the class of image(image level supervision)
Compared with other method
1. MIL: Use the appearance model itself to perform region selection
WSDDN: detection branch is independent of recognition branch
2. Bilinear architecture: two streams are symmetric
WSDDN: detection branch is explicitly designed
Method
1. Pre-trained network
2