【Deep Learning】Review:Rich feature hierarchies for accurate object detection and semantic segmentati

Rich feature hierarchies for accurate object detection and semantic segmentation

 

Source: http://arxiv.org/abs/1311.2524

github:  https://github.com/rbgirshick/rcnn

1.      Summary of thePaper

Quoted from the paper:

“Object detection systemoverview. Our system (1) takes an input image, (2) extracts around 2000bottom-up region proposals, (3) computes features for each proposal using alarge convolutional neural network (CNN), and then (4) classifies each regionusing class-specific linear SVMs. R-CNN achieves a mean average precision (mAP)of 53.7% on PASCAL VOC 2010. For comparison, [39] reports 35.1% mAP using thesame region proposals, but with a spatial pyramid and bag-of-visual-wordsapproach. The popular deformable part models perform at 33.4%. On the 200-classILSVRC2013 detection dataset, R-CNN’s mAP is 31.4%, a large improvement overOverFeat [34], which had the previous best result at 24.3%.”

To sum up, theframework of the method provided by the paper is actually very simple buteffective:

l  Replacethe sliding sampling method with selective search, extracting 2,000-regionproposal candidate.

l  Traink L-SVM classifier to obtain the score of each region based on the output ofthe features from AlexNet.

l  Gainthe detection results by abandoning some region in accordance with NMS.

 

2.      MainContributions

1)      Onecan apply high-capacity convolutional neural networks (CNNs) to bottom-upregion proposals in order to localize and segment objects.

2)      Whenlabeled training data is scarce, supervised pre-training for an auxiliary taskfollowed by domain-specific fine-tuning, yields a significant performanceboost. Combining region proposal with CNN plays an outstanding performance.

 

3.      Positive andnegative points

Positive Points:

(i) Replacetraditional sliding windows methods with region proposal method.

(ii) Use AlexNetto extract feature. Minimize the size of region to 227*227 so that we canprovide background information as prior information.

(iii) Use Boundary-boxregression to further promote the accuracy.

(iv) Replacesoftmax with SVM because the background shared in softmax and thus SVM providesmore independent information.

Negative Points:

(i) .

 

4.      How strong isthe evaluation

l  Performancelayer by layer without fine tuning. This is to say, by using the features ofpool5, fc6, fc7 for implementing SVM, the results are very similar. Theconclusion is CNN could mostly demonstrate the information in convolutionallayer.

l  Comparisonto recent feature learning methods indicates CNN’s effect is generally betterthan the other methods’ performance.

l  Comparedto Google Dean et al. paper (CVPR best paper): 16% mAP in 5 minutes. Here 48%in about 1 minute!

 

5.      Possibledirection for the future work

I’mthinking is it possible that we have some methods that improve the speed andaccuracy of object detection without using region proposal. Although regionproposal largely promote the efficiency, it inevitably neglects some importantinformation. Hope this could be possible direction.

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值