PVANET: Deep but Lightweight Neural Networks forReal-time Object Detection, arxiv 16.08
论文地址:https://arxiv.org/pdf/1608.08021v1.pdf
code(github): https://github.com/sanghoon/pva-faster-rcnn
(想不到作者居然这么快开源了,撒花)
=====
根据作者开源的model,pt文件,和代码等,
笔者跑了`example_train_384`这个实验(具体看上面的开源代码下的` models/pvanet/example_train_384`),其结果如下:
trainset: pascal voc 07 trainval-set
testset: pascal voc 07 testset
mAP:71.81%
stepsize:5w
iterations:10w
lr policy:step
另外把stepsize改为8w,itrations改为11w,其mAP为72.6%,这个比vgg16的69.6%要好。
还跑了trainset为pascal voc 0712的trainval,其中(5w/10w,分别为stepsize和iterations),mAP为73.6%,这个比vgg16的75.8%要低。
=====
更新
跑trainset为pascal voc 0712的trainval,其中(32w iterations,iter_size为3,采用plateau lr policy:2w,3w,4w,5w),mAP为77.15%,这个比vgg16的75.8%要高。
跑trainset为pascal voc 0712的trainval,其中(32w iterations,iter_size为3,采用plateau lr policy:2w,3w,4w,5w,global context branch),mAP为78.38%,这个比vgg16的75.8%要高。
=====
最新的图
=====
具体可以参考The Mean AP is 0.7190 when I test the model trained by example_train_384, is normal? #10。
=====
先看leaderboard
再看下speed
厉害的不要不要的。
=====
转正题,下面介绍下这篇论文,
老规矩,上图。
一句话:
利用目前各种设计network的方法,如batch normalization,inception,C.ReLU,residual connection,multi-scale representation等,设计一个deep,but thin的feature extractor network,然后在这个feature extractor network上把faster-rcnn的rpn和rcnn接上,得到一个完整的detection deep network,并用了SVD分解来降低fc的纬度,以及用较少的proposals(200个),在pascal voc上取得傲人的performance(mAP&speed)。当然其中也用到了一些训练技巧,如learning rate scheduling[1]。
而该论文的最重要的一个贡献就是第一个用Inception来做detection,不论是accuracy还是speed,都充分证明了其适合用于detection,而不仅仅是classification。
=====
先看framework
从tab 1大致可以看到论文中的framework(称为PAVNet),其实和VGG16很是相似的,
区别在于,PAVNet把VGG16的每个层(conv,relu等)用C.ReLU和inception代替,并引入了residual connection和multi-scale representation。
下面就一一说下其中key component。
=====
C.ReLU
=====
inception (相对来比较简单)
正是这样的inception设计,使得模型可以看到不同大小的感受野,从而可以检测到不同大小的物体,
=====
residual connection & multi-scale representation & BN
可以从tab 1中知道,residual connection主要是以prejected的方式:
1x1 conv is applied for projecting pool1_1 into conv2_1, conv2_3 into conv3_1, conv3_4 into conv4_1, and conv4_4 into conv5_1.
multi-scale representation obtained by four steps:
1 conv3_4 is down-scaled into “downscale” by 3x3 max-pool with stride 2;
2 conv5_4 is up-scaled into “upscale” by 4x4 channel-wise deconvolution whose weights are fixed as bilinear interpolation;
3 “downscale”, conv4_4 and “upscale” are combine into “concat” by channel-wise concatenation;
4 after 1x1 conv, the final output is obtained (convf).
至于BN这个就没有什么好说的了,feature extractor network的每个conv后接BN(+Scale)
=====
rpn & rcnn
这个跟faster-rcnn的相差不大,其中rpn用了25 anchors of 5 scales (3, 6, 9, 16, 25) and 5 aspect ratios (0.5, 0.667, 1.0, 1.5,2.0)
而rcnn在训练是用了12k个proposals(NMS之前,其threshold为0.4),而测试时仅用200个proposals(这个减少了1/3的测试时间)
=====
training & testing
看图不说话。
=====
嗯,最后看performance
=====
最后说下是怎么提速的
1 设计了一个thin的feature extractor network(本文的重点)
2 将少了proposals的个数(由faster-rcnn的300个变成200个)
3 对rcnn的fc进行SVD降维(降纬后还finetune了一下),令人惊讶的是没有多大的精度损失
=====
总言之,该论文成功论证了这样的thin network不仅适合做classfication,而且还可以做detection,其精度和速度都是傲娇的
想想其的online application,就觉得很吓人了
估计它的团队很快就将其布置到实际应用上了
此时此刻,笔者的心情是多么地想他们开源呀!(别鄙视笔者这么不争气)
=====
如果这篇博文对你有帮助,可否赏笔者喝杯奶茶?