目标检测：Mobilenet-SSD实现步骤

最新推荐文章于 2024-08-16 08:28:43 发布

anqian123321

最新推荐文章于 2024-08-16 08:28:43 发布

阅读量2.6k

点赞数

转

目标检测：Mobilenet-SSD实现步骤

2018年02月28日 11:40:22 阅读数：9248 标签： Mobilenet 更多

个人分类：目标检测算法安装

写的比较好，我就保存一下：http://blog.csdn.net/Jesse_Mx/article/details/78680055

mobilenet 也算是提出有一段时间了，网上也不乏各种实现版本，其中，谷歌已经开源了Tensorflow的全部代码，无奈自己几乎不熟悉Tensorflow，还是比较钟爱Caffe平台，因而一直在关心这方面。

单纯的Mobilenet分类不是关注重点，如何将其应用到目标检测网络才是关键，目前基本看好的思路就是Mobilenet+SSD，github上已经有至少如下项目涉及到这方面：

https://github.com/chuanqi305/MobileNet-SSD

https://github.com/zeusees/SSD_License_Plate_Detection

https://github.com/canteen-man/MobileNet-SSD-Focal-loss

https://github.com/cooliscool/LISA-on-SSD-mobilenet

https://github.com/FreeApe/VGG-or-MobileNet-SSD

接下来的时间，我将会尽可能进行进行分析验证，目的是寻找并试验出好的解决方案，并且期待能成功训练其他数据集。。

Mobilenet的速度是很快的，如果配上Depthwise layer，在TitanX应该能达到150fps，如果能将检测精度提升到70%以上，将会是一个很好的检测网络。

实现方案一

项目地址：MobileNet-SSD

几个月前接触到了这个project，当时chuanqi大神在Caffe平台上初步实现了Mobilenet-SSD，本人自然是很惊喜的，接下来就时不时和大神一起探讨，在其指导下，我在VOC数据集也能训练出大约72%的精度。现在这个项目趋于稳定，根据github上的描述，最终精度是72.7%，也很不错了。下面简单记录一下运行和训练方法。

模型分析

通过分析Mobilenet的模型结构和MobileNet-SSD的模型结构，可以看出，conv13是骨干网络的最后一层，作者仿照VGG-SSD的结构，在Mobilenet的conv13后面添加了8个卷积层，然后总共抽取6层用作检测，貌似没有使用分辨率为38*38的层，可能是位置太靠前了吧。

模型运行

这个项目既然叫MobileNet-SSD，那首先要求能正常运行基础版本的SSD，这方面的博客教程这是不少，本人也有几篇博文涉及，可以参考。

克隆项目：

$ </span>git clone <span class="hljs-symbol">https:</span>/<span class="hljs-regexp">/github.com/chuanqi</span>305/<span class="hljs-constant">MobileNet</span>-<span class="hljs-constant">SSD</span>.git</code><div class="hljs-button" data-title="复制"></div></pre><p style="background-color:rgb(255,255,255);">然后可以在自己的目录（我是用的是/home目录）下得到MobileNet-SSD文件夹，其中重要文件简介如下：</p><ul style="color:rgb(63,63,63);line-height:35px;background-color:rgb(255,255,255);"><li>template 存放4个网络定义的公用模板，可以由gen.py脚本修改并生成</li><li>MobileNetSSD_deploy.prototxt 运行网络定义文件</li><li>solver_train.prototxt 网络训练超参数定义文件</li><li>solver_test.prototxt 网络测试超参数定义文件</li><li>train.sh 网络训练脚本</li><li>test.sh 网络测试脚本</li><li>gen_model.sh 生成自定义网络脚本（调用template文件夹内容）</li><li>gen.py 生成公用模板脚本（暂不用）</li><li>demo.py 实际检测脚本（图片存于images文件夹）</li><li>merge_bn.py 合并bn层脚本，用于生成最终的caffemodel</li></ul><p style="background-color:rgb(255,255,255);">接下来下载已经训练好的caffemodel放入项目文件夹：<a href="https://drive.google.com/file/d/0B3gersZ2cHIxRm5PMWRoTkdHdHc/view" rel="nofollow" style="color:rgb(12,137,207);" target="_blank">Google Drive</a>&nbsp;|&nbsp;<a href="https://pan.baidu.com/s/1sln2cUx" rel="nofollow" style="color:rgb(12,137,207);" target="_blank">百度云</a>&nbsp;<br>最后打开demo.py脚本，根据个人情况修改以下路径：</p><pre class="prettyprint" style="font-size:14px;line-height:22px;" name="code" onclick="hljs.copyCode(event)"><code class="language-python hljs has-numbering"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">caffe_root = <span class="hljs-string"><span class="hljs-string">'/home/yaochuanqi/ssd/caffe/'</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">net_file= <span class="hljs-string"><span class="hljs-string">'MobileNetSSD_deploy.prototxt'</span></span>  </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">caffe_model=<span class="hljs-string"><span class="hljs-string">'MobileNetSSD_deploy.caffemodel'</span></span>  </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">test_dir = <span class="hljs-string"><span class="hljs-string">"images"</span></span></div></div></li></ol></code><div class="hljs-button" data-title="复制"></div></pre><p style="background-color:rgb(255,255,255);">然后运行demo.py脚本，就能看到检测结果了，效果尚可，随便贴两张图：</p><p style="background-color:rgb(255,255,255);"><img src="https://img-blog.csdn.net/20171219210223759?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvSmVzc2VfTXg=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="这里写图片描述" title="" style="border:none;"></p><p style="background-color:rgb(255,255,255);"><img src="https://img-blog.csdn.net/20171219210238939?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvSmVzc2VfTXg=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="这里写图片描述" title="" style="border:none;"></p><h3 style="background-color:rgb(255,255,255);"><a name="t3"></a><a style="color:rgb(12,137,207);" target="_blank"></a><span style="color:#0099FF;">模型训练</span></h3><p style="background-color:rgb(255,255,255);">我们也可以用自己的数据集来训练这个MobileNet-SSD模型，训练步骤简要记录如下：</p><p style="background-color:rgb(255,255,255);"><span style="color:#0099FF;">1.建立数据集软连接</span>&nbsp;<br>我们需要提前建立好适用于SSD训练的数据集（VOC格式），比如博主所用的是KITTI数据集，制作方法可在往期博文中找到，最终需要生成训练验证集和测试集的lmdb文件，然后建立软连接，类似于一个快捷方式，可以简化命令和节省空间。</p><pre class="prettyprint" style="font-size:14px;line-height:22px;" name="code" onclick="hljs.copyCode(event)"><code class="language-shell hljs haskell has-numbering"><ol class="hljs-ln" style="width:706px"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">$ cd ~/MobileNet-SSD

$ln ‐s /home/its/<span class="hljs-typedef"><span class="hljs-keyword">data</span>/<span class="hljs-type">KITTIdevkit</span>/<span class="hljs-type">KITTI</span>/lmdb/<span class="hljs-type">KITTI_trainval_lmdb</span> trainval_lmdb</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">$ ln ‐s /home/its/ data/KITTIdevkit/KITTI/lmdb/KITTI_test_lmdb test_lmdb

执行完命令，就能在项目文件夹下看到trainval_lmdb和test_lmdb软连接。

2.创建labelmap.prototxt文件
该文件用于定义训练样本的类别，置于项目文件夹下，其内容如下：


   
   
     
     
      
      
     
     
     
     
      
      
       
       item 
       
       {
      
      
     
     

     
     
      
      
     
     
     
     
      
        
       
       name: "none_of_the_above"
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
         label: 
       
       0
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
         display_name: 
       
       "background"
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
         }
      
      
     
     

     
     
      
      
     
     
     
     
      
        
       
       item 
       
       {
      
      
     
     

     
     
      
      
     
     
     
     
      
        
       
       name: "Car"
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
         label: 
       
       1
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
         display_name: 
       
       "Car"
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
         }
      
      
     
     

     
     
      
      
     
     
     
     
      
        
       
       item 
       
       {
      
      
     
     

     
     
      
      
     
     
     
     
      
        
       
       name: "Pedestrian"
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
         label: 
       
       2
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
         display_name: 
       
       "Pedestrian"
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
         }
      
      
     
     

     
     
      
      
     
     
     
     
      
        
       
       item 
       
       {
      
      
     
     

     
     
      
      
     
     
     
     
      
        
       
       name: "Cyclist"
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
         label: 
       
       3
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
         display_name: 
       
       "Cyclist"
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
         }

3.运行gen_model.sh脚本
由于VOC数据集是21类（加上背景），而这里只有4类，因此，我们需要重新生成训练、测试和运行网络文件，这里就要用到gen_model.sh脚本，它会调用template文件夹中的模板，按照我们指定的参数，生成所需的文。这个脚本的用法如下：


   
   
     
     
      
      
     
     
     
     
      
      
       
       usage: ./gen_model.sh CLASSNUM
      
      
     
     

     
     
      
      
     
     
     
     
      
              
       
       for voc 
       
       the classnum 
       
       is 
       
       21

只有一个类别数量的参数，因此我们执行命令如下：

./gen_model.sh 4

执行之后，得到examples文件夹，里面的3个prototxt就是从模板生成的正式网络定义，根据作者设置，其中的deploy文件是已经合并过bn层的，需要后面配套使用。

4.修改训练和测试超参数
根据实际情况,修改solver_train.prototxt和solver_test.prototxt。
其中test_iter=测试集图片数量/batchsize；初始学习率不宜太高，否则基础权重破坏比较严重；优化算法是RMSProp，可能对收敛有好处，不要改成SGD，也是为了保护权重。

5.下载预训练模型
下载地址：Google Drive | 百度云，放在项目文件夹下，这里的预训练模型是作者从Tensorflow那边转化过来的，然后经过了VOC数据集的初步调试。

6.开始训练
修改并运行train.sh脚本，中途可以不断调节参数。训练结束后，运行test.sh脚本，测试网络的精度值。

7.合并bn层
为了提高模型运行速度，作者在这里将bn层合并到了卷积层中，相当于bn的计算时间就被节省了，对检测速度可能有小幅度的帮助，打开merge_bn.py文件，然后注意修改其中的文件路径：


   
   
     
     
      
      
     
     
     
     
      
      
       
       caffe_root = 
       
       '/home/yaochuanqi/ssd/caffe/'
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       train_proto = 
       
       'MobileNetSSD_train.prototxt'
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       train_model = 
       
       'MobileNetSSD_train.caffemodel'  
       
       # should be your snapshot caffemodel, e.g. mobilnetnet_iter_72000.caffemodel
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       deploy_proto = 
       
       'MobileNetSSD_deploy.prototxt' 
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       save_model = 
       
       'MobileNetSSD_deploy.caffemodel'

然后运行该脚本，就可以得到最终的检测模型，那这个模型由于合并了bn层，参数格式已经变化，就不能再用于训练了。如果想继续训练，应该用合并前的。对于得到的最终模型，可用demo.py脚本查看实际检测效果，也可以部署在其他地方。

存在的问题

本人使用扩充的KITTI数据集训练Mobilenet-SSD，折腾了一周，精度才只有52%左右，而且训练速度比VGG的慢一些。我感觉不应该这么低，至少也应该有65%吧，暂时没有找到问题的根源在哪里，如果有同学也拿这个训练且效果很好，请告知，不胜感激！

更新：考虑到Mobilenet特征提取能力有限，最近试验将分辨率提升至416*416（速度降低很少），然后使用仅含4类目标（通过脚本提取）的COCO预训练模型，初始学习率为0.001，根据损失值和精度调整后续学习率，迭代50000次后，目前精度提升到62.8%。

Mobilenet使用Depthwise Layer

理论上Mobilenet的运行速度应该是VGGNet的数倍，但实际运行下来并非如此，前一章中，即使是合并bn层后的MobileNet-SSD也只比VGG-SSD快那么一点点，主要的原因是Caffe中暂时没有实现depthwise convolution，目前都是用的group。这里group相当于一个for循环，需要依次计算，如果能使用深度卷积，那就可以一次性计算完，节省不少时间。

经过大量实验，终于找到能让mobilenet加速的方法，项目地址：DepthwiseConvolution, 十分感谢该项目作者。

用上了depthwise convolution layer，对于mobilenet的提速十分明显，可以说是立竿见影。下面简单介绍使用方法.

添加新的深度卷积层

首先克隆项目：

$ </span>git clone <span class="hljs-symbol">https:</span>/<span class="hljs-regexp">/github.com/yonghenglh</span>6/<span class="hljs-constant">DepthwiseConvolution</span>.git</code><div class="hljs-button" data-title="复制"></div></pre><p style="background-color:rgb(255,255,255);">注意到项目中的caffe文件夹，将其中的depthwise_conv_layer.hpp，depthwise_conv_layer.cpp和depthwise_conv_layer.cu这三个文件放到SSD（即caffe）的相应位置中，这里的操作是从基础卷积类中派生了深度卷积这个类，此处并不需要对caffe.proto文件进行修改。稍后，需要重新编译Caffe，这样才能识别新增的depthwise convolution layer。</p><h3 style="background-color:rgb(255,255,255);"><a name="t7"></a><a style="color:rgb(12,137,207);" target="_blank"></a><span style="color:#0099FF;">修改deploy文件</span></h3><p style="background-color:rgb(255,255,255);">接下来我们需要修改MobileNetSSD_deploy.prototxt，将其中所有名为convXX/dw（XX代指数字）的type从”Convolution”替换成”DepthwiseConvolution”，总共需要替换13处，从conv1/dw到conv13/dw，然后把“engine: CAFFE”都注释掉，这个新的网络文件可以另存为MobileNetSSD_deploy_depth.prototxt。在运行网络的时候，caffemodel模型不用动，只需要指定新的prototxt文件和含有depthwise convolution layer的Caffe即可。</p><h3 style="background-color:rgb(255,255,255);"><a name="t8"></a><a style="color:rgb(12,137,207);" target="_blank"></a><span style="color:#0099FF;">效果验证</span></h3><p style="background-color:rgb(255,255,255);">为了验证效果，我们使用demo.py脚本来测试网络的平均运行时间，运行模式设置为gpu，在demo.py文件中添加和time相关的代码：</p><pre class="prettyprint" style="font-size:14px;line-height:22px;" name="code" onclick="hljs.copyCode(event)"><code class="language-python hljs has-numbering"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword"><span class="hljs-keyword">import</span></span> time</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-function"><span class="hljs-keyword"><span class="hljs-function"><span class="hljs-keyword">def</span></span></span><span class="hljs-function"> </span><span class="hljs-title"><span class="hljs-function"><span class="hljs-title">detect</span></span></span><span class="hljs-params"><span class="hljs-function"><span class="hljs-params">(imgfile)</span></span></span><span class="hljs-function">:</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    <span class="hljs-comment"><span class="hljs-comment">#</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    <span class="hljs-comment"><span class="hljs-comment">#</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    net.blobs[<span class="hljs-string"><span class="hljs-string">'data'</span></span>].data[...] = img</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    start=time.time() <span class="hljs-comment"><span class="hljs-comment"># time begin</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="8"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    out = net.forward()  </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="9"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    use_time=time.time()-start <span class="hljs-comment"><span class="hljs-comment"># proc time  </span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="10"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    print(<span class="hljs-string"><span class="hljs-string">"time="</span></span>+str(use_time)+<span class="hljs-string"><span class="hljs-string">"s"</span></span>) </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="11"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    <span class="hljs-comment"><span class="hljs-comment">#</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="12"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    <span class="hljs-comment"><span class="hljs-comment">#</span></span></div></div></li></ol></code><div class="hljs-button" data-title="复制"></div></pre><p style="background-color:rgb(255,255,255);">笔记本电脑显卡是GTX 850m，对于默认的7张检测图片，VGG-SSD，Mobilenet-SSD（group）和Mobilenet-SSD（depth）的平均检测时间为：</p><table style="border-spacing:0px;border:1px solid rgb(238,238,238);text-align:center;color:rgb(63,63,63);font-size:16px;line-height:35px;background-color:rgb(255,255,255);"><thead><tr><th style="vertical-align:middle;">Model</th><th style="vertical-align:middle;">Inference time</th></tr></thead><tbody><tr><td style="vertical-align:middle;">VGG-SSD</td><td style="vertical-align:middle;">107ms</td></tr><tr><td style="vertical-align:middle;">MobileNet-SSD(group)</td><td style="vertical-align:middle;">62ms</td></tr><tr><td style="vertical-align:middle;">MobileNet-SSD(depth)</td><td style="vertical-align:middle;">17ms</td></tr></tbody></table><p style="background-color:rgb(255,255,255);">如果我们使用Caffe自带的time工具，结果也是差不多的：</p><pre class="prettyprint" style="font-size:14px;line-height:22px;" name="code" onclick="hljs.copyCode(event)"><code class="language-shell hljs avrasm has-numbering"><ol class="hljs-ln" style="width:878px"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">$ cd ~/caffe

$ ./build/tools/caffe time -gpu 0 -model examples/mobilenet/XXXX .prototxt
I1219 20: 09: 24.062338 10324 caffe .cpp: 412] Average Forward pass: 109.97 ms. # VGG-SSD
I1219 20: 09: 47.771399 10343 caffe .cpp: 412] Average Forward pass: 57.4238 ms. # Mobilenet-SSD（group）
I1219 20: 10: 25.145504 10385 caffe .cpp: 412] Average Forward pass: 16.39 ms. # Mobilenet-SSD（depth）
可以看到，depthwise convolution layer是有效的，运行时间快了五六倍之多。然后，博主在Jetson TX1上也如法炮制，得到的检测时间如下：
```
I1219 22:08:15.236963  2210 caffe.cpp:412] Average Forward pass: 57.3939 ms.
 
 
```
意味着TX1上Mobilenet-SSD能达到17帧左右，这离真正的real-time又近了一步