【SSD】用caffe-ssd框架自带VGG网络训练自己的数据集



一、挑选数据集

我先是从ImageNet官网下载了所有关于杯子的图片

然后从ILSVRC2011,ILSVRC2012,ILSVRC2013和ILSVRC2015数据集通过搜索xml中杯子的代号挑出了包含杯子的数据集。

脚本工具参考:http://blog.csdn.net/renhanchi/article/details/71480835



二、处理xml文件

我只需要杯子的信息,其他物体信息要从xml文件中删掉。否则生成lmdb文件的时候会出现错误,提示“Unknown name: xxxxxxxx”。xxxx就是除了杯子以外的物体的代号。

尝试了很多方法,不多说,看下面具体步骤:

1.将Annotations文件夹改名为:Annos

2.新建一个空文件夹名字为:Annotations

3.修改下面名字为“delete_by_name.py”的python工具代码,只需要修改if not后面内容。引号内为你要保留的数据的代号。

4.运行python工具。

[python] view plain copy print ?
  1. #!/usr/bin/env python2  
  2. # -*- coding: utf-8 -*-  
  3. """ 
  4. Created on Tue Oct 31 10:03:03 2017 
  5.  
  6. @author: hans 
  7.  
  8. http://blog.csdn.net/renhanchi 
  9. """  
  10.   
  11. import os  
  12. import xml.etree.ElementTree as ET  
  13.   
  14. origin_ann_dir = 'Annos/'  
  15. new_ann_dir = 'Annotations/'  
  16.   
  17. for dirpaths, dirnames, filenames in os.walk(origin_ann_dir):  
  18.   for filename in filenames:  
  19.     if os.path.isfile(r'%s%s' %(origin_ann_dir, filename)):  
  20.       origin_ann_path = os.path.join(r'%s%s' %(origin_ann_dir, filename))  
  21.       new_ann_path = os.path.join(r'%s%s' %(new_ann_dir, filename))  
  22.       tree = ET.parse(origin_ann_path)  
  23.     
  24.       root = tree.getroot()  
  25.       for object in root.findall('object'):  
  26.         name = str(object.find('name').text)  
  27.         if not (name == "n03147509" or \  
  28.                 name == "n03216710" or \  
  29.                 name == "n03438257" or \  
  30.                 name == "n03797390" or \  
  31.                 name == "n04559910" or \  
  32.                 name == "n07930864"):  
  33.           root.remove(object)  
  34.     
  35.       tree.write(new_ann_path)  
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Created on Tue Oct 31 10:03:03 2017

@author: hans

http://blog.csdn.net/renhanchi
"""

import os
import xml.etree.ElementTree as ET

origin_ann_dir = 'Annos/'
new_ann_dir = 'Annotations/'

for dirpaths, dirnames, filenames in os.walk(origin_ann_dir):
  for filename in filenames:
    if os.path.isfile(r'%s%s' %(origin_ann_dir, filename)):
      origin_ann_path = os.path.join(r'%s%s' %(origin_ann_dir, filename))
      new_ann_path = os.path.join(r'%s%s' %(new_ann_dir, filename))
      tree = ET.parse(origin_ann_path)
  
      root = tree.getroot()
      for object in root.findall('object'):
        name = str(object.find('name').text)
        if not (name == "n03147509" or \
                name == "n03216710" or \
                name == "n03438257" or \
                name == "n03797390" or \
                name == "n04559910" or \
                name == "n07930864"):
          root.remove(object)
  
      tree.write(new_ann_path)


三、生成训练集和验证集txt文件

先新建一个名字为doc的文件夹

下面名字为“cup_list.sh”代码并不是我最终使用的,你们根据自己情况做适当修改。

[python] view plain copy print ?
  1. #!/bin/sh  
  2.   
  3. classes=(JPEGImages Annotations)  
  4. root_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )  
  5.   
  6. for dataset in train val  
  7. do  
  8.         if [ $dataset == "train" ]  
  9.         then  
  10.                 data_dir=(ILSVRC2015_train ILSVRC2015_val ILSVRC_train ImageNet)  
  11.         fi  
  12.         if [ $dataset == "val" ]  
  13.         then  
  14.                 data_dir=(ILSVRC_val)  
  15.         fi  
  16.         for cla in ${data_dir[@]}  
  17.         do  
  18.             for class in ${classes[@]}  
  19.             do  
  20.                 find ./$cla/$class/ -name "*.jpg" >> ${class}_${dataset}.txt  
  21.             done  
#!/bin/sh

classes=(JPEGImages Annotations)
root_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )

for dataset in train val
do
        if [ $dataset == "train" ]
        then
                data_dir=(ILSVRC2015_train ILSVRC2015_val ILSVRC_train ImageNet)
        fi
        if [ $dataset == "val" ]
        then
                data_dir=(ILSVRC_val)
        fi
        for cla in ${data_dir[@]}
        do
	        for class in ${classes[@]}
	        do
		        find ./$cla/$class/ -name "*.jpg" >> ${class}_${dataset}.txt
	        done
[python] view plain copy print ?
  1. for class in ${classes[@]}  
  2. do  
  3.  find ./$cla/$class/ -name "*.jpg" >> ${class}_${dataset}.txt  
  4. done  
	      for class in ${classes[@]}
	      do
		      find ./$cla/$class/ -name "*.jpg" >> ${class}_${dataset}.txt
	      done
done paste -d' ' JPEGImages_${dataset}.txt Annotations_${dataset}.txt >> temp_${dataset}.txt cat temp_${dataset}.txt | awk 'BEGIN{srand()}{print rand()"\t"$0}' | sort -k1,1 -n | cut -f2- > $dataset.txt if [ $dataset == "val" ] then /home/hans/caffe-ssd/build/tools/get_image_size $root_dir $dataset.txt $dataset"_name_size.txt" fi rm temp_${dataset}.txt rm JPEGImages_${dataset}.txt rm Annotations_${dataset}.txtdonemv train.txt doc/mv val.txt doc/mv val_name_size.txt doc/



四、写labelmap_cup.prototxt

这个文件放到doc目录下。

有几个问题需要注意。

1.label 0 必须是background

2.虽然我只检测杯子,但是xml文件中杯子name的代码有好几个。

    我一开始将所有label都设置为1,后来生成lmdb文件的时候报错。

    我只能乖乖的按顺序写下去,不过问题不大。反正知道1到6都是杯子就好。



五、生成lmdb文件

这先是出现了上面提到的Unknown name错误,通过修改xml解决了。

后来又出现调用caffe模块的Symbol错误,反正你们跟我走就好,错不了。

先修改一个文件caffe-ssd/scripts/create_annoset.py



然后运行cup_data.sh

[python] view plain copy print ?
  1. cur_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )  
  2. root_dir=/home/hans/caffe-ssd  
  3.   
  4. redo=1  
  5. data_root_dir="${cur_dir}"  
  6. dataset_name="doc"  
  7. mapfile="${cur_dir}/doc/labelmap_cup.prototxt"  
  8. anno_type="detection"  
  9. db="lmdb"  
  10. min_dim=0  
  11. max_dim=0  
  12. width=0  
  13. height=0  
  14.   
  15. extra_cmd="--encode-type=JPEG --encoded"  
  16. if [ $redo ]  
  17. then  
  18.   extra_cmd="$extra_cmd --redo"  
  19. fi  
  20. for subset in train val  
  21. do  
  22.   python $root_dir/scripts/create_annoset.py --anno-type=$anno_type --label-map-file=$mapfile --min-dim=$min_dim \  
  23. --max-dim=$max_dim --resize-width=$width --resize-height=$height --check-label $extra_cmd $data_root_dir \  
  24. $cur_dir/$dataset_name/$subset.txt $data_root_dir/$dataset_name/$subset"_"$db ln/  
  25. done  
  26. rm -rf  ln/  
cur_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )
root_dir=/home/hans/caffe-ssd

redo=1
data_root_dir="${cur_dir}"
dataset_name="doc"
mapfile="${cur_dir}/doc/labelmap_cup.prototxt"
anno_type="detection"
db="lmdb"
min_dim=0
max_dim=0
width=0
height=0

extra_cmd="--encode-type=JPEG --encoded"
if [ $redo ]
then
  extra_cmd="$extra_cmd --redo"
fi
for subset in train val
do
  python $root_dir/scripts/create_annoset.py --anno-type=$anno_type --label-map-file=$mapfile --min-dim=$min_dim \
--max-dim=$max_dim --resize-width=$width --resize-height=$height --check-label $extra_cmd $data_root_dir \
$cur_dir/$dataset_name/$subset.txt $data_root_dir/$dataset_name/$subset"_"$db ln/
done
rm -rf  ln/


六、训练

先去下载预训练模型放到doc目录下。

下载地址: cs.unc.edu/~wliu/projects/ParseNet/VGG_ILSVRC_16_layers_fc_reduced.caffemodel


修改训练代码真是一件熬心熬力的事儿,路径太多,问题也不少。还好github issues上作业挺给力。

先放出我的ssd_pascal.py代码:

[python] view plain copy print ?
  1. from __future__ import print_function  
  2. import sys  
  3. sys.path.append("/home/hans/caffe-ssd/python")  #####改  
  4. import caffe  
  5. from caffe.model_libs import *  
  6. from google.protobuf import text_format  
  7.   
  8. import math  
  9. import os  
  10. import shutil  
  11. import stat  
  12. import subprocess  
  13.   
  14. # Add extra layers on top of a "base" network (e.g. VGGNet or Inception).  
  15. def AddExtraLayers(net, use_batchnorm=True, lr_mult=1):  
  16.     use_relu = True  
  17.   
  18.     # Add additional convolutional layers.  
  19.     # 19 x 19  
  20.     from_layer = net.keys()[-1]  
  21.   
  22.     # TODO(weiliu89): Construct the name using the last layer to avoid duplication.  
  23.     # 10 x 10  
  24.     out_layer = "conv6_1"  
  25.     ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256101,  
  26.         lr_mult=lr_mult)  
  27.   
  28.     from_layer = out_layer  
  29.     out_layer = "conv6_2"  
  30.     ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 512312,  
  31.         lr_mult=lr_mult)  
  32.   
  33.     # 5 x 5  
  34.     from_layer = out_layer  
  35.     out_layer = "conv7_1"  
  36.     ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128101,  
  37.       lr_mult=lr_mult)  
  38.   
  39.     from_layer = out_layer  
  40.     out_layer = "conv7_2"  
  41.     ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256312,  
  42.       lr_mult=lr_mult)  
  43.   
  44.     # 3 x 3  
  45.     from_layer = out_layer  
  46.     out_layer = "conv8_1"  
  47.     ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128101,  
  48.       lr_mult=lr_mult)  
  49.   
  50.     from_layer = out_layer  
  51.     out_layer = "conv8_2"  
  52.     ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256301,  
  53.       lr_mult=lr_mult)  
  54.   
  55.     # 1 x 1  
  56.     from_layer = out_layer  
  57.     out_layer = "conv9_1"  
  58.     ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128101,  
  59.       lr_mult=lr_mult)  
  60.   
  61.     from_layer = out_layer  
  62.     out_layer = "conv9_2"  
  63.     ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256301,  
  64.       lr_mult=lr_mult)  
  65.   
  66.     return net  
  67.   
  68.   
  69. ### Modify the following parameters accordingly ###  
  70. # The directory which contains the caffe code.  
  71. # We assume you are running the script at the CAFFE_ROOT.  
  72. caffe_root = "/home/hans/caffe-ssd"    #####改  
  73.   
  74. # Set true if you want to start training right after generating all files.  
  75. run_soon = True  
  76. # Set true if you want to load from most recently saved snapshot.  
  77. # Otherwise, we will load from the pretrain_model defined below.  
  78. resume_training = True  
  79. # If true, Remove old model files.  
  80. remove_old_models = False  
  81.   
  82. # The database file for training data. Created by data/VOC0712/create_data.sh  
  83. train_data = "/home/hans/data/ImageNet/Detection/cup/doc/train_lmdb"   #########改  
  84. # The database file for testing data. Created by data/VOC0712/create_data.sh  
  85. test_data = "/home/hans/data/ImageNet/Detection/cup/doc/val_lmdb"    ########改  
  86. # Specify the batch sampler.  
  87. resize_width = 300  
  88. resize_height = 300  
  89. resize = "{}x{}".format(resize_width, resize_height)  
  90. batch_sampler = [  
  91.         {  
  92.                 'sampler': {  
  93.                         },  
  94.                 'max_trials'1,  
  95.                 'max_sample'1,  
  96.         },  
  97.         {  
  98.                 'sampler': {  
  99.                         'min_scale'0.3,  
  100.                         'max_scale'1.0,  
  101.                         'min_aspect_ratio'0.5,  
  102.                         'max_aspect_ratio'2.0,  
  103.                         },  
  104.                 'sample_constraint': {  
  105.                         'min_jaccard_overlap'0.1,  
  106.                         },  
  107.                 'max_trials'50,  
  108.                 'max_sample'1,  
  109.         },  
  110.         {  
  111.                 'sampler': {  
  112.                         'min_scale'0.3,  
  113.                         'max_scale'1.0,  
  114.                         'min_aspect_ratio'0.5,  
  115.                         'max_aspect_ratio'2.0,  
  116.                         },  
  117.                 'sample_constraint': {  
  118.                         'min_jaccard_overlap'0.3,  
  119.                         },  
  120.                 'max_trials'50,  
  121.                 'max_sample'1,  
  122.         },  
  123.         {  
  124.                 'sampler': {  
  125.                         'min_scale'0.3,  
  126.                         'max_scale'1.0,  
  127.                         'min_aspect_ratio'0.5,  
  128.                         'max_aspect_ratio'2.0,  
  129.                         },  
  130.                 'sample_constraint': {  
  131.                         'min_jaccard_overlap'0.5,  
  132.                         },  
  133.                 'max_trials'50,  
  134.                 'max_sample'1,  
  135.         },  
  136.         {  
  137.                 'sampler': {  
  138.                         'min_scale'0.3,  
  139.                         'max_scale'1.0,  
  140.                         'min_aspect_ratio'0.5,  
  141.                         'max_aspect_ratio'2.0,  
  142.                         },  
  143.                 'sample_constraint': {  
  144.                         'min_jaccard_overlap'0.7,  
  145.                         },  
  146.                 'max_trials'50,  
  147.                 'max_sample'1,  
  148.         },  
  149.         {  
  150.                 'sampler': {  
  151.                         'min_scale'0.3,  
  152.                         'max_scale'1.0,  
  153.                         'min_aspect_ratio'0.5,  
  154.                         'max_aspect_ratio'2.0,  
  155.                         },  
  156.                 'sample_constraint': {  
  157.                         'min_jaccard_overlap'0.9,  
  158.                         },  
  159.                 'max_trials'50,  
  160.                 'max_sample'1,  
  161.         },  
  162.         {  
  163.                 'sampler': {  
  164.                         'min_scale'0.3,  
  165.                         'max_scale'1.0,  
  166.                         'min_aspect_ratio'0.5,  
  167.                         'max_aspect_ratio'2.0,  
  168.                         },  
  169.                 'sample_constraint': {  
  170.                         'max_jaccard_overlap'1.0,  
  171.                         },  
  172.                 'max_trials'50,  
  173.                 'max_sample'1,  
  174.         },  
  175.         ]  
  176. train_transform_param = {  
  177.         'mirror'True,  
  178.         'mean_value': [104117123],  
  179.         'force_color'True,  ####改  
  180.         'resize_param': {  
  181.                 'prob'1,  
  182.                 'resize_mode': P.Resize.WARP,  
  183.                 'height': resize_height,  
  184.                 'width': resize_width,  
  185.                 'interp_mode': [  
  186.                         P.Resize.LINEAR,  
  187.                         P.Resize.AREA,  
  188.                         P.Resize.NEAREST,  
  189.                         P.Resize.CUBIC,  
  190.                         P.Resize.LANCZOS4,  
  191.                         ],  
  192.                 },  
  193.         'distort_param': {  
  194.                 'brightness_prob'0.5,  
  195.                 'brightness_delta'32,  
  196.                 'contrast_prob'0.5,  
  197.                 'contrast_lower'0.5,  
  198.                 'contrast_upper'1.5,  
  199.                 'hue_prob'0.5,  
  200.                 'hue_delta'18,  
  201.                 'saturation_prob'0.5,  
  202.                 'saturation_lower'0.5,  
  203.                 'saturation_upper'1.5,  
  204.                 'random_order_prob'0.0,  
  205.                 },  
  206.         'expand_param': {  
  207.                 'prob'0.5,  
  208.                 'max_expand_ratio'4.0,  
  209.                 },  
  210.         'emit_constraint': {  
  211.             'emit_type': caffe_pb2.EmitConstraint.CENTER,  
  212.             }  
  213.         }  
  214. test_transform_param = {  
  215.         'mean_value': [104117123],  
  216.         'force_color'True,    ####改  
  217.         'resize_param': {  
  218.                 'prob'1,  
  219.                 'resize_mode': P.Resize.WARP,  
  220.                 'height': resize_height,  
  221.                 'width': resize_width,  
  222.                 'interp_mode': [P.Resize.LINEAR],  
  223.                 },  
  224.         }  
  225.   
  226. # If true, use batch norm for all newly added layers.  
  227. # Currently only the non batch norm version has been tested.  
  228. use_batchnorm = False  
  229. lr_mult = 1  
  230. # Use different initial learning rate.  
  231. if use_batchnorm:  
  232.     base_lr = 0.0004  
  233. else:  
  234.     # A learning rate for batch_size = 1, num_gpus = 1.  
  235.     base_lr = 0.00004  
  236.   
  237. root = "/home/hans/data/ImageNet/Detection/cup"    ####改  
  238. # Modify the job name if you want.  
  239. job_name = "SSD_{}".format(resize)   ####改  
  240. # The name of the model. Modify it if you want.  
  241. model_name = "VGG_CUP_{}".format(job_name)    ####改  
  242.   
  243. # Directory which stores the model .prototxt file.  
  244. save_dir = "{}/doc/{}".format(root, job_name)    ####改  
  245. # Directory which stores the snapshot of models.  
  246. snapshot_dir = "{}/models/{}".format(root, job_name)    ####改  
  247. # Directory which stores the job script and log file.  
  248. job_dir = "{}/jobs/{}".format(root, job_name)    ####改  
  249. # Directory which stores the detection results.  
  250. output_result_dir = "{}/results/{}".format(root, job_name)    ####改  
  251.   
  252. # model definition files.  
  253. train_net_file = "{}/train.prototxt".format(save_dir)  
  254. test_net_file = "{}/test.prototxt".format(save_dir)  
  255. deploy_net_file = "{}/deploy.prototxt".format(save_dir)  
  256. solver_file = "{}/solver.prototxt".format(save_dir)  
  257. # snapshot prefix.  
  258. snapshot_prefix = "{}/{}".format(snapshot_dir, model_name)  
  259. # job script path.  
  260. job_file = "{}/{}.sh".format(job_dir, model_name)  
  261.   
  262. # Stores the test image names and sizes. Created by data/VOC0712/create_list.sh  
  263. name_size_file = "{}/doc/val_name_size.txt".format(root)    ####改  
  264. # The pretrained model. We use the Fully convolutional reduced (atrous) VGGNet.  
  265. pretrain_model = "{}/doc/VGG_ILSVRC_16_layers_fc_reduced.caffemodel".format(root)    ####改  
  266. # Stores LabelMapItem.  
  267. label_map_file = "{}/doc/labelmap_cup.prototxt".format(root)    ####改  
  268.   
  269. # MultiBoxLoss parameters.  
  270. num_classes = 7    ####改  
  271. share_location = True  
  272. background_label_id=0  
  273. train_on_diff_gt = True  
  274. normalization_mode = P.Loss.VALID  
  275. code_type = P.PriorBox.CENTER_SIZE  
  276. ignore_cross_boundary_bbox = False  
  277. mining_type = P.MultiBoxLoss.MAX_NEGATIVE  
  278. neg_pos_ratio = 3.  
  279. loc_weight = (neg_pos_ratio + 1.) / 4.  
  280. multibox_loss_param = {  
  281.     'loc_loss_type': P.MultiBoxLoss.SMOOTH_L1,  
  282.     'conf_loss_type': P.MultiBoxLoss.SOFTMAX,  
  283.     'loc_weight': loc_weight,  
  284.     'num_classes': num_classes,  
  285.     'share_location': share_location,  
  286.     'match_type': P.MultiBoxLoss.PER_PREDICTION,  
  287.     'overlap_threshold'0.5,  
  288.     'use_prior_for_matching'True,  
  289.     'background_label_id': background_label_id,  
  290.     'use_difficult_gt': train_on_diff_gt,  
  291.     'mining_type': mining_type,  
  292.     'neg_pos_ratio': neg_pos_ratio,  
  293.     'neg_overlap'0.5,  
  294.     'code_type': code_type,  
  295.     'ignore_cross_boundary_bbox': ignore_cross_boundary_bbox,  
  296.     }  
  297. loss_param = {  
  298.     'normalization': normalization_mode,  
  299.     }  
  300.   
  301. # parameters for generating priors.  
  302. # minimum dimension of input image  
  303. min_dim = 300  
  304. # conv4_3 ==> 38 x 38  
  305. # fc7 ==> 19 x 19  
  306. # conv6_2 ==> 10 x 10  
  307. # conv7_2 ==> 5 x 5  
  308. # conv8_2 ==> 3 x 3  
  309. # conv9_2 ==> 1 x 1  
  310. mbox_source_layers = ['conv4_3''fc7''conv6_2''conv7_2''conv8_2''conv9_2']  
  311. # in percent %  
  312. min_ratio = 20  
  313. max_ratio = 90  
  314. step = int(math.floor((max_ratio - min_ratio) / (len(mbox_source_layers) - 2)))  
  315. min_sizes = []  
  316. max_sizes = []  
  317. for ratio in xrange(min_ratio, max_ratio + 1, step):  
  318.   min_sizes.append(min_dim * ratio / 100.)  
  319.   max_sizes.append(min_dim * (ratio + step) / 100.)  
  320. min_sizes = [min_dim * 10 / 100.] + min_sizes  
  321. max_sizes = [min_dim * 20 / 100.] + max_sizes  
  322. steps = [8163264100300]  
  323. aspect_ratios = [[2], [23], [23], [23], [2], [2]]  
  324. # L2 normalize conv4_3.  
  325. normalizations = [20, -1, -1, -1, -1, -1]  
  326. # variance used to encode/decode prior bboxes.  
  327. if code_type == P.PriorBox.CENTER_SIZE:  
  328.   prior_variance = [0.10.10.20.2]  
  329. else:  
  330.   prior_variance = [0.1]  
  331. flip = True  
  332. clip = False  
  333.   
  334. # Solver parameters.  
  335. # Defining which GPUs to use.  
  336. gpus = "7"    ####改  
  337. gpulist = gpus.split(",")  
  338. num_gpus = len(gpulist)  
  339.   
  340. # Divide the mini-batch to different GPUs.  
  341. batch_size = 32  
  342. accum_batch_size = 32  
  343. iter_size = accum_batch_size / batch_size  
  344. solver_mode = P.Solver.CPU  
  345. device_id = 0  
  346. batch_size_per_device = batch_size  
  347. if num_gpus > 0:  
  348.   batch_size_per_device = int(math.ceil(float(batch_size) / num_gpus))  
  349.   iter_size = int(math.ceil(float(accum_batch_size) / (batch_size_per_device * num_gpus)))  
  350.   solver_mode = P.Solver.GPU  
  351.   device_id = int(gpulist[0])  
  352.   
  353. if normalization_mode == P.Loss.NONE:  
  354.   base_lr /= batch_size_per_device  
  355. elif normalization_mode == P.Loss.VALID:  
  356.   base_lr *= 25. / loc_weight  
  357. elif normalization_mode == P.Loss.FULL:  
  358.   # Roughly there are 2000 prior bboxes per image.  
  359.   # TODO(weiliu89): Estimate the exact # of priors.  
  360.   base_lr *= 2000.  
  361.   
  362. # Evaluate on whole test set.  
  363. num_test_image = 2000    ####改  
  364. test_batch_size = 8  
  365. # Ideally test_batch_size should be divisible by num_test_image,  
  366. # otherwise mAP will be slightly off the true value.  
  367. test_iter = int(math.ceil(float(num_test_image) / test_batch_size))  
  368.   
  369. solver_param = {  
  370.     # Train parameters  
  371.     'base_lr': base_lr,  
  372.     'weight_decay'0.0005,  
  373.     'lr_policy'"multistep",  
  374.     'stepvalue': [80000100000120000],  
  375.     'gamma'0.1,  
  376.     'momentum'0.9,  
  377.     'iter_size': iter_size,  
  378.     'max_iter'120000,  
  379.     'snapshot'80000,  
  380.     'display'10,  
  381.     'average_loss'10,  
  382.     'type'"SGD",  
  383.     'solver_mode': solver_mode,  
  384.     'device_id': device_id,  
  385.     'debug_info'False,  
  386.     'snapshot_after_train'True,  
  387.     # Test parameters  
  388.     'test_iter': [test_iter],  
  389.     'test_interval'100,  
  390.     'eval_type'"detection",  
  391.     'ap_version'"11point",  
  392.     'test_initialization'True,  
  393.     }  
  394.   
  395. # parameters for generating detection output.  
  396. det_out_param = {  
  397.     'num_classes': num_classes,  
  398.     'share_location': share_location,  
  399.     'background_label_id': background_label_id,  
  400.     'nms_param': {'nms_threshold'0.45'top_k'400},  
  401.     'save_output_param': {  
  402.         'output_directory': output_result_dir,  
  403.         'output_name_prefix'"comp4_det_test_",  
  404.         'output_format'"VOC",  
  405.         'label_map_file': label_map_file,  
  406.         'name_size_file': name_size_file,  
  407.         'num_test_image': num_test_image,  
  408.         },  
  409.     'keep_top_k'200,  
  410.     'confidence_threshold'0.01,  
  411.     'code_type': code_type,  
  412.     }  
  413.   
  414. # parameters for evaluating detection results.  
  415. det_eval_param = {  
  416.     'num_classes': num_classes,  
  417.     'background_label_id': background_label_id,  
  418.     'overlap_threshold'0.5,  
  419.     'evaluate_difficult_gt'False,  
  420.     'name_size_file': name_size_file,  
  421.     }  
  422.   
  423. ### Hopefully you don't need to change the following ###  
  424. # Check file.  
  425. check_if_exist(train_data)  
  426. check_if_exist(test_data)  
  427. check_if_exist(label_map_file)  
  428. check_if_exist(pretrain_model)  
  429. make_if_not_exist(save_dir)  
  430. make_if_not_exist(job_dir)  
  431. make_if_not_exist(snapshot_dir)  
  432.   
  433. # Create train net.  
  434. net = caffe.NetSpec()  
  435. net.data, net.label = CreateAnnotatedDataLayer(train_data, batch_size=batch_size_per_device,  
  436.         train=True, output_label=True, label_map_file=label_map_file,  
  437.         transform_param=train_transform_param, batch_sampler=batch_sampler)  
  438.   
  439. VGGNetBody(net, from_layer='data', fully_conv=True, reduced=True, dilated=True,  
  440.     dropout=False)  
  441.   
  442. AddExtraLayers(net, use_batchnorm, lr_mult=lr_mult)  
  443.   
  444. mbox_layers = CreateMultiBoxHead(net, data_layer='data', from_layers=mbox_source_layers,  
  445.         use_batchnorm=use_batchnorm, min_sizes=min_sizes, max_sizes=max_sizes,  
  446.         aspect_ratios=aspect_ratios, steps=steps, normalizations=normalizations,  
  447.         num_classes=num_classes, share_location=share_location, flip=flip, clip=clip,  
  448.         prior_variance=prior_variance, kernel_size=3, pad=1, lr_mult=lr_mult)  
  449.   
  450. # Create the MultiBoxLossLayer.  
  451. name = "mbox_loss"  
  452. mbox_layers.append(net.label)  
  453. net[name] = L.MultiBoxLoss(*mbox_layers, multibox_loss_param=multibox_loss_param,  
  454.         loss_param=loss_param, include=dict(phase=caffe_pb2.Phase.Value('TRAIN')),  
  455.         propagate_down=[TrueTrueFalseFalse])  
  456.   
  457. with open(train_net_file, 'w') as f:  
  458.     print('name: "{}_train"'.format(model_name), file=f)  
  459.     print(net.to_proto(), file=f)  
  460. shutil.copy(train_net_file, job_dir)  
  461.   
  462. # Create test net.  
  463. net = caffe.NetSpec()  
  464. net.data, net.label = CreateAnnotatedDataLayer(test_data, batch_size=test_batch_size,  
  465.         train=False, output_label=True, label_map_file=label_map_file,  
  466.         transform_param=test_transform_param)  
  467.   
  468. VGGNetBody(net, from_layer='data', fully_conv=True, reduced=True, dilated=True,  
  469.     dropout=False)  
  470.   
  471. AddExtraLayers(net, use_batchnorm, lr_mult=lr_mult)  
  472.   
  473. mbox_layers = CreateMultiBoxHead(net, data_layer='data', from_layers=mbox_source_layers,  
  474.         use_batchnorm=use_batchnorm, min_sizes=min_sizes, max_sizes=max_sizes,  
  475.         aspect_ratios=aspect_ratios, steps=steps, normalizations=normalizations,  
  476.         num_classes=num_classes, share_location=share_location, flip=flip, clip=clip,  
  477.         prior_variance=prior_variance, kernel_size=3, pad=1, lr_mult=lr_mult)  
  478.   
  479. conf_name = "mbox_conf"  
  480. if multibox_loss_param["conf_loss_type"] == P.MultiBoxLoss.SOFTMAX:  
  481.   reshape_name = "{}_reshape".format(conf_name)  
  482.   net[reshape_name] = L.Reshape(net[conf_name], shape=dict(dim=[0, -1, num_classes]))  
  483.   softmax_name = "{}_softmax".format(conf_name)  
  484.   net[softmax_name] = L.Softmax(net[reshape_name], axis=2)  
  485.   flatten_name = "{}_flatten".format(conf_name)  
  486.   net[flatten_name] = L.Flatten(net[softmax_name], axis=1)  
  487.   mbox_layers[1] = net[flatten_name]  
  488. elif multibox_loss_param["conf_loss_type"] == P.MultiBoxLoss.LOGISTIC:  
  489.   sigmoid_name = "{}_sigmoid".format(conf_name)  
  490.   net[sigmoid_name] = L.Sigmoid(net[conf_name])  
  491.   mbox_layers[1] = net[sigmoid_name]  
  492.   
  493. net.detection_out = L.DetectionOutput(*mbox_layers,  
  494.     detection_output_param=det_out_param,  
  495.     include=dict(phase=caffe_pb2.Phase.Value('TEST')))  
  496. net.detection_eval = L.DetectionEvaluate(net.detection_out, net.label,  
  497.     detection_evaluate_param=det_eval_param,  
  498.     include=dict(phase=caffe_pb2.Phase.Value('TEST')))  
  499.   
  500. with open(test_net_file, 'w') as f:  
  501.     print('name: "{}_test"'.format(model_name), file=f)  
  502.     print(net.to_proto(), file=f)  
  503. shutil.copy(test_net_file, job_dir)  
  504.   
  505. # Create deploy net.  
  506. # Remove the first and last layer from test net.  
  507. deploy_net = net  
  508. with open(deploy_net_file, 'w') as f:  
  509.     net_param = deploy_net.to_proto()  
  510.     # Remove the first (AnnotatedData) and last (DetectionEvaluate) layer from test net.  
  511.     del net_param.layer[0]  
  512.     del net_param.layer[-1]  
  513.     net_param.name = '{}_deploy'.format(model_name)  
  514.     net_param.input.extend(['data'])  
  515.     net_param.input_shape.extend([  
  516.         caffe_pb2.BlobShape(dim=[13, resize_height, resize_width])])  
  517.     print(net_param, file=f)  
  518. shutil.copy(deploy_net_file, job_dir)  
  519.   
  520. # Create solver.  
  521. solver = caffe_pb2.SolverParameter(  
  522.         train_net=train_net_file,  
  523.         test_net=[test_net_file],  
  524.         snapshot_prefix=snapshot_prefix,  
  525.         **solver_param)  
  526.   
  527. with open(solver_file, 'w') as f:  
  528.     print(solver, file=f)  
  529. shutil.copy(solver_file, job_dir)  
  530.   
  531. max_iter = 0  
  532. # Find most recent snapshot.  
  533. for file in os.listdir(snapshot_dir):  
  534.   if file.endswith(".solverstate"):  
  535.     basename = os.path.splitext(file)[0]  
  536.     iter = int(basename.split("{}_iter_".format(model_name))[1])  
  537.     if iter > max_iter:  
  538.       max_iter = iter  
  539.   
  540. train_src_param = '--weights="{}" \\\n'.format(pretrain_model)  
  541. if resume_training:  
  542.   if max_iter > 0:  
  543.     train_src_param = '--snapshot="{}_iter_{}.solverstate" \\\n'.format(snapshot_prefix, max_iter)  
  544.   
  545. if remove_old_models:  
  546.   # Remove any snapshots smaller than max_iter.  
  547.   for file in os.listdir(snapshot_dir):  
  548.     if file.endswith(".solverstate"):  
  549.       basename = os.path.splitext(file)[0]  
  550.       iter = int(basename.split("{}_iter_".format(model_name))[1])  
  551.       if max_iter > iter:  
  552.         os.remove("{}/{}".format(snapshot_dir, file))  
  553.     if file.endswith(".caffemodel"):  
  554.       basename = os.path.splitext(file)[0]  
  555.       iter = int(basename.split("{}_iter_".format(model_name))[1])  
  556.       if max_iter > iter:  
  557.         os.remove("{}/{}".format(snapshot_dir, file))  
  558.   
  559. # Create job file.  
  560. with open(job_file, 'w') as f:  
  561.   f.write('cd {}\n'.format(caffe_root))  
  562.   f.write('./build/tools/caffe train \\\n')  
  563.   f.write('--solver="{}" \\\n'.format(solver_file))  
  564.   f.write(train_src_param)  
  565.   if solver_param['solver_mode'] == P.Solver.GPU:  
  566.     f.write('--gpu {} 2>&1 | tee {}/{}.log\n'.format(gpus, job_dir, model_name))  
  567.   else:  
  568.     f.write('2>&1 | tee {}/{}.log\n'.format(job_dir, model_name))  
  569.   
  570. # Copy the python script to job_dir.  
  571. py_file = os.path.abspath(__file__)  
  572. shutil.copy(py_file, job_dir)  
  573.   
  574. # Run the job.  
  575. os.chmod(job_file, stat.S_IRWXU)  
  576. if run_soon:  
  577.   subprocess.call(job_file, shell=True)  
from __future__ import print_function
import sys
sys.path.append("/home/hans/caffe-ssd/python")  #####改
import caffe
from caffe.model_libs import *
from google.protobuf import text_format

import math
import os
import shutil
import stat
import subprocess

# Add extra layers on top of a "base" network (e.g. VGGNet or Inception).
def AddExtraLayers(net, use_batchnorm=True, lr_mult=1):
    use_relu = True

    # Add additional convolutional layers.
    # 19 x 19
    from_layer = net.keys()[-1]

    # TODO(weiliu89): Construct the name using the last layer to avoid duplication.
    # 10 x 10
    out_layer = "conv6_1"
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 1, 0, 1,
        lr_mult=lr_mult)

    from_layer = out_layer
    out_layer = "conv6_2"
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 512, 3, 1, 2,
        lr_mult=lr_mult)

    # 5 x 5
    from_layer = out_layer
    out_layer = "conv7_1"
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128, 1, 0, 1,
      lr_mult=lr_mult)

    from_layer = out_layer
    out_layer = "conv7_2"
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 3, 1, 2,
      lr_mult=lr_mult)

    # 3 x 3
    from_layer = out_layer
    out_layer = "conv8_1"
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128, 1, 0, 1,
      lr_mult=lr_mult)

    from_layer = out_layer
    out_layer = "conv8_2"
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 3, 0, 1,
      lr_mult=lr_mult)

    # 1 x 1
    from_layer = out_layer
    out_layer = "conv9_1"
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128, 1, 0, 1,
      lr_mult=lr_mult)

    from_layer = out_layer
    out_layer = "conv9_2"
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 3, 0, 1,
      lr_mult=lr_mult)

    return net


### Modify the following parameters accordingly ###
# The directory which contains the caffe code.
# We assume you are running the script at the CAFFE_ROOT.
caffe_root = "/home/hans/caffe-ssd"    #####改

# Set true if you want to start training right after generating all files.
run_soon = True
# Set true if you want to load from most recently saved snapshot.
# Otherwise, we will load from the pretrain_model defined below.
resume_training = True
# If true, Remove old model files.
remove_old_models = False

# The database file for training data. Created by data/VOC0712/create_data.sh
train_data = "/home/hans/data/ImageNet/Detection/cup/doc/train_lmdb"   #########改
# The database file for testing data. Created by data/VOC0712/create_data.sh
test_data = "/home/hans/data/ImageNet/Detection/cup/doc/val_lmdb"    ########改
# Specify the batch sampler.
resize_width = 300
resize_height = 300
resize = "{}x{}".format(resize_width, resize_height)
batch_sampler = [
        {
                'sampler': {
                        },
                'max_trials': 1,
                'max_sample': 1,
        },
        {
                'sampler': {
                        'min_scale': 0.3,
                        'max_scale': 1.0,
                        'min_aspect_ratio': 0.5,
                        'max_aspect_ratio': 2.0,
                        },
                'sample_constraint': {
                        'min_jaccard_overlap': 0.1,
                        },
                'max_trials': 50,
                'max_sample': 1,
        },
        {
                'sampler': {
                        'min_scale': 0.3,
                        'max_scale': 1.0,
                        'min_aspect_ratio': 0.5,
                        'max_aspect_ratio': 2.0,
                        },
                'sample_constraint': {
                        'min_jaccard_overlap': 0.3,
                        },
                'max_trials': 50,
                'max_sample': 1,
        },
        {
                'sampler': {
                        'min_scale': 0.3,
                        'max_scale': 1.0,
                        'min_aspect_ratio': 0.5,
                        'max_aspect_ratio': 2.0,
                        },
                'sample_constraint': {
                        'min_jaccard_overlap': 0.5,
                        },
                'max_trials': 50,
                'max_sample': 1,
        },
        {
                'sampler': {
                        'min_scale': 0.3,
                        'max_scale': 1.0,
                        'min_aspect_ratio': 0.5,
                        'max_aspect_ratio': 2.0,
                        },
                'sample_constraint': {
                        'min_jaccard_overlap': 0.7,
                        },
                'max_trials': 50,
                'max_sample': 1,
        },
        {
                'sampler': {
                        'min_scale': 0.3,
                        'max_scale': 1.0,
                        'min_aspect_ratio': 0.5,
                        'max_aspect_ratio': 2.0,
                        },
                'sample_constraint': {
                        'min_jaccard_overlap': 0.9,
                        },
                'max_trials': 50,
                'max_sample': 1,
        },
        {
                'sampler': {
                        'min_scale': 0.3,
                        'max_scale': 1.0,
                        'min_aspect_ratio': 0.5,
                        'max_aspect_ratio': 2.0,
                        },
                'sample_constraint': {
                        'max_jaccard_overlap': 1.0,
                        },
                'max_trials': 50,
                'max_sample': 1,
        },
        ]
train_transform_param = {
        'mirror': True,
        'mean_value': [104, 117, 123],
        'force_color': True,  ####改
        'resize_param': {
                'prob': 1,
                'resize_mode': P.Resize.WARP,
                'height': resize_height,
                'width': resize_width,
                'interp_mode': [
                        P.Resize.LINEAR,
                        P.Resize.AREA,
                        P.Resize.NEAREST,
                        P.Resize.CUBIC,
                        P.Resize.LANCZOS4,
                        ],
                },
        'distort_param': {
                'brightness_prob': 0.5,
                'brightness_delta': 32,
                'contrast_prob': 0.5,
                'contrast_lower': 0.5,
                'contrast_upper': 1.5,
                'hue_prob': 0.5,
                'hue_delta': 18,
                'saturation_prob': 0.5,
                'saturation_lower': 0.5,
                'saturation_upper': 1.5,
                'random_order_prob': 0.0,
                },
        'expand_param': {
                'prob': 0.5,
                'max_expand_ratio': 4.0,
                },
        'emit_constraint': {
            'emit_type': caffe_pb2.EmitConstraint.CENTER,
            }
        }
test_transform_param = {
        'mean_value': [104, 117, 123],
        'force_color': True,    ####改
        'resize_param': {
                'prob': 1,
                'resize_mode': P.Resize.WARP,
                'height': resize_height,
                'width': resize_width,
                'interp_mode': [P.Resize.LINEAR],
                },
        }

# If true, use batch norm for all newly added layers.
# Currently only the non batch norm version has been tested.
use_batchnorm = False
lr_mult = 1
# Use different initial learning rate.
if use_batchnorm:
    base_lr = 0.0004
else:
    # A learning rate for batch_size = 1, num_gpus = 1.
    base_lr = 0.00004

root = "/home/hans/data/ImageNet/Detection/cup"    ####改
# Modify the job name if you want.
job_name = "SSD_{}".format(resize)   ####改
# The name of the model. Modify it if you want.
model_name = "VGG_CUP_{}".format(job_name)    ####改

# Directory which stores the model .prototxt file.
save_dir = "{}/doc/{}".format(root, job_name)    ####改
# Directory which stores the snapshot of models.
snapshot_dir = "{}/models/{}".format(root, job_name)    ####改
# Directory which stores the job script and log file.
job_dir = "{}/jobs/{}".format(root, job_name)    ####改
# Directory which stores the detection results.
output_result_dir = "{}/results/{}".format(root, job_name)    ####改

# model definition files.
train_net_file = "{}/train.prototxt".format(save_dir)
test_net_file = "{}/test.prototxt".format(save_dir)
deploy_net_file = "{}/deploy.prototxt".format(save_dir)
solver_file = "{}/solver.prototxt".format(save_dir)
# snapshot prefix.
snapshot_prefix = "{}/{}".format(snapshot_dir, model_name)
# job script path.
job_file = "{}/{}.sh".format(job_dir, model_name)

# Stores the test image names and sizes. Created by data/VOC0712/create_list.sh
name_size_file = "{}/doc/val_name_size.txt".format(root)    ####改
# The pretrained model. We use the Fully convolutional reduced (atrous) VGGNet.
pretrain_model = "{}/doc/VGG_ILSVRC_16_layers_fc_reduced.caffemodel".format(root)    ####改
# Stores LabelMapItem.
label_map_file = "{}/doc/labelmap_cup.prototxt".format(root)    ####改

# MultiBoxLoss parameters.
num_classes = 7    ####改
share_location = True
background_label_id=0
train_on_diff_gt = True
normalization_mode = P.Loss.VALID
code_type = P.PriorBox.CENTER_SIZE
ignore_cross_boundary_bbox = False
mining_type = P.MultiBoxLoss.MAX_NEGATIVE
neg_pos_ratio = 3.
loc_weight = (neg_pos_ratio + 1.) / 4.
multibox_loss_param = {
    'loc_loss_type': P.MultiBoxLoss.SMOOTH_L1,
    'conf_loss_type': P.MultiBoxLoss.SOFTMAX,
    'loc_weight': loc_weight,
    'num_classes': num_classes,
    'share_location': share_location,
    'match_type': P.MultiBoxLoss.PER_PREDICTION,
    'overlap_threshold': 0.5,
    'use_prior_for_matching': True,
    'background_label_id': background_label_id,
    'use_difficult_gt': train_on_diff_gt,
    'mining_type': mining_type,
    'neg_pos_ratio': neg_pos_ratio,
    'neg_overlap': 0.5,
    'code_type': code_type,
    'ignore_cross_boundary_bbox': ignore_cross_boundary_bbox,
    }
loss_param = {
    'normalization': normalization_mode,
    }

# parameters for generating priors.
# minimum dimension of input image
min_dim = 300
# conv4_3 ==> 38 x 38
# fc7 ==> 19 x 19
# conv6_2 ==> 10 x 10
# conv7_2 ==> 5 x 5
# conv8_2 ==> 3 x 3
# conv9_2 ==> 1 x 1
mbox_source_layers = ['conv4_3', 'fc7', 'conv6_2', 'conv7_2', 'conv8_2', 'conv9_2']
# in percent %
min_ratio = 20
max_ratio = 90
step = int(math.floor((max_ratio - min_ratio) / (len(mbox_source_layers) - 2)))
min_sizes = []
max_sizes = []
for ratio in xrange(min_ratio, max_ratio + 1, step):
  min_sizes.append(min_dim * ratio / 100.)
  max_sizes.append(min_dim * (ratio + step) / 100.)
min_sizes = [min_dim * 10 / 100.] + min_sizes
max_sizes = [min_dim * 20 / 100.] + max_sizes
steps = [8, 16, 32, 64, 100, 300]
aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]]
# L2 normalize conv4_3.
normalizations = [20, -1, -1, -1, -1, -1]
# variance used to encode/decode prior bboxes.
if code_type == P.PriorBox.CENTER_SIZE:
  prior_variance = [0.1, 0.1, 0.2, 0.2]
else:
  prior_variance = [0.1]
flip = True
clip = False

# Solver parameters.
# Defining which GPUs to use.
gpus = "7"    ####改
gpulist = gpus.split(",")
num_gpus = len(gpulist)

# Divide the mini-batch to different GPUs.
batch_size = 32
accum_batch_size = 32
iter_size = accum_batch_size / batch_size
solver_mode = P.Solver.CPU
device_id = 0
batch_size_per_device = batch_size
if num_gpus > 0:
  batch_size_per_device = int(math.ceil(float(batch_size) / num_gpus))
  iter_size = int(math.ceil(float(accum_batch_size) / (batch_size_per_device * num_gpus)))
  solver_mode = P.Solver.GPU
  device_id = int(gpulist[0])

if normalization_mode == P.Loss.NONE:
  base_lr /= batch_size_per_device
elif normalization_mode == P.Loss.VALID:
  base_lr *= 25. / loc_weight
elif normalization_mode == P.Loss.FULL:
  # Roughly there are 2000 prior bboxes per image.
  # TODO(weiliu89): Estimate the exact # of priors.
  base_lr *= 2000.

# Evaluate on whole test set.
num_test_image = 2000    ####改
test_batch_size = 8
# Ideally test_batch_size should be divisible by num_test_image,
# otherwise mAP will be slightly off the true value.
test_iter = int(math.ceil(float(num_test_image) / test_batch_size))

solver_param = {
    # Train parameters
    'base_lr': base_lr,
    'weight_decay': 0.0005,
    'lr_policy': "multistep",
    'stepvalue': [80000, 100000, 120000],
    'gamma': 0.1,
    'momentum': 0.9,
    'iter_size': iter_size,
    'max_iter': 120000,
    'snapshot': 80000,
    'display': 10,
    'average_loss': 10,
    'type': "SGD",
    'solver_mode': solver_mode,
    'device_id': device_id,
    'debug_info': False,
    'snapshot_after_train': True,
    # Test parameters
    'test_iter': [test_iter],
    'test_interval': 100,
    'eval_type': "detection",
    'ap_version': "11point",
    'test_initialization': True,
    }

# parameters for generating detection output.
det_out_param = {
    'num_classes': num_classes,
    'share_location': share_location,
    'background_label_id': background_label_id,
    'nms_param': {'nms_threshold': 0.45, 'top_k': 400},
    'save_output_param': {
        'output_directory': output_result_dir,
        'output_name_prefix': "comp4_det_test_",
        'output_format': "VOC",
        'label_map_file': label_map_file,
        'name_size_file': name_size_file,
        'num_test_image': num_test_image,
        },
    'keep_top_k': 200,
    'confidence_threshold': 0.01,
    'code_type': code_type,
    }

# parameters for evaluating detection results.
det_eval_param = {
    'num_classes': num_classes,
    'background_label_id': background_label_id,
    'overlap_threshold': 0.5,
    'evaluate_difficult_gt': False,
    'name_size_file': name_size_file,
    }

### Hopefully you don't need to change the following ###
# Check file.
check_if_exist(train_data)
check_if_exist(test_data)
check_if_exist(label_map_file)
check_if_exist(pretrain_model)
make_if_not_exist(save_dir)
make_if_not_exist(job_dir)
make_if_not_exist(snapshot_dir)

# Create train net.
net = caffe.NetSpec()
net.data, net.label = CreateAnnotatedDataLayer(train_data, batch_size=batch_size_per_device,
        train=True, output_label=True, label_map_file=label_map_file,
        transform_param=train_transform_param, batch_sampler=batch_sampler)

VGGNetBody(net, from_layer='data', fully_conv=True, reduced=True, dilated=True,
    dropout=False)

AddExtraLayers(net, use_batchnorm, lr_mult=lr_mult)

mbox_layers = CreateMultiBoxHead(net, data_layer='data', from_layers=mbox_source_layers,
        use_batchnorm=use_batchnorm, min_sizes=min_sizes, max_sizes=max_sizes,
        aspect_ratios=aspect_ratios, steps=steps, normalizations=normalizations,
        num_classes=num_classes, share_location=share_location, flip=flip, clip=clip,
        prior_variance=prior_variance, kernel_size=3, pad=1, lr_mult=lr_mult)

# Create the MultiBoxLossLayer.
name = "mbox_loss"
mbox_layers.append(net.label)
net[name] = L.MultiBoxLoss(*mbox_layers, multibox_loss_param=multibox_loss_param,
        loss_param=loss_param, include=dict(phase=caffe_pb2.Phase.Value('TRAIN')),
        propagate_down=[True, True, False, False])

with open(train_net_file, 'w') as f:
    print('name: "{}_train"'.format(model_name), file=f)
    print(net.to_proto(), file=f)
shutil.copy(train_net_file, job_dir)

# Create test net.
net = caffe.NetSpec()
net.data, net.label = CreateAnnotatedDataLayer(test_data, batch_size=test_batch_size,
        train=False, output_label=True, label_map_file=label_map_file,
        transform_param=test_transform_param)

VGGNetBody(net, from_layer='data', fully_conv=True, reduced=True, dilated=True,
    dropout=False)

AddExtraLayers(net, use_batchnorm, lr_mult=lr_mult)

mbox_layers = CreateMultiBoxHead(net, data_layer='data', from_layers=mbox_source_layers,
        use_batchnorm=use_batchnorm, min_sizes=min_sizes, max_sizes=max_sizes,
        aspect_ratios=aspect_ratios, steps=steps, normalizations=normalizations,
        num_classes=num_classes, share_location=share_location, flip=flip, clip=clip,
        prior_variance=prior_variance, kernel_size=3, pad=1, lr_mult=lr_mult)

conf_name = "mbox_conf"
if multibox_loss_param["conf_loss_type"] == P.MultiBoxLoss.SOFTMAX:
  reshape_name = "{}_reshape".format(conf_name)
  net[reshape_name] = L.Reshape(net[conf_name], shape=dict(dim=[0, -1, num_classes]))
  softmax_name = "{}_softmax".format(conf_name)
  net[softmax_name] = L.Softmax(net[reshape_name], axis=2)
  flatten_name = "{}_flatten".format(conf_name)
  net[flatten_name] = L.Flatten(net[softmax_name], axis=1)
  mbox_layers[1] = net[flatten_name]
elif multibox_loss_param["conf_loss_type"] == P.MultiBoxLoss.LOGISTIC:
  sigmoid_name = "{}_sigmoid".format(conf_name)
  net[sigmoid_name] = L.Sigmoid(net[conf_name])
  mbox_layers[1] = net[sigmoid_name]

net.detection_out = L.DetectionOutput(*mbox_layers,
    detection_output_param=det_out_param,
    include=dict(phase=caffe_pb2.Phase.Value('TEST')))
net.detection_eval = L.DetectionEvaluate(net.detection_out, net.label,
    detection_evaluate_param=det_eval_param,
    include=dict(phase=caffe_pb2.Phase.Value('TEST')))

with open(test_net_file, 'w') as f:
    print('name: "{}_test"'.format(model_name), file=f)
    print(net.to_proto(), file=f)
shutil.copy(test_net_file, job_dir)

# Create deploy net.
# Remove the first and last layer from test net.
deploy_net = net
with open(deploy_net_file, 'w') as f:
    net_param = deploy_net.to_proto()
    # Remove the first (AnnotatedData) and last (DetectionEvaluate) layer from test net.
    del net_param.layer[0]
    del net_param.layer[-1]
    net_param.name = '{}_deploy'.format(model_name)
    net_param.input.extend(['data'])
    net_param.input_shape.extend([
        caffe_pb2.BlobShape(dim=[1, 3, resize_height, resize_width])])
    print(net_param, file=f)
shutil.copy(deploy_net_file, job_dir)

# Create solver.
solver = caffe_pb2.SolverParameter(
        train_net=train_net_file,
        test_net=[test_net_file],
        snapshot_prefix=snapshot_prefix,
        **solver_param)

with open(solver_file, 'w') as f:
    print(solver, file=f)
shutil.copy(solver_file, job_dir)

max_iter = 0
# Find most recent snapshot.
for file in os.listdir(snapshot_dir):
  if file.endswith(".solverstate"):
    basename = os.path.splitext(file)[0]
    iter = int(basename.split("{}_iter_".format(model_name))[1])
    if iter > max_iter:
      max_iter = iter

train_src_param = '--weights="{}" \\\n'.format(pretrain_model)
if resume_training:
  if max_iter > 0:
    train_src_param = '--snapshot="{}_iter_{}.solverstate" \\\n'.format(snapshot_prefix, max_iter)

if remove_old_models:
  # Remove any snapshots smaller than max_iter.
  for file in os.listdir(snapshot_dir):
    if file.endswith(".solverstate"):
      basename = os.path.splitext(file)[0]
      iter = int(basename.split("{}_iter_".format(model_name))[1])
      if max_iter > iter:
        os.remove("{}/{}".format(snapshot_dir, file))
    if file.endswith(".caffemodel"):
      basename = os.path.splitext(file)[0]
      iter = int(basename.split("{}_iter_".format(model_name))[1])
      if max_iter > iter:
        os.remove("{}/{}".format(snapshot_dir, file))

# Create job file.
with open(job_file, 'w') as f:
  f.write('cd {}\n'.format(caffe_root))
  f.write('./build/tools/caffe train \\\n')
  f.write('--solver="{}" \\\n'.format(solver_file))
  f.write(train_src_param)
  if solver_param['solver_mode'] == P.Solver.GPU:
    f.write('--gpu {} 2>&1 | tee {}/{}.log\n'.format(gpus, job_dir, model_name))
  else:
    f.write('2>&1 | tee {}/{}.log\n'.format(job_dir, model_name))

# Copy the python script to job_dir.
py_file = os.path.abspath(__file__)
shutil.copy(py_file, job_dir)

# Run the job.
os.chmod(job_file, stat.S_IRWXU)
if run_soon:
  subprocess.call(job_file, shell=True)

上面237行到267行你就慢慢搞吧。当然如果你按照我的文件夹布局来的话,只需要修改237行。

修改179行,是因为训练阶段出现“”OpenCV Error: Assertion failed ((scn == 3 || scn == 4) && (depth == CV_8U ||............" 这个错误。

修改216行,是因为验证阶段出现“Check failed:std::equal(top_shape.begin()+1,top_shape.begin()+4,shape.begin()+1)”这个错误。

修改270行,你的类别数+1。注意这个类别数是labelmap_cup.prototxt中最大索引+1。

修改363行,为你的测试集图片数量。

其他要修改的看上面代码吧。我都标记好了。

其余的参数调节就自己看代码改吧,也不难。

最后运行开始训练。


七、训练输出可视化(2017.11.02)

拿之前给caffe用的改了改。

有一个变动就是增加了一个 倍数time 的变量,因为有时候输出波动太大,按一定倍数取平均会让曲线平滑一点。

第一个参数是log文件路径。

需要修改代码中display和test_iterval的数值个solver.prototxt中一致。

time是倍数,想看原始数据曲线的话就设置为1。

代码:

[python] view plain copy print ?
  1. #!/usr/bin/env python2  
  2. # -*- coding: utf-8 -*-  
  3. """ 
  4. Created on Thu Nov  2 14:35:42 2017 
  5.  
  6. @author: hans 
  7.  
  8. http://blog.csdn.net/renhanchi 
  9. """  
  10.   
  11. import matplotlib.pyplot as plt  
  12. import numpy as np  
  13. import commands  
  14. import argparse  
  15.   
  16. parser = argparse.ArgumentParser()  
  17. parser.add_argument(  
  18.     '-p','--log_path',  
  19.     type = str,  
  20.     default = '',  
  21.     help = """\ 
  22.     path to log file\ 
  23.     """  
  24. )  
  25.   
  26. FLAGS = parser.parse_args()  
  27.   
  28. train_log_file = FLAGS.log_path  
  29.   
  30.   
  31. display = 10 #solver  
  32. test_interval = 100 #solver  
  33.   
  34. time = 5  
  35.   
  36. train_output = commands.getoutput("cat " + train_log_file + " | grep 'Train net output #0' | awk '{print $11}'")  #train mbox_loss  
  37. accu_output = commands.getoutput("cat " + train_log_file + " | grep 'Test net output #0' | awk '{print $11}'") #test detection_eval  
  38.   
  39. train_loss = train_output.split("\n")  
  40. test_accu = accu_output.split("\n")  
  41.     
  42. def reduce_data(data):  
  43.   iteration = len(data)/time*time  
  44.   _data = data[0:iteration]  
  45.   if time > 1:  
  46.     data_ = []  
  47.     for i in np.arange(len(data)/time):  
  48.       sum_data = 0  
  49.       for j in np.arange(time):  
  50.         index = i*time + j  
  51.         sum_data += float(_data[index])  
  52.       data_.append(sum_data/float(time))  
  53.   else:  
  54.     data_ = data  
  55.   return data_  
  56.   
  57. train_loss_ = reduce_data(train_loss)  
  58. test_accu_ = reduce_data(test_accu)  
  59.   
  60. _,ax1 = plt.subplots()  
  61. ax2 = ax1.twinx()  
  62.   
  63. ax1.plot(time*display*np.arange(len(train_loss_)), train_loss_)  
  64. ax2.plot(time*test_interval*np.arange(len(test_accu_)), test_accu_, 'r')  
  65.   
  66. ax1.set_xlabel('Iteration')  
  67. ax1.set_ylabel('Train Loss')  
  68. ax2.set_ylabel('Test Accuracy')  
  69. plt.show()  
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Created on Thu Nov  2 14:35:42 2017

@author: hans

http://blog.csdn.net/renhanchi
"""

import matplotlib.pyplot as plt
import numpy as np
import commands
import argparse

parser = argparse.ArgumentParser()
parser.add_argument(
    '-p','--log_path',
    type = str,
    default = '',
    help = """\
    path to log file\
    """
)

FLAGS = parser.parse_args()

train_log_file = FLAGS.log_path


display = 10 #solver
test_interval = 100 #solver

time = 5

train_output = commands.getoutput("cat " + train_log_file + " | grep 'Train net output #0' | awk '{print $11}'")  #train mbox_loss
accu_output = commands.getoutput("cat " + train_log_file + " | grep 'Test net output #0' | awk '{print $11}'") #test detection_eval

train_loss = train_output.split("\n")
test_accu = accu_output.split("\n")
  
def reduce_data(data):
  iteration = len(data)/time*time
  _data = data[0:iteration]
  if time > 1:
    data_ = []
    for i in np.arange(len(data)/time):
      sum_data = 0
      for j in np.arange(time):
        index = i*time + j
        sum_data += float(_data[index])
      data_.append(sum_data/float(time))
  else:
    data_ = data
  return data_

train_loss_ = reduce_data(train_loss)
test_accu_ = reduce_data(test_accu)

_,ax1 = plt.subplots()
ax2 = ax1.twinx()

ax1.plot(time*display*np.arange(len(train_loss_)), train_loss_)
ax2.plot(time*test_interval*np.arange(len(test_accu_)), test_accu_, 'r')

ax1.set_xlabel('Iteration')
ax1.set_ylabel('Train Loss')
ax2.set_ylabel('Test Accuracy')
plt.show()

八、测试模型效果(2017.11.03)

模型训练好要看最终效果如何。

原作者给了一个python工具,我觉得不好用。你们可以自己看看,名字是“ssd_pascal_webcam.py”

下面我介绍一下自己手动做检测的步骤:

先准备好三个文件,deploy.prototxt,labelmap_cup.prototxt,xxxxx.caffemodel

修改deploy.prototxt文件的第一层和最后一层:

[python] view plain copy print ?
  1. name: "VGG_VOC0712_SSD_300x300_test"  
  2. layer {  
  3.   name: "data"  
  4.   type: "VideoData"  
  5.   top: "data"  
  6.   transform_param {  
  7.     mean_value: 104.0  
  8.     mean_value: 117.0  
  9.     mean_value: 123.0  
  10.     resize_param {  
  11.       prob: 1.0  
  12.       resize_mode: WARP  
  13.       height: 300  
  14.       width: 300  
  15.       interp_mode: LINEAR  
  16.     }  
  17.   }  
  18.   data_param {  
  19.     batch_size: 1  
  20.   }  
  21.   video_data_param {  
  22.     video_type: WEBCAM  
  23.     device_id: 0 ####摄像头编号  
  24.     skip_frames: 0 ####是否跳帧  
  25.   }  
  26. }  
  27. layer {  
  28.   name: "conv1_1"  
  29.   type: "Convolution"  
  30.   bottom: "data"  
  31.   top: "conv1_1"  
  32. ...  
  33. ...  
  34. ...  
  35. ...  
  36. ...  
  37. ...  
  38. layer {  
  39.   name: "mbox_conf_flatten"  
  40.   type: "Flatten"  
  41.   bottom: "mbox_conf_softmax"  
  42.   top: "mbox_conf_flatten"  
  43.   flatten_param {  
  44.     axis: 1  
  45.   }  
  46. }  
  47. layer {  
  48.   name: "detection_out"  
  49.   type: "DetectionOutput"  
  50.   bottom: "mbox_loc"  
  51.   bottom: "mbox_conf_flatten"  
  52.   bottom: "mbox_priorbox"  
  53.   bottom: "data"  
  54.   top: "detection_out"  
  55.   include {  
  56.     phase: TEST  
  57.   }  
  58.   transform_param {  
  59.     mean_value: 104.0  
  60.     mean_value: 117.0  
  61.     mean_value: 123.0  
  62.     resize_param {  
  63.       prob: 1.0  
  64.       resize_mode: WARP  
  65.       height: 480  ####摄像头高宽,可以设置大点,会放大显示  
  66.       width: 640  
  67.       interp_mode: LINEAR  
  68.     }  
  69.   }  
  70.   detection_output_param {  
  71.     num_classes: 7  ####类别数 + 1  
  72.     share_location: true  
  73.     background_label_id: 0  
  74.     nms_param {  
  75.       nms_threshold: 0.449999988079  
  76.       top_k: 400  
  77.     }  
  78.     save_output_param {  
  79.       label_map_file: "labelmap_cup.prototxt"  #####改  
  80.     }  
  81.     code_type: CENTER_SIZE  
  82.     keep_top_k: 200  
  83.     confidence_threshold: 0.899999976158  
  84.     visualize: true  
  85.     visualize_threshold: 0.600000023842  ###只显示置信度高于这个值的结果  
  86.   }  
  87. }  
  88. layer {  
  89.   name: "slience"  
  90.   type: "Silence"  
  91.   bottom: "detection_out"  
  92.   include {  
  93.     phase: TEST  
  94.   }  
  95. }  
name: "VGG_VOC0712_SSD_300x300_test"
layer {
  name: "data"
  type: "VideoData"
  top: "data"
  transform_param {
    mean_value: 104.0
    mean_value: 117.0
    mean_value: 123.0
    resize_param {
      prob: 1.0
      resize_mode: WARP
      height: 300
      width: 300
      interp_mode: LINEAR
    }
  }
  data_param {
    batch_size: 1
  }
  video_data_param {
    video_type: WEBCAM
    device_id: 0 ####摄像头编号
    skip_frames: 0 ####是否跳帧
  }
}
layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "data"
  top: "conv1_1"
...
...
...
...
...
...
layer {
  name: "mbox_conf_flatten"
  type: "Flatten"
  bottom: "mbox_conf_softmax"
  top: "mbox_conf_flatten"
  flatten_param {
    axis: 1
  }
}
layer {
  name: "detection_out"
  type: "DetectionOutput"
  bottom: "mbox_loc"
  bottom: "mbox_conf_flatten"
  bottom: "mbox_priorbox"
  bottom: "data"
  top: "detection_out"
  include {
    phase: TEST
  }
  transform_param {
    mean_value: 104.0
    mean_value: 117.0
    mean_value: 123.0
    resize_param {
      prob: 1.0
      resize_mode: WARP
      height: 480  ####摄像头高宽,可以设置大点,会放大显示
      width: 640
      interp_mode: LINEAR
    }
  }
  detection_output_param {
    num_classes: 7  ####类别数 + 1
    share_location: true
    background_label_id: 0
    nms_param {
      nms_threshold: 0.449999988079
      top_k: 400
    }
    save_output_param {
      label_map_file: "labelmap_cup.prototxt"  #####改
    }
    code_type: CENTER_SIZE
    keep_top_k: 200
    confidence_threshold: 0.899999976158
    visualize: true
    visualize_threshold: 0.600000023842  ###只显示置信度高于这个值的结果
  }
}
layer {
  name: "slience"
  type: "Silence"
  bottom: "detection_out"
  include {
    phase: TEST
  }
}

下面是测试用的脚本内容:

[python] view plain copy print ?
  1. /home/hans/caffe-ssd/build/tools/caffe test \  
  2. --model="deploy.prototxt" \  
  3. --weights="xxxxx.caffemodel" \  
  4. --iterations="536870911" \  
  5. --gpu 0  
/home/hans/caffe-ssd/build/tools/caffe test \
--model="deploy.prototxt" \
--weights="xxxxx.caffemodel" \
--iterations="536870911" \
--gpu 0

iteration是int类型最大值。





标准杯子还是很稳定的,有时候会把柱状物检测出来。

现在这个模型还不是最终的,在我自己的验证集上detection_eval在0.72左右。

后记

这篇博客我也会持续更新。包括输出结果分析,可视化,更换网络模型等等。

这次用的是VGGnet,后面我还会用到mobileNet。

有一个问题就是均值计算,我还没测试用caffe自带的creat_mean.sh好用不好用。


----【2017.11.20 解决均值问题】--------------------------------------

自带make_mean.sh并不能求均值,发现有两个转lmdb工具,一个带annotation,一个不带。ssd用的带annotation的转换工具。

更具体内容请参考末尾:http://blog.csdn.net/renhanchi/article/details/78423343


----【2017.11.2更新】-----多GPU----------------------------------------

这个框架好像可以直接用多GPU运行的,没验证。

我服务器上已经安装了nccl,但是在make的时候告诉我都已经编译好了。

我没多管直接3个GPU上去试试,可行!不过报错centos kernel: BUG: soft lockup - CPU#3 stuck for 23s! [kworker/3:0:14900]

吓尿!我另一块GPU在跑数据。

后来用两块GPU跑,0次迭代正常,第一次迭代loss就nan了,改了几次参数无果。

还是乖乖的用一块卡跑吧~~

--------------------------------------------------------------------------------------------

  • 2
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值