caffe-MobileNet-ssd训练及测试并训练自己的NWPU -VHR-10数据集

zxmyoung

于 2020-08-02 23:18:12 发布

阅读量1.6k

点赞数

分类专栏： caffe ubuntu

本文链接：https://blog.csdn.net/zxmyoung/article/details/107752387

版权

ubuntu 同时被 2 个专栏收录

25 篇文章 0 订阅

订阅专栏

caffe

14 篇文章 0 订阅

订阅专栏

我的caffe-ssd已经配置好，这里也不再去讲。

我的caffe-ssd的路径为 /home/zhai/experiment/caffe-ssd-master/caffe

首先说下环境：ubuntu16.04，cuda9，cudnn7.5，python2.7.12，opencv3.4

参考：

https://blog.csdn.net/qq_30011277/article/details/87557742 （主）

https://blog.csdn.net/Chris_zhangrx/article/details/80458515

https://blog.csdn.net/qq_33431368/article/details/84977194

一、下载MobileNetSSD，测试demo

MobileNet-SSD 是依赖于我们以前配置的ssd 的。

1. 我们先下载源文件：

git clone https://github.com/chuanqi305/MobileNet-SSD

把下载的MobileNet-SSD 包放在caffe/examples/中

进入MobileNet-SSD，见如下各文件：

images　测试图片所存放位置
template 存放网络定义的公用模板train/test/deploy.protoxt，由gen.py脚本修改并生成，主要是因为label个数不一样所以这里的网络结构的前面几层和后面几层少许不同，这个需要我们后续训练自己数据集的时候利用gen_model.sh脚本生成。
voc　存有三个根据VOC数据集生成的网络文件和一个网络超参数train文件
demo.py 　实际检测脚本（图片存于images文件夹）只针对单张图片，做成视频就是一帧帧图片遍历
deploy.prototxt 运行网络定义文件，demo.py中调用.（与template/ＭobileNetSSD_deploy_template.prototxt相似）
gen.py 生成公用模板脚本（没有用到）
gen_model.sh 生成自定义网络脚本－－－生成template中类似的文件（训练自己的数据集时需要用到）
merge_bn.py 合并bn层脚本，用于生成最终的caffemodel（因为mobilenet有两个层最后需要合并才能得到deploy.caffemodel）
mobilenet_iter_73000.caffemodel　预训练模型
solver_test.prototxt 　网络测试超参数定义文件
solver_train.prototxt 　网络训练超参数定义文件
test.sh 网络测试脚本
train.sh 网络训练脚本
train.prototxt　训练网络定义文件　和template中的train定义网络文件相似
train_voc.sh　针对voc文件里的超参数文件和网络文件的训练脚本

2. 测试demo

下载训练好的model，测试一下。

deploy_model网址如下：https://drive.google.com/file/d/0B3gersZ2cHIxRm5PMWRoTkdHdHc/view （这里需要使用访问外网的工具）。

将下载好放的 MobileNetSSD_deploy.caffemodel 放到MobileNet-SSD文件夹中，打开demo.py可见：

文件中的deploy.prototxt网络和这个caffemodel应该不是一致的所以需要再下载一个
MobileNetSSD_deploy.prototxt，下载地址：Google Drive | 百度云

import numpy as np  
import sys,os  
import cv2
caffe_root = '/home/zhai/experiment/caffe-ssd-master/caffe/'  #修改自己对应的caffe路径
sys.path.insert(0, caffe_root + 'python')  
import caffe  


net_file= 'MobileNetSSD_deploy.prototxt'   #改为刚才下载的
caffe_model='MobileNetSSD_deploy.caffemodel'  
test_dir = "images"

然后执行demo.py文件：

cd caffe_ssd/caffe/examples/MobileNet-SSD
python demo.py

出现了问题

F0802 23:13:03.078469 1525 layer_factory.hpp:81] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type，python: DepthwiseConvolution (known types: AbsVal, Accuracy, AnnotatedData, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, ContrastiveLoss, Convolution, Crop, Data, Deconvolution, DetectionEvaluate, DetectionOutput, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, Input, LRN, LSTM, LSTMUnit, Log, MVN, MemoryData, MultiBoxLoss, MultinomialLogisticLoss, Normalize, PReLU, Parameter, Permute, Pooling, Power, PriorBox, Python, RNN, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, VideoData, WindowData)
*** Check failure stack trace: ***
已放弃 (核心已转储)

问题描述

产生原因：

没有在Makefile.conf文件中开启对python的支持。

解决方法：
在caffe文件夹下找到makefile.config文件，找到

#WITH_PYTHON_LAYER=1

去掉注释，然后在命令行中进入caffe路径下，依次

make clean
make all
make pycaffe

问题解决！

继续执行 python demo.py

显示结果

参数文件和网络文件的详细说明

① solver_train.prototxt（和solver_test.prototxt类似）

train_net: “example/MobileNetSSD_train.prototxt” 　＃训练的网络由gen_model.sh脚本生成
test_net: “example/MobileNetSSD_test.prototxt”　　＃测试网络由gen_model.sh脚本生成
test_iter: 673　　　　＃等于测试集图片数量/batchsize　　　　
test_interval: 10000
base_lr: 0.0005　　　＃　基本学习率
display: 10　　　　　＃　10步显示一次相当于10步print一次
max_iter: 120000　　＃　总共的迭代步数
lr_policy: “multistep”　＃　下降的学习率的下降方式
gamma: 0.5　　　　　＃ weight_decay: 0.00005　　
snapshot: 1000　　　　＃每次迭代1000步之后产生一个当前的caffemodel和状态文件，存入于snapshot文件夹中
snapshot_prefix: “snapshot/mobilenet” solver_mode: GPU　　　＃GPU训练方式
debug_info: false　　　
snapshot_after_train: true　　＃训练的时候是否存入中间模型，如果为false，则snapshot没有用处了
test_initialization: false　　
average_loss: 10　　
stepvalue: 20000　　　　＃呼应于ｌｒ的下降方式而设定的，迭代多少步设定再下降
stepvalue: 40000　　　　＃呼应于ｌｒ的下降方式而设定的，再迭代多少步设定再下降
iter_size: 1 type: “RMSProp”　　　　＃优化算法
eval_type: “detection”　　＃评估方式为目标检测
ap_version: “11point”

②MobileNetSSD_train_template.prototxt 网络定义文件（test和deploy类似）
截取一段　进行说明　其他以此列推

name: "MobileNet-SSD" 
＃训练的网络输入层 
layer {
 name: "data" 
 type: "AnnotatedData"　＃输入数据类型 
 top: "data"
 top: "label" 
 include { 
  phase: TRAIN　　＃训练层
 }
 ＃相当于数据预处理层 
transform_param { 
 ＃以下0.007834和127.5为图片归一化处理，这个很关键（后面移植和显示等操作都需要和这个对应） 
 scale: 0.007843 
 mirror: true 
 mean_value: 127.5 
 mean_value: 127.5 
 mean_value: 127.5 
  ＃图片resize操作　300*300　（这个直接影响速度和精度，一般分辨率越小速度越快，但是精度也随之下降） 
  resize_param { 
   prob: 1.0 
   resize_mode: WARP 
   height: 300 
   width: 300 
   interp_mode: LINEAR 
   interp_mode: AREA 
   interp_mode: NEAREST 
   interp_mode: CUBIC 
   interp_mode: LANCZOS4 
   } 
  emit_constraint { 
 	emit_type: CENTER 
   } 
  distort_param {
   brightness_prob: 0.5  
   brightness_delta: 32.0 
   contrast_prob: 0.5 
   contrast_lower: 0.5 
   contrast_upper: 1.5 
   hue_prob: 0.5 
   hue_delta: 18.0 
   saturation_prob: 0.5 
   saturation_lower: 0.5 
   saturation_upper: 1.5 
   random_order_prob: 0.0 
   } 
  expand_param { 
   prob: 0.5 
   max_expand_ratio: 4.0 
   } 
  } 
  ＃输入数据来源和格式lmdb格式 
  data_param { 
   source: "trainval_lmdb/" 
   batch_size: 24 
   backend: LMDB 
  } 
  annotated_data_param { 
   batch_sampler { 
    max_sample: 1 
    max_trials: 1 
    } 
   batch_sampler { 
   sampler { 
    min_scale: 0.3 
    max_scale: 1.0 
    min_aspect_ratio: 0.5 
    max_aspect_ratio: 2.0 
   } 
   sample_constraint { 
    min_jaccard_overlap: 0.1 
   } 
   max_sample: 1 
   max_trials: 50 
  } 
  batch_sampler { 
   sampler {
    min_scale: 0.3 
    max_scale: 1.0 
    min_aspect_ratio: 0.5 
    max_aspect_ratio: 2.0 
   } 
   sample_constraint { 
    min_jaccard_overlap: 0.3 
   } 
   max_sample: 1 
   max_trials: 50 
  } 
  batch_sampler { 
   sampler { 
    min_scale: 0.3 
    max_scale: 1.0 
    min_aspect_ratio: 0.5 
    max_aspect_ratio: 2.0 
   } 
   sample_constraint { 
    min_jaccard_overlap: 0.5 
   } 
   max_sample: 1 
   max_trials: 50 
  } 
  batch_sampler { 
   sampler { 
   min_scale: 0.3 
   max_scale: 1.0 
   min_aspect_ratio: 0.5 
   max_aspect_ratio: 2.0 
  } 
  sample_constraint { 
   min_jaccard_overlap: 0.9 
  } 
  max_sample: 1 
  max_trials: 50 
 }
  batch_sampler { 
   sampler { 
    min_scale: 0.3 
    max_scale: 1.0 
    min_aspect_ratio: 0.5 
    max_aspect_ratio: 2.0 
   } 
   sample_constraint { 
   max_jaccard_overlap: 1.0 
   } 
   max_sample: 1 
   max_trials: 50 
  } 
  label_map_file: "labelmap.prototxt" 
 } 
}
＃＃这才刚刚开始Mobilenet网络第一层 
layer { 
 name: "conv0" 
 type: "Convolution"　＃　卷积层 
 bottom: "data" 
 top: "conv0" 
 param {
  lr_mult: 0.1　　＃　学习率 
  decay_mult: 0.1 
 } 
 convolution_param { 
  num_output: 32　＃卷积核的个数 
  bias_term: false 
  pad: 1　＃卷积核是否补全 
  kernel_size: 3　＃卷积核的大小 
  stride: 2　　＃卷积核的步数 
  weight_filler { 
   type: "msra"　　　＃卷积核权值初始化方法 
  }
 }  
} 
 ＃ｂｎ层 
 layer { 
  name: "conv0/bn" 
  type: "BatchNorm" 
  bottom: "conv0" 
  top: "conv0" 
  param { 
   lr_mult: 0 
   decay_mult: 0 
  } 
  param { 
   lr_mult: 0 
   decay_mult: 0 
  } 
  param { 
   lr_mult: 0 
   decay_mult: 0 
  } 
 } 
  ＃＃scale层 
 layer { 
  name: "conv0/scale" 
  type: "Scale" 
  bottom: "conv0" 
  top: "conv0" 
  param { 
   lr_mult: 0.1 
   decay_mult: 0.0 
  } 
  param { 
   lr_mult: 0.2 
   decay_mult: 0.0 
  } 
  scale_param { 
   filler { 
     value: 1 
   }
   bias_term: true 
   bias_filler {
    value: 0 
   } 
  } 
 } 
  ＃激活函数层，一般是卷积层之后加一个Relu激活函数层 
 layer { 
 name: "conv0/relu" 
 type: "ReLU" 
 bottom: "conv0" 
 top: "conv0" 
 }

③train.sh文件 / test.sh文件
train

#!/bin/sh
#判断网络结构文件是否存在　这里需要修改成　此时　数据集对应的网络文件(gen_model生成)
if ! test -f example/MobileNetSSD_train.prototxt ;then
	echo "error: example/MobileNetSSD_train.prototxt does not exist."
	echo "please use the gen_model.sh to generate your own model."
        exit 1
fi
mkdir -p snapshot
../../build/tools/caffe train -solver="solver_train.prototxt" \　＃＃训练超参数用的时候这里可能需要更改
-weights="mobilenet_iter_73000.caffemodel" \ #＃预训练模型可能需要更改
-gpu 0

test

#!/bin/sh
#latest=snapshot/mobilenet_iter_73000.caffemodel
＃＃定义latest为snapshot（存放模型的文件）中的最后生成的一个即训练完merge_bn的deploy.caffemodel
latest=$(ls -t snapshot/*.caffemodel | head -n 1)
if test -z $latest; then
        exit 1
fi
../../build/tools/caffe train -solver="solver_test.prototxt" \
--weights=$latest \ ##用的时候直接改成你要test的caffemodel也可以
-gpu 0

④demo.py文件　这个文件以后需要按照自己的要求更改（例如修改成视频的）
源文件的大致说明：

＃＃导入包 
import numpy as np 
import sys,os 
import cv2 
＃＃这里需要修改caffe的根目录 
caffe_root = '/home/che/caffe/' 
sys.path.insert(0, caffe_root + 'python') 
import caffe ＃网络文件　模型名称　测试图片文件夹　需要修改 
net_file= 'MobileNetSSD_deploy.prototxt' 
caffe_model='MobileNetSSD_deploy.caffemodel' 
test_dir = "images" 
＃＃判断是否存在模型和网络文件 
if not os.path.exists(caffe_model): 
	print(caffe_model + " does not exist") 
	exit() 
if not os.path.exists(net_file): 
	print(net_file + " does not exist") 
	exit() 
＃＃生成网络 
net = caffe.Net(net_file,caffe_model,caffe.TEST) 
＃＃类别定义 
CLASSES = ('background', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor') 
＃＃图片预处理，即归一化，resize 的300以及减去的127.5以及乘上的0.007834都和上面网络文件相对应一致 
def preprocess(src): 
	img = cv2.resize(src, (300,300)) 
	img = img - 127.5 
	img = img * 0.007843 
	return img 
＃＃网络输出的整理 
def postprocess(img, out): 
	h = img.shape[0] 
	w = img.shape[1] 
	box = out['detection_out'][0,0,:,3:7] * np.array([w, h, w, h]) 
	cls = out['detection_out'][0,0,:,1] 
	conf = out['detection_out'][0,0,:,2] 
	return (box.astype(np.int32), conf, cls) 
＃＃主函数　目标检测 
def detect(imgfile): 
	origimg = cv2.imread(imgfile)　　 
	img = preprocess(origimg) 
	img = img.astype(np.float32) 
	img = img.transpose((2, 0, 1)) 
	net.blobs['data'].data[...] = img 
	out = net.forward() ＃＃　前向推理 
	box, conf, cls = postprocess(origimg, out)＃＃产生ｂｏｘ为边框的值，ｃｏｎｆ为概率　ｃｌｓ为类别 	
	＃＃进行逐一画图标注产生最后的显示结果 
	for i in range(len(box)): 
		p1 = (box[i][0], box[i][1]) 
		p2 = (box[i][2], box[i][3]) 
		cv2.rectangle(origimg, p1, p2, (0,255,0))　　＃＃画框 
		p3 = (max(p1[0], 15), max(p1[1], 15)) 
		title = "%s:%.2f" % (CLASSES[int(cls[i])], conf[i]) 
		cv2.putText(origimg, title, p3, cv2.FONT_ITALIC, 0.6, (0, 255, 0), 1)　　＃＃画标注 
	cv2.imshow("SSD", origimg) 
	k = cv2.waitKey(0) & 0xff 
	#Exit if ESC pressed 
	if k == 27 : 
		return False 
	return True 
for f in os.listdir(test_dir): 
			if detect(test_dir + "/" + f) == False: 
				break

3 利用自己的数据集训练自己的ＭobileNetSSD model

首选在caffe/data中新建一个MyDataSet文件夹，将数据集都放到data中，这样有利于统一管理。
我的NWPU VHR-10数据集是做项目时候已经做好了，这边就不公布了这里主要是讲一下流程：

制作数据集

具体做法请考如下地址处的博文：自己制作图像VOC数据集–用于Objection Detection（目标检测）)。
此时此刻你的MyDataSet中应该有以下两个文件，因为多生成的labels这边目标检测用不到所以没有拷贝进来，文件如下所示：

Annotations 　利用标注软件　生成对应的xml文件
JPEGImages 原始图片

生成索引txt文件

利用以下代码，生成ImageSet文件夹，此文件夹目录下包含Main文件下，在ImageSets\Main里有四个txt文件：test.txt train.txt trainval.txt val.txt；分别是测试数据集索引（也就是各个测试图片的名称，相对路径）、训练数据集、训练验证数据集、验证数据集
创建CreateImageSets.py文件，代码如下，这里注释简单说明

#!/usr/bin/python
# -*- coding: UTF-8 -*- 



import os
import random

trainval_percent = 0.9  # 可以自己设置
train_percent = 0.8  # 可以自己设置

#xmlfilepath = f"/Users/Administrator/Desktop/ship_detection_online/Annotations_new"  # 地址填自己的
#txtsavepath = f"/Users/Administrator/Desktop/ship_detection_online/ImageSets/Main"
xmlfilepath = 'Annotations'
txtsavepath = 'ImageSets/Main'
total_xml = os.listdir(xmlfilepath)
num = len(total_xml)
list = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
trainval = random.sample(list, tv)
train = random.sample(trainval, tr)

ftrainval = open(txtsavepath + '/trainval.txt', 'w')
ftest = open(txtsavepath + '/test.txt', 'w')
ftrain = open(txtsavepath + '/train.txt', 'w')
fval = open(txtsavepath + '/val.txt', 'w')

for i in list:
    name = total_xml[i][:-4] + '\n'
    if i in trainval:
        ftrainval.write(name)
        if i in train:
            ftrain.write(name)
        else:
            fval.write(name)
    else:
        ftest.write(name)

ftrainval.close()
ftrain.close()
fval.close()
ftest.close()
print('Well finshed')

执行 python CreateImageSets.py得到以下，

生成lmdb格式文件(caffe输入格式)

首先先把从caffe/data/VOC0712/ 以下几个文件拷贝到data/MyDataSet中：

cd  caffe/data
cp VOC0712/create_list.sh MyDataSet/
cp VOC0712/create_data.sh MyDataSet/
cp VOC0712/labelmap_voc.prototxt MyDataSet/

此时数据集的文件情况为：

更改复制过来的这三个文件。create_list.sh更改形式：

#!/bin/bash 

root_dir=$HOME/caffe/data　## 更改你的路径 
sub_dir=ImageSets/Main 
bash_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" 
for dataset in trainval test 
do 
	dst_file=$bash_dir/$dataset.txt 
	if [ -f $dst_file ] 
	then 
		rm -f $dst_file 
	fi 
	for name in MyDataSet ##更改成你的dataset的名称 
	do 
		# if [[ $dataset == "test" && $name == "VOC2012" ]]　　 
		#then 
			# continue 
		#fi 
		echo "Create list for $name $dataset..." 
..............(这里不用改，省略) 
done

更改之后为：

create_data.sh更改形式为：

cur_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd ) 
root_dir=$cur_dir/../.. 
cd $root_dir redo=1 
data_root_dir="$HOME/caffe/data" ## 更改你的路径 
dataset_name="MyDataSet" ##更改你的dataset的名称 
mapfile="$root_dir/data/$dataset_name/labelmap_voc.prototxt" 
...............(这里不用改，省略) 
done

改之后：

如果训练图像不是.jpeg或.jpg格式，还需要对上述两个文件中出现的指定的图像后缀名做一下修改，需要修改的地方不多。

labelmap_voc.prototxt需要依据自己的label来修改，举个例子如下：

item {
  name: "none_of_the_above"
  label: 0
  display_name: "background"
}
item {
  name: "airplane"
  label: 1
  display_name: "airplane"
}
item {
  name: "ship"
  label: 2
  display_name: "ship"
}
item {
  name: "storagetank"
  label: 3
  display_name: "storagetank"
}
item {
  name: "baseballdiamond"
  label: 4
  display_name: "baseballdiamond"
}
item {
  name: "tenniscourt"
  label: 5
  display_name: "tenniscourt"
}
item {
  name: "basketballcourt"
  label: 6
  display_name: "basketballcourt"
}
item {
  name: "groundtrackfield"
  label: 7
  display_name: "groundtrackfield"
}
item {
  name: "habor"
  label: 8
  display_name: "habor"
}
item {
  name: "bridge"
  label: 9
  display_name: "bridge"
}
item {
  name: "vehicle"
  label: 10
  display_name: "vehicle"
}

依次执行（执行之前，最好删掉＃#注释）：

cd /home/zhai/experiment/caffe-ssd-master/caffe/data/MyDataSet
sh create_list.sh
sh create_data.sh

执行以上命令错误

zhai@zhai-Lenovo-Legion-Y7000:~/experiment/caffe-ssd-master/caffe$ sh ./data/MyDataSet/create_list.sh
./data/MyDataSet/create_list.sh: 5: ./data/MyDataSet/create_list.sh: Bad substitution
Create list for MyDataSet trainval...
./data/MyDataSet/create_list.sh: 39: [: trainval: unexpected operator
./data/MyDataSet/create_list.sh: 45: [: trainval: unexpected operator
Create list for MyDataSet test...
./data/MyDataSet/create_list.sh: 39: [: test: unexpected operator
./data/MyDataSet/create_list.sh: 45: [: test: unexpected operator

数据转换，执行命令直接 sudo ./data/VOC0712/create_list.sh 或 sudo bash ./data/VOC0712/create_list.sh

而不是 sudo sh ./data/VOC0712/create_list.sh 否则会抱如上错误。

参考：https://blog.csdn.net/u011489887/article/details/91354461

cd /home/zhai/experiment/caffe-ssd-master/caffe/data/MyDataSet
sudo bash create_list.sh
sudo sh create_data.sh

然后：

------------------------------------------------------------------------------------------------------------------------------------------------------------------------

出现问题：

hai@zhai-Lenovo-Legion-Y7000:~/experiment/caffe-ssd-master/caffe$ sudo ./data/MyDataSet/create_data.sh 
Traceback (most recent call last):
  File "/home/zhai/experiment/caffe-ssd-master/caffe/data/MyDataSet/../../scripts/create_annoset.py", line 12, in <module>
    from caffe.proto import caffe_pb2
  File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/__init__.py", line 1, in <module>
    from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
  File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/pycaffe.py", line 11, in <module>
    import numpy as np
ImportError: No module named 'numpy'
Traceback (most recent call last):
  File "/home/zhai/experiment/caffe-ssd-master/caffe/data/MyDataSet/../../scripts/create_annoset.py", line 12, in <module>
    from caffe.proto import caffe_pb2
  File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/__init__.py", line 1, in <module>
    from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
  File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/pycaffe.py", line 11, in <module>
    import numpy as np
ImportError: No module named 'numpy'

可以看到是numpy的问题。所以下载更新numpy

pip install -U numpy

接下来，新问题

zhai@zhai-Lenovo-Legion-Y7000:~/experiment/caffe-ssd-master/caffe$ ./data/VOC0712/create_data.sh
Traceback (most recent call last):
File "/home/zhai/experiment/caffe-ssd-master/caffe/data/VOC0712/../../scripts/create_annoset.py", line 12, in <module>
    from caffe.proto import caffe_pb2
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/__init__.py", line 1, in <module>
    from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/pycaffe.py", line 13, in <module>
    from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
ImportError: /home/zhai/anaconda3/bin/../lib/./libharfbuzz.so.0: undefined symbol: FT_Done_MM_Var
Traceback (most recent call last):
File "/home/zhai/experiment/caffe-ssd-master/caffe/data/VOC0712/../../scripts/create_annoset.py", line 12, in <module>
    from caffe.proto import caffe_pb2
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/__init__.py", line 1, in <module>
    from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/pycaffe.py", line 13, in <module>
    from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
ImportError: /home/zhai/anaconda3/bin/../lib/./libharfbuzz.so.0: undefined symbol: FT_Done_MM_Var

这个地方不知道为啥会调用anaconda3的库文件，Anaconda的 lib 中，把 Python需要的各种 lib 单独列了出来，造成和系统中的库版本不一致。用系统自带的来代替anaconda3中的，参考：https://zhuanlan.zhihu.com/p/163639019

/home/zhai/anaconda3/lib/libharfbuzz.so cp /usr/lib/x86_64-linux-gnu/libharfbuzz.so

其实就是把系统库里面的/usr/lib/x86_64-linux-gnu/libharfbuzz.so 替换掉/home/zhai/anaconda3/lib/libharfbuzz.so

接下来，新问题，这次是libfontconfig.so.1

然后，第二波代替

/home/zhai/anaconda3/lib/libfontconfig.so.1 cp /usr/lib/x86_64-linux-gnu/libfontconfig.so.1

新问题

zhai@zhai-Lenovo-Legion-Y7000:~/experiment/caffe-ssd-master/caffe$ ./data/VOC0712/create_data.shTraceback (most recent call last):
File "/home/zhai/experiment/caffe-ssd-master/caffe/data/VOC0712/../../scripts/create_annoset.py", line 12, in <module>
    from caffe.proto import caffe_pb2
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/__init__.py", line 1, in <module>
    from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/pycaffe.py", line 13, in <module>
    from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
ImportError: /home/zhai/anaconda3/bin/../lib/libpangoft2-1.0.so.0: undefined symbol: hb_font_set_variations
Traceback (most recent call last):
File "/home/zhai/experiment/caffe-ssd-master/caffe/data/VOC0712/../../scripts/create_annoset.py", line 12, in <module>
    from caffe.proto import caffe_pb2
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/__init__.py", line 1, in <module>
    from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/pycaffe.py", line 13, in <module>
    from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
ImportError: /home/zhai/anaconda3/bin/../lib/libpangoft2-1.0.so.0: undefined symbol: hb_font_set_variations

第三波代替，先 locate libpangoft2-1.0.so.0，查看位置

然后系统自带的代替anaconda3的

这次用刚才办法不行，那就换种方法，参考：https://blog.csdn.net/qq_45569859/article/details/103341971

$ cd home/zhai/anaconda3/lib/
$ rm libpangoft2-1.0.so.0
$ cp /usr/lib/x86_64-linux-gnu/libpangoft2-1.0.so.0 libpangoft2-1.0.so.0

在再回去执行命令，./data/VOC0712/create_data.sh

zhai@zhai-Lenovo-Legion-Y7000:~/experiment/caffe-ssd-master/caffe$ ./data/VOC0712/create_data.sh
Traceback (most recent call last):
File "/home/zhai/experiment/caffe-ssd-master/caffe/data/VOC0712/../../scripts/create_annoset.py", line 12, in <module>
    from caffe.proto import caffe_pb2
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/__init__.py", line 1, in <module>
    from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/pycaffe.py", line 13, in <module>
    from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
ImportError: dynamic module does not define module export function (PyInit__caffe)
Traceback (most recent call last):
File "/home/zhai/experiment/caffe-ssd-master/caffe/data/VOC0712/../../scripts/create_annoset.py", line 12, in <module>
    from caffe.proto import caffe_pb2
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/__init__.py", line 1, in <module>
    from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/pycaffe.py", line 13, in <module>
    from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
ImportError: dynamic module does not define module export function (PyInit__caffe)

～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～～

从上面的虚线到这的问题，应该都是因为系统安装Anaconda后，如果使用了其自动改变环境变量，那么默认的python ,pip都将变为Anaconda下的，如下图，which python都是anaconda下的，但是我们用的都是系统哦自带的，所以现在出现了现在的问题。

这时如果想切换默认python为原始独立python,
1. sudo gedit ~/.bashrc
2. 添加 export PATH="/usr/bin/:$PATH"
3.source ~/.bashrc

下面才是默认的

好，接下来在caffe根目录下继续执行， ./data/MyDataSet/create_list.sh 注意：不要有sudo

在执行 ./data/MyDataSet/create_data.sh

即可生成

两个文件里都为lmdb文件

其次，发现在examples中有个与MobileNetSSD平级的目录MyDataSet里面为lmdb文件夹的超链接文件，后续训练使用。

利用MobileNetSSD进行训练

由于VOC数据集是21类（加上背景），而这里只有11类（加上背景），因此，我们需要重新生成训练、测试和运行网络文件，这里就要用到gen_model.sh脚本，它会调用template文件夹中的模板，按照我们指定的参数，生成所需的训练网络模型。这个脚本的用法usage: CLASSNUM 对应label的个数 + backgroud =11

./gen_model.sh CLASSNUM

①首先在MobileNetSSD文件中建立自己的labelmap.prototxt（内容和上述labelmap_voc.prototxt一样）
②建立自己对应label个数的train/test/deploy网络文件

./gen_model.sh 11　 ＃数字11对应label的个数 + backgroud =11

小问题：执行以上命令

bash: ./gen_model.sh: 权限不够

重新执行

bash ./gen_model.sh 11

执行之后，得到examples文件夹，里面的3个prototxt就是从模板生成的正式网络定义，根据作者设置，其中的deploy文件是已经合并过bn层的，需要后面配套使用。

③建立数据集的超链接

ln -s PATH_TO_YOUR_TRAIN_LMDB trainval_lmdb
ln -s PATH_TO_YOUR_TEST_LMDB test_lmdb

以我的路径操作，在MobileNetSSD中执行上述两句的命令为：

ln -s /home/zhai/experiment/caffe-ssd-master/caffe/data/MyDataSet/lmdb/MyDataSet_trainval_lmdb trainval_lmdb 
ln -s /home/zhai/experiment/caffe-ssd-master/caffe/data/MyDataSet/lmdb/MyDataSet_test_lmdb test_lmdb

在这里插入图片描述
则在MobileNetSSD下出现两个超链接文件：

这一步也可以将上面在example中生成的MyDataSet文件里面的两个超链接全部复制到MobileNetSSD中去，并且将名字改成如上图所示的名称。

④修改超参数、指定预训练模型，开始model训练
按照自身要求修改solver_test.prototxt和solver_train.prototxt中的超参数；（新手可以不动）

其中test_iter=测试集图片数量/batchsize；初始学习率不宜太高，否则基础权重破坏比较严重；优化算法是RMSProp，可能对收敛有好处，不要改成SGD，也是为了保护权重。

然后，修改预训练模型为：

开始训练了：

在/home/zhai/experiment/caffe-ssd-master/caffe/examples/MobileNet-SSD下修改并运行train.sh脚本，中途可以不断调节参数。训练结束后，运行test.sh脚本，测试网络的精度值。

sh train.sh

⑤接下来，可能出现以下错误：

F0805 00:28:38.227371 27458 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory

原因：GPU内存不够。
解决方案：如下，在caffe/examples/MobileNet-SSD/example中的MobileNetSSD_train.prototxt中减小batchsize即可。我的电脑太次，只能选择8。

重新run脚本

sh train.sh

训练中途也可以不断调节参数，能看来随着迭代次数的增多loss正在减小，经过一段时间的训练，最后几万步之后loss差不多1.0上下浮动。

成功

参考：https://blog.csdn.net/qq_30011277/article/details/87557742

https://blog.csdn.net/xiao__run/article/details/80643346

https://blog.csdn.net/c20081052/article/details/81747719

https://blog.csdn.net/qq_33431368/article/details/84977194

合并成最终的model，以及如何测试

训练开始后，文件中多了一个snapshot文件夹。

可以看出我们是按照每一千步生成一个caffemodel文件和一个实时训练状态文件，这个就是solver.prototxt文件中可以进行设定。

①合并出最终的caffemodel
因为MobileNet中有bn和scale层，最后生成deploy需要进行一步操作，为了提高模型运行速度，作者在这里将bn层合并到了卷积层中，相当于bn的计算时间就被节省了，对检测速度可能有小幅度的帮助，打开merge_bn.py文件，然后注意修改其中的文件路径
merge_bn.py的内容如下。

import os
import sys
import argparse
import logging

import numpy as np
try:
    caffe_root = '/home/zhai/experiment/caffe-ssd-master/caffe' ＃＃此处改成你的路径即可
    sys.path.insert(0, caffe_root + 'python')
    import caffe
except ImportError:
    logging.fatal("Cannot find caffe!")
from caffe.proto import caffe_pb2
from google.protobuf import text_format

def make_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument('--model', type=str, required=True, help='.prototxt file for inference')
    parser.add_argument('--weights', type=str, required=True, help='.caffemodel file for inference')
    return parser

bn_maps = {}
def find_top_after_bn(layers, name, top):
    bn_maps[name] = {} 
    for l in layers:
        if len(l.bottom) == 0:
            continue
        if l.bottom[0] == top and l.type == "BatchNorm":
            bn_maps[name]["bn"] = l.name
            top = l.top[0]
        if l.bottom[0] == top and l.type == "Scale":
            bn_maps[name]["scale"] = l.name
            top = l.top[0]
    return top

def pre_process(expected_proto, new_proto):
    net_specs = caffe_pb2.NetParameter()
    net_specs2 = caffe_pb2.NetParameter()
    with open(expected_proto, "r") as fp:
        text_format.Merge(str(fp.read()), net_specs)

    net_specs2.MergeFrom(net_specs)
    layers = net_specs.layer
    num_layers = len(layers)

    for i in range(num_layers - 1, -1, -1):
         del net_specs2.layer[i]

    for idx in range(num_layers):
        l = layers[idx]
        if l.type == "BatchNorm" or l.type == "Scale":
            continue
        elif l.type == "Convolution" or l.type == "Deconvolution":
            top = find_top_after_bn(layers, l.name, l.top[0])
            bn_maps[l.name]["type"] = l.type
            layer = net_specs2.layer.add()
            layer.MergeFrom(l)
            layer.top[0] = top
            layer.convolution_param.bias_term = True
        else:
            layer = net_specs2.layer.add()
            layer.MergeFrom(l)

    with open(new_proto, "w") as fp:
        fp.write("{}".format(net_specs2))

def load_weights(net, nobn):
    if sys.version_info > (3,0):
        listKeys = nobn.params.keys()
    else:
        listKeys = nobn.params.iterkeys()
    for key in listKeys:
        if type(nobn.params[key]) is caffe._caffe.BlobVec:
            conv = net.params[key]
            if key not in bn_maps or "bn" not in bn_maps[key]:
                for i, w in enumerate(conv):
                    nobn.params[key][i].data[...] = w.data
            else:
                print(key)
                bn = net.params[bn_maps[key]["bn"]]
                scale = net.params[bn_maps[key]["scale"]]
                wt = conv[0].data
                channels = 0
                if bn_maps[key]["type"] == "Convolution": 
                    channels = wt.shape[0]
                elif bn_maps[key]["type"] == "Deconvolution": 
                    channels = wt.shape[1]
                else:
                    print("error type " + bn_maps[key]["type"])
                    exit(-1)
                bias = np.zeros(channels)
                if len(conv) > 1:
                    bias = conv[1].data
                mean = bn[0].data
                var = bn[1].data
                scalef = bn[2].data

                scales = scale[0].data
                shift = scale[1].data

                if scalef != 0:
                    scalef = 1. / scalef
                mean = mean * scalef
                var = var * scalef
                rstd = 1. / np.sqrt(var + 1e-5)
                if bn_maps[key]["type"] == "Convolution": 
                    rstd1 = rstd.reshape((channels,1,1,1))
                    scales1 = scales.reshape((channels,1,1,1))
                    wt = wt * rstd1 * scales1
                else:
                    rstd1 = rstd.reshape((1, channels,1,1))
                    scales1 = scales.reshape((1, channels,1,1))
                    wt = wt * rstd1 * scales1
                bias = (bias - mean) * rstd * scales + shift
                
                nobn.params[key][0].data[...] = wt
                nobn.params[key][1].data[...] = bias

if __name__ == '__main__':
    parser1 = make_parser()
    args = parser1.parse_args()
    pre_process(args.model, "no_bn.prototxt")

    net = caffe.Net(args.model, args.weights, caffe.TEST)  
    net2 = caffe.Net("no_bn.prototxt", caffe.TEST)

    load_weights(net, net2)
    net2.save("no_bn.caffemodel")

然后运行merge_bn.py：

##这里使用的是迭代训练120000次得到的模型来进行bn层的合并，以获得最终的模型。
python merge_bn.py --model ./example/MobileNetSSD_deploy.prototxt --weights ./snapshot/mobilenet_iter_120000.caffemodel

此时会发现，MobileNet-SSD中多出了一个no_bn.prototxt文件和一个no_bn.caffemodel文件，这就是我们想要获得模型文件和参数文件了。

2. 测试训练结果，对于caffemodel进行test

在MobileNet-SSD/ 下新建文件夾testimages，放入测试的图像。

对demo.py中的路径和文件名进行修改执行demo.py即可
我先复制demo.py为My_demo.py.然后修改其其中的路径。

执行 pyhthon My_demo.py

展示结果

3. 测试精度
也可以利用test.sh进行测试以下总体的acc
这里可能需要修改下MobileNet-SSD/solver_test.prototxt文件中的对应路径
例如这个地方solver_train.prototxt文件中的路径默认为example/... 而solver_test.prototxt 没有修改，需要自行修改为以下形式

train_net: "example/MobileNetSSD_train.prototxt"　
test_net: "example/MobileNetSSD_test.prototxt"

在/home/zhai/experiment/caffe-ssd-master/caffe/examples/MobileNet-SSD下，执行脚本：

sh test.sh

合并成最终的model，以及如何测试

训练开始后，文件中多了一个snapshot文件夹。

可以看出我们是按照每一千步生成一个caffemodel文件和一个实时训练状态文件，这个就是solver.prototxt文件中可以进行设定。

import os
import sys
import argparse
import logging

import numpy as np
try:
    caffe_root = '/home/zhai/experiment/caffe-ssd-master/caffe' ＃＃此处改成你的路径即可
    sys.path.insert(0, caffe_root + 'python')
    import caffe
except ImportError:
    logging.fatal("Cannot find caffe!")
from caffe.proto import caffe_pb2
from google.protobuf import text_format

def make_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument('--model', type=str, required=True, help='.prototxt file for inference')
    parser.add_argument('--weights', type=str, required=True, help='.caffemodel file for inference')
    return parser

bn_maps = {}
def find_top_after_bn(layers, name, top):
    bn_maps[name] = {} 
    for l in layers:
        if len(l.bottom) == 0:
            continue
        if l.bottom[0] == top and l.type == "BatchNorm":
            bn_maps[name]["bn"] = l.name
            top = l.top[0]
        if l.bottom[0] == top and l.type == "Scale":
            bn_maps[name]["scale"] = l.name
            top = l.top[0]
    return top

def pre_process(expected_proto, new_proto):
    net_specs = caffe_pb2.NetParameter()
    net_specs2 = caffe_pb2.NetParameter()
    with open(expected_proto, "r") as fp:
        text_format.Merge(str(fp.read()), net_specs)

    net_specs2.MergeFrom(net_specs)
    layers = net_specs.layer
    num_layers = len(layers)

    for i in range(num_layers - 1, -1, -1):
         del net_specs2.layer[i]

    for idx in range(num_layers):
        l = layers[idx]
        if l.type == "BatchNorm" or l.type == "Scale":
            continue
        elif l.type == "Convolution" or l.type == "Deconvolution":
            top = find_top_after_bn(layers, l.name, l.top[0])
            bn_maps[l.name]["type"] = l.type
            layer = net_specs2.layer.add()
            layer.MergeFrom(l)
            layer.top[0] = top
            layer.convolution_param.bias_term = True
        else:
            layer = net_specs2.layer.add()
            layer.MergeFrom(l)

    with open(new_proto, "w") as fp:
        fp.write("{}".format(net_specs2))

def load_weights(net, nobn):
    if sys.version_info > (3,0):
        listKeys = nobn.params.keys()
    else:
        listKeys = nobn.params.iterkeys()
    for key in listKeys:
        if type(nobn.params[key]) is caffe._caffe.BlobVec:
            conv = net.params[key]
            if key not in bn_maps or "bn" not in bn_maps[key]:
                for i, w in enumerate(conv):
                    nobn.params[key][i].data[...] = w.data
            else:
                print(key)
                bn = net.params[bn_maps[key]["bn"]]
                scale = net.params[bn_maps[key]["scale"]]
                wt = conv[0].data
                channels = 0
                if bn_maps[key]["type"] == "Convolution": 
                    channels = wt.shape[0]
                elif bn_maps[key]["type"] == "Deconvolution": 
                    channels = wt.shape[1]
                else:
                    print("error type " + bn_maps[key]["type"])
                    exit(-1)
                bias = np.zeros(channels)
                if len(conv) > 1:
                    bias = conv[1].data
                mean = bn[0].data
                var = bn[1].data
                scalef = bn[2].data

                scales = scale[0].data
                shift = scale[1].data

                if scalef != 0:
                    scalef = 1. / scalef
                mean = mean * scalef
                var = var * scalef
                rstd = 1. / np.sqrt(var + 1e-5)
                if bn_maps[key]["type"] == "Convolution": 
                    rstd1 = rstd.reshape((channels,1,1,1))
                    scales1 = scales.reshape((channels,1,1,1))
                    wt = wt * rstd1 * scales1
                else:
                    rstd1 = rstd.reshape((1, channels,1,1))
                    scales1 = scales.reshape((1, channels,1,1))
                    wt = wt * rstd1 * scales1
                bias = (bias - mean) * rstd * scales + shift
                
                nobn.params[key][0].data[...] = wt
                nobn.params[key][1].data[...] = bias

if __name__ == '__main__':
    parser1 = make_parser()
    args = parser1.parse_args()
    pre_process(args.model, "no_bn.prototxt")

    net = caffe.Net(args.model, args.weights, caffe.TEST)  
    net2 = caffe.Net("no_bn.prototxt", caffe.TEST)

    load_weights(net, net2)
    net2.save("no_bn.caffemodel")

然后运行merge_bn.py：

##这里使用的是迭代训练120000次得到的模型来进行bn层的合并，以获得最终的模型。
python merge_bn.py --model ./example/MobileNetSSD_deploy.prototxt --weights ./snapshot/mobilenet_iter_120000.caffemodel

此时会发现，MobileNet-SSD中多出了一个no_bn.prototxt文件和一个no_bn.caffemodel文件，这就是我们想要获得模型文件和参数文件了。

2. 测试训练结果，对于caffemodel进行test

在MobileNet-SSD/ 下新建文件夾testimages，放入测试的图像。

对demo.py中的路径和文件名进行修改执行demo.py即可
我先复制demo.py为My_demo.py.然后修改其其中的路径。

执行 pyhthon My_demo.py

展示结果

train_net: "example/MobileNetSSD_train.prototxt"　
test_net: "example/MobileNetSSD_test.prototxt"

在/home/zhai/experiment/caffe-ssd-master/caffe/examples/MobileNet-SSD下，执行脚本：

sh test.sh

结果展示

caffe-MobileNet-ssd训练及测试并训练自己的NWPU -VHR-10数据集

一、下载MobileNetSSD，测试demo

MobileNet-SSD 是依赖于我们以前配置的ssd 的。

1. 我们先下载源文件：

2. 测试demo

产生原因：

参数文件和网络文件的详细说明

3 利用自己的数据集训练自己的ＭobileNetSSD model

制作数据集

生成索引txt文件

生成lmdb格式文件(caffe输入格式)

利用MobileNetSSD进行训练

合并成最终的model，以及如何测试

合并成最终的model，以及如何测试