我的caffe-ssd已经配置好,这里也不再去讲。
我的caffe-ssd的路径为 /home/zhai/experiment/caffe-ssd-master/caffe
首先说下环境:ubuntu16.04,cuda9,cudnn7.5,python2.7.12,opencv3.4
参考:
https://blog.csdn.net/qq_30011277/article/details/87557742 (主)
https://blog.csdn.net/Chris_zhangrx/article/details/80458515
https://blog.csdn.net/qq_33431368/article/details/84977194
一、下载MobileNetSSD,测试demo
MobileNet-SSD 是依赖于我们以前配置的ssd 的。
1. 我们先下载源文件:
git clone https://github.com/chuanqi305/MobileNet-SSD
把下载的MobileNet-SSD 包放在caffe/examples/中
进入MobileNet-SSD
,见如下各文件:
- images 测试图片所存放位置
- template 存放网络定义的公用模板train/test/deploy.protoxt,由gen.py脚本修改并生成,主要是因为label个数不一样所以这里的网络结构的前面几层和后面几层少许不同,这个需要我们后续训练自己数据集的时候利用gen_model.sh脚本生成。
- voc 存有三个根据VOC数据集生成的网络文件和一个网络超参数train文件
- demo.py 实际检测脚本(图片存于images文件夹)只针对单张图片,做成视频就是一帧帧图片遍历
- deploy.prototxt 运行网络定义文件,demo.py中调用.(与template/MobileNetSSD_deploy_template.prototxt相似)
- gen.py 生成公用模板脚本(没有用到)
- gen_model.sh 生成自定义网络脚本---生成template中类似的文件(训练自己的数据集时需要用到)
- merge_bn.py 合并bn层脚本,用于生成最终的caffemodel(因为mobilenet有两个层最后需要合并才能得到deploy.caffemodel)
- mobilenet_iter_73000.caffemodel 预训练模型
- solver_test.prototxt 网络测试超参数定义文件
- solver_train.prototxt 网络训练超参数定义文件
- test.sh 网络测试脚本
- train.sh 网络训练脚本
- train.prototxt 训练网络定义文件 和template中的train定义网络文件相似
- train_voc.sh 针对voc文件里的超参数文件和网络文件的训练脚本
2. 测试demo
下载训练好的model,测试一下。
deploy_model网址如下:https://drive.google.com/file/d/0B3gersZ2cHIxRm5PMWRoTkdHdHc/view (这里需要使用访问外网的工具)。
将下载好放的 MobileNetSSD_deploy.caffemodel 放到MobileNet-SSD
文件夹中,打开demo.py可见:
文件中的deploy.prototxt网络和这个caffemodel应该不是一致的所以需要再下载一个
MobileNetSSD_deploy.prototxt,下载地址:Google Drive | 百度云
import numpy as np
import sys,os
import cv2
caffe_root = '/home/zhai/experiment/caffe-ssd-master/caffe/' #修改自己对应的caffe路径
sys.path.insert(0, caffe_root + 'python')
import caffe
net_file= 'MobileNetSSD_deploy.prototxt' #改为刚才下载的
caffe_model='MobileNetSSD_deploy.caffemodel'
test_dir = "images"
然后执行demo.py文件:
cd caffe_ssd/caffe/examples/MobileNet-SSD
python demo.py
出现了问题
F0802 23:13:03.078469 1525 layer_factory.hpp:81] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type,python: DepthwiseConvolution (known types: AbsVal, Accuracy, AnnotatedData, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, ContrastiveLoss, Convolution, Crop, Data, Deconvolution, DetectionEvaluate, DetectionOutput, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, Input, LRN, LSTM, LSTMUnit, Log, MVN, MemoryData, MultiBoxLoss, MultinomialLogisticLoss, Normalize, PReLU, Parameter, Permute, Pooling, Power, PriorBox, Python, RNN, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, VideoData, WindowData)
*** Check failure stack trace: ***
已放弃 (核心已转储)
问题描述
产生原因:
没有在Makefile.conf文件中开启对python的支持。
解决方法:
在caffe文件夹下找到makefile.config文件,找到
#WITH_PYTHON_LAYER=1
去掉注释,然后在命令行中进入caffe路径下,依次
make clean
make all
make pycaffe
问题解决!
继续执行 python demo.py
显示结果
参数文件和网络文件的详细说明
① solver_train.prototxt(和solver_test.prototxt类似)
train_net: “example/MobileNetSSD_train.prototxt” #训练的网络由gen_model.sh脚本生成
test_net: “example/MobileNetSSD_test.prototxt” #测试网络由gen_model.sh脚本生成
test_iter: 673 #等于测试集图片数量/batchsize
test_interval: 10000
base_lr: 0.0005 # 基本学习率
display: 10 # 10步显示一次相当于10步print一次
max_iter: 120000 # 总共的迭代步数
lr_policy: “multistep” # 下降的学习率的下降方式
gamma: 0.5 # weight_decay: 0.00005
snapshot: 1000 #每次迭代1000步之后产生一个当前的caffemodel和状态文件,存入于snapshot文件夹中
snapshot_prefix: “snapshot/mobilenet” solver_mode: GPU #GPU训练方式
debug_info: false
snapshot_after_train: true #训练的时候是否存入中间模型,如果为false,则snapshot没有用处了
test_initialization: false
average_loss: 10
stepvalue: 20000 #呼应于lr的下降方式而设定的,迭代多少步设定再下降
stepvalue: 40000 #呼应于lr的下降方式而设定的,再迭代多少步设定再下降
iter_size: 1 type: “RMSProp” #优化算法
eval_type: “detection” #评估方式为目标检测
ap_version: “11point”
②MobileNetSSD_train_template.prototxt 网络定义文件(test和deploy类似)
截取一段 进行说明 其他以此列推
name: "MobileNet-SSD"
#训练的网络输入层
layer {
name: "data"
type: "AnnotatedData" #输入数据类型
top: "data"
top: "label"
include {
phase: TRAIN #训练层
}
#相当于数据预处理层
transform_param {
#以下0.007834和127.5为图片归一化处理,这个很关键(后面移植和显示等操作都需要和这个对应)
scale: 0.007843
mirror: true
mean_value: 127.5
mean_value: 127.5
mean_value: 127.5
#图片resize操作 300*300 (这个直接影响速度和精度,一般分辨率越小速度越快,但是精度也随之下降)
resize_param {
prob: 1.0
resize_mode: WARP
height: 300
width: 300
interp_mode: LINEAR
interp_mode: AREA
interp_mode: NEAREST
interp_mode: CUBIC
interp_mode: LANCZOS4
}
emit_constraint {
emit_type: CENTER
}
distort_param {
brightness_prob: 0.5
brightness_delta: 32.0
contrast_prob: 0.5
contrast_lower: 0.5
contrast_upper: 1.5
hue_prob: 0.5
hue_delta: 18.0
saturation_prob: 0.5
saturation_lower: 0.5
saturation_upper: 1.5
random_order_prob: 0.0
}
expand_param {
prob: 0.5
max_expand_ratio: 4.0
}
}
#输入数据来源和格式lmdb格式
data_param {
source: "trainval_lmdb/"
batch_size: 24
backend: LMDB
}
annotated_data_param {
batch_sampler {
max_sample: 1
max_trials: 1
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.5
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.1
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.5
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.3
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.5
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.5
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.5
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.9
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.3
max_scale: 1.0
min_aspect_ratio: 0.5
max_aspect_ratio: 2.0
}
sample_constraint {
max_jaccard_overlap: 1.0
}
max_sample: 1
max_trials: 50
}
label_map_file: "labelmap.prototxt"
}
}
##这才刚刚开始Mobilenet网络第一层
layer {
name: "conv0"
type: "Convolution" # 卷积层
bottom: "data"
top: "conv0"
param {
lr_mult: 0.1 # 学习率
decay_mult: 0.1
}
convolution_param {
num_output: 32 #卷积核的个数
bias_term: false
pad: 1 #卷积核是否补全
kernel_size: 3 #卷积核的大小
stride: 2 #卷积核的步数
weight_filler {
type: "msra" #卷积核权值初始化方法
}
}
}
#bn层
layer {
name: "conv0/bn"
type: "BatchNorm"
bottom: "conv0"
top: "conv0"
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
}
##scale层
layer {
name: "conv0/scale"
type: "Scale"
bottom: "conv0"
top: "conv0"
param {
lr_mult: 0.1
decay_mult: 0.0
}
param {
lr_mult: 0.2
decay_mult: 0.0
}
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
#激活函数层,一般是卷积层之后加一个Relu激活函数层
layer {
name: "conv0/relu"
type: "ReLU"
bottom: "conv0"
top: "conv0"
}
③train.sh文件 / test.sh文件
train
#!/bin/sh
#判断网络结构文件是否存在 这里需要修改成 此时 数据集对应的网络文件(gen_model生成)
if ! test -f example/MobileNetSSD_train.prototxt ;then
echo "error: example/MobileNetSSD_train.prototxt does not exist."
echo "please use the gen_model.sh to generate your own model."
exit 1
fi
mkdir -p snapshot
../../build/tools/caffe train -solver="solver_train.prototxt" \ ##训练超参数用的时候这里可能需要更改
-weights="mobilenet_iter_73000.caffemodel" \ ##预训练模型可能需要更改
-gpu 0
test
#!/bin/sh
#latest=snapshot/mobilenet_iter_73000.caffemodel
##定义latest为snapshot(存放模型的文件)中的最后生成的一个即训练完merge_bn的deploy.caffemodel
latest=$(ls -t snapshot/*.caffemodel | head -n 1)
if test -z $latest; then
exit 1
fi
../../build/tools/caffe train -solver="solver_test.prototxt" \
--weights=$latest \ ##用的时候直接改成你要test的caffemodel也可以
-gpu 0
④demo.py文件 这个文件以后需要按照自己的要求更改(例如修改成视频的)
源文件的大致说明:
##导入包
import numpy as np
import sys,os
import cv2
##这里需要修改caffe的根目录
caffe_root = '/home/che/caffe/'
sys.path.insert(0, caffe_root + 'python')
import caffe #网络文件 模型名称 测试图片文件夹 需要修改
net_file= 'MobileNetSSD_deploy.prototxt'
caffe_model='MobileNetSSD_deploy.caffemodel'
test_dir = "images"
##判断是否存在模型和网络文件
if not os.path.exists(caffe_model):
print(caffe_model + " does not exist")
exit()
if not os.path.exists(net_file):
print(net_file + " does not exist")
exit()
##生成网络
net = caffe.Net(net_file,caffe_model,caffe.TEST)
##类别定义
CLASSES = ('background', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor')
##图片预处理,即归一化,resize 的300以及减去的127.5以及乘上的0.007834都和上面网络文件相对应一致
def preprocess(src):
img = cv2.resize(src, (300,300))
img = img - 127.5
img = img * 0.007843
return img
##网络输出的整理
def postprocess(img, out):
h = img.shape[0]
w = img.shape[1]
box = out['detection_out'][0,0,:,3:7] * np.array([w, h, w, h])
cls = out['detection_out'][0,0,:,1]
conf = out['detection_out'][0,0,:,2]
return (box.astype(np.int32), conf, cls)
##主函数 目标检测
def detect(imgfile):
origimg = cv2.imread(imgfile)
img = preprocess(origimg)
img = img.astype(np.float32)
img = img.transpose((2, 0, 1))
net.blobs['data'].data[...] = img
out = net.forward() ## 前向推理
box, conf, cls = postprocess(origimg, out)##产生box为边框的值,conf为概率 cls为类别
##进行逐一画图标注产生最后的显示结果
for i in range(len(box)):
p1 = (box[i][0], box[i][1])
p2 = (box[i][2], box[i][3])
cv2.rectangle(origimg, p1, p2, (0,255,0)) ##画框
p3 = (max(p1[0], 15), max(p1[1], 15))
title = "%s:%.2f" % (CLASSES[int(cls[i])], conf[i])
cv2.putText(origimg, title, p3, cv2.FONT_ITALIC, 0.6, (0, 255, 0), 1) ##画标注
cv2.imshow("SSD", origimg)
k = cv2.waitKey(0) & 0xff
#Exit if ESC pressed
if k == 27 :
return False
return True
for f in os.listdir(test_dir):
if detect(test_dir + "/" + f) == False:
break
3 利用自己的数据集训练自己的MobileNetSSD model
首选在caffe/data
中新建一个MyDataSet
文件夹,将数据集都放到data
中,这样有利于统一管理。
我的NWPU VHR-10数据集是做项目时候已经做好了,这边就不公布了这里主要是讲一下流程:
-
制作数据集
具体做法请考如下地址处的博文:自己制作图像VOC数据集–用于Objection Detection(目标检测))。
此时此刻你的MyDataSet
中应该有以下两个文件,因为多生成的labels这边目标检测用不到所以没有拷贝进来,文件如下所示:
- Annotations 利用标注软件 生成对应的xml文件
- JPEGImages 原始图片
生成索引txt文件
利用以下代码,生成ImageSet文件夹,此文件夹目录下包含Main文件下,在ImageSets\Main
里有四个txt文件:test.txt train.txt trainval.txt val.txt; 分别是测试数据集索引(也就是各个测试图片的名称,相对路径)、训练数据集、训练验证数据集、验证数据集
创建CreateImageSets.py文件,代码如下,这里注释简单说明
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import os
import random
trainval_percent = 0.9 # 可以自己设置
train_percent = 0.8 # 可以自己设置
#xmlfilepath = f"/Users/Administrator/Desktop/ship_detection_online/Annotations_new" # 地址填自己的
#txtsavepath = f"/Users/Administrator/Desktop/ship_detection_online/ImageSets/Main"
xmlfilepath = 'Annotations'
txtsavepath = 'ImageSets/Main'
total_xml = os.listdir(xmlfilepath)
num = len(total_xml)
list = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
trainval = random.sample(list, tv)
train = random.sample(trainval, tr)
ftrainval = open(txtsavepath + '/trainval.txt', 'w')
ftest = open(txtsavepath + '/test.txt', 'w')
ftrain = open(txtsavepath + '/train.txt', 'w')
fval = open(txtsavepath + '/val.txt', 'w')
for i in list:
name = total_xml[i][:-4] + '\n'
if i in trainval:
ftrainval.write(name)
if i in train:
ftrain.write(name)
else:
fval.write(name)
else:
ftest.write(name)
ftrainval.close()
ftrain.close()
fval.close()
ftest.close()
print('Well finshed')
执行 python CreateImageSets.py得到以下,
生成lmdb格式文件(caffe输入格式)
首先先把从caffe/data/VOC0712/ 以下几个文件拷贝到data/MyDataSet
中:
cd caffe/data
cp VOC0712/create_list.sh MyDataSet/
cp VOC0712/create_data.sh MyDataSet/
cp VOC0712/labelmap_voc.prototxt MyDataSet/
此时数据集的文件情况为:
更改复制过来的这三个文件。create_list.sh更改形式:
#!/bin/bash
root_dir=$HOME/caffe/data ## 更改你的路径
sub_dir=ImageSets/Main
bash_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
for dataset in trainval test
do
dst_file=$bash_dir/$dataset.txt
if [ -f $dst_file ]
then
rm -f $dst_file
fi
for name in MyDataSet ##更改成你的dataset的名称
do
# if [[ $dataset == "test" && $name == "VOC2012" ]]
#then
# continue
#fi
echo "Create list for $name $dataset..."
..............(这里不用改,省略)
done
更改之后为:
create_data.sh更改形式为:
cur_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )
root_dir=$cur_dir/../..
cd $root_dir redo=1
data_root_dir="$HOME/caffe/data" ## 更改你的路径
dataset_name="MyDataSet" ##更改你的dataset的名称
mapfile="$root_dir/data/$dataset_name/labelmap_voc.prototxt"
...............(这里不用改,省略)
done
改之后:
如果训练图像不是.jpeg或.jpg格式,还需要对上述两个文件中出现的指定的图像后缀名做一下修改,需要修改的地方不多。
labelmap_voc.prototxt需要依据自己的label来修改,举个例子如下:
item {
name: "none_of_the_above"
label: 0
display_name: "background"
}
item {
name: "airplane"
label: 1
display_name: "airplane"
}
item {
name: "ship"
label: 2
display_name: "ship"
}
item {
name: "storagetank"
label: 3
display_name: "storagetank"
}
item {
name: "baseballdiamond"
label: 4
display_name: "baseballdiamond"
}
item {
name: "tenniscourt"
label: 5
display_name: "tenniscourt"
}
item {
name: "basketballcourt"
label: 6
display_name: "basketballcourt"
}
item {
name: "groundtrackfield"
label: 7
display_name: "groundtrackfield"
}
item {
name: "habor"
label: 8
display_name: "habor"
}
item {
name: "bridge"
label: 9
display_name: "bridge"
}
item {
name: "vehicle"
label: 10
display_name: "vehicle"
}
依次执行(执行之前,最好删掉##注释):
cd /home/zhai/experiment/caffe-ssd-master/caffe/data/MyDataSet
sh create_list.sh
sh create_data.sh
执行以上命令错误
zhai@zhai-Lenovo-Legion-Y7000:~/experiment/caffe-ssd-master/caffe$ sh ./data/MyDataSet/create_list.sh
./data/MyDataSet/create_list.sh: 5: ./data/MyDataSet/create_list.sh: Bad substitution
Create list for MyDataSet trainval...
./data/MyDataSet/create_list.sh: 39: [: trainval: unexpected operator
./data/MyDataSet/create_list.sh: 45: [: trainval: unexpected operator
Create list for MyDataSet test...
./data/MyDataSet/create_list.sh: 39: [: test: unexpected operator
./data/MyDataSet/create_list.sh: 45: [: test: unexpected operator
数据转换,执行命令直接 sudo ./data/VOC0712/create_list.sh 或 sudo bash ./data/VOC0712/create_list.sh
而不是 sudo sh ./data/VOC0712/create_list.sh 否则会抱如上错误。
参考:https://blog.csdn.net/u011489887/article/details/91354461
cd /home/zhai/experiment/caffe-ssd-master/caffe/data/MyDataSet
sudo bash create_list.sh
sudo sh create_data.sh
然后:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
出现问题:
hai@zhai-Lenovo-Legion-Y7000:~/experiment/caffe-ssd-master/caffe$ sudo ./data/MyDataSet/create_data.sh
Traceback (most recent call last):
File "/home/zhai/experiment/caffe-ssd-master/caffe/data/MyDataSet/../../scripts/create_annoset.py", line 12, in <module>
from caffe.proto import caffe_pb2
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/__init__.py", line 1, in <module>
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/pycaffe.py", line 11, in <module>
import numpy as np
ImportError: No module named 'numpy'
Traceback (most recent call last):
File "/home/zhai/experiment/caffe-ssd-master/caffe/data/MyDataSet/../../scripts/create_annoset.py", line 12, in <module>
from caffe.proto import caffe_pb2
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/__init__.py", line 1, in <module>
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/pycaffe.py", line 11, in <module>
import numpy as np
ImportError: No module named 'numpy'
可以看到是numpy的问题。所以下载更新numpy
pip install -U numpy
接下来,新问题
zhai@zhai-Lenovo-Legion-Y7000:~/experiment/caffe-ssd-master/caffe$ ./data/VOC0712/create_data.sh
Traceback (most recent call last):
File "/home/zhai/experiment/caffe-ssd-master/caffe/data/VOC0712/../../scripts/create_annoset.py", line 12, in <module>
from caffe.proto import caffe_pb2
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/__init__.py", line 1, in <module>
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/pycaffe.py", line 13, in <module>
from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
ImportError: /home/zhai/anaconda3/bin/../lib/./libharfbuzz.so.0: undefined symbol: FT_Done_MM_Var
Traceback (most recent call last):
File "/home/zhai/experiment/caffe-ssd-master/caffe/data/VOC0712/../../scripts/create_annoset.py", line 12, in <module>
from caffe.proto import caffe_pb2
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/__init__.py", line 1, in <module>
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/pycaffe.py", line 13, in <module>
from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
ImportError: /home/zhai/anaconda3/bin/../lib/./libharfbuzz.so.0: undefined symbol: FT_Done_MM_Var
这个地方不知道为啥会调用anaconda3的库文件,Anaconda的 lib 中,把 Python需要的各种 lib 单独列了出来,造成和系统中的库版本不一致。用系统自带的来代替anaconda3中的,参考:https://zhuanlan.zhihu.com/p/163639019
/home/zhai/anaconda3/lib/libharfbuzz.so cp /usr/lib/x86_64-linux-gnu/libharfbuzz.so
其实就是把系统库里面的/usr/lib/x86_64-linux-gnu/libharfbuzz.so 替换掉/home/zhai/anaconda3/lib/libharfbuzz.so
接下来,新问题,这次是libfontconfig.so.1
然后,第二波代替
/home/zhai/anaconda3/lib/libfontconfig.so.1 cp /usr/lib/x86_64-linux-gnu/libfontconfig.so.1
新问题
zhai@zhai-Lenovo-Legion-Y7000:~/experiment/caffe-ssd-master/caffe$ ./data/VOC0712/create_data.shTraceback (most recent call last):
File "/home/zhai/experiment/caffe-ssd-master/caffe/data/VOC0712/../../scripts/create_annoset.py", line 12, in <module>
from caffe.proto import caffe_pb2
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/__init__.py", line 1, in <module>
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/pycaffe.py", line 13, in <module>
from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
ImportError: /home/zhai/anaconda3/bin/../lib/libpangoft2-1.0.so.0: undefined symbol: hb_font_set_variations
Traceback (most recent call last):
File "/home/zhai/experiment/caffe-ssd-master/caffe/data/VOC0712/../../scripts/create_annoset.py", line 12, in <module>
from caffe.proto import caffe_pb2
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/__init__.py", line 1, in <module>
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/pycaffe.py", line 13, in <module>
from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
ImportError: /home/zhai/anaconda3/bin/../lib/libpangoft2-1.0.so.0: undefined symbol: hb_font_set_variations
第三波代替,先 locate libpangoft2-1.0.so.0,查看位置
然后系统自带的代替anaconda3的
这次用刚才办法不行,那就换种方法,参考:https://blog.csdn.net/qq_45569859/article/details/103341971
$ cd home/zhai/anaconda3/lib/
$ rm libpangoft2-1.0.so.0
$ cp /usr/lib/x86_64-linux-gnu/libpangoft2-1.0.so.0 libpangoft2-1.0.so.0
在再回去执行命令,./data/VOC0712/create_data.sh
zhai@zhai-Lenovo-Legion-Y7000:~/experiment/caffe-ssd-master/caffe$ ./data/VOC0712/create_data.sh
Traceback (most recent call last):
File "/home/zhai/experiment/caffe-ssd-master/caffe/data/VOC0712/../../scripts/create_annoset.py", line 12, in <module>
from caffe.proto import caffe_pb2
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/__init__.py", line 1, in <module>
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/pycaffe.py", line 13, in <module>
from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
ImportError: dynamic module does not define module export function (PyInit__caffe)
Traceback (most recent call last):
File "/home/zhai/experiment/caffe-ssd-master/caffe/data/VOC0712/../../scripts/create_annoset.py", line 12, in <module>
from caffe.proto import caffe_pb2
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/__init__.py", line 1, in <module>
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver
File "/home/zhai/experiment/caffe-ssd-master/caffe/python/caffe/pycaffe.py", line 13, in <module>
from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
ImportError: dynamic module does not define module export function (PyInit__caffe)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
从上面的虚线到这的问题,应该都是因为系统安装Anaconda后,如果使用了其自动改变环境变量,那么默认的python ,pip都将变为Anaconda下的,如下图,which python都是anaconda下的,但是我们用的都是系统哦自带的,所以现在出现了现在的问题。
这时如果想切换默认python为原始独立python,
1. sudo gedit ~/.bashrc
2. 添加 export PATH="/usr/bin/:$PATH"
3.source ~/.bashrc
下面才是默认的
好,接下来在caffe根目录下继续执行, ./data/MyDataSet/create_list.sh 注意:不要有sudo
在执行 ./data/MyDataSet/create_data.sh
即可生成
两个文件里都为lmdb文件
其次,发现在examples
中有个与MobileNetSSD
平级的目录MyDataSet
里面为lmdb文件夹的超链接文件,后续训练使用。
利用MobileNetSSD进行训练
由于VOC数据集是21类(加上背景),而这里只有11类(加上背景),因此,我们需要重新生成训练、测试和运行网络文件,这里就要用到gen_model.sh脚本,它会调用template文件夹中的模板,按照我们指定的参数,生成所需的训练网络模型。这个脚本的用法usage: CLASSNUM 对应label的个数 + backgroud =11
./gen_model.sh CLASSNUM
①首先在MobileNetSSD文件中建立自己的labelmap.prototxt(内容和上述labelmap_voc.prototxt一样)
②建立自己对应label个数的train/test/deploy网络文件
./gen_model.sh 11 #数字11对应label的个数 + backgroud =11
小问题:执行以上命令
bash: ./gen_model.sh: 权限不够
重新执行
bash ./gen_model.sh 11
执行之后,得到examples文件夹,里面的3个prototxt就是从模板生成的正式网络定义,根据作者设置,其中的deploy文件是已经合并过bn层的,需要后面配套使用。
③建立数据集的超链接
ln -s PATH_TO_YOUR_TRAIN_LMDB trainval_lmdb
ln -s PATH_TO_YOUR_TEST_LMDB test_lmdb
以我的路径操作,在MobileNetSSD中执行上述两句的命令为:
ln -s /home/zhai/experiment/caffe-ssd-master/caffe/data/MyDataSet/lmdb/MyDataSet_trainval_lmdb trainval_lmdb
ln -s /home/zhai/experiment/caffe-ssd-master/caffe/data/MyDataSet/lmdb/MyDataSet_test_lmdb test_lmdb
则在MobileNetSSD下出现两个超链接文件:
这一步也可以将上面在example
中生成的MyDataSet
文件里面的两个超链接全部复制到MobileNetSSD
中去,并且将名字改成如上图所示的名称。
④修改超参数、指定预训练模型,开始model训练
按照自身要求修改solver_test.prototxt和solver_train.prototxt中的超参数;(新手可以不动)
其中test_iter=测试集图片数量/batchsize;初始学习率不宜太高,否则基础权重破坏比较严重;优化算法是RMSProp,可能对收敛有好处,不要改成SGD,也是为了保护权重。
然后,修改预训练模型为:
开始训练了:
在/home/zhai/experiment/caffe-ssd-master/caffe/examples/MobileNet-SSD下 修改并运行train.sh脚本,中途可以不断调节参数。训练结束后,运行test.sh脚本,测试网络的精度值。
sh train.sh
⑤接下来,可能出现以下错误:
F0805 00:28:38.227371 27458 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory
原因:GPU内存不够。
解决方案:如下,在caffe/examples/MobileNet-SSD/example中的MobileNetSSD_train.prototxt中减小batchsize即可。我的电脑太次,只能选择8。
重新run脚本
sh train.sh
训练中途也可以不断调节参数,能看来随着迭代次数的增多loss正在减小,经过一段时间的训练,最后几万步之后loss差不多1.0上下浮动。
成功
参考:https://blog.csdn.net/qq_30011277/article/details/87557742
https://blog.csdn.net/xiao__run/article/details/80643346
https://blog.csdn.net/c20081052/article/details/81747719
https://blog.csdn.net/qq_33431368/article/details/84977194
合并成最终的model,以及如何测试
训练开始后,文件中多了一个snapshot文件夹。
可以看出我们是按照每一千步生成一个caffemodel文件和一个实时训练状态文件,这个就是solver.prototxt文件中可以进行设定。
①合并出最终的caffemodel
因为MobileNet中有bn和scale层,最后生成deploy需要进行一步操作,为了提高模型运行速度,作者在这里将bn层合并到了卷积层中,相当于bn的计算时间就被节省了,对检测速度可能有小幅度的帮助,打开merge_bn.py文件,然后注意修改其中的文件路径
merge_bn.py的内容如下。
import os
import sys
import argparse
import logging
import numpy as np
try:
caffe_root = '/home/zhai/experiment/caffe-ssd-master/caffe' ##此处改成你的路径即可
sys.path.insert(0, caffe_root + 'python')
import caffe
except ImportError:
logging.fatal("Cannot find caffe!")
from caffe.proto import caffe_pb2
from google.protobuf import text_format
def make_parser():
parser = argparse.ArgumentParser()
parser.add_argument('--model', type=str, required=True, help='.prototxt file for inference')
parser.add_argument('--weights', type=str, required=True, help='.caffemodel file for inference')
return parser
bn_maps = {}
def find_top_after_bn(layers, name, top):
bn_maps[name] = {}
for l in layers:
if len(l.bottom) == 0:
continue
if l.bottom[0] == top and l.type == "BatchNorm":
bn_maps[name]["bn"] = l.name
top = l.top[0]
if l.bottom[0] == top and l.type == "Scale":
bn_maps[name]["scale"] = l.name
top = l.top[0]
return top
def pre_process(expected_proto, new_proto):
net_specs = caffe_pb2.NetParameter()
net_specs2 = caffe_pb2.NetParameter()
with open(expected_proto, "r") as fp:
text_format.Merge(str(fp.read()), net_specs)
net_specs2.MergeFrom(net_specs)
layers = net_specs.layer
num_layers = len(layers)
for i in range(num_layers - 1, -1, -1):
del net_specs2.layer[i]
for idx in range(num_layers):
l = layers[idx]
if l.type == "BatchNorm" or l.type == "Scale":
continue
elif l.type == "Convolution" or l.type == "Deconvolution":
top = find_top_after_bn(layers, l.name, l.top[0])
bn_maps[l.name]["type"] = l.type
layer = net_specs2.layer.add()
layer.MergeFrom(l)
layer.top[0] = top
layer.convolution_param.bias_term = True
else:
layer = net_specs2.layer.add()
layer.MergeFrom(l)
with open(new_proto, "w") as fp:
fp.write("{}".format(net_specs2))
def load_weights(net, nobn):
if sys.version_info > (3,0):
listKeys = nobn.params.keys()
else:
listKeys = nobn.params.iterkeys()
for key in listKeys:
if type(nobn.params[key]) is caffe._caffe.BlobVec:
conv = net.params[key]
if key not in bn_maps or "bn" not in bn_maps[key]:
for i, w in enumerate(conv):
nobn.params[key][i].data[...] = w.data
else:
print(key)
bn = net.params[bn_maps[key]["bn"]]
scale = net.params[bn_maps[key]["scale"]]
wt = conv[0].data
channels = 0
if bn_maps[key]["type"] == "Convolution":
channels = wt.shape[0]
elif bn_maps[key]["type"] == "Deconvolution":
channels = wt.shape[1]
else:
print("error type " + bn_maps[key]["type"])
exit(-1)
bias = np.zeros(channels)
if len(conv) > 1:
bias = conv[1].data
mean = bn[0].data
var = bn[1].data
scalef = bn[2].data
scales = scale[0].data
shift = scale[1].data
if scalef != 0:
scalef = 1. / scalef
mean = mean * scalef
var = var * scalef
rstd = 1. / np.sqrt(var + 1e-5)
if bn_maps[key]["type"] == "Convolution":
rstd1 = rstd.reshape((channels,1,1,1))
scales1 = scales.reshape((channels,1,1,1))
wt = wt * rstd1 * scales1
else:
rstd1 = rstd.reshape((1, channels,1,1))
scales1 = scales.reshape((1, channels,1,1))
wt = wt * rstd1 * scales1
bias = (bias - mean) * rstd * scales + shift
nobn.params[key][0].data[...] = wt
nobn.params[key][1].data[...] = bias
if __name__ == '__main__':
parser1 = make_parser()
args = parser1.parse_args()
pre_process(args.model, "no_bn.prototxt")
net = caffe.Net(args.model, args.weights, caffe.TEST)
net2 = caffe.Net("no_bn.prototxt", caffe.TEST)
load_weights(net, net2)
net2.save("no_bn.caffemodel")
然后运行merge_bn.py:
##这里使用的是迭代训练120000次得到的模型来进行bn层的合并,以获得最终的模型。
python merge_bn.py --model ./example/MobileNetSSD_deploy.prototxt --weights ./snapshot/mobilenet_iter_120000.caffemodel
此时会发现,MobileNet-SSD中多出了一个no_bn.prototxt文件和一个no_bn.caffemodel文件,这就是我们想要获得模型文件和参数文件了。
2. 测试训练结果,对于caffemodel进行test
在MobileNet-SSD/ 下新建文件夾testimages,放入测试的图像。
对demo.py中的路径和文件名进行修改执行demo.py即可
我先复制demo.py为My_demo.py.然后修改其其中的路径。
执行 pyhthon My_demo.py
展示结果
3. 测试精度
也可以利用test.sh进行测试以下总体的acc
这里可能需要修改下MobileNet-SSD/solver_test.prototxt文件中的对应路径
例如这个地方solver_train.prototxt文件中的路径默认为example/... 而solver_test.prototxt 没有修改,需要自行修改为以下形式
train_net: "example/MobileNetSSD_train.prototxt"
test_net: "example/MobileNetSSD_test.prototxt"
在/home/zhai/experiment/caffe-ssd-master/caffe/examples/MobileNet-SSD下,执行脚本:
sh test.sh
合并成最终的model,以及如何测试
训练开始后,文件中多了一个snapshot文件夹。
可以看出我们是按照每一千步生成一个caffemodel文件和一个实时训练状态文件,这个就是solver.prototxt文件中可以进行设定。
①合并出最终的caffemodel
因为MobileNet中有bn和scale层,最后生成deploy需要进行一步操作,为了提高模型运行速度,作者在这里将bn层合并到了卷积层中,相当于bn的计算时间就被节省了,对检测速度可能有小幅度的帮助,打开merge_bn.py文件,然后注意修改其中的文件路径
merge_bn.py的内容如下。
import os
import sys
import argparse
import logging
import numpy as np
try:
caffe_root = '/home/zhai/experiment/caffe-ssd-master/caffe' ##此处改成你的路径即可
sys.path.insert(0, caffe_root + 'python')
import caffe
except ImportError:
logging.fatal("Cannot find caffe!")
from caffe.proto import caffe_pb2
from google.protobuf import text_format
def make_parser():
parser = argparse.ArgumentParser()
parser.add_argument('--model', type=str, required=True, help='.prototxt file for inference')
parser.add_argument('--weights', type=str, required=True, help='.caffemodel file for inference')
return parser
bn_maps = {}
def find_top_after_bn(layers, name, top):
bn_maps[name] = {}
for l in layers:
if len(l.bottom) == 0:
continue
if l.bottom[0] == top and l.type == "BatchNorm":
bn_maps[name]["bn"] = l.name
top = l.top[0]
if l.bottom[0] == top and l.type == "Scale":
bn_maps[name]["scale"] = l.name
top = l.top[0]
return top
def pre_process(expected_proto, new_proto):
net_specs = caffe_pb2.NetParameter()
net_specs2 = caffe_pb2.NetParameter()
with open(expected_proto, "r") as fp:
text_format.Merge(str(fp.read()), net_specs)
net_specs2.MergeFrom(net_specs)
layers = net_specs.layer
num_layers = len(layers)
for i in range(num_layers - 1, -1, -1):
del net_specs2.layer[i]
for idx in range(num_layers):
l = layers[idx]
if l.type == "BatchNorm" or l.type == "Scale":
continue
elif l.type == "Convolution" or l.type == "Deconvolution":
top = find_top_after_bn(layers, l.name, l.top[0])
bn_maps[l.name]["type"] = l.type
layer = net_specs2.layer.add()
layer.MergeFrom(l)
layer.top[0] = top
layer.convolution_param.bias_term = True
else:
layer = net_specs2.layer.add()
layer.MergeFrom(l)
with open(new_proto, "w") as fp:
fp.write("{}".format(net_specs2))
def load_weights(net, nobn):
if sys.version_info > (3,0):
listKeys = nobn.params.keys()
else:
listKeys = nobn.params.iterkeys()
for key in listKeys:
if type(nobn.params[key]) is caffe._caffe.BlobVec:
conv = net.params[key]
if key not in bn_maps or "bn" not in bn_maps[key]:
for i, w in enumerate(conv):
nobn.params[key][i].data[...] = w.data
else:
print(key)
bn = net.params[bn_maps[key]["bn"]]
scale = net.params[bn_maps[key]["scale"]]
wt = conv[0].data
channels = 0
if bn_maps[key]["type"] == "Convolution":
channels = wt.shape[0]
elif bn_maps[key]["type"] == "Deconvolution":
channels = wt.shape[1]
else:
print("error type " + bn_maps[key]["type"])
exit(-1)
bias = np.zeros(channels)
if len(conv) > 1:
bias = conv[1].data
mean = bn[0].data
var = bn[1].data
scalef = bn[2].data
scales = scale[0].data
shift = scale[1].data
if scalef != 0:
scalef = 1. / scalef
mean = mean * scalef
var = var * scalef
rstd = 1. / np.sqrt(var + 1e-5)
if bn_maps[key]["type"] == "Convolution":
rstd1 = rstd.reshape((channels,1,1,1))
scales1 = scales.reshape((channels,1,1,1))
wt = wt * rstd1 * scales1
else:
rstd1 = rstd.reshape((1, channels,1,1))
scales1 = scales.reshape((1, channels,1,1))
wt = wt * rstd1 * scales1
bias = (bias - mean) * rstd * scales + shift
nobn.params[key][0].data[...] = wt
nobn.params[key][1].data[...] = bias
if __name__ == '__main__':
parser1 = make_parser()
args = parser1.parse_args()
pre_process(args.model, "no_bn.prototxt")
net = caffe.Net(args.model, args.weights, caffe.TEST)
net2 = caffe.Net("no_bn.prototxt", caffe.TEST)
load_weights(net, net2)
net2.save("no_bn.caffemodel")
然后运行merge_bn.py:
##这里使用的是迭代训练120000次得到的模型来进行bn层的合并,以获得最终的模型。
python merge_bn.py --model ./example/MobileNetSSD_deploy.prototxt --weights ./snapshot/mobilenet_iter_120000.caffemodel
此时会发现,MobileNet-SSD中多出了一个no_bn.prototxt文件和一个no_bn.caffemodel文件,这就是我们想要获得模型文件和参数文件了。
2. 测试训练结果,对于caffemodel进行test
在MobileNet-SSD/ 下新建文件夾testimages,放入测试的图像。
对demo.py中的路径和文件名进行修改执行demo.py即可
我先复制demo.py为My_demo.py.然后修改其其中的路径。
执行 pyhthon My_demo.py
展示结果
3. 测试精度
也可以利用test.sh进行测试以下总体的acc
这里可能需要修改下MobileNet-SSD/solver_test.prototxt文件中的对应路径
例如这个地方solver_train.prototxt文件中的路径默认为example/... 而solver_test.prototxt 没有修改,需要自行修改为以下形式
train_net: "example/MobileNetSSD_train.prototxt"
test_net: "example/MobileNetSSD_test.prototxt"
在/home/zhai/experiment/caffe-ssd-master/caffe/examples/MobileNet-SSD下,执行脚本:
sh test.sh
结果展示