本文基于 Liu W 提出的 SSD 算法,针对其算法实现做以下工作:
1)完成代码的安装与编译
2)使用自己的数据训练 SSD 模型
SSD 编译安装
SSD 基于 CAFFE 实现,因此过程跟 faster-rcnn 差不多:
1)clone 工程
git clone https://github.com/weiliu89/caffe.git
cd caffe
git checkout ssd
2)修改 Makefile.config
cp Makefile.config.example Makefile.config
1) 取消如下注释
USE_CUDNN := 1
BLAS := atlas #这里默认用的 openBlas
WITH_PYTHON_LAYER := 1
2)添加 hdf5 依赖
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial
3)编译
make -j8
make py
make test -j8
4)测试
这篇文章有比较详细的说明
a.首先到这里下载训练好的模型(需要翻墙)
b.添加编译好的 caffe 路径
# 文件 examples/ssd/ssd_pascal_webcam.py
path_ = os.environ['HOME_PATH'] + '/caffe-ssd'
sys.path.insert(0, path_ + '/python')
c.插上摄像头,开始检测
python examples/ssd/ssd_pascal_webcam.py
使用自己的数据训练
大致思路是(参考这篇文章):先将数据组织成 VOC 的格式(参考 faster-rcnn 的编译修改数据格式),然后将其转为 lmdb 格式,最后使用作者提供的工具生成相应的 train,test,solver.prototxt
1)生成 trainval.lmdb 和 test.lmdb
a.创建文件夹 data/my_voc
copy data/VOC0712/ 下的 create_list.sh、create_data.sh 和 labelmap_voc.prototxt 到该目录下
b.修改 create_list.sh
这个脚本是为了生成 jpg 和 xml 的路径(图1)
图1. jpg 和 xml 一一对应
修改后的 create_list.sh(主要删去 voc 里琐碎的格式细节)
#!/bin/bash
root_dir=$HOME/py-faster-rcnn/trainval
bash_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
dst_file=$bash_dir/"my_voc.txt"
if [ -f $dst_file ]
then
rm -f $dst_file
fi
echo "Create list ..."
dataset_file=$root_dir/"ImageList_test.txt"
img_file=$bash_dir/"voc_img.txt"
cp $dataset_file $img_file
sed -i "s/^/JPEGImages\//g" $img_file
sed -i "s/$/.jpg/g" $img_file
label_file=$bash_dir/"voc_label.txt"
cp $dataset_file $label_file
sed -i "s/^/Annotations\//g" $label_file
sed -i "s/$/.xml/g" $label_file
paste -d' ' $img_file $label_file >> $dst_file
rm -f $label_file
rm -f $img_file
# Shuffle trainval file.
rand_file=$dst_file.random
cat $dst_file | perl -MList::Util=shuffle -e 'print shuffle(<STDIN>);' > $rand_file
mv $rand_file $dst_file
$bash_dir/../../build/tools/get_image_size $root_dir $dst_file $bash_dir/"my_voc_name_size.txt"
注:需要对 ImageList_trainval.txt 和 ImageList_test.txt 分别运行(修改 dataset_file 的文件名),保留 ImageList_test.txt 生成的 my_voc_name_size.txt
c.修改 create_data.sh
这个脚本生成 lmdb
cur_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )
root_dir=$HOME/caffe-ssd
#cd $root_dir
redo=1
data_root_dir="$HOME/py-faster-rcnn/trainval"
mapfile="$root_dir/data/my_voc/labelmap_voc.prototxt"
anno_type="detection"
db="lmdb"
min_dim=0
max_dim=0
width=0
height=0
extra_cmd="--encode-type=jpg --encoded"
if [ $redo ]
then
extra_cmd="$extra_cmd --redo"
fi
python $root_dir/scripts/create_annoset.py --anno-type=$anno_type --label-map-file=$mapfile --min-dim=$min_dim --max-dim=$max_dim --resize-width=$width --resize-height=$height --check-label $extra_cmd $data_root_dir $root_dir/data/my_voc/my_voc_trainval.txt $data_root_dir/my_voc_trainval_lmdb examples/my_voc
注:
- 在 create_annoset.py 里添加
path_ = os.environ['HOME_PATH'] + '/caffe-ssd'
sys.path.insert(0, path_ + '/python')
- 修改输入文件名 my_voc_trainval.txt 及输出 my_voc_trainval_lmdb 分别生成 trainval_lmdb 和 test-lmdb
d.修改 labelmap_voc.prototxt
该文件记录了待训练样本的 class ,按自己的情况增删
item {
name: "none_of_the_above"
label: 0
display_name: "background"
}
2)修改训练脚本 ssd_pascal.py
这篇文章可以参考
examples/ssd/ssd_pascal.py 添加额外的 layer,设置所有训练参数,然后在指定目录生成 caffe 需要的 train.prototxt,test.prototxt,solver.prototxt等文件
a.首先仍然添加 caffe 的路径
path_ = os.environ['HOME_PATH'] + '/caffe-ssd'
sys.path.insert(0, path_ + '/python')
b.修改 train_data 和 test_data 为上一步生成的 lmdb 路径
c.修改 model_name、save_dir、snapshot_dir、job_dir、output_result_dir
修改 name_size_file、pretrain_model(从这里下载)、label_map_file
# Modify the job name if you want.
job_name = "SSD_{}".format(resize)
# The name of the model. Modify it if you want.
model_name = "VGG_MY_VOC_{}".format(job_name)
# Directory which stores the model .prototxt file.
save_dir = "models/VGGNet/MY_VOC/{}".format(job_name)
# Directory which stores the snapshot of models.
snapshot_dir = "models/VGGNet/MY_VOC/{}".format(job_name)
# Directory which stores the job script and log file.
job_dir = "jobs/VGGNet/MY_VOC/{}".format(job_name)
# Directory which stores the detection results.
output_result_dir = "{}/caffe-ssd/data/my_voc/results/{}/".format(os.environ['HOME'], job_name)
...
# Stores the test image names and sizes. Created by data/VOC0712/create_list.sh
name_size_file = "data/my_voc/test_name_size.txt"
# The pretrained model. We use the Fully convolutional reduced (atrous) VGGNet.
pretrain_model = "models/VGGNet/VGG_ILSVRC_16_layers_fc_reduced.caffemodel"
# Stores LabelMapItem.
label_map_file = "data/my_voc/labelmap_voc.prototxt"
d.修改 num_classes 为自己数据 class 数目(+背景)
修改 num_test_image 为 test 数据数目
按实际修改 batch_size (对于大图片,这个值要适当减小,否则会显存不足报错)
e.按自己 gpu 数目,修改 gpus=”0,1,2,3”
3)训练
python examples/ssd/ssd_pascal.py
注:必须吐槽一下,SSD的训练真心慢,同样的数据,在YOLO2上大概1天,SSD跑8W次用了2周。。。
4)测试
参考《SSD: Single Shot MultiBox Detector 检测单张图片》,修改 example/ssd_detect.ipynb
如文章《IPython Notebook的介绍》所述,ipynb文件是 IPython Notebook,为 web based IPython 封装
pip安装 jupyter notebook,在 example/ 目录下打开即可
sudo pip install jupyter -i https://pypi.tuna.tsinghua.edu.cn/simple
# 会在浏览器里打开
jupyter notebook
为了方便,将 ssd_detect 另存为 .py 文件
1)使用 opencv 处理视频和图像
2)封装为类,方便调用和代码梳理
3)增加视频文件的检测支持
#!/usr/bin/env python
# -*- codong:utf-8 -*-
import os
import sys
path_ = os.environ['HOME_PATH'] + '/caffe-ssd'
sys.path.insert(0, path_ + '/python')
import numpy as np
import caffe
import cv2
from google.protobuf import text_format
from caffe.proto import caffe_pb2
class ssd_detect(object):
def __init__(self, args):
if isinstance(args, list):
self._labelmap_file = args[0]
self._model_def = args[1]
self._model_weights = args[2]
self._use_gpu = args[3]
else:
self._labelmap_file = args.labelmap_file
self._model_def = args.model_def
self._model_weights = args.model_weights
self._use_gpu = args.use_gpu
self._img = []
self._image_resize = 300
self._image_shape = 0
self._rect_color = (0, 0, 255)
self._text_color = (0, 255, 0)
self._threshold = 0.6
self._mean = np.array([0, 0, 0], dtype=np.uint8)
self.init_net()
def init_net(self):
# set gpu mode
if self._use_gpu:
caffe.set_device(0)
caffe.set_mode_gpu()
else:
caffe.set_mode_cpu()
# load PASCAL VOC labels
file_ = open(self._labelmap_file, 'r')
self._labelmap = caffe_pb2.LabelMap()
text_format.Merge(str(file_.read()), self._labelmap)
# init caffe net
self._net = caffe.Net(self._model_def, # defines the structure of the model
self._model_weights, # contains the trained weights
caffe.TEST) # use test mode (e.g., don't perform dropout)
def get_labelname(self):
num_labels = len(self._labelmap.item)
labelnames = []
if type(self._top_label_indices) is not list:
self._top_label_indices = [self._top_label_indices]
for label in self._top_label_indices:
found = False
for i in xrange(0, num_labels):
if label == self._labelmap.item[i].label:
found = True
labelnames.append(self._labelmap.item[i].display_name)
break
assert found == True
return labelnames
def detect_it(self, img):
# pre-handle image
self._img = []
self._img = cv2.resize(img, (self._image_resize, self._image_resize))
self._img -= self._mean
self._img = self._img.transpose((2, 0, 1))
# input
self._net.blobs['data'].data[...] = self._img
detections = self._net.forward()['detection_out']
# Parse the outputs.
det_label = detections[0,0,:,1]
det_conf = detections[0,0,:,2]
det_xmin = detections[0,0,:,3]
det_ymin = detections[0,0,:,4]
det_xmax = detections[0,0,:,5]
det_ymax = detections[0,0,:,6]
# Get detections with confidence higher than self._threshold.
top_indices = [i for i, conf in enumerate(det_conf) if conf >= self._threshold]
self._top_conf = det_conf[top_indices]
self._top_label_indices = det_label[top_indices].tolist()
self._top_labels = self.get_labelname()
self._top_xmin = det_xmin[top_indices]
self._top_ymin = det_ymin[top_indices]
self._top_xmax = det_xmax[top_indices]
self._top_ymax = det_ymax[top_indices]
def draw_it(self, img, show_rate):
show_img = cv2.resize(img, (int(img.shape[1]*show_rate), int(img.shape[0]*show_rate)))
for i in xrange(self._top_conf.shape[0]):
xmin = int(round(self._top_xmin[i] * img.shape[1]))
ymin = int(round(self._top_ymin[i] * img.shape[0]))
xmax = int(round(self._top_xmax[i] * img.shape[1]))
ymax = int(round(self._top_ymax[i] * img.shape[0]))
score = self._top_conf[i]
label = int(self._top_label_indices[i])
label_name = self._top_labels[i]
display_txt = '%s: %.2f'%(label_name, score)
coords = ((int(xmin*show_rate), int(ymin*show_rate)), \
(int(xmax*show_rate), int(ymax*show_rate)))
cv2.rectangle(show_img, coords[0], coords[1], self._rect_color, thickness=2)
cv2.putText(show_img, display_txt, (coords[0][0], coords[0][1] - 5), \
cv2.FONT_HERSHEY_SIMPLEX, 0.8, self._text_color, thickness=2)
return show_img
def detect_video(self, video_file):
cap = cv2.VideoCapture(video_file)
if not cap.isOpened():
print 'can\'t open video file: {}'.format(video_file)
return
print 'detecting within file {}'.format(video_file)
cv2.namedWindow('detect')
while True:
ret, img = cap.read()
if not ret:
print 'finished!'
return
self.detect_it(img)
img = self.draw_it(img, 0.4)
cv2.imshow('detect', img)
cv2.waitKey(20)
if __name__ == '__main__':
args = list()
args.append('{}/data/my_voc/labelmap_voc.prototxt'.format(path_))
args.append('{}/models/VGGNet/MY_VOC/SSD_300x300/deploy.prototxt'.format(path_))
args.append('{}/models/VGGNet/MY_VOC/SSD_300x300/VGG_MY_VOC_SSD_300x300_iter_80000.caffemodel'.format(path_))
args.append(True)
file_ = '/home/xxx/py-faster-rcnn/trainval/test.avi'
sd = ssd_detect(args)
sd.detect_video(file_)
注: examples/ssd/ssd_pascal_video.py 也是一种检测方案,但最终他是调用的二进制可执行文件 caffe,因此只能看看效果,不能获得检测结果(但从这可以看到 end-to-end 的意味:网络输入视频文件,输出直接是带有结果标记的视频帧)