r-cnn系列代码编译及解读(4)

最新推荐文章于 2024-07-07 01:07:36 发布

zizi7

最新推荐文章于 2024-07-07 01:07:36 发布

阅读量919

点赞数

分类专栏：机器学习文章标签： ssd 深度学习 caffe

本文链接：https://blog.csdn.net/zizi7/article/details/77468424

版权

机器学习专栏收录该内容

44 篇文章 0 订阅

订阅专栏

本文基于 Liu W 提出的 SSD 算法，针对其算法实现做以下工作：
1）完成代码的安装与编译
2）使用自己的数据训练 SSD 模型

SSD 编译安装

SSD 基于 CAFFE 实现，因此过程跟 faster-rcnn 差不多:
1）clone 工程

git clone https://github.com/weiliu89/caffe.git
cd caffe
git checkout ssd

2）修改 Makefile.config

cp Makefile.config.example Makefile.config
1) 取消如下注释
USE_CUDNN := 1
BLAS := atlas  #这里默认用的 openBlas
WITH_PYTHON_LAYER := 1

2)添加 hdf5 依赖
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial

LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial

3）编译

make -j8
make py
make test -j8

4）测试
这篇文章有比较详细的说明
a.首先到这里下载训练好的模型（需要翻墙）
b.添加编译好的 caffe 路径

# 文件 examples/ssd/ssd_pascal_webcam.py

path_ = os.environ['HOME_PATH'] + '/caffe-ssd'
sys.path.insert(0, path_ + '/python')

c.插上摄像头，开始检测

python examples/ssd/ssd_pascal_webcam.py

使用自己的数据训练

大致思路是（参考这篇文章）：先将数据组织成 VOC 的格式（参考 faster-rcnn 的编译修改数据格式），然后将其转为 lmdb 格式，最后使用作者提供的工具生成相应的 train,test,solver.prototxt

1）生成 trainval.lmdb 和 test.lmdb
a.创建文件夹 data/my_voc
copy data/VOC0712/ 下的 create_list.sh、create_data.sh 和 labelmap_voc.prototxt 到该目录下

b.修改 create_list.sh
这个脚本是为了生成 jpg 和 xml 的路径（图1）

　　　　　　　　　　　　这里写图片描述
　　　　　　　　　　　　　　　　　　　　　　　　图1. jpg 和 xml 一一对应

修改后的 create_list.sh（主要删去 voc 里琐碎的格式细节）

#!/bin/bash

root_dir=$HOME/py-faster-rcnn/trainval
bash_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

dst_file=$bash_dir/"my_voc.txt"

if [ -f $dst_file ]
  then
    rm -f $dst_file
fi

echo "Create list ..."
dataset_file=$root_dir/"ImageList_test.txt"

img_file=$bash_dir/"voc_img.txt"
cp $dataset_file $img_file
sed -i "s/^/JPEGImages\//g" $img_file
sed -i "s/$/.jpg/g" $img_file

label_file=$bash_dir/"voc_label.txt"
cp $dataset_file $label_file
sed -i "s/^/Annotations\//g" $label_file
sed -i "s/$/.xml/g" $label_file

paste -d' ' $img_file $label_file >> $dst_file

rm -f $label_file
rm -f $img_file

# Shuffle trainval file.
rand_file=$dst_file.random
cat $dst_file | perl -MList::Util=shuffle -e 'print shuffle(<STDIN>);' > $rand_file
mv $rand_file $dst_file

$bash_dir/../../build/tools/get_image_size $root_dir $dst_file $bash_dir/"my_voc_name_size.txt"

注：需要对 ImageList_trainval.txt 和 ImageList_test.txt 分别运行（修改 dataset_file 的文件名），保留 ImageList_test.txt 生成的 my_voc_name_size.txt

c.修改 create_data.sh
这个脚本生成 lmdb

cur_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )
root_dir=$HOME/caffe-ssd

#cd $root_dir

redo=1
data_root_dir="$HOME/py-faster-rcnn/trainval"
mapfile="$root_dir/data/my_voc/labelmap_voc.prototxt"
anno_type="detection"
db="lmdb"
min_dim=0
max_dim=0
width=0
height=0

extra_cmd="--encode-type=jpg --encoded"
if [ $redo ]
then
  extra_cmd="$extra_cmd --redo"
fi

  python $root_dir/scripts/create_annoset.py --anno-type=$anno_type --label-map-file=$mapfile --min-dim=$min_dim --max-dim=$max_dim --resize-width=$width --resize-height=$height --check-label $extra_cmd $data_root_dir $root_dir/data/my_voc/my_voc_trainval.txt $data_root_dir/my_voc_trainval_lmdb examples/my_voc

注：

在 create_annoset.py 里添加

path_ = os.environ['HOME_PATH'] + '/caffe-ssd'
sys.path.insert(0, path_ + '/python')

修改输入文件名 my_voc_trainval.txt 及输出 my_voc_trainval_lmdb 分别生成 trainval_lmdb 和 test-lmdb

d.修改 labelmap_voc.prototxt
该文件记录了待训练样本的 class ，按自己的情况增删

item {
  name: "none_of_the_above"
  label: 0
  display_name: "background"
}

2）修改训练脚本 ssd_pascal.py
这篇文章可以参考
examples/ssd/ssd_pascal.py 添加额外的 layer，设置所有训练参数，然后在指定目录生成 caffe 需要的 train.prototxt，test.prototxt，solver.prototxt等文件

a.首先仍然添加 caffe 的路径

path_ = os.environ['HOME_PATH'] + '/caffe-ssd'
sys.path.insert(0, path_ + '/python')

b.修改 train_data 和 test_data 为上一步生成的 lmdb 路径

c.修改 model_name、save_dir、snapshot_dir、job_dir、output_result_dir
　修改 name_size_file、pretrain_model（从这里下载）、label_map_file

# Modify the job name if you want.
job_name = "SSD_{}".format(resize)
# The name of the model. Modify it if you want.
model_name = "VGG_MY_VOC_{}".format(job_name)

# Directory which stores the model .prototxt file.
save_dir = "models/VGGNet/MY_VOC/{}".format(job_name)
# Directory which stores the snapshot of models.
snapshot_dir = "models/VGGNet/MY_VOC/{}".format(job_name)
# Directory which stores the job script and log file.
job_dir = "jobs/VGGNet/MY_VOC/{}".format(job_name)
# Directory which stores the detection results.
output_result_dir = "{}/caffe-ssd/data/my_voc/results/{}/".format(os.environ['HOME'], job_name)
...
# Stores the test image names and sizes. Created by data/VOC0712/create_list.sh
name_size_file = "data/my_voc/test_name_size.txt"
# The pretrained model. We use the Fully convolutional reduced (atrous) VGGNet.
pretrain_model = "models/VGGNet/VGG_ILSVRC_16_layers_fc_reduced.caffemodel"
# Stores LabelMapItem.
label_map_file = "data/my_voc/labelmap_voc.prototxt"

d.修改 num_classes 为自己数据 class 数目（+背景）
　修改 num_test_image 为 test 数据数目
　按实际修改 batch_size （对于大图片，这个值要适当减小，否则会显存不足报错）

e.按自己 gpu 数目，修改 gpus=”0,1,2,3”

3）训练

python examples/ssd/ssd_pascal.py

注：必须吐槽一下，SSD的训练真心慢，同样的数据，在YOLO2上大概1天，SSD跑8W次用了2周。。。

4）测试
参考《SSD: Single Shot MultiBox Detector 检测单张图片》，修改 example/ssd_detect.ipynb

如文章《IPython Notebook的介绍》所述，ipynb文件是 IPython Notebook，为 web based IPython 封装

pip安装 jupyter notebook，在 example/ 目录下打开即可

sudo pip install jupyter -i https://pypi.tuna.tsinghua.edu.cn/simple
# 会在浏览器里打开
jupyter notebook

为了方便，将 ssd_detect 另存为 .py 文件
1）使用 opencv 处理视频和图像
2）封装为类，方便调用和代码梳理
3）增加视频文件的检测支持

#!/usr/bin/env python
# -*- codong:utf-8 -*-

import os
import sys
path_ = os.environ['HOME_PATH'] + '/caffe-ssd'
sys.path.insert(0, path_ + '/python')

import numpy as np
import caffe
import cv2

from google.protobuf import text_format
from caffe.proto import caffe_pb2


class ssd_detect(object):
    def __init__(self, args):
        if isinstance(args, list):
            self._labelmap_file = args[0]
            self._model_def = args[1]
            self._model_weights = args[2]

            self._use_gpu = args[3]
        else:
            self._labelmap_file = args.labelmap_file
            self._model_def = args.model_def
            self._model_weights = args.model_weights

            self._use_gpu = args.use_gpu

        self._img = []
        self._image_resize = 300
        self._image_shape = 0
        self._rect_color = (0, 0, 255)
        self._text_color = (0, 255, 0)
        self._threshold = 0.6
        self._mean = np.array([0, 0, 0], dtype=np.uint8)

        self.init_net()


    def init_net(self):
        # set gpu mode
        if self._use_gpu:
            caffe.set_device(0)
            caffe.set_mode_gpu()
        else:
            caffe.set_mode_cpu()

        # load PASCAL VOC labels
        file_ = open(self._labelmap_file, 'r')
        self._labelmap = caffe_pb2.LabelMap()
        text_format.Merge(str(file_.read()), self._labelmap)

        # init caffe net
        self._net = caffe.Net(self._model_def,   # defines the structure of the model
                self._model_weights,             # contains the trained weights
                caffe.TEST)                      # use test mode (e.g., don't perform dropout)


    def get_labelname(self):
        num_labels = len(self._labelmap.item)
        labelnames = []
        if type(self._top_label_indices) is not list:
            self._top_label_indices = [self._top_label_indices]
        for label in self._top_label_indices:
            found = False
            for i in xrange(0, num_labels):
                if label == self._labelmap.item[i].label:
                    found = True
                    labelnames.append(self._labelmap.item[i].display_name)
                    break
            assert found == True
        return labelnames


    def detect_it(self, img):
        # pre-handle image
        self._img = []
        self._img = cv2.resize(img, (self._image_resize, self._image_resize))
        self._img -= self._mean
        self._img = self._img.transpose((2, 0, 1))

        # input
        self._net.blobs['data'].data[...] = self._img

        detections = self._net.forward()['detection_out']
        # Parse the outputs.
        det_label = detections[0,0,:,1]
        det_conf = detections[0,0,:,2]
        det_xmin = detections[0,0,:,3]
        det_ymin = detections[0,0,:,4]
        det_xmax = detections[0,0,:,5]
        det_ymax = detections[0,0,:,6]

        # Get detections with confidence higher than self._threshold.
        top_indices = [i for i, conf in enumerate(det_conf) if conf >= self._threshold]

        self._top_conf = det_conf[top_indices]
        self._top_label_indices = det_label[top_indices].tolist()
        self._top_labels = self.get_labelname()
        self._top_xmin = det_xmin[top_indices]
        self._top_ymin = det_ymin[top_indices]
        self._top_xmax = det_xmax[top_indices]
        self._top_ymax = det_ymax[top_indices]


    def draw_it(self, img, show_rate):
        show_img = cv2.resize(img, (int(img.shape[1]*show_rate), int(img.shape[0]*show_rate)))
        for i in xrange(self._top_conf.shape[0]):
            xmin = int(round(self._top_xmin[i] * img.shape[1]))
            ymin = int(round(self._top_ymin[i] * img.shape[0]))
            xmax = int(round(self._top_xmax[i] * img.shape[1]))
            ymax = int(round(self._top_ymax[i] * img.shape[0]))

            score = self._top_conf[i]
            label = int(self._top_label_indices[i])
            label_name = self._top_labels[i]
            display_txt = '%s: %.2f'%(label_name, score)
            coords = ((int(xmin*show_rate), int(ymin*show_rate)), \
                   (int(xmax*show_rate), int(ymax*show_rate)))

            cv2.rectangle(show_img, coords[0], coords[1],  self._rect_color, thickness=2)
            cv2.putText(show_img, display_txt, (coords[0][0], coords[0][1] - 5), \
                   cv2.FONT_HERSHEY_SIMPLEX, 0.8, self._text_color, thickness=2)

        return show_img


    def detect_video(self, video_file):
        cap = cv2.VideoCapture(video_file)
        if not cap.isOpened():
            print 'can\'t open video file: {}'.format(video_file)
            return

        print 'detecting within file {}'.format(video_file)
        cv2.namedWindow('detect')
        while True:
            ret, img = cap.read()
            if not ret:
                print 'finished!'
                return

            self.detect_it(img)
            img = self.draw_it(img, 0.4)
            cv2.imshow('detect', img)
            cv2.waitKey(20)


if __name__ == '__main__':
    args = list()
    args.append('{}/data/my_voc/labelmap_voc.prototxt'.format(path_))
    args.append('{}/models/VGGNet/MY_VOC/SSD_300x300/deploy.prototxt'.format(path_))
    args.append('{}/models/VGGNet/MY_VOC/SSD_300x300/VGG_MY_VOC_SSD_300x300_iter_80000.caffemodel'.format(path_))
    args.append(True)

    file_ = '/home/xxx/py-faster-rcnn/trainval/test.avi'
    sd = ssd_detect(args)
    sd.detect_video(file_)