参考:https://www.cnblogs.com/dudumiaomiao/p/6556111.html
http://m.blog.csdn.net/hongbin_xu/article/details/77278329
http://www.cnblogs.com/dudumiaomiao/p/6556111.html
http://bealin.github.io/2016/10/23/Caffe%E5%AD%A6%E4%B9%A0%E7%B3%BB%E5%88%97%E2%80%94%E2%80%946%E4%BD%BF%E7%94%A8Faster-RCNN%E8%BF%9B%E8%A1%8C%E7%9B%AE%E6%A0%87%E6%A3%80%E6%B5%8B/
https://www.zhihu.com/question/57091642/answer/165134753
http://blog.csdn.net/qq_14975217/article/details/51523907
http://www.cnblogs.com/louyihang-loves-baiyan/p/4903231.html
http://blog.csdn.net/qq_36219202/article/details/72896203
1. 首先下载pascal voc2007数据集
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
2. 解压
tar -xvf VOCdevkit_08-Jun-2007.tar
tar -xvf VOCdevkit_08-Jun-2007.tar
tar -xvf VOCtest_06-Nov-2007.tar
3.将VOCdevkit文件夹放到py-faster-rcnn/data中
4. 为VOCdevkit创建软连接
ln -s VOCdevkit VOCdevkit2007
5. 准备训练模型的数据
JPEGImages:存放图像数据,jpg/jpeg格式;图片名称6位数,例如000001;图片长宽比0.462-6.828,瘦长的图片剔除;
Annotations :存放目标位置及框大小的xml 文件
制作xml文件的工具https://github.com/tzutalin/labelImg
ImageSets/Main :存放train.txt, trainval.txt, val.txt, test.txt文件,
train.txt : 训练数据集,trainval的70%
trainval.txt : 训练验证集,整个数据量的70%
val.txt :验证集,trainval的30%
test.txt :测试数据集,整个数据量的30%
生成txt文件的sh代码段:
#/usr/bin/env sh
DATATRAIN=VOC2007/
DATATEST=VOC2007/
MY=VOC2007
echo "create train.txt"
rm -f $MY/trainval.txt
for i in 0
do
find $DATATRAIN/JPEGImages -name $i*.jpg | cut -d '/' -f4 | cut -d '.' -f1 | sed "s/$//">>$MY/trainval.txt
done
echo "all done"
生成的txt,如图
后面没有写类别号,(1代表正样本,-1代表负样本。如果是0,应该代表较难分辨出来的样本!)
(https://www.cnblogs.com/dudumiaomiao/p/6556111.html中指出:
VOC的数据集可以做很多的CV任务,比如Object detection, Semantic segementation, Edge detection等,所以Imageset 下有几个子文件夹(Layout, Main, Segementation),修改下Main下的文件 (train.txt, trainval.txt, val.txt, test.txt),里面写上你想要进行任务的图片的编号
将上述你的数据集放在py-faster-rcnn/data/VOCdevkit2007/VOC2007下面,替换原始VOC2007/
JPEGIMages,Imagesets,Annotations)
6. 修改相应代码(使用faster_rcnn_alt_opt.sh训练方法及ZF小型网络,若使用faster_rcnn_end2end.sh训练方法,需要修改faster_rcnn_end2end文件中的是三个文件)
a: py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage1_fast_rcnn_train.pt
name: "ZF"
layer {
name: 'data'
type: 'Python'
top: 'data'
top: 'rois'
top: 'labels'
top: 'bbox_targets'
top: 'bbox_inside_weights'
top: 'bbox_outside_weights'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'
param_str: "'num_classes': 21" //修改训练类别,类别数+1,1表示背景
}
}
layer {
name: "cls_score"
type: "InnerProduct"
bottom: "fc7"
top: "cls_score"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
inner_product_param {
num_output: 2 //修改训练类别,类别数+1,1表示背景
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "bbox_pred"
type: "InnerProduct"
bottom: "fc7"
top: "bbox_pred"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
inner_product_param {
num_output: 84 //(训练的类别数+1)*4
weight_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}
b:修改 py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage1_rpn_train.pt
layer {
name: 'input-data'
type: 'Python'
top: 'data'
top: 'im_info'
top: 'gt_boxes'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'
param_str: "'num_classes': 21" //按训练类别数改,类别数+1
}
}
c:修改 py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage2_fast_rcnn_train.pt
修改层及方式与stage1_fast_rcnn_train.pt相同
d:修改py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage2_rpn_train.pt
修改层及方式与stage1_rpn_train.pt相同
e:修改py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/faster_rcnn_test.pt
layer {
name: "cls_score"
type: "InnerProduct"
bottom: "fc7"
top: "cls_score"
inner_product_param {
num_output: 21 //修改位类别数+1
}
}
layer {
name: "bbox_pred"
type: "InnerProduct"
bottom: "fc7"
top: "bbox_pred"
inner_product_param {
num_output: 84 //修改位(类别数+1)*4
}
}
f: 修改py-faster-rcnn/lib/datasets/pascal_voc.py
class pascal_voc(imdb):
def __init__(self, image_set, year, devkit_path=None):
imdb.__init__(self, 'voc_' + year + '_' + image_set)
self._year = year
self._image_set = image_set
self._devkit_path = self._get_default_path() if devkit_path is None \
else devkit_path
self._data_path = os.path.join(self._devkit_path, 'VOC' + self._year)
self._classes = ('__background__', # always index 0
'自己的类别', '自己的类别'') //修改,写上自己的类别
def _load_pascal_annotation(self, index): //若自己的XML与原VOC2007中的格式不同,根据自己的格式修改此段
"""
Load image and bounding boxes info from XML file in the PASCAL VOC
format.
"""
filename = os.path.join(self._data_path, 'Annotations', index + '.xml')
tree = ET.parse(filename)
objs = tree.findall('object')
if not self.config['use_diff']:
# Exclude the samples labeled as difficult
non_diff_objs = [
obj for obj in objs if int(obj.find('difficult').text) == 0] //例如我的需要修改为non_diff_objs = [obj for obj in objs if obj.find('name').text == 'bocode']
# if len(non_diff_objs) != len(objs):
# print 'Removed {} difficult objects'.format(
# len(objs) - len(non_diff_objs))
objs = non_diff_objs
num_objs = len(objs)
(此步骤可以省去) e: 修改py-faster-rcnn/lib/datasets/voc_eval.py
根据xml修改相应语句
我的xml如图:
相应的修改位:
def parse_rec(filename):
""" Parse a PASCAL VOC xml file """
tree = ET.parse(filename)
objects = []
for obj in tree.findall('object'):
obj_struct = {}
obj_struct['name'] = obj.find('name').text
# obj_struct['pose'] = obj.find('pose').text //注释掉了
# obj_struct['truncated'] = int(obj.find('truncated').text) //注释掉了
# obj_struct['difficult'] = int(obj.find('difficult').text) //注释掉了
bbox = obj.find('bndbox')
obj_struct['bbox'] = [int(bbox.find('xmin').text),
int(bbox.find('ymin').text),
int(bbox.find('xmax').text),
int(bbox.find('ymax').text)]
objects.append(obj_struct)
for imagename in imagenames:
R = [obj for obj in recs[imagename] if obj['name'] == classname]
bbox = np.array([x['bbox'] for x in R])
# difficult = np.array([x['difficult'] for x in R]).astype(np.bool) //注释掉了
difficult = 0; //增加了此句
det = [False] * len(R)
#npos = npos + sum(~difficult)
class_recs[imagename] = {'bbox': bbox,
'difficult': difficult,
'det': det}
if ovmax > ovthresh:
#if not R['difficult'][jmax]: //注释掉了
#if not R['det'][jmax]: //注释掉了
# tp[d] = 1. //注释掉了
# R['det'][jmax] = 1 //注释掉了
# else: //注释掉了
# fp[d] = 1. //注释掉了
fp[d] = 1. //增加了此句
else:
fp[d] = 1.
不太清楚整个代码的意思。
g:修改py-faster-rcnn/lib/datasets/imdb.py
修改如下:
def append_flipped_images(self):
num_images = self.num_images
widths = self._get_widths()
for i in xrange(num_images):
boxes = self.roidb[i]['boxes'].copy()
oldx1 = boxes[:, 0].copy()
oldx2 = boxes[:, 2].copy()
boxes[:, 0] = widths[i] - oldx2 - 1
boxes[:, 2] = widths[i] - oldx1 - 1
for b in range(len(boxes)): //添加此
if boxes[b][2]< boxes[b][0]: //添加此
boxes[b][0] = 0 ///添加此
assert (boxes[:, 2] >= boxes[:, 0]).all()
entry = {'boxes' : boxes,
'gt_overlaps' : self.roidb[i]['gt_overlaps'],
'gt_classes' : self.roidb[i]['gt_classes'],
'flipped' : True}
self.roidb.append(entry)
self._image_index = self._image_index * 2
7. 配置参数及迭代次数在solve及train_faster_rcnn_alt_opt.py中修改
修改 py-faster-rcnn/tools/train_faster_rcnn_alt_opt.py
# Solver for each training stage
solvers = [[net_name, n, 'stage1_rpn_solver60k80k.pt'],
[net_name, n, 'stage1_fast_rcnn_solver30k40k.pt'],
[net_name, n, 'stage2_rpn_solver60k80k.pt'],
[net_name, n, 'stage2_fast_rcnn_solver30k40k.pt']]
solvers = [os.path.join(cfg.MODELS_DIR, *s) for s in solvers]
# Iterations for each training stage
max_iters = [80000, 40000, 80000, 40000] //按需要修改次数
分别对应rpn1,fast1,rpn2,fast2;
若max_iters修改后,'stage1_rpn_solver60k80k.pt,stage1_fast_rcnn_solver30k40k.pt,stage2_rpn_solver60k80k.pt,stage2_fast_rcnn_solver30k40k.pt'中的stepsize进行相应修改。 stepsize小于上面修改的数值,stepsize的意义是经过stepsize次的迭代后降低一次学习率(非必要修改),以及faster_rcnn_alt_opt.sh中(根据实际情况修改) 若使用faster_rcnn_end2end,修改迭代次数在faster_rcnn_end2end.sh(根据实际情况修改)及solver中修改
8. 开始训练
开始训练时,若存在py-faster-rcnn/output文件夹,注意保存里面的模型(未进行过训练无次文件夹)
cd py-faster-rcnn
sudo ./experiments/scripts/faster_rcnn_alt_opt.sh 0 ZF pascal_voc
9. 训练结果
训练结果保存在py-faster-rcnn/output中,名字一般为ZF_faster_rcnn_final.caffemodel
训练日志保存在experiments/logs
10. 问题
a: 训练结束后出现
原因:删掉上一次训练遗留的cache,主要是需要修改py-faster-rcnn/lib/datasets/voc_eval.py中的语句,根据xml修改
11. 测试模型
http://blog.csdn.net/zhy8623080/article/details/73188580