一,配置py-faster-rcnn
网上配置py-faster-rcnn的博客教程一大堆,我这里就不重复了。这里列出一篇写得比较好的博客:如何使用py-faster-rcnn训练自己的数据集
还有一些其他参考博客:
第一篇
第二篇
本文主要是记录一下在使用py-faster-rcnn过程中碰到的许多问题,把解决方法记录下来,方便下一次使用时查阅,同时也希望能对碰到相同问题的人提供一点参考。
二,制作自己的数据集
为了尽量少改动代码,最方便的方式是按照源代码中的PASCAL VOC数据集的放置格式,即在…/py-faster-rcnn/data/文件夹下,新建一个名为VOCdevkit2007,文件目录结构如下:
VOCdevkit2007
├── results
│ └── VOC2007
│ └── Main #空目录,用来存放test集结果
├── VOC2007
│ ├── Annotations #标定文件xml
│ ├── ImageSets
│ │ └── Main #txt文件存放至此
│ └── JPEGImages #jpg图像存放至此
└── VOCcode
记住一定要添加文件路径VOCdevkit2007/results/VOC2007/Main/,不然会报错。
参考:https://blog.csdn.net/yeler082/article/details/81036918
三,在自己的数据集上训练
需要特别注意的是,faster rcnn有两种训练方式(alt 和 end2end),它们分别对应了/experiments/scripts/下的两个脚本:
├── faster rcnn两种种训练方式(假设用ZF作为特征提取,采用pascal_voc格式的数据集,并选择gpu 0 训练)
│ └── Alternative training(alt-opt): ./experiments/scripts/faster_rcnn_alt_opt.sh 0 ZF pascal_voc
│ └── Approximate joint training(end-to-end): ./experiments/scripts/faster_rcnn_end2end.sh 0 ZF pascal_voc
若是用的vgg16 则需要修改对应的模型文件(train.prototxt,test.prototxt),并在训练时指定模型为VGG16。
四,训练报错总结
在训练过程中,出现了以下错误:
1. 报错:
cls = self._class_to_ind[obj.find('name').text.lower().strip()] KeyError: 'water stain'
解决方式:参看这篇
- 首先核对tf-faster-rcnn/lib/datasets/pascal_voc.py文件中self._class内容
objs = diff_objs (or non_diff_objs)
- 并在下方添加代码
cls_objs = [obj for obj in objs if obj.find('name').text in self._classes]
objs = cls_objs
2. 报错
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "./tools/train_faster_rcnn_alt_opt.py", line 130, in train_rpn
max_iters=max_iters)
File "/home/ard/py-faster-rcnn-ori/tools/../lib/fast_rcnn/train.py", line 157, in train_net
pretrained_model=pretrained_model)
File "/home/ard/py-faster-rcnn-ori/tools/../lib/fast_rcnn/train.py", line 51, in __init__
pb2.text_format.Merge(f.read(), self.solver_param)
AttributeError: 'module' object has no attribute 'text_format'
解决方式参考这篇
在文件./lib/fast_rcnn/train.py增加一行:
import google.protobuf.text_format
即可解决问题
3.报错
assert(boxes[:,2]>=boxes[:,0]).all()
解决方式:参考这里
报错:
File "./tools/train_net.py", line 85, in
roidb = get_training_roidb(imdb)
File "/usr/local/fast-rcnn/tools/../lib/fast_rcnn/train.py", line 111, in get_training_roidb
rdl_roidb.prepare_roidb(imdb)
File "/usr/local/fast-rcnn/tools/../lib/roi_data_layer/roidb.py", line 23, in prepare_roidb
roidb[i]['image'] = imdb.image_path_at(i)
IndexError: list index out of range
解决办法:
删除fast-rcnn-master/data/cache/ 文件夹下的.pkl文件,或者改名备份,重新训练即可。
4.报错
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "./tools/train_faster_rcnn_alt_opt.py", line 196, in train_fast_rcnn
max_iters=max_iters)
File "/home/ard/py-faster-rcnn-ori/tools/../lib/fast_rcnn/train.py", line 162, in train_net
model_paths = sw.train_model(max_iters)
File "/home/ard/py-faster-rcnn-ori/tools/../lib/fast_rcnn/train.py", line 103, in train_model
self.solver.step(1)
File "/home/ard/py-faster-rcnn-ori/tools/../lib/roi_data_layer/layer.py", line 144, in forward
blobs = self._get_next_minibatch()
File "/home/ard/py-faster-rcnn-ori/tools/../lib/roi_data_layer/layer.py", line 63, in _get_next_minibatch
return get_minibatch(minibatch_db, self._num_classes)
File "/home/ard/py-faster-rcnn-ori/tools/../lib/roi_data_layer/minibatch.py", line 55, in get_minibatch
num_classes)
File "/home/ard/py-faster-rcnn-ori/tools/../lib/roi_data_layer/minibatch.py", line 100, in _sample_rois
fg_inds, size=fg_rois_per_this_image, replace=False)
File "mtrand.pyx", line 1192, in mtrand.RandomState.choice
TypeError: 'numpy.float64' object cannot be interpreted as an index
解决方法:
- /home/xxx/py-faster-rcnn/lib/roi_data_layer/minibatch.py
将第26行:fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)
改为:fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image).astype(np.int)
- /home/xxx/py-faster-rcnn/lib/datasets/ds_utils.py
将第12行:hashes = np.round(boxes * scale).dot(v)
改为:hashes = np.round(boxes * scale).dot(v).astype(np.int)
- /home/xxx/py-faster-rcnn/lib/fast_rcnn/test.py
将第129行: hashes = np.round(blobs['rois'] * cfg.DEDUP_BOXES).dot(v)
改为: hashes = np.round(blobs['rois'] * cfg.DEDUP_BOXES).dot(v).astype(np.int)
- /home/xxx/py-faster-rcnn/lib/rpn/proposal_target_layer.py
将第60行:fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)
改为:fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image).astype(np.int)
5.第四步改了之后继续运行,又报了以下错误(部分):
File "/home/ard/py-faster-rcnn-ori/tools/../lib/roi_data_layer/minibatch.py", line 177, in _get_bbox_regression_labels
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
TypeError: slice indices must be integers or None or have an __index__ method
解决方法:
修改 /home/XXX/py-faster-rcnn/lib/rpn/proposal_target_layer.py,123行
for ind in inds:
cls = clss[ind]
start = 4 * cls
end = start + 4
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
return bbox_targets, bbox_inside_weights
这里的ind,start,end都是 numpy.int 类型,这种类型的数据不能作为索引,所以必须对其进行强制类型转换,转化结果如下:
for ind in inds:
ind = int(ind)
cls = clss[ind]
start = int(4 * cos)
end = int(start + 4)
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
return bbox_targets, bbox_inside_weight
6.报错:
IndentationError: unindent does not match any outer indentation level
这个肯定是缩进不正确导致的,定位到对应文件的对应行,检查:
(1)是否有的缩进用了Tab,而有的用了空格,要统一;
(2)若是用的空格,检查空格数是否一样;
参考:https://www.cnblogs.com/heimanba/p/3783022.html
7.报错:
IOError: [Errno 2] No such file or directory: '/home/wangzhan/py-faster-rcnn-
master/data/VOCdevkit2007/results/VOC2007/Main/comp4_e3ae962b-98ad-418e-a396-bc6fa4d1d62f_det_test_kiss.txt'
原因分析:data/VOCdevkit2007/路径下没有VOC2007/Main/文件夹,所以只要在data/VOCdevkit2007下添加好这两个路径就好了。(这似乎是在分阶段训练时才存在这个问题,end2end训练方式好像没有这个问题)
参考:https://www.cnblogs.com/zhengmeisong/p/9102059.html
8.报错:
smooth_L1_loss_layer.cpp:28] Check failed: bottom[0]->channels() == bottom[1]->channels() (4 vs. 8)
这是在Creating Layer loss_bbox
时报的错,错误是bottom[0]->channels() == bottom[1]->channels()
,所以判断可能是输入的通道不匹配导致的。网上查说是train.protxt中num_classes,cls_score和bbox_pred的num_output没改对(参考:这里)。 仔细检查train.protxt后发现确实有一处错误,本来应该是8=(4*(2+1)),但却错误地写成了4,改过来即可:
改完之后,便可以顺利地训练了。
五,利用训练好的模型测试
1,对测试集的图片进行测试,主要是看测试集的图片测试后的mAp是多少;
命令:
./experiments/scripts/faster_rcnn_end2end_test.sh 0 VGG16 pascal_voc
2.对一些demo图片进行测试,参考:这篇文章的demo部分;
命令:
python tools/demo.py