在cuda8.0+faster-rcnn（python版）下使用kitti数据集进行训练

本文链接：https://blog.csdn.net/flztiii/article/details/73881954

前言

本人是萌新一枚，第一次写博客感觉鸭梨山大。最近因为在做车辆检测问题，于是想测试了一下faster-rcnn使用kitti数据会有什么样的效果。结果不用不知道，里面出现了无数的坑（主要是因为环境的不同），为了避免大家遇到同样了问题，于是本人决定将自己的测试过程写下来，供大家参考。当然，本文参考了许多其他大佬的博客文章，最后我会给出链接，大家有兴趣可以看一看。

faster-rcnn编译

faster-rcnn的编译过程我在这就不多说了，网上可以查到许多内容，注明，我这里使用的python版的faster-rcnn，matlab版的没有进行尝试。这里我讲一下怎么处理faster-rcnn与cuda8.0不兼容的问题。对于这个问题，我测试了晚上很多种解决方法，结果有的并没能解决问题，让我花了很多时间。这里我介绍一下一个成功的解决方法。错误如下：

too few arguments to function ‘cudnnStatus_t cudnnSetPooling2dDescriptor(cudnnPoolingDescriptor_t, cudnnPoolingMode_t, cudnnNanPropagation_t, int, int, int, int, int, int)’ pad_h, pad_w, stride_h, stride_w));

我使用的解决方法如下：

1.将./include/caffe/util/cudnn.hpp 换成最新版的caffe里的cudnn的实现，即相应的cudnn.hpp.
2. 将./include/caffe/layers里的，所有以cudnn开头的文件，例如cudnn_conv_layer.hpp。都替换成最新版的caffe里的相应的同名文件。
3.将./src/caffe/layer里的，所有以cudnn开头的文件，例如cudnn_lrn_layer.cu，cudnn_pooling_layer.cpp，cudnn_sigmoid_layer.cu。
都替换成最新版的caffe里的相应的同名文件。

之后根据网上的流程便可以成功编译faster-rcnn。

数据集的准备

Kitti数据集的下载只需要第一个图片集（12G）和标注文件即可。由于faster-rcnn使用的是VOC格式的数据集，所以我们要将kitti数据集的格式改成VOC的格式。这里我先简单说明一下VOC数据集的格式，便于大家对于转换程序的理解。

以VOC2007为例，其中包含了3个文件夹：

1.JPEGImages是用来存放所有的训练图片的

2.ImageSets中有多个子文件夹（Main，Layout，Segmentation），由于我只关心detection的任务（VOC数据集还可以用来做其他任务），所以我只需要考虑其中的Main文件夹，Main文件夹中的内容是一些txt文件，是用来标明训练的时候的train数据集和val数据集。

3.Annotation是用来存放xml文件的，其中xml文件中包含了相对应的bounding box的位置信息，以及种类。xml文件的内容如下：

<?xml version="1.0" ?>
<annotation>
	<folder>VOC2007</folder>					//文件夹
	<filename>000012.jpg</filename>					//xml文件对应的图片的名称
	<source>
		<database>The VOC2007 Database</database>
		<annotation>PASCAL VOC2007</annotation>
	</source>
	<size>								//图片大小信息1242x375
		<width>1242</width>					
		<height>375</height>
		<depth>3</depth>
	</size>
	<object>							//图片中标注的物体
		<name>car</name>					//标注的物体类别
		<difficult>0</difficult>
		<bndbox>						//标注物体的bounding box
		<xmin>662</xmin>
		<ymin>185</ymin>
		<xmax>690</xmax>
		<ymax>205</ymax>
		</bndbox>
		</object>
	<object>
		<name>car</name>
		<difficult>0</difficult>
		<bndbox>
		<xmin>448</xmin>
		<ymin>177</ymin>
		<xmax>481</xmax>
		<ymax>206</ymax>
		</bndbox>
	</object>
</annotation>

现在我们来看一下kitti数据集的格式,我们下载的kitti数据集分为两个压缩文件，其中一个是image里面全是交通场景图，另一个是label里面是关于标注信息的txt文件。txt文件内容如下：

car 0.00 0 -1.57 599.41 156.40 629.75 189.25 2.85 2.63 12.34 0.47 1.49 69.44 -1.56
car 0.00 0 1.85 387.63 181.54 423.81 203.12 1.67 1.87 3.69 -16.53 2.39 58.49 1.57
pedestrian 0.00 3 -1.65 676.60 163.95 688.98 193.93 1.86 0.60 2.02 4.59 1.32 45.84 -1.55

每一行就是一个object，最前方是类别信息，后面是bounding box信息。

了解了两类数据集的格式之后，让我们来看看如何将kitti数据集转化为VOC数据集吧：

首先由于kitti使用的是png图片，而VOC使用的是jpg文件，我们使用图片格式转换工具进行格式转换，之后将jpg图片放入JPEGImages文件夹（自己创建）。

下一步，由于我只需要汽车类car和行人类pedesreian，于是我将kitti中的卡车等其他类别进行了合并代码如下

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

# modify_annotations_txt.py
import glob
import string

txt_list = glob.glob('./Labels/*.txt') # 存储Labels文件夹所有txt文件路径
def show_category(txt_list):
    category_list= []
    for item in txt_list:
        try:
            with open(item) as tdf:
                for each_line in tdf:
                    labeldata = each_line.strip().split(' ') # 去掉前后多余的字符并把其分开
                    category_list.append(labeldata[0]) # 只要第一个字段，即类别
        except IOError as ioerr:
            print('File error:'+str(ioerr))
    print(set(category_list)) # 输出集合

def merge(line):
    each_line=''
    for i in range(len(line)):
        if i!= (len(line)-1):
            each_line=each_line+line[i]+' '
        else:
            each_line=each_line+line[i] # 最后一条字段后面不加空格
    each_line=each_line+'\n'
    return (each_line)

print('before modify categories are:\n')
show_category(txt_list)

for item in txt_list:
    new_txt=[]
    try:
        with open(item, 'r') as r_tdf:
            for each_line in r_tdf:
                labeldata = each_line.strip().split(' ')
                if labeldata[0] in ['Truck','Van','Tram','Car']: # 合并汽车类
                    labeldata[0] = labeldata[0].replace(labeldata[0],'car')
                if labeldata[0] in ['Person_sitting','Cyclist','Pedestrian']: # 合并行人类
                    labeldata[0] = labeldata[0].replace(labeldata[0],'pedestrian')
                if labeldata[0] == 'DontCare': # 忽略Dontcare类
                    continue
                if labeldata[0] == 'Misc': # 忽略Misc类
                    continue
                new_txt.append(merge(labeldata)) # 重新写入新的txt文件
        with open(item,'w+') as w_tdf: # w+是打开原文件将内容删除，另写新内容进去
            for temp in new_txt:
                w_tdf.write(temp)
    except IOError as ioerr:
        print('File error:'+str(ioerr))

print('\nafter modify categories are:\n')
show_category(txt_list)

将本程序和kitti的Labels放在同一目录下执行，可以将Labels中的类别合并为只剩下car类和pedestrian类（这里我使用小写是防止faster-rcnn训练报错）。之后要把txt文件转化为xml文件，在相同目录下创建文件夹Annotations。执行文件代码如下：

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# txt_to_xml.py
# 根据一个给定的XML Schema，使用DOM树的形式从空白文件生成一个XML
from xml.dom.minidom import Document
import cv2
import os

def generate_xml(name,split_lines,img_size,class_ind):
    doc = Document()  # 创建DOM文档对象

    annotation = doc.createElement('annotation')
    doc.appendChild(annotation)

    title = doc.createElement('folder')
    title_text = doc.createTextNode('VOC2007')#这里修改了文件夹名
    title.appendChild(title_text)
    annotation.appendChild(title)

    img_name=name+'.jpg'#要用jpg格式

    title = doc.createElement('filename')
    title_text = doc.createTextNode(img_name)
    title.appendChild(title_text)
    annotation.appendChild(title)

    source = doc.createElement('source')
    annotation.appendChild(source)

    title = doc.createElement('database')
    title_text = doc.createTextNode('The VOC2007 Database')#修改为VOC
    title.appendChild(title_text)
    source.appendChild(title)

    title = doc.createElement('annotation')
    title_text = doc.createTextNode('PASCAL VOC2007')#修改为VOC
    title.appendChild(title_text)
    source.appendChild(title)

    size = doc.createElement('size')
    annotation.appendChild(size)

    title = doc.createElement('width')
    title_text = doc.createTextNode(str(img_size[1]))
    title.appendChild(title_text)
    size.appendChild(title)

    title = doc.createElement('height')
    title_text = doc.createTextNode(str(img_size[0]))
    title.appendChild(title_text)
    size.appendChild(title)

    title = doc.createElement('depth')
    title_text = doc.createTextNode(str(img_size[2]))
    title.appendChild(title_text)
    size.appendChild(title)

    for split_line in split_lines:
        line=split_line.strip().split()
        if line[0] in class_ind:
            object = doc.createElement('object')
            annotation.appendChild(object)

            title = doc.createElement('name')
            title_text = doc.createTextNode(line[0])
            title.appendChild(title_text)
            object.appendChild(title)
	    
   	    title = doc.createElement('difficult')
            title_text = doc.createTextNode('0')
            title.appendChild(title_text)
            object.appendChild(title)

            bndbox = doc.createElement('bndbox')
            object.appendChild(bndbox)
            title = doc.createElement('xmin')
            title_text = doc.createTextNode(str(int(float(line[4]))))
            title.appendChild(title_text)
            bndbox.appendChild(title)
            title = doc.createElement('ymin')
            title_text = doc.createTextNode(str(int(float(line[5]))))
            title.appendChild(title_text)
            bndbox.appendChild(title)
            title = doc.createElement('xmax')
            title_text = doc.createTextNode(str(int(float(line[6]))))
            title.appendChild(title_text)
            bndbox.appendChild(title)
            title = doc.createElement('ymax')
            title_text = doc.createTextNode(str(int(float(line[7]))))
            title.appendChild(title_text)
            bndbox.appendChild(title)

    # 将DOM对象doc写入文件
    f = open('Annotations/'+name+'.xml','w')
    f.write(doc.toprettyxml(indent = ''))
    f.close()

if __name__ == '__main__':
    class_ind=('pedestrian', 'car')#修改为了两类
    cur_dir=os.getcwd()
    labels_dir=os.path.join(cur_dir,'Labels')
    for parent, dirnames, filenames in os.walk(labels_dir): # 分别得到根目录，子目录和根目录下文件   
        for file_name in filenames:
            full_path=os.path.join(parent, file_name) # 获取文件全路径
            #print full_path
            f=open(full_path)
            split_lines = f.readlines()
            name= file_name[:-4] # 后四位是扩展名.txt，只取前面的文件名
            #print name
            img_name=name+'.jpg' 
            img_path=os.path.join('/home/iair339-04/data/KITTIdevkit/KITTI/JPEGImages',img_name) # 路径需要自行修改            
            #print img_path
            img_size=cv2.imread(img_path).shape
            generate_xml(name,split_lines,img_size,class_ind)
print('all txts has converted into xmls')

将程序放在Labels同一级目录下执行，则可以在Annotations文件夹下生成xml文件，之后在同级目录下创建Imagesets文件夹，在文件夹中创建Main，Layout，Segmentation子文件夹。执行文件代码如下（用python3运行。t执行程序过程中，如遇到pdb提示，可按c键，再按enter键）

# create_train_test_txt.py
# encoding:utf-8
import pdb
import glob
import os
import random
import math

def get_sample_value(txt_name, category_name):
    label_path = './Labels/'
    txt_path = label_path + txt_name+'.txt'
    try:
        with open(txt_path) as r_tdf:
            if category_name in r_tdf.read():
                return ' 1'
            else:
                return '-1'
    except IOError as ioerr:
        print('File error:'+str(ioerr))

txt_list_path = glob.glob('./Labels/*.txt')
txt_list = []

for item in txt_list_path:
    temp1,temp2 = os.path.splitext(os.path.basename(item))
    txt_list.append(temp1)
txt_list.sort()
print(txt_list, end = '\n\n')

# 有博客建议train:val:test=8:1:1，先尝试用一下
num_trainval = random.sample(txt_list, math.floor(len(txt_list)*9/10.0)) # 可修改百分比
num_trainval.sort()
print(num_trainval, end = '\n\n')

num_train = random.sample(num_trainval,math.floor(len(num_trainval)*8/9.0)) # 可修改百分比
num_train.sort()
print(num_train, end = '\n\n')

num_val = list(set(num_trainval).difference(set(num_train)))
num_val.sort()
print(num_val, end = '\n\n')

num_test = list(set(txt_list).difference(set(num_trainval)))
num_test.sort()
print(num_test, end = '\n\n')

pdb.set_trace()

Main_path = './ImageSets/Main/'
train_test_name = ['trainval','train','val','test']
category_name = ['Car','Pedestrian']#修改类别

# 循环写trainvl train val test
for item_train_test_name in train_test_name:
    list_name = 'num_'
    list_name += item_train_test_name
    train_test_txt_name = Main_path + item_train_test_name + '.txt' 
    try:
        # 写单个文件
        with open(train_test_txt_name, 'w') as w_tdf:
            # 一行一行写
            for item in eval(list_name):
                w_tdf.write(item+'\n')
        # 循环写Car Pedestrian Cyclist
        for item_category_name in category_name:
            category_txt_name = Main_path + item_category_name + '_' + item_train_test_name + '.txt'
            with open(category_txt_name, 'w') as w_tdf:
                # 一行一行写
                for item in eval(list_name):
                    w_tdf.write(item+' '+ get_sample_value(item, item_category_name)+'\n')
    except IOError as ioerr:
        print('File error:'+str(ioerr))

在Labels同级目录下执行文件，生成Main中的txt文件。至此，数据集的准备结束，我们将准备好的Annotations，JPEGImages，ImageSets文件夹放到如下目录下

python-faster-rcnn/data/VOCdevkit2007/VOC2007

Faster-rcnn训练

先简要介绍一下faster-rcnn的工程目录吧：

caffe-fast-rcnn —> caffe框架
data —> 存放数据，以及读取文件的cache
experiments —>存放配置文件以及运行的log文件,配置文件
lib —> python接口
models —> 三种模型, ZF(S)/VGG1024(M)/VGG16(L)
output —> 输出的model存放的位置，不训练此文件夹没有
tools —> 训练和测试的python文件

faster-rcnn有两种训练方法： Alternative training(alt-opt)和Approximate joint training(end-to-end)，这里我使用的是apt-opt的训练方法，使用到的是tools/train_faster_rcnn_alt_opt.py，感兴趣的可以看一下其的代码，这里我就不多说了。我使用的网络是VGG16，相比于ZF，VGG16可以达到更高的准确性。当然，直接进行训练是不可能的，我们要修改faster-rcnn的网络结构以适应我们car和pedestrian两类的情况，要修改的网络部分如下：

1./py-faster-rcnn/models/pascal_voc/VGG16/faster_rcnn_alt_opt/stage1_fast_rcnn_train.pt文件

第14行

name: "VGG_ILSVRC_16_layers"
layer {
  name: 'data'
  type: 'Python'
  top: 'data'
  top: 'rois'
  top: 'labels'
  top: 'bbox_targets'
  top: 'bbox_inside_weights'
  top: 'bbox_outside_weights'
  python_param {
    module: 'roi_data_layer.layer'
    layer: 'RoIDataLayer'
    param_str: "'num_classes': 3" #此处修改类别
  }
}

第428和第451行

layer {
  name: "cls_score"
  type: "InnerProduct"
  bottom: "fc7"
  top: "cls_score"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 3 #此处修改类别
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "bbox_pred"
  type: "InnerProduct"
  bottom: "fc7"
  top: "bbox_pred"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 12 #此处修改类别
    weight_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

2./py-faster-rcnn/models/pascal_voc/VGG16/faster_rcnn_alt_opt/stage1_rpn_train.pt文件
第11行

name: "VGG_ILSVRC_16_layers"
layer {
  name: 'input-data'
  type: 'Python'
  top: 'data'
  top: 'im_info'
  top: 'gt_boxes'
  python_param {
    module: 'roi_data_layer.layer'
    layer: 'RoIDataLayer'
    param_str: "'num_classes': 3" #此处修改类别
  }
}

3./py-faster-rcnn/models/pascal_voc/VGG16/faster_rcnn_alt_opt/stage2_fast_rcnn_train.pt文件
第14行

name: "VGG_ILSVRC_16_layers"
layer {
  name: 'data'
  type: 'Python'
  top: 'data'
  top: 'rois'
  top: 'labels'
  top: 'bbox_targets'
  top: 'bbox_inside_weights'
  top: 'bbox_outside_weights'
  python_param {
    module: 'roi_data_layer.layer'
    layer: 'RoIDataLayer'
    param_str: "'num_classes': 3" #此处修改类别
  }
}

第380和第399行

layer {
  name: "cls_score"
  type: "InnerProduct"
  bottom: "fc7"
  top: "cls_score"
  param { lr_mult: 1 }
  param { lr_mult: 2 }
  inner_product_param {
    num_output: 3 #此处修改类别
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "bbox_pred"
  type: "InnerProduct"
  bottom: "fc7"
  top: "bbox_pred"
  param { lr_mult: 1 }
  param { lr_mult: 2 }
  inner_product_param {
    num_output: 12 #此处修改类别
    weight_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

4./py-faster-rcnn/models/pascal_voc/VGG16/faster_rcnn_alt_opt/stage2_rpn_train.pt文件
第11行

name: "VGG_ILSVRC_16_layers"
layer {
  name: 'input-data'
  type: 'Python'
  top: 'data'
  top: 'im_info'
  top: 'gt_boxes'
  python_param {
    module: 'roi_data_layer.layer'
    layer: 'RoIDataLayer'
    param_str: "'num_classes': 3"  
  }
}

自此网络结构修改结束。但是为了防止训练结果不收敛（其实是由于后面进行训练的时候卡在阶段一），我们要将训练的学习率调小，调整到0.0001。（试验后得到的参数值）

需要调整的文件是/py-faster-rcnn/models/pascal_voc/VGG16/faster_rcnn_alt_opt目录下的所有的solver文件的

base_lr: 0.0001

之后执行/py-fater-rcnn/data/scripts/fetch_imagenet_model.sh，得到imagenet的caffemodel文件，因为rpn网络的训练是以imagenet为初始值进行训练的。再之后修改py-faster-rcnn/lib/datasets/pascal_voc.py文件的31行，修改为自己的类别，如下：

        self._classes = ('__background__', # always index 0
                         'car', 'pedestrian')

修改py-faster-rcnn/lib/dataset/imdb.py文件，修改102行的append_flipped_images函数为：

    def append_flipped_images(self):
        num_images = self.num_images
        widths = [PIL.Image.open(self.image_path_at(i)).size[0]  
                  for i in xrange(num_images)] 
        for i in xrange(num_images):
            boxes = self.roidb[i]['boxes'].copy()
            oldx1 = boxes[:, 0].copy()
            oldx2 = boxes[:, 2].copy()
            boxes[:, 0] = widths[i] - oldx2 - 1
            boxes[:, 2] = widths[i] - oldx1 - 1
            assert (boxes[:, 2] >= boxes[:, 0]).all()
            entry = {'boxes' : boxes,
                     'gt_overlaps' : self.roidb[i]['gt_overlaps'],
                     'gt_classes' : self.roidb[i]['gt_classes'],
                     'flipped' : True}
            self.roidb.append(entry)
        self._image_index = self._image_index * 2

自此代码修改结束，不过如果不想排错的话，建议先再看本文的排错部分，修改其他代码再进行训练。
接来下先介绍一下如何修改训练超参数（学习率已经在前面改过了，就不再说了），大多数超参数都是在/py-faster-rcnn/models/pascal_voc/VGG16/faster_rcnn_alt_opt目录下的所有的solver文件中进行改动的。只有训练次数不同，训练次数是在/py-faster-rcnn/tools/train_faster_rcnn_alt_opt.py中进行修改的：

第80行

max_iters = [120000, 80000, 120000, 80000]

分别对应rpn第1阶段，fast rcnn第1阶段，rpn第2阶段，fast rcnn第2阶段的迭代次数，不过注意这里的值不能小于上面的solver里面的step_size的大小。在这里我建议大家先把训练次数调小两个数量集，这样排除错误的时候就不会等太长时间了。
接下来就是在py-faster-rcnn目录下打开终端，执行下列命令：

./experiments/scripts/faster_rcnn_alt_opt.sh 0 VGG16 pascal_voc

开始训练，如果在py-faster-rcnn文件夹下出现output文件夹，output文件夹下有final caffemodel则表明训练成功。

训练中遇到了错误及问题

下面是我在训练的过程中所遇到的一些问题以及解决的方法，希望对大家有所帮助。

问题一：

rpn阶段一训练结束后不开始fast-rcnn阶段一的训练

这个问题是因为学习率过大的原因，调小学习率可以解决

问题二：

pb2.text_format.Merge(f.read(), self.solver_param) AttributeError: 'module' object has no attribute 'text_format'

这个问题是因为protobuf的版本不对，解决方法是pip install protobuf==2.6.0

问题三：

训练faster rcnn时出现如下报错：
File "/py-faster-rcnn/tools/../lib/datasets/imdb.py", line 108, in append_flipped_images
assert (boxes[:, 2] >= boxes[:, 0]).all()
AssertionError

这个问题是由于faster rcnn会对Xmin,Ymin,Xmax,Ymax进行减一操作如果Xmin为0，减一后变为65535

解决方法如下

改/py-faster-rcnn/lib/fast_rcnn/config.py的61行，不使图片实现翻转，如下改为：
# Use horizontally-flipped images during training?
__C.TRAIN.USE_FLIPPED = False

问题四：

TypeError: ‘numpy.float64’ object cannot be interpreted as an index

这个错误是/py-faster-rcnn/lib/roi_data_layer下的minibatch.py中的npr.choice引起的（98行到116行），所以需要改成如下所示

    if fg_inds.size > 0:
    	for i in range(0,len(fg_inds)):
            fg_inds[i] = int(fg_inds[i])
        fg_inds = npr.choice(fg_inds, size=int(fg_rois_per_this_image), replace=False)


    # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
    bg_inds = np.where((overlaps < cfg.TRAIN.BG_THRESH_HI) &
                       (overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
    # Compute number of background RoIs to take from this image (guarding
    # against there being fewer than desired)
    bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image
    bg_rois_per_this_image = np.minimum(bg_rois_per_this_image,
                                        bg_inds.size)
    # Sample foreground regions without replacement
    if bg_inds.size > 0:
        for i in range(0,len(bg_inds)):
            bg_inds[i] = int(bg_inds[i])
        bg_inds = npr.choice(bg_inds, size=int(bg_rois_per_this_image), replace=False)

注意有两个npr.choice，所以两个地方都按照如上来改。

问题五：

labels[fg_rois_per_this_image:] = 0
TypeError: slice indices must be integers or None or have an index method

这个错误是由numpy的版本引起的，只要将fg_rois_per_this_image强制转换为int型就可以了
labels[int(fg_rois_per_this_image):] = 0

问题六：bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
TypeError: slice indices must be integers or None or have an __index__ method

解决方法：修改/py-faster-rcnn/lib/rpn/proposal_target_layer.py，转到123行

for ind in inds:
cls = clss[ind]
start = 4 * cls
end = start + 4
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
return bbox_targets, bbox_inside_weights
这里的ind，start，end都是 numpy.int 类型，这种类型的数据不能作为索引，所以必须对其进行强制类型转换，转化结果如下：

    for ind in inds:
        ind = int(ind)
        cls = clss[ind]
        start = int(4 * cls)
        end = int(start + 4)
        bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
        bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
    return bbox_targets, bbox_inside_weights

问题七：

/home/iair339-04/py-faster-rcnn/tools/../lib/rpn/proposal_layer.py:175: RuntimeWarning: invalid value encountered in greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]

解决方法

修改/py-faster-rcnn/lib/datasets/pascal_voc.py第204-207行，修改结果如下：

            x1 = float(bbox.find('xmin').text)
            y1 = float(bbox.find('ymin').text)
            x2 = float(bbox.find('xmax').text) 
            y2 = float(bbox.find('ymax').text)

Faster rcnn测试

接下来是测试部分的代码修改，我使用的tools里的demo.py进行修改来实现模型的测试。首先我们要修改测试的模型文件

/py-faster-rcnn/models/pascal_voc/VGG16/faster_rcnn_alt_opt/faster_rcnn_test.pt文件

第392和第401行

layer {
  name: "cls_score"
  type: "InnerProduct"
  bottom: "fc7"
  top: "cls_score"
  inner_product_param {
    num_output: 3 #修改类别数
  }
}
layer {
  name: "bbox_pred"
  type: "InnerProduct"
  bottom: "fc7"
  top: "bbox_pred"
  inner_product_param {
    num_output: 12 #修改类别数
  }
}

接下来将训练好的final caffemodel放到/py-faster-rcnn/data/faster_rcnn_models目录下，并且把待测试图片放到/py-faster-rcnn/data/demo目录下。修改/py-faster-rcnn/tools/demo.py文件，修改如下：

第27行修改类别

CLASSES = ('__background__',
           'car', 'pedestrian') #此处修改类别

第31行修改模型名称为final caffemodel名称

NETS = {'vgg16': ('VGG16',
                  'kitti4.caffemodel'),#修改model名字
        'zf': ('ZF',
                  'ZF_faster_rcnn_final.caffemodel')}

第141行修改测试图片名称

    im_names = ['1348.png','1562.png','4714.png','5509.png','5512.png','5861.png','12576.png','12924.png',
		'22622.png','23873.png','2726.png','3173.png','8125.png','8853.png','9283.png','11714.png','24424.png',
		'25201.png','25853.png','27651.png']

之后运行demo.py便可以进行测试，在此我并没有把行人检测和车辆检测合并到同一个图片上，感兴趣的可以自己去网上搜索相关资料。下面展示一下我自己训练的模型的detection效果（每张图片测试时间平均为0.1s）