使用tensorflow复现faster-rcnn 一.pascal voc数据集的处理

最新推荐文章于 2021-10-30 10:05:30 发布

QUIPY

最新推荐文章于 2021-10-30 10:05:30 发布

阅读量1.7k

点赞数 2

分类专栏： Tensorflow faster rcnn

本文链接：https://blog.csdn.net/weixin_40446651/article/details/85245672

版权

Tensorflow faster rcnn 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

更新：项目开源github地址：https://github.com/LongJun123456/Faster-rcnn-tensorflow

Faster-rcnn作为经典的两步检测网络，其anchor的概念与思想在之后的很多检测网络中也有应用。像Yolo中bounding box的思想就是借鉴的anchor的思想。并且Faster-rcnn的性能放在今天也还算是比较不错的，在很多领域内都是通过对Faster-rcnn的网络结构进行改造，让其适用不同的情景。所以，不管是从学习的角度还是从工程的角度来说，吃透Faster-rcnn这篇论文所要表达的思想与其具体的实现方式是很有必要的，而不是停留在简单的调参层面。

楼主接下来的一段日子里，将结合具体的代码来讲一讲如何从头开始一步一步复现Faster-rcnn。在复现过程中遇到的很多共性的问题也会有所提及，并探讨问题出现的原因。整个复现过程我将分为 pascal_voc数据集的处理， anchor的制作与处理， rpn网络的搭建与训练，检测网络的搭建与训练，总体网络的测试效果评估六个部分，整体的复现代码在最后会进行分享。第一篇文章就来讲一下如何做网络训练的前期数据集的准备工作，也就是pascal_voc数据集的处理。

Pascal_voc数据集

pascal_voc是检测领域标准的数据集，检测网络性能的好坏基本上都是以pascal数据集为基准进行测试的。接下来对pascal_voc数据集进行一个简单的介绍。首先是数据集的下载地址：

pascal_voc数据集下载地址

可以选择下载2007版的或者2012版的，2012版的信息更丰富，图片数量也更多，但数据集占用的空间也更多。训练集加测试集总共有3.7G，2007相对来说更小一些，训练集加测试集加起来在1G左右。

下载解压后的文件夹格式如上图所示，我们主要使用的是前三个文件夹 Annotations，ImageSets, 和JPEGImages。后面两个Segmentations的文件夹主要是用于分割的，在这里我们不需要使用。

接下来介绍一下需要使用的三个文件夹中的内容。

JPEGImages

JPEGImages文件夹中存放的是数据集中的图片，图片的格式都是jpg格式的。在pascal_voc中，需要区分的物体类别总共有20类，例如说人，猫，狗，汽车，自行车等等。每一张图片中都至少有一个其中类别的物体，如下图所示：

每一张图片的标号既名称都是6位数的数字，例如000030，000032等等。

Annotations

annotions中是一系列的xml文件，每一个xml文件的都与JPEGImages中的图片一一对应，如下图所示：

xml文件中记录的是每张图片的具体信息，例如图片的存放的文件夹、图片的名称、路径、图片的长，宽，图片中包含object的类别信息，坐标信息等，如下图所示：

pascal_voc数据集的处理中，解析xml文件是最重要的一步，在之后会结合代码详细说明如何解析xml文件。

ImageSets

ImageSets文件夹中存放的是一些txt文本文件，如下图所示：

没一个txt文件中，记录的是该类别所使用的图片的名称，例如train.txt里存放的就是训练集所对应的图片名称：

整体的pascal_voc处理的代码如上，将整个pascal_voc数据集的处理过程封装成一个类，用于给主函数调用。

代码解析

class pascal_voc(object):
    def __init__(self, phase, rebuild=False):
        self.devkil_path = os.path.join(cfg.PASCAL_PATH, 'VOCdevkit')   #pascal_voc路径
        self.data_path = os.path.join(self.devkil_path, 'VOC2007')  #pascal_voc 2007路径
        self.cache_path = cfg.CACHE_PATH    #缓存路径
        self.batch_size = cfg.BATCH_SIZE    #batch_size
        self.target_size = cfg.target_size  #图片的最小尺寸
        self.max_size = cfg.max_size    #图片的最大尺寸
        self.classes = cfg.CLASSES  #类别信息  ['background', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus'....]
        self.pixel_means = cfg.PIXEL_MEANS  #背景像素
        self.class_to_ind = dict(zip(self.classes, range(len(self.classes))))   #构造class字典
        self.flipped = cfg.FLIPPED  #图片是否翻转
        self.phase = phase  #ImageSet 的名称
        self.rebuild = rebuild   #是否重新简历缓存
        self.cursor = 0    #当前游标
        self.epoch = 1     #当前的epoch
        #self.gt_labels = None
        self.prepare()
        self.num_gtlabels = len(self.gt_labels)

首先是类的初始化函数，基本上使用config.py文件中的文本信息对类进行初始化，包括pascal的相关路径信息，图片的最大最小尺寸等等。接着就进入self.prepare()函数进行数据集的处理。

self.prepare()和self.load_labels()函数函数

def prepare(self):
        gt_labels = self.load_labels()
        if self.flipped:
            print('Appending horizontally-flipped training examples ...') #{'boxes':boxes, 'gt_classs':gt_classes, 'imname':imname}组成的list
            gt_labels_cp = copy.deepcopy(gt_labels) #很重要
            for idx in range(len(gt_labels_cp)):
                gt_labels_cp[idx]['flipped'] = True
                width_pre = copy.deepcopy(gt_labels_cp[idx]['boxes'][:,0])
                gt_labels_cp[idx]['boxes'][:,0] = gt_labels_cp[idx]['image_size'][0] - gt_labels_cp[idx]['boxes'][:,2]
                gt_labels_cp[idx]['boxes'][:,2] = gt_labels_cp[idx]['image_size'][0] - width_pre
#                gt_labels_cp[idx]['boxes'][:,[0,2]] = gt_labels_cp[idx]['image_size'][0] - gt_labels_cp[idx]['boxes'][:,[0,2]][:,::-1]
            gt_labels += gt_labels_cp
        if self.phase == 'train':
            np.random.shuffle(gt_labels)
        self.gt_labels = gt_labels


def load_labels(self):
        cache_file = os.path.join(
            self.cache_path, 'pascal_' + self.phase + '_gt_labels.pkl')

        if os.path.isfile(cache_file) and not self.rebuild:
            print('Loading gt_labels from: ' + cache_file)
            with open(cache_file, 'rb') as f:
                gt_labels = pickle.load(f)  #从.pkl文件中反序列对象
            return gt_labels

        print('Processing gt_labels from: ' + self.data_path)

        if not os.path.exists(self.cache_path):
            os.makedirs(self.cache_path)

        if self.phase == 'train':
            txtname = os.path.join(
                self.data_path, 'ImageSets', 'Main', 'trainval.txt')
        else:
            txtname = os.path.join(
                self.data_path, 'ImageSets', 'Main', 'val.txt')
            self.flipped = False
        with open(txtname, 'r') as f:
            self.image_index = [x.strip() for x in f.readlines()]

        gt_labels = []
        for index in self.image_index:
            gt_label = self.load_pascal_annotation(index) #groundtruth_roidb 包括objet box坐标信息 以及类别信息(转换成dict后的)
            gt_labels.append(gt_label)
        print('Saving gt_labels to: ' + cache_file)
        with open(cache_file, 'wb') as f:
            pickle.dump(gt_labels, f)
        return gt_labels

在self.prepare()函数中首先调用的是self.load_labels()函数，self.load_labels()函数返回的是一个list，list的元素是dict，每一个dict中包含了一张图片的名称，图中object的类别信息，坐标信息，是否翻转，图片的尺寸信息，图片对应

如果设置了对数据集进行翻转来增强数据集，也就是self.flipped被设置成ture的话就进行相应的翻转处理。

最后使用np.random.shuffle(gt_labels)函数打乱list的顺序，避免在训练过程中出现过拟合的现象，并将list的信息存放到self.gt_labels中供主函数调用。

self.load_labels()函数的作用是加载ground truth的相关信息。进入self.load_labels()，首先判断是否有.pkl缓存文件。一般第一次运行是没有缓存文件的，需要我们自己建立。

如果需要自己建立，则首先找到需要使用的数据集的名称是训练集('train')还是测试集(测试集)

with open(txtname, 'r') as f:
self.image_index = [x.strip() for x in f.readlines()]

逐行读取该数据集对应的ImageSets目录下的txt文件，存放到self.image_index 这个list当中，list的元素就是每一张图片的名称。

之后遍历每一张图片名称：

gt_labels = []
for index in self.image_index:
gt_label = self.load_pascal_annotation(index)
gt_labels.append(gt_label)

self.load_pascal_annotation（）函数返回的是一个dict,dict中包含图片名称对应的xml文件中包含的object 的box坐标信息，以及类别信息。之后把这些dict，添加到gt_labels这个list中去, 并返回。

self.load_pascal_annotation（）函数

    def load_pascal_annotation(self, index):
        """
        Load image and bounding boxes info from XML file in the PASCAL VOC
        format.
        """
        filename = os.path.join(self.data_path, 'Annotations', index + '.xml')
        tree = ET.parse(filename)
        objs = tree.findall('object')
        image_size = tree.find('size')
        size_info = np.zeros((2,), dtype=np.float32)
        size_info[0] = float(image_size.find('width').text)
        size_info[1] = float(image_size.find('height').text)
        num_objs = len(objs) #object的数量
        boxes = np.zeros((num_objs, 4), dtype=np.float32) #boxes 坐标 (num_objs,4)个 dtype=np.uint16
        gt_classes = np.zeros((num_objs), dtype=np.int32) #class 的数量num_objs个 dtype=np.int32 应该是groundtruth中读到的class
        
        for ix, obj in enumerate(objs):
            bbox = obj.find('bndbox')
            # Make pixel indexes 0-based
            x1 = float(bbox.find('xmin').text) - 1
            y1 = float(bbox.find('ymin').text) - 1
            x2 = float(bbox.find('xmax').text) - 1
            y2 = float(bbox.find('ymax').text) - 1
            cls = self.class_to_ind[obj.find('name').text.lower().strip()] #找到class对应的类别信息
            boxes[ix, :] = [x1, y1, x2, y2] #注意boxes是一个np类的矩阵 大小为[num_objs,4]
            gt_classes[ix] = cls #将class信息存入gt_classses中，注意gt_classes也是一个np类的矩阵 大小为[num_objs] 是int值 对应于name
            imname = os.path.join(self.data_path, 'JPEGImages', index + '.jpg')
        return {'boxes':boxes, 'gt_classs':gt_classes, 'imname':imname, 'flipped':False, 'image_size':size_info, 'image_index': index}

在self.load_pascal_annotation(index)函数中，

首先找到该图片名称对应的xml文件：

filename = os.path.join(self.data_path, 'Annotations', index + '.xml')

然后通过python的xml.etree.ElementTree库对xml文件进行解析并读取：

tree = ET.parse(filename)

找到'object'子节点和‘size’子节点，object子节点中存放的是图片中每一个object的类别和坐标信息，‘size’节点中存放的是每一张图片的长宽信息：

objs = tree.findall('object')
image_size = tree.find('size')
size_info = np.zeros((2,), dtype=np.float32)
size_info[0] = float(image_size.find('width').text)
size_info[1] = float(image_size.find('height').text)

遍历该图片中所有的object，并将所有object的角标信息存入boxes这个numpy数组中，将class信息转换成数字后（之前是string类型字符串），存入gt_classes这个numpy数组中：

for ix, obj in enumerate(objs):
bbox = obj.find('bndbox')
# Make pixel indexes 0-based
x1 = float(bbox.find('xmin').text) - 1
y1 = float(bbox.find('ymin').text) - 1
x2 = float(bbox.find('xmax').text) - 1
y2 = float(bbox.find('ymax').text) - 1
cls = self.class_to_ind[obj.find('name').text.lower().strip()] #找到class对应的类别信息
boxes[ix, :] = [x1, y1, x2, y2] #注意boxes是一个np类的矩阵大小为[num_objs,4]
gt_classes[ix] = cls #将class信息存入gt_classses中，注意gt_classes也是一个np类的矩阵大小为[num_objs] 是int值对应于name
imname = os.path.join(self.data_path, 'JPEGImages', index + '.jpg')

最终将这张图片中object的boxes坐标信息，gt_clasess类别信息，图片尺寸 size_info, 图片名称及标号imname和index，打包成一个dict并返回。

self.get()函数

在主函数创建pascal_voc类事例，并且完成类初始化后，在每一步训练时，都要获得这一步训练所需的图片，图片中包含的object的真值信息等。这个功能是由pascal_voc类中 get()类方法实现的：

    def get(self): #在get中完成 self.epoch+1的操作
        count = 0
        tf_blob = {}
        assert self.batch_size == 1, "only support single batch" 
        while count < self.batch_size:
            imname = self.gt_labels[self.cursor]['imname']
            flipped = self.gt_labels[self.cursor]['flipped']
            image = self.image_read(imname, flipped=flipped)
            image, image_scale = self.prep_im_for_blob(image, self.pixel_means, self.target_size, self.max_size)#resize后的image
            image = np.reshape(image, (self.batch_size, image.shape[0], image.shape[1], 3)) #将image 转化成tensorflow输入的形式
            gt_box = self.gt_labels[self.cursor]['boxes'] * image_scale #将gt_box sclae与scale相乘 boxes.shape=[num_obj,4]
            gt_cls = self.gt_labels[self.cursor]['gt_classs']
            count += 1
            self.cursor += 1
            if self.cursor >= len(self.gt_labels):
                np.random.shuffle(self.gt_labels)
                self.cursor = 0
                self.epoch += 1
        tf_blob = {'image':image, 'scale':image_scale, 'cls':gt_cls, 'box': gt_box, 'imname': imname}
        return tf_blob #返回的image.shape=[batch,size,size,3] image_scale, gt_box.shape=[num_objs,4]

get()函数每一次返回1个dict，包含该步训练所需的图片（‘image’)，图片放缩的尺寸（'scale'），图片中的object的类别，坐标信息，和图片名称。

在这里，默认每一步训练的batch_size为1。为什么输入给rpn网络的batch_size只能为1，在接下来的几章中会有说明。imname、flipped、gt_box、gt_cls 可以直接由之前prepare()中创建的self.gt_labels得到。

同时gt_box需要和图片数据一样进行同样尺寸的scale缩放。

self.image_read()函数返回opencv读到的图片mat(在python中为numpy数组格式)

self.prep_im_for_blob()函数主要实现图片的resize，将图片的尺寸缩放成600-1000之间。

最后在训练完一个batch后，将self.gt_labels再次打乱，以避免过拟合的情况出现：

if self.cursor >= len(self.gt_labels):
np.random.shuffle(self.gt_labels)
self.cursor = 0
self.epoch += 1

self.image_read()函数

    def image_read(self, imname, flipped=False):
        image = cv2.imread(imname)  #opencv 中默认图片色彩格式为BGR
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).astype(np.float32) #将图片转成RGB格式
        if flipped:
            image = image[:, ::-1, :]
        return image

self.image_read()函数主要是使用opencv.imread()读取inname所指定的JPEGimages目录下的.jpg图片，并返回BGR通道的numpy格式数据：

image = cv2.imread(imname) #opencv 中默认图片色彩格式为BGR
由于在卷积核提前特征的过程中，预训练的vgg16模型默认处理的图片格式是RGB的，所以在这里要将图片的色彩通道由BGR转换成RGB通道：

image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).astype(np.float32) #将图片转成RGB格式

最后如果flipped标志位设置成，TURE，还有进行相应的图片翻转操作。

self.prep_im_for_blob函数

    def prep_im_for_blob(self, im, pixel_means, target_size, max_size): #传入image 背景 600 1000
            im = im.astype(np.float32, copy=False)
            im -= pixel_means #去掉背景
            im_shape = im.shape
            im_size_min = np.min(im_shape[0:2])
            im_size_max = np.max(im_shape[0:2])
            im_scale = float(target_size) / float(im_size_min) #600/最短边
            # Prevent the biggest axis from being more than MAX_SIZE
            if np.round(im_scale * im_size_max) > max_size:
                im_scale = float(max_size) / float(im_size_max)
            im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale,interpolation=cv2.INTER_LINEAR)
            return im, im_scale #返回im 和 im_scale

self.prep_im_for_blob函数主要是对读取到的图片数据进行处理，例如将图片进行去背景处理，将图片的尺寸缩放到600-1000之间。返回缩放后的图片数据，以及缩放比。

以上就是pascal voc数据集处理的全部过程。在这里还写了个简单的测试代码：

if __name__ == '__main__':
    pascal = pascal_voc('train')
    tf_blob = pascal.get()
    #print (len(pascal.gt_labels))

创建了一个pascal_voc类实例，并调用get()类方法。

得到的一个dict信息如上图所示，box是（2，4）shape的numpy矩阵，cls是（2）shape的numpy矩阵，包含该图片中的两个object的坐标和类别信息，image是一个800*600大小的图片数据，imname为该图片的名称也就是路径，scale为该图片处理过程中的缩放比。

至此完成了训练网络的前期准备工作，也就是训练数据集的准备这一块。下一章将讲述如何生成Faster RCNN中的anchor以及相关的处理过程。

Pascal voc数据集的处理代码

import os
import xml.etree.ElementTree as ET
import numpy as np
import cv2
import pickle
import copy
import config as cfg


class pascal_voc(object):
    def __init__(self, phase, rebuild=False):
        self.devkil_path = os.path.join(cfg.PASCAL_PATH, 'VOCdevkit')   #pasval_voc路径
        self.data_path = os.path.join(self.devkil_path, 'VOC2007')  #pascal_voc 2007路径
        self.cache_path = cfg.CACHE_PATH    #缓存路径
        self.batch_size = cfg.BATCH_SIZE    #batch_size
        self.target_size = cfg.target_size  #图片的最小尺寸
        self.max_size = cfg.max_size    #图片的最大尺寸
        self.classes = cfg.CLASSES  #类别信息  ['background', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus'....]
        self.pixel_means = cfg.PIXEL_MEANS  #背景像素
        self.class_to_ind = dict(zip(self.classes, range(len(self.classes))))   #构造class字典
        self.flipped = cfg.FLIPPED  #图片是否翻转
        self.phase = phase  #ImageSet 的名称
        self.rebuild = rebuild   #是否重新简历缓存
        self.cursor = 0    #当前游标
        self.epoch = 1     #当前的epoch
        #self.gt_labels = None
        self.prepare()
        self.num_gtlabels = len(self.gt_labels)
    
    def image_read(self, imname, flipped=False):
        image = cv2.imread(imname)  #opencv 中默认图片色彩格式为BGR
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).astype(np.float32) #将图片转成RGB格式
        if flipped:
            image = image[:, ::-1, :]
        return image
    
    def get(self): #在get中完成 self.epoch+1的操作
        #images = np.zeros((self.batch_size, self.image_size, self.image_size, 3))
        #gt_box = np.zeros((self.batch_size, 4), dtype=np.uint16)
        #gt_cls = np.zeros((num_objs), dtype=np.int32)
        count = 0
        tf_blob = {}
        assert self.batch_size == 1, "only support single batch" 
        while count < self.batch_size:
            imname = self.gt_labels[self.cursor]['imname']
            flipped = self.gt_labels[self.cursor]['flipped']
            image = self.image_read(imname, flipped=flipped)
            image, image_scale = self.prep_im_for_blob(image, self.pixel_means, self.target_size, self.max_size)#resize后的image
            image = np.reshape(image, (self.batch_size, image.shape[0], image.shape[1], 3)) #将image 转化成tensorflow输入的形式
            gt_box = self.gt_labels[self.cursor]['boxes'] * image_scale #将gt_box sclae与scale相乘 boxes.shape=[num_obj,4]
            gt_cls = self.gt_labels[self.cursor]['gt_classs']
            count += 1
            self.cursor += 1
            if self.cursor >= len(self.gt_labels):
                np.random.shuffle(self.gt_labels)
                self.cursor = 0
                self.epoch += 1
        tf_blob = {'image':image, 'scale':image_scale, 'cls':gt_cls, 'box': gt_box, 'imname': imname}
        return tf_blob #返回的image.shape=[batch,size,size,3] image_scale, gt_box.shape=[num_objs,4]

    

    def prepare(self):
        gt_labels = self.load_labels()
        if self.flipped:
            print('Appending horizontally-flipped training examples ...') #{'boxes':boxes, 'gt_classs':gt_classes, 'imname':imname}组成的list
            gt_labels_cp = copy.deepcopy(gt_labels) #很重要
            for idx in range(len(gt_labels_cp)):
                gt_labels_cp[idx]['flipped'] = True
                width_pre = copy.deepcopy(gt_labels_cp[idx]['boxes'][:,0])
                gt_labels_cp[idx]['boxes'][:,0] = gt_labels_cp[idx]['image_size'][0] - gt_labels_cp[idx]['boxes'][:,2]
                gt_labels_cp[idx]['boxes'][:,2] = gt_labels_cp[idx]['image_size'][0] - width_pre
#                gt_labels_cp[idx]['boxes'][:,[0,2]] = gt_labels_cp[idx]['image_size'][0] - gt_labels_cp[idx]['boxes'][:,[0,2]][:,::-1]
            gt_labels += gt_labels_cp
        if self.phase == 'train':
            np.random.shuffle(gt_labels)
        self.gt_labels = gt_labels
        #return gt_labels

    def load_labels(self):
        cache_file = os.path.join(
            self.cache_path, 'pascal_' + self.phase + '_gt_labels.pkl')

        if os.path.isfile(cache_file) and not self.rebuild:
            print('Loading gt_labels from: ' + cache_file)
            with open(cache_file, 'rb') as f:
                gt_labels = pickle.load(f)  #从.pkl文件中反序列对象
            return gt_labels

        print('Processing gt_labels from: ' + self.data_path)

        if not os.path.exists(self.cache_path):
            os.makedirs(self.cache_path)

        if self.phase == 'train':
            txtname = os.path.join(
                self.data_path, 'ImageSets', 'Main', 'trainval.txt')
        else:
            txtname = os.path.join(
                self.data_path, 'ImageSets', 'Main', 'val.txt')
            self.flipped = False
        with open(txtname, 'r') as f:
            self.image_index = [x.strip() for x in f.readlines()]

        gt_labels = []
        for index in self.image_index:
            gt_label = self.load_pascal_annotation(index) #groundtruth_roidb 包括objet box坐标信息 以及类别信息(转换成dict后的)
            gt_labels.append(gt_label)
        print('Saving gt_labels to: ' + cache_file)
        with open(cache_file, 'wb') as f:
            pickle.dump(gt_labels, f)
        return gt_labels

    def load_pascal_annotation(self, index):
        """
        Load image and bounding boxes info from XML file in the PASCAL VOC
        format.
        """
        filename = os.path.join(self.data_path, 'Annotations', index + '.xml')
        tree = ET.parse(filename)
        objs = tree.findall('object')
        image_size = tree.find('size')
        size_info = np.zeros((2,), dtype=np.float32)
        size_info[0] = float(image_size.find('width').text)
        size_info[1] = float(image_size.find('height').text)
        num_objs = len(objs) #object的数量
        boxes = np.zeros((num_objs, 4), dtype=np.float32) #boxes 坐标 (num_objs,4)个 dtype=np.uint16
        gt_classes = np.zeros((num_objs), dtype=np.int32) #class 的数量num_objs个 dtype=np.int32 应该是groundtruth中读到的class
        
        for ix, obj in enumerate(objs):
            bbox = obj.find('bndbox')
            # Make pixel indexes 0-based
            x1 = float(bbox.find('xmin').text) - 1
            y1 = float(bbox.find('ymin').text) - 1
            x2 = float(bbox.find('xmax').text) - 1
            y2 = float(bbox.find('ymax').text) - 1
            cls = self.class_to_ind[obj.find('name').text.lower().strip()] #找到class对应的类别信息
            boxes[ix, :] = [x1, y1, x2, y2] #注意boxes是一个np类的矩阵 大小为[num_objs,4]
            gt_classes[ix] = cls #将class信息存入gt_classses中，注意gt_classes也是一个np类的矩阵 大小为[num_objs] 是int值 对应于name
            imname = os.path.join(self.data_path, 'JPEGImages', index + '.jpg')
        return {'boxes':boxes, 'gt_classs':gt_classes, 'imname':imname, 'flipped':False, 'image_size':size_info, 'image_index': index}
    
    def prep_im_for_blob(self, im, pixel_means, target_size, max_size): #传入image 背景 600 1000
            im = im.astype(np.float32, copy=False)
            im -= pixel_means #去掉背景
            im_shape = im.shape
            im_size_min = np.min(im_shape[0:2])
            im_size_max = np.max(im_shape[0:2])
            im_scale = float(target_size) / float(im_size_min) #600/最短边
            # Prevent the biggest axis from being more than MAX_SIZE
            if np.round(im_scale * im_size_max) > max_size:
                im_scale = float(max_size) / float(im_size_max)
            im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale,interpolation=cv2.INTER_LINEAR)
            return im, im_scale #返回im 和 im_scale

    def voc_ap(self, rec, prec): #使用10年之后的pascal_voc的map计算方式
        mrec = np.concatenate(([0.], rec, [1.]))
        mpre = np.concatenate(([0.], prec, [0.]))
        for i in range(mpre.size - 1, 0, -1):
            mpre[i-1] = np.maximum(mpre[i-1], mpre[i])
        
        i = np.where(mrec[1:] != mrec[:-1])[0] #取所有与取倒数第一个之间
        ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) #计算ap
        
        return ap


if __name__ == '__main__':
    pascal = pascal_voc('train')
    tf_blob = pascal.get()
    #print (len(pascal.gt_labels))

QUIPY

关注

2
点赞
踩
19

收藏

觉得还不错? 一键收藏
3
评论
使用tensorflow复现faster-rcnn 一.pascal voc数据集的处理

更新：项目开源github地址：https://github.com/LongJun123456/Faster-rcnn-tensorflowFaster-rcnn作为经典的两步检测网络，其anchor的概念与思想在之后的很多检测网络中也有应用。像Yolo中bounding box的思想就是借鉴的anchor的思想。并且Faster-rcnn的性能放在今天也还算是比较不错的，在很多领域内都是通...
复制链接

扫一扫