Faster-rcnn 源码学习(一)

Faster-rcnn 源码学习(一)

由于工作上的需要,需使用fasterrcnn,由于fasterrcnn的细节较多,因此决定看一下源码,看的过程中主要参考网上的内容,为了遗忘,做一下记录,会是一系列文章,边整理边发布。

参考学习代码连接:https://github.com/rbgirshick/py-faster-rcnn
Faster-RCNN利用代码实现讲解算法原理
Faster-rcnn环境搭建与训练自己的数据

我们跟着Faster rcnn的训练流程来一步一步梳理,进入tools\train_faster_rcnn_alt_opt.py中:

从main函数入口开始,如下:

if __name__ == '__main__':
    args = parse_args() # 获取训练命令行的解析参数

    print('Called with args:')
    print(args)

    if args.cfg_file is not None:
        cfg_from_file(args.cfg_file)
    if args.set_cfgs is not None:
        cfg_from_list(args.set_cfgs)
    cfg.GPU_ID = args.gpu_id

    # --------------------------------------------------------------------------
    # Pycaffe doesn't reliably free GPU memory when instantiated nets are
    # discarded (e.g. "del net" in Python code). To work around this issue, each
    # training stage is executed in a separate process using
    # multiprocessing.Process.
    # --------------------------------------------------------------------------

    # queue for communicated results between processes
    # 设置一个多线程对象交换的方式
    mp_queue = mp.Queue()
    # solves, iters, etc. for each training stage
    solvers, max_iters, rpn_test_prototxt = get_solvers(args.net_name)

上述代码中首先对终端中的命令行进行解析,获取相关的命令参数;然后利用mp.Queue()创建一个多线程的对象,再利用get_solvers()获得solvers等信息;然后就开始了论文中的“四步训练”:

使用代码如下:

## 第一步,训练RPN网络
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
    print 'Stage 1 RPN, init from ImageNet model'
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'

    ###利用train_rpn训练RPN网络
    cfg.TRAIN.SNAPSHOT_INFIX = 'stage1'
    mp_kwargs = dict(
            queue=mp_queue,
            imdb_name=args.imdb_name,
            init_model=args.pretrained_model,
            solver=solvers[0],
            max_iters=max_iters[0],
            cfg=cfg)
    p = mp.Process(target=train_rpn, kwargs=mp_kwargs) ## 使用train_rpn函数训练RPN网络,设置进程对象
    p.start() ## 开始训练RPN
    rpn_stage1_out = mp_queue.get() #mp_queue是进程间用于通讯的数据结构,这里使用get()获取线程中的数据
    p.join() ## 等待子进程结束后再向下进行

    ## 第二步,主要是利用第一步训练好的RPN网络来生成proposal
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
    print 'Stage 1 RPN, generate proposals'
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'

    mp_kwargs = dict(
            queue=mp_queue,
            imdb_name=args.imdb_name,
            rpn_model_path=str(rpn_stage1_out['model_path']),
            cfg=cfg,
            rpn_test_prototxt=rpn_test_prototxt)
    p = mp.Process(target=rpn_generate, kwargs=mp_kwargs) #rpn_generate()产生proposal
    p.start() #开始生成proposal
    rpn_stage1_out['proposal_path'] = mp_queue.get()['proposal_path']
    p.join()

    #第三步:训练fast rcnn 网络
    #使用RPN产生的proposal来训练网络的另一半fast_rcnn,rpn_file即使上一步中保存的proposal
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
    print 'Stage 1 Fast R-CNN using RPN proposals, init from ImageNet model'
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'

    cfg.TRAIN.SNAPSHOT_INFIX = 'stage1'
    mp_kwargs = dict(
            queue=mp_queue,
            imdb_name=args.imdb_name,
            init_model=args.pretrained_model,
            solver=solvers[1],
            max_iters=max_iters[1],
            cfg=cfg,
            rpn_file=rpn_stage1_out['proposal_path'])
    p = mp.Process(target=train_fast_rcnn, kwargs=mp_kwargs)
    p.start()
    fast_rcnn_stage1_out = mp_queue.get()
    p.join()

    # 第四步,用第三步产生的fast rcnn 预训练模型的权值初始化RPN,这一次conv层参数是不动的,相当于微调RPN网络
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
    print 'Stage 2 RPN, init from stage 1 Fast R-CNN model'
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'

    cfg.TRAIN.SNAPSHOT_INFIX = 'stage2'
    mp_kwargs = dict(
            queue=mp_queue,
            imdb_name=args.imdb_name,
            init_model=str(fast_rcnn_stage1_out['model_path']),
            solver=solvers[2],
            max_iters=max_iters[2],
            cfg=cfg)
    p = mp.Process(target=train_rpn, kwargs=mp_kwargs)
    p.start()
    rpn_stage2_out = mp_queue.get()
    p.join()

    #第五步,基于第四步训练得到的RPN网络产生proposal,方法和前面一致
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
    print 'Stage 2 RPN, generate proposals'
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'

    mp_kwargs = dict(
            queue=mp_queue,
            imdb_name=args.imdb_name,
            rpn_model_path=str(rpn_stage2_out['model_path']),
            cfg=cfg,
            rpn_test_prototxt=rpn_test_prototxt)
    p = mp.Process(target=rpn_generate, kwargs=mp_kwargs)
    p.start()
    rpn_stage2_out['proposal_path'] = mp_queue.get()['proposal_path']
    p.join()

    #第六步:训练最终的模型,这一步,conv层和RPN层参数都是固定的,只是训练了rcnn层(全连接层)
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
    print 'Stage 2 Fast R-CNN, init from stage 2 RPN R-CNN model'
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'

    cfg.TRAIN.SNAPSHOT_INFIX = 'stage2'
    mp_kwargs = dict(
            queue=mp_queue,
            imdb_name=args.imdb_name,
            init_model=str(rpn_stage2_out['model_path']),
            solver=solvers[3],
            max_iters=max_iters[3],
            cfg=cfg,
            rpn_file=rpn_stage2_out['proposal_path'])
    p = mp.Process(target=train_fast_rcnn, kwargs=mp_kwargs)
    p.start()
    fast_rcnn_stage2_out = mp_queue.get()
    p.join()

    # Create final model (just a copy of the last stage)
    final_path = os.path.join(
            os.path.dirname(fast_rcnn_stage2_out['model_path']),
            args.net_name + '_faster_rcnn_final.caffemodel')
    print 'cp {} -> {}'.format(
            fast_rcnn_stage2_out['model_path'], final_path)
    shutil.copy(fast_rcnn_stage2_out['model_path'], final_path)
    print 'Final model: {}'.format(final_path)

从第一步开始分析理解,第一步是训练RPN网络,网络结构如下:

第一步代码如下:

## 第一步,训练RPN网络
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
    print 'Stage 1 RPN, init from ImageNet model'
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'

    ###利用train_rpn训练RPN网络
    cfg.TRAIN.SNAPSHOT_INFIX = 'stage1'
    mp_kwargs = dict(
            queue=mp_queue,
            imdb_name=args.imdb_name,
            init_model=args.pretrained_model,
            solver=solvers[0],
            max_iters=max_iters[0],
            cfg=cfg)
    p = mp.Process(target=train_rpn, kwargs=mp_kwargs) ## 使用train_rpn函数训练RPN网络,设置进程对象
    p.start() ## 开始训练RPN
    rpn_stage1_out = mp_queue.get() #mp_queue是进程间用于通讯的数据结构,这里使用get()获取线程中的数据
    p.join() ## 等待子进程结束后再向下进行

上面代码先开始子进程训练RPN,训练函数是train_rpn(),那么我们就进入这个函数一探究竟,train_rpn()函数就在train_faster_rcnn_alt_opt.py文件中,代码如下:

ef train_rpn(queue=None, imdb_name=None, init_model=None, solver=None,
              max_iters=None, cfg=None):
    """Train a Region Proposal Network in a separate training process.
    """

    # Not using any proposals, just ground-truth boxes
    cfg.TRAIN.HAS_RPN = True
    cfg.TRAIN.BBOX_REG = False  # applies only to Fast R-CNN bbox regression
    cfg.TRAIN.PROPOSAL_METHOD = 'gt'
    cfg.TRAIN.IMS_PER_BATCH = 1
    print 'Init model: {}'.format(init_model)
    print('Using config:')
    pprint.pprint(cfg)

    import caffe
    _init_caffe(cfg)

    roidb, imdb = get_roidb(imdb_name) #获取roidb和imdb格式的训练数据
    print 'roidb len: {}'.format(len(roidb))
    output_dir = get_output_dir(imdb)
    print 'Output will be saved to `{:s}`'.format(output_dir)

    model_paths = train_net(solver, roidb, output_dir,
                            pretrained_model=init_model,
                            max_iters=max_iters)
    # Cleanup all but the final model
    for i in model_paths[:-1]:
        os.remove(i)
    rpn_model_path = model_paths[-1]
    # Send final model path through the multiprocessing queue
    queue.put({'model_path': rpn_model_path})

这里首先使用cfg设置训练时的一些设置参数(cfg类是一个字典,其定义在config.py中,属于网络训练时的配置文件)(注意这里的cfg.TRAIN.PROPOSAL_METHOD = 'gt’这在后面会用到),然后是初始化caffe,这里主要是设置了随机数种子,以及使用caffe训练时的模式(gpu/cpu);之后就是第一个重头戏–获取imdb和roidb格式的训练数据:

get_roidb()函数,如下:

## get_roidb 获取imdb和roidb格式的训练数据
def get_roidb(imdb_name, rpn_file=None):
    imdb = get_imdb(imdb_name)  ## 这里的imdb其实是imdb类的子类pascal_voc类
    print 'Loaded dataset `{:s}` for training'.format(imdb.name)
    imdb.set_proposal_method(cfg.TRAIN.PROPOSAL_METHOD) ## 设置产生proposal的方法
    print 'Set proposal method: {:s}'.format(cfg.TRAIN.PROPOSAL_METHOD)
    if rpn_file is not None:
        imdb.config['rpn_file'] = rpn_file
    roidb = get_training_roidb(imdb)
    return roidb, imdb

首先通过get_imdb()函数获得imdb数据,那我们就再进入get_imdb()函数一探究竟,该函数定义在lib/datasets/factory.py文件中个,代码如下:

__sets = {}

from datasets.pascal_voc import pascal_voc
from datasets.coco import coco
import numpy as np

# Set up voc_<year>_<split> using selective search "fast" mode
for year in ['2007', '2012']:
    for split in ['train', 'val', 'trainval', 'test']:
        name = 'voc_{}_{}'.format(year, split)
        __sets[name] = (lambda split=split, year=year: pascal_voc(split, year))

# Set up coco_2014_<split>
for year in ['2014']:
    for split in ['train', 'val', 'minival', 'valminusminival']:
        name = 'coco_{}_{}'.format(year, split)
        __sets[name] = (lambda split=split, year=year: coco(split, year))

# Set up coco_2015_<split>
for year in ['2015']:
    for split in ['test', 'test-dev']:
        name = 'coco_{}_{}'.format(year, split)
        __sets[name] = (lambda split=split, year=year: coco(split, year))

def get_imdb(name):
    """Get an imdb (image database) by name."""
    if not __sets.has_key(name): #__sets 是一个字典,字典的key是数据集的名称,字典的value是一个lambda表达式(即一个函数指针)
        raise KeyError('Unknown dataset: {}'.format(name))
    return __sets[name]() # 返回imdb数据集,准确的说应该是pascal_voc类数据集,它是imdb类的一个子类

def list_imdbs():
    """List all registered imdbs."""
    return __sets.keys()

这里其实也是调用了pascal_voc()函数来创建imdb数据,pascal_voc类见pascal_voc.py文件中,代码如下:

class pascal_voc(imdb):
    def __init__(self, image_set, year, devkit_path=None): ##这个类用来组织输入的图片的数据,但并没有将真实的图片存进去
        imdb.__init__(self, 'voc_' + year + '_' + image_set)
        self._year = year
        self._image_set = image_set
        self._devkit_path = self._get_default_path() if devkit_path is None \
                            else devkit_path
        self._data_path = os.path.join(self._devkit_path, 'VOC' + self._year)
        self._classes = ('__background__', # always index 0
                         'aeroplane', 'bicycle', 'bird', 'boat',
                         'bottle', 'bus', 'car', 'cat', 'chair',
                         'cow', 'diningtable', 'dog', 'horse',
                         'motorbike', 'person', 'pottedplant',
                         'sheep', 'sofa', 'train', 'tvmonitor')
        self._class_to_ind = dict(zip(self.classes, xrange(self.num_classes))) ## 给每一个类别分别赋予一个对应的整数
        self._image_ext = '.jpg' ## 图片的扩展名
        self._image_index = self._load_image_set_index() ## 把所有的图片名称加载,放在list中,便于索引读取图片
        # Default to roidb handler
        self._roidb_handler = self.selective_search_roidb
        self._salt = str(uuid.uuid4())
        self._comp_id = 'comp4'

        # PASCAL specific config options
        self.config = {'cleanup'     : True,
                       'use_salt'    : True,
                       'use_diff'    : False,
                       'matlab_eval' : False,
                       'rpn_file'    : None,
                       'min_size'    : 2}

        assert os.path.exists(self._devkit_path), \
                'VOCdevkit path does not exist: {}'.format(self._devkit_path)
        assert os.path.exists(self._data_path), \
                'Path does not exist: {}'.format(self._data_path)

这里只截取了一部分,可以发现,pascal_voc这个类主要用来组织输入的图片数据,存储图片的相关信息,但并不存储图片;而实际上,pascal_voc类是imdb类的一个子类;imdb获得的信息如下,imdb是获得训练数据的image_index,classes等信息:

好了现在imdb数据已经获得了,再回到get_roidb()中,紧接着set_proposal_method()函数设置了产生proposal的方法,实际也是向imdb中添加roidb数据,进入set_proposal_method()(定义在datasets/imdb.py文件中)这个函数:

def set_proposal_method(self, method):
        method = eval('self.' + method + '_roidb')
        self.roidb_handler = method

首先用eval()对这个方法进行解析,使其有效,再传入roidb_handler中,这里就要回到之前的train_rpn()函数中了,它里面设置了cfg.TRAIN.PROPOSAL_METHOD=‘gt’(默认值是selective search,先前用于fast rcnn的),先进入gt_roidb()函数(datasets/pascal_voc.py)中:

def gt_roidb(self):
        """
        Return the database of ground-truth regions of interest.

        This function loads/saves from/to a cache file to speed up future calls.
        获得ground_truth的roidb格式数据
        """
        cache_file = os.path.join(self.cache_path, self.name + '_gt_roidb.pkl') #保存缓存文件
        if os.path.exists(cache_file):
            with open(cache_file, 'rb') as fid:
                roidb = cPickle.load(fid)
            print '{} gt roidb loaded from {}'.format(self.name, cache_file)
            return roidb

        gt_roidb = [self._load_pascal_annotation(index) #使用_load_pascal_annotation()从XML中解析gt_roidb数据
                    for index in self.image_index]
        with open(cache_file, 'wb') as fid:
            cPickle.dump(gt_roidb, fid, cPickle.HIGHEST_PROTOCOL) #将roidb数据序列化保存到cache_file中
        print 'wrote gt roidb to {}'.format(cache_file)

        return gt_roidb

这里gt_roidb()中实际是使用_load_pascal_annotation()通过解析XML文件获得gt的roi的,进入该解析函数:

def _load_pascal_annotation(self, index):
        """
        Load image and bounding boxes info from XML file in the PASCAL VOC
        format.
        通过解析XML文件获得gt的roi的
        """
        filename = os.path.join(self._data_path, 'Annotations', index + '.xml')
        tree = ET.parse(filename) #从硬盘导入xml文件
        objs = tree.findall('object') #找到所有属于某个tag的element
        if not self.config['use_diff']:
            # Exclude the samples labeled as difficult
            non_diff_objs = [
                obj for obj in objs if int(obj.find('difficult').text) == 0] #寻找‘difficult’tag中的值为0的obj
            # if len(non_diff_objs) != len(objs):
            #     print 'Removed {} difficult objects'.format(
            #         len(objs) - len(non_diff_objs))
            objs = non_diff_objs
        num_objs = len(objs)

        boxes = np.zeros((num_objs, 4), dtype=np.uint16) #存储坐标,num_objs x 4
        gt_classes = np.zeros((num_objs), dtype=np.int32) #存储要分的类别,这里的类别数等于num_objs
        overlaps = np.zeros((num_objs, self.num_classes), dtype=np.float32) #存储重叠矩阵,num_objs x num_classes
        # "Seg" area for pascal is just the box area
        seg_areas = np.zeros((num_objs), dtype=np.float32) #候选框的面积,个数就是box的个数

        # Load object bounding boxes into a data frame.
        for ix, obj in enumerate(objs):#ix是索引
            bbox = obj.find('bndbox')
            # Make pixel indexes 0-based
            x1 = float(bbox.find('xmin').text) - 1 # 获取gt的坐标信息
            y1 = float(bbox.find('ymin').text) - 1
            x2 = float(bbox.find('xmax').text) - 1
            y2 = float(bbox.find('ymax').text) - 1
            cls = self._class_to_ind[obj.find('name').text.lower().strip()] #获取gt的类别信息
            boxes[ix, :] = [x1, y1, x2, y2] #将坐标信息存储到boxes列表中
            gt_classes[ix] = cls
            overlaps[ix, cls] = 1.0 #这里的box就是gt,所以重叠率设为1,这样子其实overlaps就成了一个单位矩阵
            seg_areas[ix] = (x2 - x1 + 1) * (y2 - y1 + 1) #gt的面积

        overlaps = scipy.sparse.csr_matrix(overlaps)

        return {'boxes' : boxes, #返回key,boxes存储的坐标 一共5个值
                'gt_classes': gt_classes, #存储每个box对应的类别
                'gt_overlaps' : overlaps,  #共有num_classes行,每一行对应的box的类索引值处为1,其余皆为0,后来被转换为了稀疏矩阵
                'flipped' : False, #表示图片还未被翻转
                'seg_areas' : seg_areas}

可以发现,roidb的结构是一个包含有5个key的字典

这个时候就从imdb获得了最初的roidb格式的数据,但这还不是训练时的roidb数据,再回到get_roidb()函数中,通过get_training_roidb()函数(lib/fast_rcnn/train.py)得到最终用于训练的roidb数据,进入该函数:

def get_training_roidb(imdb): # 产生用于训练的roidb格式的数据,主要实现图片的水平翻转,并添加回去
    """Returns a roidb (Region of Interest database) for use in training."""
    if cfg.TRAIN.USE_FLIPPED:
        print 'Appending horizontally-flipped training examples...'
        imdb.append_flipped_images()
        print 'done'

    print 'Preparing training data...'
    rdl_roidb.prepare_roidb(imdb)
    print 'done'

    return imdb.roidb

先根据cfg.TRAIN.USE_FLIPPED判断是否需要对roi进行水平镜像翻转(注意这里的镜像的对称轴是图片的中心线),然后使用append_flipped_images()(datasets/imdb.py)添加镜像roi,作者认为这样子能提高最终网络的训练结果(这应该算是一种简单的数据增强吧),进入该函数:

def append_flipped_images(self):#加入水平翻转的图片,总数也翻倍
        num_images = self.num_images
        widths = self._get_widths()
        for i in xrange(num_images):
            boxes = self.roidb[i]['boxes'].copy()
            oldx1 = boxes[:, 0].copy() #oldx1是Xmin,oldx2是Xmax
            oldx2 = boxes[:, 2].copy()
            boxes[:, 0] = widths[i] - oldx2 - 1 #以图片的中轴线做水平镜像
            boxes[:, 2] = widths[i] - oldx1 - 1
            assert (boxes[:, 2] >= boxes[:, 0]).all()
            entry = {'boxes' : boxes,
                     'gt_overlaps' : self.roidb[i]['gt_overlaps'],
                     'gt_classes' : self.roidb[i]['gt_classes'],
                     'flipped' : True}
            self.roidb.append(entry)
        self._image_index = self._image_index * 2 #索引数x2

添加之后还没结束,回到get_training_roidb()中,最后还要再经过一步prepare_roidb()(lib/roi_data_layer.py),进入该函数

def prepare_roidb(imdb):
    """Enrich the imdb's roidb by adding some derived quantities that
    are useful for training. This function precomputes the maximum
    overlap, taken over ground-truth boxes, between each ROI and
    each ground-truth box. The class with maximum overlap is also
    recorded.
    添加一些额外的衍生信息,方便训练
    """
    sizes = [PIL.Image.open(imdb.image_path_at(i)).size
             for i in xrange(imdb.num_images)]
    roidb = imdb.roidb
    for i in xrange(len(imdb.image_index)):
        roidb[i]['image'] = imdb.image_path_at(i) #添加图片的路径,宽,高信息
        roidb[i]['width'] = sizes[i][0]
        roidb[i]['height'] = sizes[i][1]
        # need gt_overlaps as a dense array for argmax 
        gt_overlaps = roidb[i]['gt_overlaps'].toarray() #[[0 0 0 0 0 1]]
        # max overlap with gt over classes (columns)
        max_overlaps = gt_overlaps.max(axis=1)
        # gt class that had the max overlap
        max_classes = gt_overlaps.argmax(axis=1) #对应的类别
        roidb[i]['max_classes'] = max_classes
        roidb[i]['max_overlaps'] = max_overlaps
        # sanity checks
        # max overlap of 0 => class should be zero (background)
        zero_inds = np.where(max_overlaps == 0)[0]
        assert all(max_classes[zero_inds] == 0)
        # max overlap > 0 => class should not be zero (must be a fg class)
        nonzero_inds = np.where(max_overlaps > 0)[0]
        assert all(max_classes[nonzero_inds] != 0)

gt_overlaps表示每张图像对应的类别,gt_overlaps形式为(n,class_num),(5,6),5表示有5张图像,6表示有6个label,对应的类别处设置为1,其余设置为0,如下图,gt_overlaps数据:

向roidb中再添加一些额外的信息就可以用来进行训练了. 到这儿,获取roidb和imdb的代码就介绍到这儿。

回到train_rpn函数中,利用train_net进行训练,该函数后续会讲解。
Faster-rcnn 源码学习(二)

  • 0
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Fast R-CNN(Region-based Convolutional Neural Networks)是一种目标检测算法,而Faster R-CNN是Fast R-CNN的改进版本。Fast R-CNN基于区域建议网络(Region Proposal Network,RPN)生成候选区域,通过提取候选区域特征并送入全连接层进行目标分类和边界框回归。 而Faster R-CNN进一步改进了区域建议网络,将其整合到模型中,从而实现端到端的目标检测Faster R-CNN的主要创新点是引入了RPN网络,使得检测和提取候选区域的过程能够在训练和测试过程中共享卷积特征,大大提高了检测速度。 Faster R-CNN源码主要包含以下几个部分: 1. 基础模型部分:包括了卷积层、池化层等用于特征提取的网络结构。 2. 区域建议网络(RPN)部分:构建一个小型的神经网络,对于输入图像中的每个位置生成多个候选框,同时输出每个候选框属于目标的概率。 3. 快速区域卷积神经网络(Fast R-CNN)部分:通过共享卷积特征,对RPN输出的候选框进行特征提取,并送入全连接层进行目标分类和边界框回归。 4. 损失函数:用于训练网络的损失函数,主要包括用于划分候选框是否包含目标的分类损失和用于对边界框回归的回归损失。 Faster R-CNN的源码实现通常使用深度学习框架,如PyTorch或TensorFlow。在源码中,会包含网络结构的定义、损失函数的定义、数据加载与处理、训练过程以及测试过程等。 总之,Faster R-CNN源码实现了一种端到端的目标检测算法,通过整合区域建议网络和快速区域卷积神经网络,实现了高效准确的目标检测任务。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值