R-FCN：走向全卷积的网络

最新推荐文章于 2022-09-14 14:26:43 发布

心之所向521

最新推荐文章于 2022-09-14 14:26:43 发布

阅读量2.5k

点赞数 3

分类专栏：经典网络骨架Backbone 深度学习算法文章标签：深度学习人工智能神经网络目标检测图像处理

本文链接：https://blog.csdn.net/weixin_45564943/article/details/121981073

版权

深度学习算法同时被 2 个专栏收录

34 篇文章 9 订阅

订阅专栏

经典网络骨架Backbone

12 篇文章 1 订阅

订阅专栏

本文详细解读了R-FCN（Region-based Fully Convolutional Networks）在减少全连接层参数并提升物体检测性能方面的创新。通过ResNet-101作为基础网络，引入位置敏感得分图，使得网络对位置信息更敏感，从而改进了Faster R-CNN的检测效果。

摘要由CSDN通过智能技术生成

1.背景：

由于全连接的参数量过大，现在越来越多的网络开始去掉全连接，R-FCN就是一个很好的例子！而且个人认为全卷积应该是未来！

Faster RCNN在RoI Pooling后采用了全连接网络来得到分类与回归的预测，这部分全连接网络占据了整个网络结构的大部分参数，而目前越来越多的全卷积网络证明了不使用全连接网络效果会更好，以适应各种输入尺度的图片。一个很自然的想法就是去掉RoI Pooling后的全连接，直接连接到分类与回归的网络中，但通过实验发现这种方法检测的效果很差，其中一个原因就是基础的卷积网络是针对分类设计的，具有平移不变性，对位置不敏感，而物体检测则对位置敏感。针对上述“痛点”，微软亚洲研究院的代季峰团队提出了R-FCN（Region-based Fully Convolutional Networks）算法，利用一个精心设计的位置敏感得分图（position-sensitive score maps）实现了对位置的敏感，并且采用了全卷积网络，大大减少了网络的参数量。

论文地址:

R-FCN: Object Detection via Region-based Fully Convolutional Networks (neurips.cc)

2.结构图：

较为具体的结构：

如图所示为R-FCN的网络结构图，首先R-FCN采用了ResNet-101网络作为Backbone，并在原始的100个卷积层后增加了一个1×1卷积，将通道数降低为1024。 此外，为了增大后续特征图的尺寸，R-FCN将ResNet-101的下采样率从32降到了16。具体做法是，在第5个卷积组里将卷积的步长从2变为1，同时在这个阶段的卷积使用空洞数为2的空洞卷积以扩大感受野。降低步长增加空洞卷积是一种常用的方法，可以在保持特征图尺寸的同时，增大感受野。在特征图上进行1×1卷积，可以得到位置敏感得分图，其通道数为k2(c+1)。这里的c代表物体类别，一般需要再加上背景这一类别。k的含义是将RoI划分为k2个区域，如下图分别展示了k为1、3、5的情况。

例如当k=3时，可以将RoI分为左上、中上、右上等9个区域，每个区域对特征区域的信息敏感。因此，位置敏感得分图的通道包含了所有9个区域内所有类别的信息。对于一个位置敏感得分图上的点，假设其坐标为m×n，通道在右上区域，类别为人，则该点表示当前位置属于人并且在人这个“物体”的右上区域的特征，因此这样就包含了位置信息。在RPN提供了一个感兴趣区域后，对应到位置敏感得分图上，首先将RoI划分为k×k个网格，如图所示：

左侧为将9个不同区域展开后的RoI特征，9个区域分别对应着不同的位置，在Pooling时首先选取其所在区域的对应位置的特征，例如左上区域只选取其左上角的特征，右下区域只选取右下角的特征，选取后对区域内求均值，最终可形成右侧的一个c+1维的k×k特征图。接下来再对这个c+1维的k×k特征进行逐通道求和，即可得到c+1维的向量，最后进行Softmax即可完成这个RoI的分类预测。至于RoI的位置回归，则与分类很相似，只不过位置敏感得分图的通道数为k2(c+1)，而回归的敏感回归图的通道数为k2×4，按照相同的方法进行Pooling，可形成通道数为4的k×k特征，求和可得到1×4的向量，即为回归的预测。由于R-FCN去掉了全连接层，并且整个网络都是共享计算的，因此速度很快。此外，由于位置敏感得分图的存在，引入了位置信息，因此R-FCN的检测效果也更好。

3.相关代码：（caffe框架）

训练代码：

import caffe
from fast_rcnn.config import cfg
import roi_data_layer.roidb as rdl_roidb
from utils.timer import Timer
import numpy as np
import os

from caffe.proto import caffe_pb2
import google.protobuf as pb2

class SolverWrapper(object):
    """A simple wrapper around Caffe's solver.
    This wrapper gives us control over he snapshotting process, which we
    use to unnormalize the learned bounding-box regression weights.
    """

    def __init__(self, solver_prototxt, roidb, output_dir,
                 pretrained_model=None):
        """Initialize the SolverWrapper."""
        self.output_dir = output_dir

        if (cfg.TRAIN.HAS_RPN and cfg.TRAIN.BBOX_REG and
            cfg.TRAIN.BBOX_NORMALIZE_TARGETS):
            # RPN can only use precomputed normalization because there are no
            # fixed statistics to compute a priori
            assert cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED

        if cfg.TRAIN.BBOX_REG:
            print 'Computing bounding-box regression targets...'
            self.bbox_means, self.bbox_stds = \
                    rdl_roidb.add_bbox_regression_targets(roidb)
            print 'done'

        self.solver = caffe.SGDSolver(solver_prototxt)
        if pretrained_model is not None:
            print ('Loading pretrained model '
                   'weights from {:s}').format(pretrained_model)
            self.solver.net.copy_from(pretrained_model)

        self.solver_param = caffe_pb2.SolverParameter()
        with open(solver_prototxt, 'rt') as f:
            pb2.text_format.Merge(f.read(), self.solver_param)

        self.solver.net.layers[0].set_roidb(roidb)

    def snapshot(self):
        """Take a snapshot of the network after unnormalizing the learned
        bounding-box regression weights. This enables easy use at test-time.
        """
        net = self.solver.net

        scale_bbox_params_faster_rcnn = (cfg.TRAIN.BBOX_REG and
                             cfg.TRAIN.BBOX_NORMALIZE_TARGETS and
                             net.params.has_key('bbox_pred'))

        scale_bbox_params_rfcn = (cfg.TRAIN.BBOX_REG and
                             cfg.TRAIN.BBOX_NORMALIZE_TARGETS and
                             net.params.has_key('rfcn_bbox'))

        scale_bbox_params_rpn = (cfg.TRAIN.RPN_NORMALIZE_TARGETS and
                                 net.params.has_key('rpn_bbox_pred'))

        if scale_bbox_params_faster_rcnn:
            # save original values
            orig_0 = net.params['bbox_pred'][0].data.copy()
            orig_1 = net.params['bbox_pred'][1].data.copy()

            # scale and shift with bbox reg unnormalization; then save snapshot
            net.params['bbox_pred'][0].data[...] = \
                    (net.params['bbox_pred'][0].data *
                     self.bbox_stds[:, np.newaxis])
            net.params['bbox_pred'][1].data[...] = \
                    (net.params['bbox_pred'][1].data *
                     self.bbox_stds + self.bbox_means)

        if scale_bbox_params_rpn:
            rpn_orig_0 = net.params['rpn_bbox_pred'][0].data.copy()
            rpn_orig_1 = net.params['rpn_bbox_pred'][1].data.copy()
            num_anchor = rpn_orig_0.shape[0] / 4
            # scale and shift with bbox reg unnormalization; then save snapshot
            self.rpn_means = np.tile(np.asarray(cfg.TRAIN.RPN_NORMALIZE_MEANS),
                                      num_anchor)
            self.rpn_stds = np.tile(np.asarray(cfg.TRAIN.RPN_NORMALIZE_STDS),
                                     num_anchor)
            net.params['rpn_bbox_pred'][0].data[...] = \
                (net.params['rpn_bbox_pred'][0].data *
                 self.rpn_stds[:, np.newaxis, np.newaxis, np.newaxis])
            net.params['rpn_bbox_pred'][1].data[...] = \
                (net.params['rpn_bbox_pred'][1].data *
                 self.rpn_stds + self.rpn_means)

        if scale_bbox_params_rfcn:
            # save original values
            orig_0 = net.params['rfcn_bbox'][0].data.copy()
            orig_1 = net.params['rfcn_bbox'][1].data.copy()
            repeat = orig_1.shape[0] / self.bbox_means.shape[0]

            # scale and shift with bbox reg unnormalization; then save snapshot
            net.params['rfcn_bbox'][0].data[...] = \
                    (net.params['rfcn_bbox'][0].data *
                     np.repeat(self.bbox_stds, repeat).reshape((orig_1.shape[0], 1, 1, 1)))
            net.params['rfcn_bbox'][1].data[...] = \
                    (net.params['rfcn_bbox'][1].data *
                     np.repeat(self.bbox_stds, repeat) + np.repeat(self.bbox_means, repeat))

        infix = ('_' + cfg.TRAIN.SNAPSHOT_INFIX
                 if cfg.TRAIN.SNAPSHOT_INFIX != '' else '')
        filename = (self.solver_param.snapshot_prefix + infix +
                    '_iter_{:d}'.format(self.solver.iter) + '.caffemodel')
        filename = os.path.join(self.output_dir, filename)
        net.save(str(filename))
        print 'Wrote snapshot to: {:s}'.format(filename)

        if scale_bbox_params_faster_rcnn:
            # restore net to original state
            net.params['bbox_pred'][0].data[...] = orig_0
            net.params['bbox_pred'][1].data[...] = orig_1
        if scale_bbox_params_rfcn:
            # restore net to original state
            net.params['rfcn_bbox'][0].data[...] = orig_0
            net.params['rfcn_bbox'][1].data[...] = orig_1
        if scale_bbox_params_rpn:
            # restore net to original state
            net.params['rpn_bbox_pred'][0].data[...] = rpn_orig_0
            net.params['rpn_bbox_pred'][1].data[...] = rpn_orig_1

        return filename

    def train_model(self, max_iters):
        """Network training loop."""
        last_snapshot_iter = -1
        timer = Timer()
        model_paths = []
        while self.solver.iter < max_iters:
            # Make one SGD update
            timer.tic()
            self.solver.step(1)
            timer.toc()
            if self.solver.iter % (10 * self.solver_param.display) == 0:
                print 'speed: {:.3f}s / iter'.format(timer.average_time)

            if self.solver.iter % cfg.TRAIN.SNAPSHOT_ITERS == 0:
                last_snapshot_iter = self.solver.iter
                model_paths.append(self.snapshot())

        if last_snapshot_iter != self.solver.iter:
            model_paths.append(self.snapshot())
        return model_paths

def get_training_roidb(imdb):
    """Returns a roidb (Region of Interest database) for use in training."""
    if cfg.TRAIN.USE_FLIPPED:
        print 'Appending horizontally-flipped training examples...'
        imdb.append_flipped_images()
        print 'done'

    print 'Preparing training data...'
    rdl_roidb.prepare_roidb(imdb)
    print 'done'

    return imdb.roidb

def filter_roidb(roidb):
    """Remove roidb entries that have no usable RoIs."""

    def is_valid(entry):
        # Valid images have:
        #   (1) At least one foreground RoI OR
        #   (2) At least one background RoI
        overlaps = entry['max_overlaps']
        # find boxes with sufficient overlap
        fg_inds = np.where(overlaps >= cfg.TRAIN.FG_THRESH)[0]
        # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
        bg_inds = np.where((overlaps < cfg.TRAIN.BG_THRESH_HI) &
                           (overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
        # image is only valid if such boxes exist
        valid = len(fg_inds) > 0 or len(bg_inds) > 0
        return valid

    num = len(roidb)
    filtered_roidb = [entry for entry in roidb if is_valid(entry)]
    num_after = len(filtered_roidb)
    print 'Filtered {} roidb entries: {} -> {}'.format(num - num_after,
                                                       num, num_after)
    return filtered_roidb

def train_net(solver_prototxt, roidb, output_dir,
              pretrained_model=None, max_iters=40000):
    """Train a Fast R-CNN network."""

    roidb = filter_roidb(roidb)
    sw = SolverWrapper(solver_prototxt, roidb, output_dir,
                       pretrained_model=pretrained_model)

    print 'Solving...'
    model_paths = sw.train_model(max_iters)
    print 'done solving'
    return model_paths