目录
1.背景:
由于全连接的参数量过大,现在越来越多的网络开始去掉全连接,R-FCN就是一个很好的例子!而且个人认为全卷积应该是未来!
Faster RCNN在RoI Pooling后采用了全连接网络来得到分类与回归的预测,这部分全连接网络占据了整个网络结构的大部分参数,而目前越来越多的全卷积网络证明了不使用全连接网络效果会更好,以适应各种输入尺度的图片。 一个很自然的想法就是去掉RoI Pooling后的全连接,直接连接到分类与回归的网络中,但通过实验发现这种方法检测的效果很差,其中一个原因就是基础的卷积网络是针对分类设计的,具有平移不变性,对位置不敏感,而物体检测则对位置敏感。 针对上述“痛点”,微软亚洲研究院的代季峰团队提出了R-FCN(Region-based Fully Convolutional Networks)算法,利用一个精心设计的位置敏感得分图(position-sensitive score maps)实现了对位置的敏感,并且采用了全卷积网络,大大减少了网络的参数量。
论文地址:
R-FCN: Object Detection via Region-based Fully Convolutional Networks (neurips.cc)
2.结构图:
较为具体的结构:
如图所示为R-FCN的网络结构图,首先R-FCN采用了ResNet-101网络作为Backbone,并在原始的100个卷积层后增加了一个1×1卷积,将通道数降低为1024。 此外,为了增大后续特征图的尺寸,R-FCN将ResNet-101的下采样率从32降到了16。具体做法是,在第5个卷积组里将卷积的步长从2变为1,同时在这个阶段的卷积使用空洞数为2的空洞卷积以扩大感受野。降低步长增加空洞卷积是一种常用的方法,可以在保持特征图尺寸的同时,增大感受野。 在特征图上进行1×1卷积,可以得到位置敏感得分图,其通道数为k2(c+1)。这里的c代表物体类别,一般需要再加上背景这一类别。k的含义是将RoI划分为k2个区域,如下图分别展示了k为1、3、5的情况。
例如当k=3时,可以将RoI分为左上、中上、右上等9个区域,每个区域对特征区域的信息敏感。因此,位置敏感得分图的通道包含了所有9个区域内所有类别的信息。 对于一个位置敏感得分图上的点,假设其坐标为m×n,通道在右上区域,类别为人,则该点表示当前位置属于人并且在人这个“物体”的右上区域的特征,因此这样就包含了位置信息。 在RPN提供了一个感兴趣区域后,对应到位置敏感得分图上,首先将RoI划分为k×k个网格,如图所示:
左侧为将9个不同区域展开后的RoI特征,9个区域分别对应着不同的位置,在Pooling时首先选取其所在区域的对应位置的特征,例如左上区域只选取其左上角的特征,右下区域只选取右下角的特征,选取后对区域内求均值,最终可形成右侧的一个c+1维的k×k特征图。 接下来再对这个c+1维的k×k特征进行逐通道求和,即可得到c+1维的向量,最后进行Softmax即可完成这个RoI的分类预测。 至于RoI的位置回归,则与分类很相似,只不过位置敏感得分图的通道数为k2(c+1),而回归的敏感回归图的通道数为k2×4,按照相同的方法进行Pooling,可形成通道数为4的k×k特征,求和可得到1×4的向量,即为回归的预测。 由于R-FCN去掉了全连接层,并且整个网络都是共享计算的,因此速度很快。此外,由于位置敏感得分图的存在,引入了位置信息,因此R-FCN的检测效果也更好。
3.相关代码:(caffe框架)
训练代码:
import caffe
from fast_rcnn.config import cfg
import roi_data_layer.roidb as rdl_roidb
from utils.timer import Timer
import numpy as np
import os
from caffe.proto import caffe_pb2
import google.protobuf as pb2
class SolverWrapper(object):
"""A simple wrapper around Caffe's solver.
This wrapper gives us control over he snapshotting process, which we
use to unnormalize the learned bounding-box regression weights.
"""
def __init__(self, solver_prototxt, roidb, output_dir,
pretrained_model=None):
"""Initialize the SolverWrapper."""
self.output_dir = output_dir
if (cfg.TRAIN.HAS_RPN and cfg.TRAIN.BBOX_REG and
cfg.TRAIN.BBOX_NORMALIZE_TARGETS):
# RPN can only use precomputed normalization because there are no
# fixed statistics to compute a priori
assert cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED
if cfg.TRAIN.BBOX_REG:
print 'Computing bounding-box regression targets...'
self.bbox_means, self.bbox_stds = \
rdl_roidb.add_bbox_regression_targets(roidb)
print 'done'
self.solver = caffe.SGDSolver(solver_prototxt)
if pretrained_model is not None:
print ('Loading pretrained model '
'weights from {:s}').format(pretrained_model)
self.solver.net.copy_from(pretrained_model)
self.solver_param = caffe_pb2.SolverParameter()
with open(solver_prototxt, 'rt') as f:
pb2.text_format.Merge(f.read(), self.solver_param)
self.solver.net.layers[0].set_roidb(roidb)
def snapshot(self):
"""Take a snapshot of the network after unnormalizing the learned
bounding-box regression weights. This enables easy use at test-time.
"""
net = self.solver.net
scale_bbox_params_faster_rcnn = (cfg.TRAIN.BBOX_REG and
cfg.TRAIN.BBOX_NORMALIZE_TARGETS and
net.params.has_key('bbox_pred'))
scale_bbox_params_rfcn = (cfg.TRAIN.BBOX_REG and
cfg.TRAIN.BBOX_NORMALIZE_TARGETS and
net.params.has_key('rfcn_bbox'))
scale_bbox_params_rpn = (cfg.TRAIN.RPN_NORMALIZE_TARGETS and
net.params.has_key('rpn_bbox_pred'))
if scale_bbox_params_faster_rcnn:
# save original values
orig_0 = net.params['bbox_pred'][0].data.copy()
orig_1 = net.params['bbox_pred'][1].data.copy()
# scale and shift with bbox reg unnormalization; then save snapshot
net.params['bbox_pred'][0].data[...] = \
(net.params['bbox_pred'][0].data *
self.bbox_stds[:, np.newaxis])
net.params['bbox_pred'][1].data[...] = \
(net.params['bbox_pred'][1].data *
self.bbox_stds + self.bbox_means)
if scale_bbox_params_rpn:
rpn_orig_0 = net.params['rpn_bbox_pred'][0].data.copy()
rpn_orig_1 = net.params['rpn_bbox_pred'][1].data.copy()
num_anchor = rpn_orig_0.shape[0] / 4
# scale and shift with bbox reg unnormalization; then save snapshot
self.rpn_means = np.tile(np.asarray(cfg.TRAIN.RPN_NORMALIZE_MEANS),
num_anchor)
self.rpn_stds = np.tile(np.asarray(cfg.TRAIN.RPN_NORMALIZE_STDS),
num_anchor)
net.params['rpn_bbox_pred'][0].data[...] = \
(net.params['rpn_bbox_pred'][0].data *
self.rpn_stds[:, np.newaxis, np.newaxis, np.newaxis])
net.params['rpn_bbox_pred'][1].data[...] = \
(net.params['rpn_bbox_pred'][1].data *
self.rpn_stds + self.rpn_means)
if scale_bbox_params_rfcn:
# save original values
orig_0 = net.params['rfcn_bbox'][0].data.copy()
orig_1 = net.params['rfcn_bbox'][1].data.copy()
repeat = orig_1.shape[0] / self.bbox_means.shape[0]
# scale and shift with bbox reg unnormalization; then save snapshot
net.params['rfcn_bbox'][0].data[...] = \
(net.params['rfcn_bbox'][0].data *
np.repeat(self.bbox_stds, repeat).reshape((orig_1.shape[0], 1, 1, 1)))
net.params['rfcn_bbox'][1].data[...] = \
(net.params['rfcn_bbox'][1].data *
np.repeat(self.bbox_stds, repeat) + np.repeat(self.bbox_means, repeat))
infix = ('_' + cfg.TRAIN.SNAPSHOT_INFIX
if cfg.TRAIN.SNAPSHOT_INFIX != '' else '')
filename = (self.solver_param.snapshot_prefix + infix +
'_iter_{:d}'.format(self.solver.iter) + '.caffemodel')
filename = os.path.join(self.output_dir, filename)
net.save(str(filename))
print 'Wrote snapshot to: {:s}'.format(filename)
if scale_bbox_params_faster_rcnn:
# restore net to original state
net.params['bbox_pred'][0].data[...] = orig_0
net.params['bbox_pred'][1].data[...] = orig_1
if scale_bbox_params_rfcn:
# restore net to original state
net.params['rfcn_bbox'][0].data[...] = orig_0
net.params['rfcn_bbox'][1].data[...] = orig_1
if scale_bbox_params_rpn:
# restore net to original state
net.params['rpn_bbox_pred'][0].data[...] = rpn_orig_0
net.params['rpn_bbox_pred'][1].data[...] = rpn_orig_1
return filename
def train_model(self, max_iters):
"""Network training loop."""
last_snapshot_iter = -1
timer = Timer()
model_paths = []
while self.solver.iter < max_iters:
# Make one SGD update
timer.tic()
self.solver.step(1)
timer.toc()
if self.solver.iter % (10 * self.solver_param.display) == 0:
print 'speed: {:.3f}s / iter'.format(timer.average_time)
if self.solver.iter % cfg.TRAIN.SNAPSHOT_ITERS == 0:
last_snapshot_iter = self.solver.iter
model_paths.append(self.snapshot())
if last_snapshot_iter != self.solver.iter:
model_paths.append(self.snapshot())
return model_paths
def get_training_roidb(imdb):
"""Returns a roidb (Region of Interest database) for use in training."""
if cfg.TRAIN.USE_FLIPPED:
print 'Appending horizontally-flipped training examples...'
imdb.append_flipped_images()
print 'done'
print 'Preparing training data...'
rdl_roidb.prepare_roidb(imdb)
print 'done'
return imdb.roidb
def filter_roidb(roidb):
"""Remove roidb entries that have no usable RoIs."""
def is_valid(entry):
# Valid images have:
# (1) At least one foreground RoI OR
# (2) At least one background RoI
overlaps = entry['max_overlaps']
# find boxes with sufficient overlap
fg_inds = np.where(overlaps >= cfg.TRAIN.FG_THRESH)[0]
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where((overlaps < cfg.TRAIN.BG_THRESH_HI) &
(overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
# image is only valid if such boxes exist
valid = len(fg_inds) > 0 or len(bg_inds) > 0
return valid
num = len(roidb)
filtered_roidb = [entry for entry in roidb if is_valid(entry)]
num_after = len(filtered_roidb)
print 'Filtered {} roidb entries: {} -> {}'.format(num - num_after,
num, num_after)
return filtered_roidb
def train_net(solver_prototxt, roidb, output_dir,
pretrained_model=None, max_iters=40000):
"""Train a Fast R-CNN network."""
roidb = filter_roidb(roidb)
sw = SolverWrapper(solver_prototxt, roidb, output_dir,
pretrained_model=pretrained_model)
print 'Solving...'
model_paths = sw.train_model(max_iters)
print 'done solving'
return model_paths