faster rcnn学习之rpn 的生成

364人阅读 评论(1) 收藏 举报
分类:

接着上一节《 faster rcnn学习之rpn训练全过程》,假定我们已经训好了rpn网络,下面我们看看如何利用训练好的rpn网络生成proposal.

其网络为rpn_test.pt

# Enter your network definition here.
# Use Shift+Enter to update the visualization.
name: "VGG_CNN_M_1024"
input: "data"
input_shape {
  dim: 1
  dim: 3
  dim: 224
  dim: 224
}
input: "im_info"
input_shape {
  dim: 1
  dim: 3
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 96
    kernel_size: 7
    stride: 2
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "conv1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0005
    beta: 0.75
    k: 2
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "norm1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 5
    stride: 2
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "conv2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0005
    beta: 0.75
    k: 2
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "norm2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}

#========= RPN ============

layer {
  name: "rpn_conv/3x3"
  type: "Convolution"
  bottom: "conv5"
  top: "rpn/output"
  convolution_param {
    num_output: 256
    kernel_size: 3 pad: 1 stride: 1
  }
}
layer {
  name: "rpn_relu/3x3"
  type: "ReLU"
  bottom: "rpn/output"
  top: "rpn/output"
}
layer {
  name: "rpn_cls_score"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_cls_score"
  convolution_param {
    num_output: 18   # 2(bg/fg) * 9(anchors)
    kernel_size: 1 pad: 0 stride: 1
  }
}
layer {
  name: "rpn_bbox_pred"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_bbox_pred"
  convolution_param {
    num_output: 36   # 4 * 9(anchors)
    kernel_size: 1 pad: 0 stride: 1
  }
}
layer {
   bottom: "rpn_cls_score"
   top: "rpn_cls_score_reshape"
   name: "rpn_cls_score_reshape"
   type: "Reshape"
   reshape_param { shape { dim: 0 dim: 2 dim: -1 dim: 0 } }
}

#========= RoI Proposal ============

layer {
  name: "rpn_cls_prob"
  type: "Softmax"
  bottom: "rpn_cls_score_reshape"
  top: "rpn_cls_prob"
}
layer {
  name: 'rpn_cls_prob_reshape'
  type: 'Reshape'
  bottom: 'rpn_cls_prob'
  top: 'rpn_cls_prob_reshape'
  reshape_param { shape { dim: 0 dim: 18 dim: -1 dim: 0 } }
}
layer {
  name: 'proposal'
  type: 'Python'
  bottom: 'rpn_cls_prob_reshape'
  bottom: 'rpn_bbox_pred'
  bottom: 'im_info'
  top: 'rois'
  top: 'scores'
  python_param {
    module: 'rpn.proposal_layer'
    layer: 'ProposalLayer'
    param_str: "'feat_stride': 16"
  }
}



同样借用文献[1]的图 ,网络绘制出来如下:我们发现与rpn基本相同。




如上,一张大小为224*224的图片经过前面的5个卷积层,输出256张大小为13*13的 特征图(你也可以理解为一张13*13*256大小的特征图,256表示通道数),然后使用1*1的卷积输出13*13*18的rpn_cls_score,和13*13*36的rpn_bbox_pred。rpn_cls_score经过了reshape,准备进行softmax输出。


接着rpn_cls_score_reshape使用softmax输出了rpn_cls_prob,再reshape回去,输出rpn_cls_prob_reshape。


最后rpn_cls_prob_reshape(1*18*13*13),rpn_bbox_pred(1*36*13*13),im_info (1*3)输入到proposal层中输出了rois与scores。

layer {
  name: 'proposal'
  type: 'Python'
  bottom: 'rpn_cls_prob_reshape'
  bottom: 'rpn_bbox_pred'
  bottom: 'im_info'
  top: 'rois'
  top: 'scores'
  python_param {
    module: 'rpn.proposal_layer'
    layer: 'ProposalLayer'
    param_str: "'feat_stride': 16"
  }
}
我们来看看proposal_layer,

  def setup(self, bottom, top):
        # parse the layer parameter string, which must be valid YAML
        layer_params = yaml.load(self.param_str_)

        self._feat_stride = layer_params['feat_stride']
        anchor_scales = layer_params.get('scales', (8, 16, 32))
        self._anchors = generate_anchors(scales=np.array(anchor_scales))
        self._num_anchors = self._anchors.shape[0]

        if DEBUG:
            print 'feat_stride: {}'.format(self._feat_stride)
            print 'anchors:'
            print self._anchors

        # rois blob: holds R regions of interest, each is a 5-tuple
        # (n, x1, y1, x2, y2) specifying an image batch index n and a
        # rectangle (x1, y1, x2, y2)
        top[0].reshape(1, 5)

        # scores blob: holds scores for R regions of interest
        if len(top) > 1:
            top[1].reshape(1, 1, 1, 1)
anchor_target_layer.py 的setup类似,设置了top的shape,并且生成了左上角顶点的anchors。

    def forward(self, bottom, top):
        # Algorithm:
        #
        # for each (H, W) location i
        #   generate A anchor boxes centered on cell i
        #   apply predicted bbox deltas at cell i to each of the A anchors
        # clip predicted boxes to image
        # remove predicted boxes with either height or width < threshold
        # sort all (proposal, score) pairs by score from highest to lowest
        # take top pre_nms_topN proposals before NMS
        # apply NMS with threshold 0.7 to remaining proposals
        # take after_nms_topN proposals after NMS
        # return the top proposals (-> RoIs top, scores top)

        assert bottom[0].data.shape[0] == 1, \
            'Only single item batches are supported'

        cfg_key = str(self.phase) # either 'TRAIN' or 'TEST'
        pre_nms_topN  = cfg[cfg_key].RPN_PRE_NMS_TOP_N
        post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N
        nms_thresh    = cfg[cfg_key].RPN_NMS_THRESH
        min_size      = cfg[cfg_key].RPN_MIN_SIZE

        # the first set of _num_anchors channels are bg probs   (前9个是背景,后面的是前景预测)
        # the second set are the fg probs, which we want
        scores = bottom[0].data[:, self._num_anchors:, :, :]
        bbox_deltas = bottom[1].data
        im_info = bottom[2].data[0, :]

        if DEBUG:
            print 'im_size: ({}, {})'.format(im_info[0], im_info[1])
            print 'scale: {}'.format(im_info[2])

        # 1. Generate proposals from bbox deltas and shifted anchors
        height, width = scores.shape[-2:]

        if DEBUG:
            print 'score map size: {}'.format(scores.shape)

        # Enumerate all shifts
        shift_x = np.arange(0, width) * self._feat_stride
        shift_y = np.arange(0, height) * self._feat_stride
        shift_x, shift_y = np.meshgrid(shift_x, shift_y)
        shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
                            shift_x.ravel(), shift_y.ravel())).transpose()

        # Enumerate all shifted anchors:
        #
        # add A anchors (1, A, 4) to
        # cell K shifts (K, 1, 4) to get
        # shift anchors (K, A, 4)
        # reshape to (K*A, 4) shifted anchors
        A = self._num_anchors
        K = shifts.shape[0]
        anchors = self._anchors.reshape((1, A, 4)) + \
                  shifts.reshape((1, K, 4)).transpose((1, 0, 2))
        anchors = anchors.reshape((K * A, 4))

        # Transpose and reshape predicted bbox transformations to get them
        # into the same order as the anchors:
        #
        # bbox deltas will be (1, 4 * A, H, W) format
        # transpose to (1, H, W, 4 * A)
        # reshape to (1 * H * W * A, 4) where rows are ordered by (h, w, a)
        # in slowest to fastest order
		# 为了与anchors的shape对应,故做了此变换
        bbox_deltas = bbox_deltas.transpose((0, 2, 3, 1)).reshape((-1, 4))

        # Same story for the scores:
        #
        # scores are (1, A, H, W) format
        # transpose to (1, H, W, A)
        # reshape to (1 * H * W * A, 1) where rows are ordered by (h, w, a)
		# 为了与anchors的shape对应,故做了此变换
        scores = scores.transpose((0, 2, 3, 1)).reshape((-1, 1))

        # Convert anchors into proposals via bbox transformations,生成预测(x1,y1,x2,y2)
        proposals = bbox_transform_inv(anchors, bbox_deltas)

        # 2. clip predicted boxes to image
        proposals = clip_boxes(proposals, im_info[:2])

        # 3. remove predicted boxes with either height or width < threshold
        # (NOTE: convert min_size to input image scale stored in im_info[2])
        keep = _filter_boxes(proposals, min_size * im_info[2])
        proposals = proposals[keep, :]
        scores = scores[keep]

        # 4. sort all (proposal, score) pairs by score from highest to lowest
        # 5. take top pre_nms_topN (e.g. 6000)
        order = scores.ravel().argsort()[::-1]
        if pre_nms_topN > 0:
            order = order[:pre_nms_topN]
        proposals = proposals[order, :]
        scores = scores[order]

        # 6. apply nms (e.g. threshold = 0.7)
        # 7. take after_nms_topN (e.g. 300)
        # 8. return the top proposals (-> RoIs top)
        keep = nms(np.hstack((proposals, scores)), nms_thresh)
        if post_nms_topN > 0:
            keep = keep[:post_nms_topN]
        proposals = proposals[keep, :]
        scores = scores[keep]

        # Output rois blob
        # Our RPN implementation only supports a single input image, so all
        # batch inds are 0
		# rois 的shape为1*5,(n,x1,y1,x2,y2) ,这里生成的box的尺度是缩放后的。
        batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)
        blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))
        top[0].reshape(*(blob.shape))
        top[0].data[...] = blob

        # [Optional] output scores blob
        if len(top) > 1:
            top[1].reshape(*(scores.shape))
            top[1].data[...] = scores
而forward中,先是生成了所有的anchor,然后利用预测地偏移量与生成的anchor一起生成proposal.

再接着进行了一些删减操作以及nms去重。返回前景分数最高的一些proposals及对应的scores.注意生成的proposal是相对于

输入尺度的,也就是缩放后的尺度。



我们再回到train_faster_rcnn_alt_opt中。看Stage 1 RPN, generate proposals'

  mp_kwargs = dict(
            queue=mp_queue,
            imdb_name=args.imdb_name,
            rpn_model_path=str(rpn_stage1_out['model_path']),
            cfg=cfg,
            rpn_test_prototxt=rpn_test_prototxt)
    p = mp.Process(target=rpn_generate, kwargs=mp_kwargs)
    p.start()
    rpn_stage1_out['proposal_path'] = mp_queue.get()['proposal_path']
    p.join()

在rpn_generate中,载入了网络,且使用了生成的rpn网络,接下来imdb_proposals根据网络与imdb生成了rpn_proposals。

imdb_proposals在generate.py中。 

def im_proposals(net, im):
    """Generate RPN proposals on a single image."""
    blobs = {}
    blobs['data'], blobs['im_info'] = _get_image_blob(im)
    net.blobs['data'].reshape(*(blobs['data'].shape))
    net.blobs['im_info'].reshape(*(blobs['im_info'].shape))
    blobs_out = net.forward(
            data=blobs['data'].astype(np.float32, copy=False),
            im_info=blobs['im_info'].astype(np.float32, copy=False))

    scale = blobs['im_info'][0, 2]
    boxes = blobs_out['rois'][:, 1:].copy() / scale
    scores = blobs_out['scores'].copy()
    return boxes, scores

def imdb_proposals(net, imdb):
    """Generate RPN proposals on all images in an imdb."""

    _t = Timer()
    imdb_boxes = [[] for _ in xrange(imdb.num_images)]
    for i in xrange(imdb.num_images):
        im = cv2.imread(imdb.image_path_at(i))
        _t.tic()
        imdb_boxes[i], scores = im_proposals(net, im)
        _t.toc()
        print 'im_proposals: {:d}/{:d} {:.3f}s' \
              .format(i + 1, imdb.num_images, _t.average_time)
        if 0:
            dets = np.hstack((imdb_boxes[i], scores))
            # from IPython import embed; embed()
            _vis_proposals(im, dets[:3, :], thresh=0.9)
            plt.show()

    return imdb_boxes
可以看到在im_proposals中有

  boxes = blobs_out['rois'][:, 1:].copy() / scale
所以rpn生成的proposal经过了缩放,又回到了原始图片的尺度。

imdb_boxes的shape是N*5.N为盒子的序号。


参考:

1. http://blog.csdn.net/zy1034092330/article/details/62044941

2. https://www.zhihu.com/question/35887527/answer/140239982





查看评论

[caffe笔记005]:通过代码理解faster-RCNN中的RPN

注意:整个RPN完全是笔者自己的理解,可能会有一些理解错误的地方。1. RPN简介RPN是regional proposal networks的缩写,是faster-RCNN结构中的一部分。faste...
  • happyflyy
  • happyflyy
  • 2017-02-07 22:38:50
  • 15455

faster rcnn中rpn的anchor

作者:马塔 链接:https://www.zhihu.com/question/42205480/answer/155759667 来源:知乎 著作权归作者所有。商业转载请联系作者获得授权,非商...
  • ture_dream
  • ture_dream
  • 2017-08-07 13:42:45
  • 1146

faster rcnn学习之rpn训练全过程

上篇我们讲解了rpn与fast rcnn的数据准备阶段,接下来我们讲解rpn的整个训练过程。最后 讲解rpn训练完毕后rpn的生成。 我们顺着stage1_rpn_train.pt的内容讲解。 nam...
  • xiamentingtao
  • xiamentingtao
  • 2017-11-08 18:55:29
  • 1023

【目标检测】Faster RCNN算法详解

继RCNN,fast RCNN之后,目标检测界的领军人物Ross Girshick在2015年提出faster RCNN。目标检测速度达到15fps。...
  • shenxiaolu1984
  • shenxiaolu1984
  • 2016-04-21 15:08:06
  • 135181

faster-rcnn中,对RPN的理解

先吐槽一下,网上教程很多,但是对 滑动窗口、anchor、rpn这一块说出这是太烂了,很少人可以讲的清楚明白, 经过自己看了原文章《Faster R-CNN: Towards Real-Time O...
  • ying86615791
  • ying86615791
  • 2017-05-28 00:45:55
  • 6039

Faster rcnn:实时目标检测系统,利用RPN-Net

1.   简介     近年来由于region proposal技术的成功,物体检测发展的非常快,涌现了如RCNN、SPP-net、Fast-Rcnn等一系列的目标识别系统,然而,region pro...
  • u014156736
  • u014156736
  • 2016-06-20 00:37:42
  • 5874

Faster-Rcnn中RPN(Region Proposal Network)的理解

卷积后的pool层特征既可以用于类别判别,也可以用于回归BoundingBox,可以这样想,object的外围存在着一个看不见的BoundingBox, 只要人为提供了真值,那么网络就可以学会去调整参...
  • MLlearnerTJ
  • MLlearnerTJ
  • 2016-12-17 17:30:22
  • 10451

实时的神经网络:Faster-RCNN技术分析

另CNN在人工智能领域脱颖而出的是许多精妙的思想,受生物视觉所启发的局部感知策略,基于统计特性的权值共享,利用特征聚合的池化……这些tricks优化performance的同时,也以指数级递减了计算的...
  • luopingfeng
  • luopingfeng
  • 2016-04-29 16:39:19
  • 117511

Faster R-CNN中RPN为什么能进行候选区域提取?

因为还没搞清楚,所以以下内容有猜测成分。 对于models\rpn_prototxts\ZF\train_val.prototxt,当输入图像“data”的尺寸是224*224*3时,那么前向传播...
  • wshdkf
  • wshdkf
  • 2018-01-13 21:12:51
  • 132

faster-rcnn 之 RPN网络的结构解析

【说明】:我想很多人在看faster-rcnn的时候,都会被RPN的网络结构和连接方式纠结,作者在文中说的不是很清晰,这里给出解析; 【首先】:大家应该要了解卷积神经网络的连接方式,卷积核的维度,反向...
  • sloanqin
  • sloanqin
  • 2016-05-31 15:03:36
  • 36109
    个人资料
    持之以恒
    等级:
    访问量: 34万+
    积分: 4750
    排名: 7613
    个人网站
    最新评论