【深度学习】Faster R-CNN 网络

JNingWei

已于 2023-01-26 16:05:35 修改

阅读量1.6k

点赞数 1

分类专栏：深度学习文章标签：深度学习人工智能计算机视觉神经网络目标检测

于 2017-12-21 11:10:19 首次发布

本文链接：https://blog.csdn.net/jningwei/article/details/78859926

版权

深度学习专栏收录该内容

79 篇文章 60 订阅

订阅专栏

Structure

看不清的可以右键，在新tab中查看该图片：
这里写图片描述

其中，“方框”代表“操作”，“菱框”代表“输出结果”。

以下是我画的Faster R-CNN结构图：
这里写图片描述

前部

Faster R-CNN 头部负责对输入图像进行 特征提取 ：
这里写图片描述

网络结构有两种，一种是将ZFNet（扔掉了尾端的全连接层）拿来用，另一种则是将VGG拿来用（扔掉了尾端的全连接层）。论文中给出的是第一种（绿框内为拿来用的那部分）：
这里写图片描述

中部

Faster R-CNN 中部负责对 特征图 (即前部所提取到的特征) 进行 生成anchor并从中筛选出proposal ：
这里写图片描述

这里写图片描述

双分支：

绿框内：【RPN（生成anchor --> 洗涤anchor成proposal）】。
红框内：【特征图的无损传递】。

最后统一交付给 黄框内：【RoIPooling】去输出相同size的RoI。

后部

交给 分类任务 和 边框回归任务 来进一步提升location精度和class精度，并输出检测结果：
这里写图片描述

Loss Computation

多任务：
Faster R-CNN论文笔记——FR
这里写图片描述

Fast R-CNN网络有两个同级输出层（cls score和bbox_prdict层），都是全连接层，称为multi-task。
① clsscore层：用于分类，输出k+1维数组p，表示属于k类和背景的概率。对每个RoI（Region of Interesting）输出离散型概率分布
这里写图片描述
通常，p由k+1类的全连接层利用softmax计算得出。

② bbox_prdict层：用于调整候选区域位置，输出bounding box回归的位移，输出4*K维数组t，表示分别属于k类时，应该平移缩放的参数。
这里写图片描述
k表示类别的索引，是指相对于objectproposal尺度不变的平移，是指对数空间中相对于objectproposal的高与宽。
loss_cls层评估分类损失函数。由真实分类u对应的概率决定：

loss_bbox评估检测框定位的损失函数。比较真实分类对应的预测平移缩放参数这里写图片描述和真实平移缩放参数为
的差别：

其中，smooth L1损失函数为：

smooth L1损失函数曲线如下图9所示，作者这样设置的目的是想让loss对于离群点更加鲁棒，相比于L2损失函数，其对离群点、异常值（outlier）不敏感，可控制梯度的量级使训练时不容易跑飞。
这里写图片描述

最后总损失为（两者加权和，如果分类为背景则不考虑定位损失）：
这里写图片描述
规定u=0为背景类（也就是负标签），那么艾弗森括号指数函数[u≥1]表示背景候选区域即负样本不参与回归损失，不需要对候选区域进行回归操作。λ控制分类损失和回归损失的平衡。Fast R-CNN论文中，所有实验λ=1。
艾弗森括号指数函数为：
这里写图片描述
源码中bbox_loss_weights用于标记每一个bbox是否属于某一个类。

Code

附上作者的源码 rbgirshick/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_end2end/train.prototxt ：

name: "ZF"
layer {
  name: 'input-data'
  type: 'Python'
  top: 'data'
  top: 'im_info'
  top: 'gt_boxes'
  python_param {
    module: 'roi_data_layer.layer'
    layer: 'RoIDataLayer'
    param_str: "'num_classes': 21"
  }
}

#========= conv1-conv5 ============

layer {
	name: "conv1"
	type: "Convolution"
	bottom: "data"
	top: "conv1"
	param { lr_mult: 1.0 }
	param { lr_mult: 2.0 }
	convolution_param {
		num_output: 96
		kernel_size: 7
		pad: 3
		stride: 2
	}
}
layer {
	name: "relu1"
	type: "ReLU"
	bottom: "conv1"
	top: "conv1"
}
layer {
	name: "norm1"
	type: "LRN"
	bottom: "conv1"
	top: "norm1"
	lrn_param {
		local_size: 3
		alpha: 0.00005
		beta: 0.75
		norm_region: WITHIN_CHANNEL
    engine: CAFFE
	}
}
layer {
	name: "pool1"
	type: "Pooling"
	bottom: "norm1"
	top: "pool1"
	pooling_param {
		kernel_size: 3
		stride: 2
		pad: 1
		pool: MAX
	}
}
layer {
	name: "conv2"
	type: "Convolution"
	bottom: "pool1"
	top: "conv2"
	param { lr_mult: 1.0 }
	param { lr_mult: 2.0 }
	convolution_param {
		num_output: 256
		kernel_size: 5
		pad: 2
		stride: 2
	}
}
layer {
	name: "relu2"
	type: "ReLU"
	bottom: "conv2"
	top: "conv2"
}
layer {
	name: "norm2"
	type: "LRN"
	bottom: "conv2"
	top: "norm2"
	lrn_param {
		local_size: 3
		alpha: 0.00005
		beta: 0.75
		norm_region: WITHIN_CHANNEL
    engine: CAFFE
	}
}
layer {
	name: "pool2"
	type: "Pooling"
	bottom: "norm2"
	top: "pool2"
	pooling_param {
		kernel_size: 3
		stride: 2
		pad: 1
		pool: MAX
	}
}
layer {
	name: "conv3"
	type: "Convolution"
	bottom: "pool2"
	top: "conv3"
	param { lr_mult: 1.0 }
	param { lr_mult: 2.0 }
	convolution_param {
		num_output: 384
		kernel_size: 3
		pad: 1
		stride: 1
	}
}
layer {
	name: "relu3"
	type: "ReLU"
	bottom: "conv3"
	top: "conv3"
}
layer {
	name: "conv4"
	type: "Convolution"
	bottom: "conv3"
	top: "conv4"
	param { lr_mult: 1.0 }
	param { lr_mult: 2.0 }
	convolution_param {
		num_output: 384
		kernel_size: 3
		pad: 1
		stride: 1
	}
}
layer {
	name: "relu4"
	type: "ReLU"
	bottom: "conv4"
	top: "conv4"
}
layer {
	name: "conv5"
	type: "Convolution"
	bottom: "conv4"
	top: "conv5"
	param { lr_mult: 1.0 }
	param { lr_mult: 2.0 }
	convolution_param {
		num_output: 256
		kernel_size: 3
		pad: 1
		stride: 1
	}
}
layer {
	name: "relu5"
	type: "ReLU"
	bottom: "conv5"
	top: "conv5"
}

#========= RPN ============

layer {
  name: "rpn_conv/3x3"
  type: "Convolution"
  bottom: "conv5"
  top: "rpn/output"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 256
    kernel_size: 3 pad: 1 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}
layer {
  name: "rpn_relu/3x3"
  type: "ReLU"
  bottom: "rpn/output"
  top: "rpn/output"
}

#layer {
#  name: "rpn_conv/3x3"
#  type: "Convolution"
#  bottom: "conv5"
#  top: "rpn_conv/3x3"
#  param { lr_mult: 1.0 }
#  param { lr_mult: 2.0 }
#  convolution_param {
#    num_output: 192
#    kernel_size: 3 pad: 1 stride: 1
#    weight_filler { type: "gaussian" std: 0.01 }
#    bias_filler { type: "constant" value: 0 }
#  }
#}
#layer {
#  name: "rpn_conv/5x5"
#  type: "Convolution"
#  bottom: "conv5"
#  top: "rpn_conv/5x5"
#  param { lr_mult: 1.0 }
#  param { lr_mult: 2.0 }
#  convolution_param {
#    num_output: 64
#    kernel_size: 5 pad: 2 stride: 1
#    weight_filler { type: "gaussian" std: 0.0036 }
#    bias_filler { type: "constant" value: 0 }
#  }
#}
#layer {
#  name: "rpn/output"
#  type: "Concat"
#  bottom: "rpn_conv/3x3"
#  bottom: "rpn_conv/5x5"
#  top: "rpn/output"
#}
#layer {
#  name: "rpn_relu/output"
#  type: "ReLU"
#  bottom: "rpn/output"
#  top: "rpn/output"
#}

layer {
  name: "rpn_cls_score"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_cls_score"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 18   # 2(bg/fg) * 9(anchors)
    kernel_size: 1 pad: 0 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}
layer {
  name: "rpn_bbox_pred"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_bbox_pred"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 36   # 4 * 9(anchors)
    kernel_size: 1 pad: 0 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}
layer {
   bottom: "rpn_cls_score"
   top: "rpn_cls_score_reshape"
   name: "rpn_cls_score_reshape"
   type: "Reshape"
   reshape_param { shape { dim: 0 dim: 2 dim: -1 dim: 0 } }
}
layer {
  name: 'rpn-data'
  type: 'Python'
  bottom: 'rpn_cls_score'
  bottom: 'gt_boxes'
  bottom: 'im_info'
  bottom: 'data'
  top: 'rpn_labels'
  top: 'rpn_bbox_targets'
  top: 'rpn_bbox_inside_weights'
  top: 'rpn_bbox_outside_weights'
  python_param {
    module: 'rpn.anchor_target_layer'
    layer: 'AnchorTargetLayer'
    param_str: "'feat_stride': 16"
  }
}
layer {
  name: "rpn_loss_cls"
  type: "SoftmaxWithLoss"
  bottom: "rpn_cls_score_reshape"
  bottom: "rpn_labels"
  propagate_down: 1
  propagate_down: 0
  top: "rpn_cls_loss"
  loss_weight: 1
  loss_param {
    ignore_label: -1
    normalize: true
  }
}
layer {
  name: "rpn_loss_bbox"
  type: "SmoothL1Loss"
  bottom: "rpn_bbox_pred"
  bottom: "rpn_bbox_targets"
  bottom: 'rpn_bbox_inside_weights'
  bottom: 'rpn_bbox_outside_weights'
  top: "rpn_loss_bbox"
  loss_weight: 1
  smooth_l1_loss_param { sigma: 3.0 }
}

#========= RoI Proposal ============

layer {
  name: "rpn_cls_prob"
  type: "Softmax"
  bottom: "rpn_cls_score_reshape"
  top: "rpn_cls_prob"
}
layer {
  name: 'rpn_cls_prob_reshape'
  type: 'Reshape'
  bottom: 'rpn_cls_prob'
  top: 'rpn_cls_prob_reshape'
  reshape_param { shape { dim: 0 dim: 18 dim: -1 dim: 0 } }
}
layer {
  name: 'proposal'
  type: 'Python'
  bottom: 'rpn_cls_prob_reshape'
  bottom: 'rpn_bbox_pred'
  bottom: 'im_info'
  top: 'rpn_rois'
#  top: 'rpn_scores'
  python_param {
    module: 'rpn.proposal_layer'
    layer: 'ProposalLayer'
    param_str: "'feat_stride': 16"
  }
}
#layer {
#  name: 'debug-data'
#  type: 'Python'
#  bottom: 'data'
#  bottom: 'rpn_rois'
#  bottom: 'rpn_scores'
#  python_param {
#    module: 'rpn.debug_layer'
#    layer: 'RPNDebugLayer'
#  }
#}
layer {
  name: 'roi-data'
  type: 'Python'
  bottom: 'rpn_rois'
  bottom: 'gt_boxes'
  top: 'rois'
  top: 'labels'
  top: 'bbox_targets'
  top: 'bbox_inside_weights'
  top: 'bbox_outside_weights'
  python_param {
    module: 'rpn.proposal_target_layer'
    layer: 'ProposalTargetLayer'
    param_str: "'num_classes': 21"
  }
}

#========= RCNN ============

layer {
  name: "roi_pool_conv5"
  type: "ROIPooling"
  bottom: "conv5"
  bottom: "rois"
  top: "roi_pool_conv5"
  roi_pooling_param {
    pooled_w: 6
    pooled_h: 6
    spatial_scale: 0.0625 # 1/16
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "roi_pool_conv5"
  top: "fc6"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
    scale_train: false
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
    scale_train: false
  }
}
layer {
  name: "cls_score"
  type: "InnerProduct"
  bottom: "fc7"
  top: "cls_score"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  inner_product_param {
    num_output: 21
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "bbox_pred"
  type: "InnerProduct"
  bottom: "fc7"
  top: "bbox_pred"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  inner_product_param {
    num_output: 84
    weight_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "loss_cls"
  type: "SoftmaxWithLoss"
  bottom: "cls_score"
  bottom: "labels"
  propagate_down: 1
  propagate_down: 0
  top: "cls_loss"
  loss_weight: 1
  loss_param {
    ignore_label: -1
    normalize: true
  }
}
layer {
  name: "loss_bbox"
  type: "SmoothL1Loss"
  bottom: "bbox_pred"
  bottom: "bbox_targets"
  bottom: 'bbox_inside_weights'
  bottom: 'bbox_outside_weights'
  top: "bbox_loss"
  loss_weight: 1
}