【深度学习】Faster R-CNN 网络

Structure

看不清的可以右键,在新tab中查看该图片:
这里写图片描述

其中,“方框”代表“操作”,“菱框”代表“输出结果”。

以下是我画的Faster R-CNN结构图:
这里写图片描述

前部

Faster R-CNN 头部 负责对输入图像进行 特征提取
这里写图片描述

网络结构有两种,一种是将ZFNet(扔掉了尾端的全连接层)拿来用,另一种则是将VGG拿来用(扔掉了尾端的全连接层)。论文中给出的是第一种(绿框内为拿来用的那部分):
这里写图片描述

中部

Faster R-CNN 中部 负责对 特征图 (即前部所提取到的特征) 进行 生成anchor并从中筛选出proposal
这里写图片描述

这里写图片描述

双分支:

  1. 绿框内:【RPN(生成anchor --> 洗涤anchor成proposal)】。
  2. 红框内:【特征图的无损传递】。

最后统一交付给 黄框内:【RoIPooling】 去输出相同size的RoI。

后部

交给 分类任务边框回归任务 来 进一步提升location精度和class精度,并输出检测结果:
这里写图片描述

Loss Computation

多任务:
Faster R-CNN论文笔记——FR
这里写图片描述

Fast R-CNN网络有两个同级输出层(cls score和bbox_prdict层),都是全连接层,称为multi-task。
① clsscore层:用于分类,输出k+1维数组p,表示属于k类和背景的概率。对每个RoI(Region of Interesting)输出离散型概率分布
这里写图片描述
通常,p由k+1类的全连接层利用softmax计算得出。

② bbox_prdict层:用于调整候选区域位置,输出bounding box回归的位移,输出4*K维数组t,表示分别属于k类时,应该平移缩放的参数。
这里写图片描述
k表示类别的索引,这里写图片描述是指相对于objectproposal尺度不变的平移,这里写图片描述是指对数空间中相对于objectproposal的高与宽。
loss_cls层评估分类损失函数。由真实分类u对应的概率决定:
这里写图片描述
loss_bbox评估检测框定位的损失函数。比较真实分类对应的预测平移缩放参数这里写图片描述和真实平移缩放参数为这里写图片描述
的差别:
这里写图片描述
其中,smooth L1损失函数为:
这里写图片描述
smooth L1损失函数曲线如下图9所示,作者这样设置的目的是想让loss对于离群点更加鲁棒,相比于L2损失函数,其对离群点、异常值(outlier)不敏感,可控制梯度的量级使训练时不容易跑飞。
这里写图片描述

最后总损失为(两者加权和,如果分类为背景则不考虑定位损失):
这里写图片描述
规定u=0为背景类(也就是负标签),那么艾弗森括号指数函数[u≥1]表示背景候选区域即负样本不参与回归损失,不需要对候选区域进行回归操作。λ控制分类损失和回归损失的平衡。Fast R-CNN论文中,所有实验λ=1。
艾弗森括号指数函数为:
这里写图片描述
源码中bbox_loss_weights用于标记每一个bbox是否属于某一个类。

Code

附上作者的源码 rbgirshick/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_end2end/train.prototxt

name: "ZF"
layer {
  name: 'input-data'
  type: 'Python'
  top: 'data'
  top: 'im_info'
  top: 'gt_boxes'
  python_param {
    module: 'roi_data_layer.layer'
    layer: 'RoIDataLayer'
    param_str: "'num_classes': 21"
  }
}

#========= conv1-conv5 ============

layer {
	name: "conv1"
	type: "Convolution"
	bottom: "data"
	top: "conv1"
	param { lr_mult: 1.0 }
	param { lr_mult: 2.0 }
	convolution_param {
		num_output: 96
		kernel_size: 7
		pad: 3
		stride: 2
	}
}
layer {
	name: "relu1"
	type: "ReLU"
	bottom: "conv1"
	top: "conv1"
}
layer {
	name: "norm1"
	type: "LRN"
	bottom: "conv1"
	top: "norm1"
	lrn_param {
		local_size: 3
		alpha: 0.00005
		beta: 0.75
		norm_region: WITHIN_CHANNEL
    engine: CAFFE
	}
}
layer {
	name: "pool1"
	type: "Pooling"
	bottom: "norm1"
	top: "pool1"
	pooling_param {
		kernel_size: 3
		stride: 2
		pad: 1
		pool: MAX
	}
}
layer {
	name: "conv2"
	type: "Convolution"
	bottom: "pool1"
	top: "conv2"
	param { lr_mult: 1.0 }
	param { lr_mult: 2.0 }
	convolution_param {
		num_output: 256
		kernel_size: 5
		pad: 2
		stride: 2
	}
}
layer {
	name: "relu2"
	type: "ReLU"
	bottom: "conv2"
	top: "conv2"
}
layer {
	name: "norm2"
	type: "LRN"
	bottom: "conv2"
	top: "norm2"
	lrn_param {
		local_size: 3
		alpha: 0.00005
		beta: 0.75
		norm_region: WITHIN_CHANNEL
    engine: CAFFE
	}
}
layer {
	name: "pool2"
	type: "Pooling"
	bottom: "norm2"
	top: "pool2"
	pooling_param {
		kernel_size: 3
		stride: 2
		pad: 1
		pool: MAX
	}
}
layer {
	name: "conv3"
	type: "Convolution"
	bottom: "pool2"
	top: "conv3"
	param { lr_mult: 1.0 }
	param { lr_mult: 2.0 }
	convolution_param {
		num_output: 384
		kernel_size: 3
		pad: 1
		stride: 1
	}
}
layer {
	name: "relu3"
	type: "ReLU"
	bottom: "conv3"
	top: "conv3"
}
layer {
	name: "conv4"
	type: "Convolution"
	bottom: "conv3"
	top: "conv4"
	param { lr_mult: 1.0 }
	param { lr_mult: 2.0 }
	convolution_param {
		num_output: 384
		kernel_size: 3
		pad: 1
		stride: 1
	}
}
layer {
	name: "relu4"
	type: "ReLU"
	bottom: "conv4"
	top: "conv4"
}
layer {
	name: "conv5"
	type: "Convolution"
	bottom: "conv4"
	top: "conv5"
	param { lr_mult: 1.0 }
	param { lr_mult: 2.0 }
	convolution_param {
		num_output: 256
		kernel_size: 3
		pad: 1
		stride: 1
	}
}
layer {
	name: "relu5"
	type: "ReLU"
	bottom: "conv5"
	top: "conv5"
}

#========= RPN ============

layer {
  name: "rpn_conv/3x3"
  type: "Convolution"
  bottom: "conv5"
  top: "rpn/output"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 256
    kernel_size: 3 pad: 1 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}
layer {
  name: "rpn_relu/3x3"
  type: "ReLU"
  bottom: "rpn/output"
  top: "rpn/output"
}

#layer {
#  name: "rpn_conv/3x3"
#  type: "Convolution"
#  bottom: "conv5"
#  top: "rpn_conv/3x3"
#  param { lr_mult: 1.0 }
#  param { lr_mult: 2.0 }
#  convolution_param {
#    num_output: 192
#    kernel_size: 3 pad: 1 stride: 1
#    weight_filler { type: "gaussian" std: 0.01 }
#    bias_filler { type: "constant" value: 0 }
#  }
#}
#layer {
#  name: "rpn_conv/5x5"
#  type: "Convolution"
#  bottom: "conv5"
#  top: "rpn_conv/5x5"
#  param { lr_mult: 1.0 }
#  param { lr_mult: 2.0 }
#  convolution_param {
#    num_output: 64
#    kernel_size: 5 pad: 2 stride: 1
#    weight_filler { type: "gaussian" std: 0.0036 }
#    bias_filler { type: "constant" value: 0 }
#  }
#}
#layer {
#  name: "rpn/output"
#  type: "Concat"
#  bottom: "rpn_conv/3x3"
#  bottom: "rpn_conv/5x5"
#  top: "rpn/output"
#}
#layer {
#  name: "rpn_relu/output"
#  type: "ReLU"
#  bottom: "rpn/output"
#  top: "rpn/output"
#}

layer {
  name: "rpn_cls_score"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_cls_score"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 18   # 2(bg/fg) * 9(anchors)
    kernel_size: 1 pad: 0 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}
layer {
  name: "rpn_bbox_pred"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_bbox_pred"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 36   # 4 * 9(anchors)
    kernel_size: 1 pad: 0 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}
layer {
   bottom: "rpn_cls_score"
   top: "rpn_cls_score_reshape"
   name: "rpn_cls_score_reshape"
   type: "Reshape"
   reshape_param { shape { dim: 0 dim: 2 dim: -1 dim: 0 } }
}
layer {
  name: 'rpn-data'
  type: 'Python'
  bottom: 'rpn_cls_score'
  bottom: 'gt_boxes'
  bottom: 'im_info'
  bottom: 'data'
  top: 'rpn_labels'
  top: 'rpn_bbox_targets'
  top: 'rpn_bbox_inside_weights'
  top: 'rpn_bbox_outside_weights'
  python_param {
    module: 'rpn.anchor_target_layer'
    layer: 'AnchorTargetLayer'
    param_str: "'feat_stride': 16"
  }
}
layer {
  name: "rpn_loss_cls"
  type: "SoftmaxWithLoss"
  bottom: "rpn_cls_score_reshape"
  bottom: "rpn_labels"
  propagate_down: 1
  propagate_down: 0
  top: "rpn_cls_loss"
  loss_weight: 1
  loss_param {
    ignore_label: -1
    normalize: true
  }
}
layer {
  name: "rpn_loss_bbox"
  type: "SmoothL1Loss"
  bottom: "rpn_bbox_pred"
  bottom: "rpn_bbox_targets"
  bottom: 'rpn_bbox_inside_weights'
  bottom: 'rpn_bbox_outside_weights'
  top: "rpn_loss_bbox"
  loss_weight: 1
  smooth_l1_loss_param { sigma: 3.0 }
}

#========= RoI Proposal ============

layer {
  name: "rpn_cls_prob"
  type: "Softmax"
  bottom: "rpn_cls_score_reshape"
  top: "rpn_cls_prob"
}
layer {
  name: 'rpn_cls_prob_reshape'
  type: 'Reshape'
  bottom: 'rpn_cls_prob'
  top: 'rpn_cls_prob_reshape'
  reshape_param { shape { dim: 0 dim: 18 dim: -1 dim: 0 } }
}
layer {
  name: 'proposal'
  type: 'Python'
  bottom: 'rpn_cls_prob_reshape'
  bottom: 'rpn_bbox_pred'
  bottom: 'im_info'
  top: 'rpn_rois'
#  top: 'rpn_scores'
  python_param {
    module: 'rpn.proposal_layer'
    layer: 'ProposalLayer'
    param_str: "'feat_stride': 16"
  }
}
#layer {
#  name: 'debug-data'
#  type: 'Python'
#  bottom: 'data'
#  bottom: 'rpn_rois'
#  bottom: 'rpn_scores'
#  python_param {
#    module: 'rpn.debug_layer'
#    layer: 'RPNDebugLayer'
#  }
#}
layer {
  name: 'roi-data'
  type: 'Python'
  bottom: 'rpn_rois'
  bottom: 'gt_boxes'
  top: 'rois'
  top: 'labels'
  top: 'bbox_targets'
  top: 'bbox_inside_weights'
  top: 'bbox_outside_weights'
  python_param {
    module: 'rpn.proposal_target_layer'
    layer: 'ProposalTargetLayer'
    param_str: "'num_classes': 21"
  }
}

#========= RCNN ============

layer {
  name: "roi_pool_conv5"
  type: "ROIPooling"
  bottom: "conv5"
  bottom: "rois"
  top: "roi_pool_conv5"
  roi_pooling_param {
    pooled_w: 6
    pooled_h: 6
    spatial_scale: 0.0625 # 1/16
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "roi_pool_conv5"
  top: "fc6"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
    scale_train: false
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
    scale_train: false
  }
}
layer {
  name: "cls_score"
  type: "InnerProduct"
  bottom: "fc7"
  top: "cls_score"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  inner_product_param {
    num_output: 21
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "bbox_pred"
  type: "InnerProduct"
  bottom: "fc7"
  top: "bbox_pred"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  inner_product_param {
    num_output: 84
    weight_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "loss_cls"
  type: "SoftmaxWithLoss"
  bottom: "cls_score"
  bottom: "labels"
  propagate_down: 1
  propagate_down: 0
  top: "cls_loss"
  loss_weight: 1
  loss_param {
    ignore_label: -1
    normalize: true
  }
}
layer {
  name: "loss_bbox"
  type: "SmoothL1Loss"
  bottom: "bbox_pred"
  bottom: "bbox_targets"
  bottom: 'bbox_inside_weights'
  bottom: 'bbox_outside_weights'
  top: "bbox_loss"
  loss_weight: 1
}
根据引用\[1\]中提到的信息,Faster-RCNN网络结构包括四个关键模块:特征提取网络、生成ROI、ROI分类和ROI回归。特征提取网络用于提取像中的特征,生成ROI模块用于生成候选区域,ROI分类模块用于对候选区域进行分类,ROI回归模块用于对候选区域进行位置回归。这四个模块通过一个神经网络结合在一起,形成了Faster-RCNN的端到端网络。 另外,根据引用\[2\]中提到的信息,Faster-RCNN的训练过程包括以下几个步骤:首先,使用ImageNet模型初始化一个RPN网络,并进行独立训练;然后,使用上一步RPN网络生成的proposal作为输入,使用ImageNet模型初始化一个Fast-RCNN网络,并进行训练;接下来,使用第二步的Fast-RCNN网络参数初始化一个新的RPN网络,并只更新RPN特有的网络层进行重新训练;最后,固定共享的网络层,将Fast-RCNN特有的网络层加入进来,形成一个统一的网络,并继续训练,对Fast-RCNN特有的网络层进行fine-tune。通过这些步骤,Faster-RCNN实现了网络内部预测proposal并实现检测的功能。 综上所述,Faster-RCNN网络结构包括特征提取网络、生成ROI、ROI分类和ROI回归四个关键模块,并通过一系列训练步骤进行优化和调整。 #### 引用[.reference_title] - *1* *2* [Faster-rcnn详解](https://blog.csdn.net/WZZ18191171661/article/details/79439212)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [Faster RCNN 网络分析及维度分析](https://blog.csdn.net/qq_23981335/article/details/121017168)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值