
【Faster RCNN】损失函数理解:

1. 使用Smoooh L1 Loss的原因

2. Faster RCNN的损失函数

2.1 分类损失

2.2 回归损失



tensorflow+faster rcnn代码解析(二):anchor_target_layer、proposal_target_layer、proposal_layer

最近又重新学习了一遍Faster RCNN有挺多收获的,在此重新记录一下。

1. 使用Smoooh L1 Loss的原因



一个通常的解决办法是,分段函数,在0点附近使用平方函数使得它更加平滑。它被称之为平滑L1损失函数。它通过一个参数σ 来控制平滑的区域。一般情况下σ = 1,在faster rcnn函数中σ = 3

2. Faster RCNN的损失函数

Faster RCNN的的损失主要分为RPN的损失和Fast RCNN的损失,计算公式如下,并且两部分损失都包括分类损失(cls loss)回归损失(bbox regression loss)。

下面分别讲一下RPN和fast RCNN部分的损失。

2.1 分类损失





  1. rpn_cls_score = tf.reshape(self._predictions[ 'rpn_cls_score_reshape'], [ -1, 2]) #rpn_cls_score = (17100,2)
  2. rpn_label = tf.reshape(self._anchor_targets[ 'rpn_labels'], [ -1]) #rpn_label = (17100,)
  3. rpn_select = tf.where(tf.not_equal(rpn_label, -1)) #将不等于-1的labels选出来(也就是正负样本选出来),返回序号
  4. rpn_cls_score = tf.reshape(tf.gather(rpn_cls_score, rpn_select), [ -1, 2]) #同时选出对应的分数
  5. rpn_label = tf.reshape(tf.gather(rpn_label, rpn_select), [ -1])
  6. rpn_cross_entropy = tf.reduce_mean(
  7. tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_label))


  1. 代码第一行将其reshape变为(17100,2),行数表示anchor的数量,列数为前景和背景,表示属于前景和背景的分数。
  2. 代码第二行和第三行,将RPN的label也reshape成(17100,),即分别对应上anchor,然后从中选出不等于-1的,也就是选择出前景和背景,数量为Ncls,返回其index,为rpn_select。
  3. 代码第四行,根据index选择出对应的分数。
  4. 第五行,根据rpn_label和rpn_cls_score计算交叉熵损失。其中reduce_mean函数就是除以个数(Ncls)求平均

(2)Fast RCNN分类损失

RPN的分类损失时二分类的交叉熵损失,而Fast RCNN是多分类的交叉熵损失(当你训练的类别数>2时,这里假定类别数为5)。在Fast RCNN的训练过程中会选出128个rois,即Ncls = 128,标签的值就是0到4。代码为:

  1. cross_entropy = tf.reduce_mean(
  2. tf.nn.sparse_softmax_cross_entropy_with_logits(
  3. logits= tf.reshape(cls_score, [-1, self._num_classes]), labels=label))

2.2 回归损失

回归损失这块就RPN和Fast RCNN一起讲,公式为:


  •   是一个向量,表示anchor,RPN训练阶段(rois,FastRCNN阶段)预测的偏移量
  • 是与ti维度相同的向量,表示anchor,RPN训练阶段(rois,FastRCNN阶段)相对于gt实际的偏移量

R是smoothL1 函数,就是我们上面说的,不同之处是这里σ = 3,RPN训练(σ = 1,Fast RCNN训练),

对于每一个anchor 计算完部分后还要乘以P*,如前所述,P*有物体时(positive)为1,没有物体(negative)时为0,意味着只有前景才计算损失,背景不计算损失。inside_weights就是这个作用。

对于和Nreg的解释在RPN训练过程中如下(之所以以RPN训练为前提因为此时batch size = 256,如果是fast rcnn,batchsize = 128):



  1. def _smooth_l1_loss(self, bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights, sigma=1.0, dim=[1]):
  2. sigma_2 = sigma ** 2
  3. box_diff = bbox_pred - bbox_targets #ti-ti*
  4. in_box_diff = bbox_inside_weights * box_diff #前景才有计算损失的资格
  5. abs_in_box_diff = tf.abs(in_box_diff) #x = |ti-ti*|
  6. smoothL1_sign = tf.stop_gradient(tf.to_float(tf.less(abs_in_box_diff, 1. / sigma_2))) #判断smoothL1输入的大小,如果x = |ti-ti*|小于就返回1,否则返回0
  7. #计算smoothL1损失
  8. in_loss_box = tf.pow(in_box_diff, 2) * (sigma_2 / 2.) * smoothL1_sign + (abs_in_box_diff - ( 0.5 / sigma_2)) * ( 1. - smoothL1_sign)
  9. out_loss_box = bbox_outside_weights * in_loss_box
  10. loss_box = tf.reduce_mean(tf.reduce_sum(
  11. out_loss_box,
  12. axis=dim
  13. ))
  14. return loss_box


论文中把Ncls,Nreg和都看做是平衡分类损失和回归损失的归一化权重,但是我在看tensorflow代码实现faster rcnn的损失时发现(这里以fast rcnn部分的分类损失和box回归损失为例,如下),可以看到在计算分类损失时,并没有输入Ncls这个参数,只是在计算box回归损失的时候输入了outside_weights这个参数。这时候我才意识到分类损失是交叉熵函数,求和后会除以总数量,除以Ncls已经包含到交叉熵函数本身。


  1. # RCNN, class loss
  2. cls_score = self._predictions[ "cls_score"]
  3. label = tf.reshape( self._proposal_targets[ "labels"], [- 1])
  4. cross_entropy = tf.reduce_mean(
  5. tf.nn.sparse_softmax_cross_entropy_with_logits(
  6. logits=tf.reshape(cls_score, [- 1, self._num_classes]), labels=label))
  7. # RCNN, bbox loss
  8. bbox_pred = self._predictions[ 'bbox_pred'] #(128,12)
  9. bbox_targets = self._proposal_targets[ 'bbox_targets'] #(128,12)
  10. bbox_inside_weights = self._proposal_targets[ 'bbox_inside_weights'] #(128,12)
  11. bbox_outside_weights = self._proposal_targets[ 'bbox_outside_weights'] #(128,12)
  12. loss_box = self._smooth_l1_loss(bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights)





    第一个卷积层为:11*11*96即尺寸为11*11,有96个卷积核,步长为4,卷积层后跟ReLU,因此输出的尺寸为 224/4=56,去掉边缘为55,因此其输出的每个feature map 为 55*55*96,同时后面跟LRN层,尺寸不变.

    最大池化层,核大小为3*3,步长为2,因此feature map的大小为:27*27*96.



    卷积和的大小为: 5*5*256,步长为1,尺寸不会改变,同样紧跟ReLU,和LRN层.

    最大池化层,和大小为3*3,步长为2,因此feature map为:13*13*256



    第三层卷积为 3*3*384,步长为1,加上ReLU

    第四层卷积为 3*3*384,步长为1,加上ReLU

    第五层卷积为 3*3*256,步长为1,加上ReLU

    第五层后跟最大池化层,核大小3*3,步长为2,因此feature map:6*6*256


1. FC : 4096 + ReLU
2. FC:4096 + ReLU
3. FC: 1000 最后一层为softmax为1000类的概率值.

2. AlexNet中的trick














3. Tensorflow实现AlexNet

  1. def print_activations(t):
  2. print(, ' ', t.get_shape().as_list())


  1. def inference(images):
  2. """Build the AlexNet model.
  3. Args:
  4. images: Images Tensor
  5. Returns:
  6. pool5: the last Tensor in the convolutional component of AlexNet.
  7. parameters: a list of Tensors corresponding to the weights and biases of the
  8. AlexNet model.
  9. """
  10. parameters = []
  11. # conv1
  12. with tf.name_scope( 'conv1') as scope:
  13. kernel = tf.Variable(tf.truncated_normal([ 11, 11, 3, 96], dtype=tf.float32,
  14. stddev= 1e-1), name= 'weights')
  15. conv = tf.nn.conv2d(images, kernel, [ 1, 4, 4, 1], padding= 'VALID')
  16. biases = tf.Variable(tf.constant( 0.0, shape=[ 96], dtype=tf.float32),
  17. trainable= True, name= 'biases')
  18. bias = tf.nn.bias_add( conv, biases)
  19. conv1 = tf.nn.relu(bias, name= scope)
  20. print_activations(conv1)
  21. parameters += [kernel, biases]
  22. # lrn1
  23. # TODO(shlens, jiayq): Add a GPU version of local response normalization.
  24. # pool1
  25. pool1 = tf.nn.max_pool(conv1,
  26. ksize=[ 1, 3, 3, 1],
  27. strides=[ 1, 2, 2, 1],
  28. padding= 'VALID',
  29. name= 'pool1')
  30. print_activations(pool1)
  31. # conv2
  32. with tf.name_scope( 'conv2') as scope:
  33. kernel = tf.Variable(tf.truncated_normal([ 5, 5, 96, 256], dtype=tf.float32,
  34. stddev= 1e-1), name= 'weights')
  35. conv = tf.nn.conv2d(pool1, kernel, [ 1, 1, 1, 1], padding= 'SAME')
  36. biases = tf.Variable(tf.constant( 0.0, shape=[ 256], dtype=tf.float32),
  37. trainable= True, name= 'biases')
  38. bias = tf.nn.bias_add( conv, biases)
  39. conv2 = tf.nn.relu(bias, name= scope)
  40. parameters += [kernel, biases]
  41. print_activations(conv2)
  42. # pool2
  43. pool2 = tf.nn.max_pool(conv2,
  44. ksize=[ 1, 3, 3, 1],
  45. strides=[ 1, 2, 2, 1],
  46. padding= 'VALID',
  47. name= 'pool2')
  48. print_activations(pool2)
  49. # conv3
  50. with tf.name_scope( 'conv3') as scope:
  51. kernel = tf.Variable(tf.truncated_normal([ 3, 3, 256, 384],
  52. dtype=tf.float32,
  53. stddev= 1e-1), name= 'weights')
  54. conv = tf.nn.conv2d(pool2, kernel, [ 1, 1, 1, 1], padding= 'SAME')
  55. biases = tf.Variable(tf.constant( 0.0, shape=[ 384], dtype=tf.float32),
  56. trainable= True, name= 'biases')
  57. bias = tf.nn.bias_add( conv, biases)
  58. conv3 = tf.nn.relu(bias, name= scope)
  59. parameters += [kernel, biases]
  60. print_activations(conv3)
  61. # conv4
  62. with tf.name_scope( 'conv4') as scope:
  63. kernel = tf.Variable(tf.truncated_normal([ 3, 3, 384, 384],
  64. dtype=tf.float32,
  65. stddev= 1e-1), name= 'weights')
  66. conv = tf.nn.conv2d(conv3, kernel, [ 1, 1, 1, 1], padding= 'SAME')
  67. biases = tf.Variable(tf.constant( 0.0, shape=[ 384], dtype=tf.float32),
  68. trainable= True, name= 'biases')
  69. bias = tf.nn.bias_add( conv, biases)
  70. conv4 = tf.nn.relu(bias, name= scope)
  71. parameters += [kernel, biases]
  72. print_activations(conv4)
  73. # conv5
  74. with tf.name_scope( 'conv5') as scope:
  75. kernel = tf.Variable(tf.truncated_normal([ 3, 3, 384, 256],
  76. dtype=tf.float32,
  77. stddev= 1e-1), name= 'weights')
  78. conv = tf.nn.conv2d(conv4, kernel, [ 1, 1, 1, 1], padding= 'SAME')
  79. biases = tf.Variable(tf.constant( 0.0, shape=[ 256], dtype=tf.float32),
  80. trainable= True, name= 'biases')
  81. bias = tf.nn.bias_add( conv, biases)
  82. conv5 = tf.nn.relu(bias, name= scope)
  83. parameters += [kernel, biases]
  84. print_activations(conv5)
  85. # pool5
  86. pool5 = tf.nn.max_pool(conv5,
  87. ksize=[ 1, 3, 3, 1],
  88. strides=[ 1, 2, 2, 1],
  89. padding= 'VALID',
  90. name= 'pool5')
  91. print_activations(pool5)
  92. return pool5, parameters
  93. def time_tensorflow_run( session, target, info_string):
  94. """Run the computation to obtain the target tensor and print timing stats.
  95. Args:
  96. session: the TensorFlow session to run the computation under.
  97. target: the target Tensor that is passed to the session's run() function.
  98. info_string: a string summarizing this run, to be printed with the stats.
  99. Returns:
  100. None
  101. """
  102. num_steps_burn_in = 10
  103. total_duration = 0.0
  104. total_duration_squared = 0.0
  105. for i in xrange(FLAGS.num_batches + num_steps_burn_in):
  106. start_time = time.time()
  107. _ =
  108. duration = time.time() - start_time
  109. if i >= num_steps_burn_in:
  110. if not i % 10:
  111. print ( '%s: step %d, duration = %.3f' %
  112. (, i - num_steps_burn_in, duration))
  113. total_duration += duration
  114. total_duration_squared += duration * duration
  115. mn = total_duration / FLAGS.num_batches
  116. vr = total_duration_squared / FLAGS.num_batches - mn * mn
  117. sd = math.sqrt(vr)
  118. print ( '%s: %s across %d steps, %.3f +/- %.3f sec / batch' %
  119. (, info_string, FLAGS.num_batches, mn, sd))


  1. def run_benchmark():
  2. """Run the benchmark on AlexNet."""
  3. with tf.Graph().as_default():
  4. # Generate some dummy images.
  5. image_size = 224
  6. # Note that our padding definition is slightly different the cuda-convnet.
  7. # In order to force the model to start with the same activations sizes,
  8. # we add 3 to the image_size and employ VALID padding above.
  9. images = tf.Variable(tf.random_normal([FLAGS.batch_size,
  10. image_size,
  11. image_size, 3],
  12. dtype=tf.float32,
  13. stddev= 1e-1))
  14. # Build a Graph that computes the logits predictions from the
  15. # inference model.
  16. pool5, parameters = inference(images)
  17. # Build an initialization operation.
  18. init = tf.global_variables_initializer()
  19. # Start running operations on the Graph.
  20. config = tf.ConfigProto()
  21. config.gpu_options.allocator_type = 'BFC'
  22. sess = tf.Session(config=config)
  24. # Run the forward benchmark.
  25. time_tensorflow_run(sess, pool5, "Forward")
  26. # Add a simple objective so we can calculate the backward pass.
  27. objective = tf.nn.l2_loss(pool5)
  28. # Compute the gradient with respect to all the parameters.
  29. grad = tf.gradients(objective, parameters)
  30. # Run the backward benchmark.
  31. time_tensorflow_run(sess, grad, "Forward-backward")
  32. def main(_):
  33. run_benchmark()
  34. if __name__ == '__main__':
  35. parser = argparse.ArgumentParser()
  36. parser.add_argument(
  37. '--batch_size',
  38. type= int,
  39. default= 128,
  40. help= 'Batch size.'
  41. )
  42. parser.add_argument(
  43. '--num_batches',
  44. type= int,
  45. default= 100,
  46. help= 'Number of batches to run.'
  47. )
  48. FLAGS, unparsed = parser.parse_known_args()
  49. main= main, argv=[sys.argv[ 0]] + unparsed)


  1. conv1 [ 128, 54, 54, 96]
  2. pool1 [ 128, 26, 26, 96]
  3. conv2 [ 128, 26, 26, 256]
  4. pool2 [ 128, 12, 12, 256]
  5. conv3 [ 128, 12, 12, 384]
  6. conv4 [ 128, 12, 12, 384]
  7. conv5 [ 128, 12, 12, 256]
  8. pool5 [ 128, 5, 5, 256]


  1. 2018- 11- 27 17: 49: 36. 936271: step 0, duration = 0. 085
  2. 2018- 11- 27 17: 49: 37. 860652: step 10, duration = 0. 085
  3. 2018- 11- 27 17: 49: 38. 794103: step 20, duration = 0. 100
  4. 2018- 11- 27 17: 49: 39. 726452: step 30, duration = 0. 099
  5. 2018- 11- 27 17: 49: 40. 637597: step 40, duration = 0. 088
  6. 2018- 11- 27 17: 49: 41. 546659: step 50, duration = 0. 078
  7. 2018- 11- 27 17: 49: 42. 471295: step 60, duration = 0. 085
  8. 2018- 11- 27 17: 49: 43. 389295: step 70, duration = 0. 095
  9. 2018- 11- 27 17: 49: 44. 306961: step 80, duration = 0. 085
  10. 2018- 11- 27 17: 49: 45. 225164: step 90, duration = 0. 085
  11. 2018- 11- 27 17: 49: 46. 058470: Forward across 100 steps, 0. 092 +/- 0. 008 sec / batch
  12. 2018- 11- 27 17: 49: 50. 335397: step 0, duration = 0. 281
  13. 2018- 11- 27 17: 49: 53. 041129: step 10, duration = 0. 279
  14. 2018- 11- 27 17: 49: 55. 747921: step 20, duration = 0. 269
  15. 2018- 11- 27 17: 49: 58. 454006: step 30, duration = 0. 269
  16. 2018- 11- 27 17: 50: 01. 176237: step 40, duration = 0. 285
  17. 2018- 11- 27 17: 50: 03. 882712: step 50, duration = 0. 269
  18. 2018- 11- 27 17: 50: 06. 573259: step 60, duration = 0. 269
  19. 2018- 11- 27 17: 50: 09. 286011: step 70, duration = 0. 270
  20. 2018- 11- 27 17: 50: 12. 007992: step 80, duration = 0. 275
  21. 2018- 11- 27 17: 50: 14. 706777: step 90, duration = 0. 262
  22. 2018- 11- 27 17: 50: 17. 138761: Forward-backward across 100 steps, 0. 271 +/- 0. 006 sec / batch
  23. An exception has occurred, use %tb to see the full traceback.


