Faster RCNN源码解读2-_anchor_component()为图像建立anchors（核心和关键1）

最新推荐文章于 2023-03-21 19:32:01 发布

业余狙击手19

最新推荐文章于 2023-03-21 19:32:01 发布

阅读量1.1k

点赞数 1

分类专栏： # 目标检测算法

本文链接：https://blog.csdn.net/sxlsxl119/article/details/101363596

版权

目标检测算法专栏收录该内容

28 篇文章 17 订阅

订阅专栏

Faster RCNN复现

Faster RCNN源码解读1-整体流程和各个子流程梳理

Faster RCNN源码解读2-_anchor_component()为图像建立anchors（核心和关键1）

Faster RCNN源码解读3.1-_region_proposal() 筛选anchors-_proposal_layer()（核心和关键2）

Faster RCNN源码解读3.2-_region_proposal()筛选anchors-_anchor_target_layer()（核心和关键2）

Faster RCNN源码解读3.3-_region_proposal() 筛选anchors-_proposal_target_layer()（核心和关键2）

Faster RCNN源码解读4-其他收尾工作：ROI_pooling、分类、回归等

Faster RCNN源码解读5-损失函数

理论介绍：有关Faster RCNN理论介绍的文章，可以自行搜索，这里就不多说理论部分了。

复现过程：代码配置过程没有记录，具体怎么把源码跑起来需要自己搜索一下。

faster rcnn源码确实挺复杂的，虽然一步步解析了，但是觉得还是没有领会其中的精髓，只能算是略知皮毛。在这里将代码解析的过程给大家分享一下，希望对大家有帮助。先是解析了代码的整体结构，然后对各个子结构进行了分析。代码中的注释，有的是原来就有的注释，有的是参考网上别人的，有的是自己理解的，里面或多或少会有些错误，如果发现，欢迎指正！

本文解析的源码地址：https://github.com/lijianaiml/tf-faster-rcnn-windows

本文将lib/layer_utils/snippets.py中generate_anchors_pre_tf()代码摘出来，用特例数据运行一下，看一下它是怎么根据特征图的宽高生成w*h*9个anchors的。

# --------------------------------------------------------
# Tensorflow Faster R-CNN
# Licensed under The MIT License [see LICENSE for details]
# Written by Xinlei Chen
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
import numpy as np

def Mygenerate_anchors():

  array=([[ -83.,  -39.,  100.,   56.],
        [-175.,  -87.,  192.,  104.],
        [-359., -183.,  376.,  200.],
        [ -55.,  -55.,   72.,   72.],
        [-119., -119.,  136.,  136.],
        [-247., -247.,  264.,  264.],
        [ -35.,  -79.,   52.,   96.],
        [ -79., -167.,   96.,  184.],
        [-167., -343.,  184.,  360.]])
  temp = np.array(array)
  return temp


'''
通过特征图宽高，_feat_stride（特征图对原始图缩小的比例）等得到图像上的所有可能的anchors
(坐标可能超出原始图像边界)和anchor数量
generate_anchors_pre_tf步骤如下：
  1,通过_ratio_enum得到anchor时，使用 (0, 0, 15, 15) 的基准窗口，先通过ratio=[0.5,1,2]
  的比例得到anchors。ratio指的是像素总数（宽*高）的比例，而不是宽或者高的比例，
  得到如下三个archor（每个archor为左上角和右下角的坐标）：
  2,而后在通过scales=(8, 16, 32)得到放大倍数后的anchors。scales时，将上面的每个都直接放
  大对应的倍数，最终得到9个anchors（每个anchor为左上角和右下角的坐标）。将上面三个anchors
  分别放大就行了，因而本文未给出该图。
  3,之后通过tf.add(anchor_constant, shifts)得到缩放后的每个点的9个anchor在原始图的矩形框。
  anchor_constant：1*9*4。shifts：N*1*4。N为缩放后特征图的像素数。将维度从N*9*4变换到
  (N*9)*4，得到缩放后的图像每个点在原始图像中的anchors。
'''
def generate_anchors_pre_tf(sess,height, width, feat_stride=16, anchor_scales=(8, 16, 32), anchor_ratios=(0.5, 1, 2)):

  #test
  width = 10
  height=5


  shift_x = tf.range(width) * feat_stride # 得到原图的宽  x=tf.range(8.0, 13.0, 2.0) #输出x=[  8.  10.  12.]
  shift_y = tf.range(height) * feat_stride # 得到原图的高
  print("shift_x:",'\n', sess.run(shift_x))
  print("shift_y:",'\n', sess.run(shift_y))

  shift_x, shift_y = tf.meshgrid(shift_x, shift_y) #meshgrid用于从数组a和b产生网格。生成的网格矩阵A和B大小是相同的。
  print("meshgrid_shift_x:",'\n', sess.run(shift_x))
  print("meshgrid_shift_y:",'\n', sess.run(shift_y))

  sx = tf.reshape(shift_x, shape=(-1,))
  sy = tf.reshape(shift_y, shape=(-1,))
  print("sx:",'\n', sess.run(sx))
  print("sy:",'\n', sess.run(sy))

  shifts = tf.transpose(tf.stack([sx, sy, sx, sy])) #tf.transpose() : 对tensor进行转置
  print("shifts:", '\n', sess.run(shifts))
  #tf.stack()是一个矩阵拼接函数，即将秩为 R 的张量列表堆叠成一个秩为 (R+1) 的张量。
  K = tf.multiply(width, height)  #tf.multiply（）两个矩阵中对应元素各自相乘
  print("K:", '\n', sess.run(K))

  shifts = tf.transpose(tf.reshape(shifts, shape=[1, K, 4]), perm=(1, 0, 2))
  print("transpose_shifts:", '\n', sess.run(shifts))

  #lib/layer_utils/generate_anchors.py中  根据基准的anchor[0,0,15,15]生成9个基本的anchors
  # anchors = generate_anchors(ratios=np.array(anchor_ratios), scales=np.array(anchor_scales))
  anchors =Mygenerate_anchors()
  A = anchors.shape[0]  #A是基本anchor的数量，值为9
  anchor_constant = tf.constant(anchors.reshape((1, A, 4)), dtype=tf.int32)
  print("anchor_constant:", '\n', sess.run(anchor_constant))

  length = K * A  #K是特征图宽高乘积，这里K*A得到生成anchors的总数

  #通过tf.add(anchor_constant, shifts)得到缩放后的每个点的9个anchor在原始图的矩形框。
  #anchor_constant：1 * 9 * 4。shifts：N * 1 * 4。N为缩放后特征图的像素数。将维度从N * 9 * 4变换到
  #(N * 9) * 4，得到缩放后的图像每个点在原始图像中的anchors。
  anchors_tf = tf.reshape(tf.add(anchor_constant, shifts), shape=(length, 4))
  print("anchors_tf:", '\n', sess.run(anchors_tf))
  print("anchors_tf_length:", '\n', anchors_tf.shape)

  #tf.cast()函数的作用是执行tensorflow中张量数据类型转换，比如读入的图片如果是int8类型的，
  #一般在要在训练前把图像的数据格式转换为float32。
  return tf.cast(anchors_tf, dtype=tf.float32), length


if __name__ == '__main__':
    sess = tf.Session()
    sess.run(tf.global_variables_initializer())
    generate_anchors_pre_tf(sess,5, 10, feat_stride=16, anchor_scales=(8, 16, 32), anchor_ratios=(0.5, 1, 2))

    print('Called with args:')

运行结果：