AnchorBox的一些理解

最新推荐文章于 2025-02-20 20:37:15 发布

转载最新推荐文章于 2025-02-20 20:37:15 发布 · 7.2k 阅读

文章标签：

#CNN #机器学习 #神经网络 #anchor #深度学习

机器学习基本框架理解专栏收录该内容

3 篇文章

订阅专栏

Anchor box相当于对对一个中心点，取不同的窗口，从而用来检测重叠在一起的多个目标量。

首先我们需要知道anchor的本质是什么，本质是SPP(spatial pyramid pooling)思想的逆向。而SPP本身是做什么的呢，就是将不同尺寸的输入resize成为相同尺寸的输出。所以SPP的逆向就是，将相同尺寸的输出，倒推得到不同尺寸的输入。

接下来是anchor的窗口尺寸，这个不难理解，三个面积尺寸（128^2，256^2，512^2），然后在每个面积尺寸下，取三种不同的长宽比例（1:1,1:2,2:1）.这样一来，我们得到了一共9种面积尺寸各异的anchor。示意图如下：

对于每个3x3的窗口，作者就计算这个滑动窗口的中心点所对应的原始图片的中心点。然后作者假定，这个3x3窗口，是从原始图片上通过SPP池化得到的，而这个池化的区域的面积以及比例，就是一个个的anchor。换句话说，对于每个3x3窗口，作者假定它来自9种不同原始区域的池化，但是这些池化在原始图片中的中心点，都完全一样。这个中心点，就是刚才提到的，3x3窗口中心点所对应的原始图片中的中心点。如此一来，在每个窗口位置，我们都可以根据9个不同长宽比例、不同面积的anchor，逆向推导出它所对应的原始图片中的一个区域，这个区域的尺寸以及坐标，

相关源码

下面的是生成anchorbox的源码，可以参考看，一些地方我加了些注释方便理解

# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""Generates grid anchors on the fly as used in Faster RCNN.

Generates grid anchors on the fly as described in:
"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks"
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun.
"""

import tensorflow as tf

from object_detection.core import anchor_generator
from object_detection.core import box_list
from object_detection.utils import ops


class GridAnchorGenerator(anchor_generator.AnchorGenerator):
  """Generates a grid of anchors at given scales and aspect ratios."""

  def __init__(self,
               scales=(0.5, 1.0, 2.0),
               aspect_ratios=(0.5, 1.0, 2.0),
               base_anchor_size=None,
               anchor_stride=None,
               anchor_offset=None):
    """Constructs a GridAnchorGenerator.

    Args:
      scales: a list of (float) scales, default=(0.5, 1.0, 2.0)
      aspect_ratios: a list of (float) aspect ratios, default=(0.5, 1.0, 2.0)
      base_anchor_size: base anchor size as height, width (
                        (length-2 float32 list, default=[256, 256])
      anchor_stride: difference in centers between base anchors for adjacent
                     grid positions (length-2 float32 list, default=[16, 16])
      anchor_offset: center of the anchor with scale and aspect ratio 1 for the
                     upper left element of the grid, this should be zero for
                     feature networks with only VALID padding and even receptive
                     field size, but may need additional calculation if other
                     padding is used (length-2 float32 tensor, default=[0, 0])
    """
    # Handle argument defaults
    if base_anchor_size is None:
      base_anchor_size = [256, 256]
    base_anchor_size = tf.constant(base_anchor_size, tf.float32)
    if anchor_stride is None:
      anchor_stride = [16, 16]
    anchor_stride = tf.constant(anchor_stride, dtype=tf.float32)
    if anchor_offset is None:
      anchor_offset = [0, 0]
    anchor_offset = tf.constant(anchor_offset, dtype=tf.float32)

    self._scales = scales
    self._aspect_ratios = aspect_ratios
    self._base_anchor_size = base_anchor_size
    self._anchor_stride = anchor_stride
    self._anchor_offset = anchor_offset

  def name_scope(self):
    return 'GridAnchorGenerator'

  def num_anchors_per_location(self):
    """Returns the number of anchors per spatial location.

    Returns:
      a list of integers, one for each expected feature map to be passed to
      the `generate` function.
    """
    return [len(self._scales) * len(self._aspect_ratios)]

  def _generate(self, feature_map_shape_list):
    """Generates a collection of bounding boxes to be used as anchors.

    Args:
      feature_map_shape_list: list of pairs of convnet layer resolutions in the
        format [(height_0, width_0)].  For example, setting
        feature_map_shape_list=[(8, 8)] asks for anchors that correspond
        to an 8x8 layer.  For this anchor generator, only lists of length 1 are
        allowed.

    Returns:
      boxes: a BoxList holding a collection of N anchor boxes
    Raises:
      ValueError: if feature_map_shape_list, box_specs_list do not have the same
        length.
      ValueError: if feature_map_shape_list does not consist of pairs of
        integers
    """
    if not (isinstance(feature_map_shape_list, list)
            and len(feature_map_shape_list) == 1):
      raise ValueError('feature_map_shape_list must be a list of length 1.')
    if not all([isinstance(list_item, tuple) and len(list_item) == 2
                for list_item in feature_map_shape_list]):
      raise ValueError('feature_map_shape_list must be a list of pairs.')
    # grid_height, grid_width就是featuremap的size，在前面提到的例子中也就是75，100
    grid_height, grid_width = feature_map_shape_list[0]
    #scales=(0.5, 1.0, 2.0),aspect_ratios=(0.5, 1.0, 2.0) 
    # 这个操作会生成枚举值，也就是（scales_grid[i],aspect_ratios_grid[i]）对应scale和aspect_ratio的9种组合
    scales_grid, aspect_ratios_grid = ops.meshgrid(self._scales,
                                                   self._aspect_ratios)
    scales_grid = tf.reshape(scales_grid, [-1])
    aspect_ratios_grid = tf.reshape(aspect_ratios_grid, [-1])
    return  tile_anchors(grid_height,
                        grid_width,
                        scales_grid,
                        aspect_ratios_grid,
                        self._base_anchor_size,
                        self._anchor_stride,
                        self._anchor_offset)


def tile_anchors(grid_height,
                 grid_width,
                 scales,
                 aspect_ratios,
                 base_anchor_size,
                 anchor_stride,
                 anchor_offset):
  """Create a tiled set of anchors strided along a grid in image space.

  This op creates a set of anchor boxes by placing a "basis" collection of
  boxes with user-specified scales and aspect ratios centered at evenly
  distributed points along a grid.  The basis collection is specified via the
  scale and aspect_ratios arguments.  For example, setting scales=[.1, .2, .2]
  and aspect ratios = [2,2,1/2] means that we create three boxes: one with scale
  .1, aspect ratio 2, one with scale .2, aspect ratio 2, and one with scale .2
  and aspect ratio 1/2.  Each box is multiplied by "base_anchor_size" before
  placing it over its respective center.

  Grid points are specified via grid_height, grid_width parameters as well as
  the anchor_stride and anchor_offset parameters.

  Args:
    grid_height: size of the grid in the y direction (int or int scalar tensor)
    grid_width: size of the grid in the x direction (int or int scalar tensor)
    上面两个是feature map的size，即卷积核输出的特征层的长宽
    scales: a 1-d  (float) tensor representing the scale of each box in the
      basis set.面积缩放刻度
    aspect_ratios: a 1-d (float) tensor representing the aspect ratio of each
      box in the basis set.  The length of the scales and aspect_ratios tensors
      must be equal.长宽比
    base_anchor_size: base anchor size as [height, width]
      (float tensor of shape [2])
    anchor_stride: difference in centers between base anchors for adjacent grid
                   positions (float tensor of shape [2])anchor移动步长
    anchor_offset: center of the anchor with scale and aspect ratio 1 for the
                   upper left element of the grid, this should be zero for
                   feature networks with only VALID padding and even receptive
                   field size, but may need some additional calculation if other
                   padding is used (float tensor of shape [2])
  Returns:
    a BoxList holding a collection of N anchor boxes
  """

  '''
      下面这三行操作解释一下，这是要算出变换后的矩形的宽高，可以自己算一下。
      设：
          W: base anchor size的宽度
          H: base anchor size的高度
          w: 变换之后的宽度
          h: 变换之后的高度
          s: 面积缩放（scale）的值
          r: 宽和高的比值
      然后列出等式：
          w/h = r
          w*h = W*H*(s^2)
      算一下w和h的值就好了，而且他这里还有个小bug，就是base anchor size的长宽不一样的时候算的值是不对的，
      但是这个一般都是一样的，所以无所谓了。  
  '''
  ratio_sqrts = tf.sqrt(aspect_ratios) 
  heights = scales / ratio_sqrts * base_anchor_size[0]
  widths = scales * ratio_sqrts * base_anchor_size[1]

  # Get a grid of box centers
  y_centers = tf.to_float(tf.range(grid_height))
  y_centers = y_centers * anchor_stride[0] + anchor_offset[0]
  # output： [array([  0.,   8.,  16.,  24.,  32.,  40.,  48.,  56.,  64.,  72.,  80.,
  #       88.,  96., 104., 112., 120., 128., 136., 144., 152., 160., 168.,
  #      176., 184., 192., 200., 208., 216., 224., 232., 240., 248., 256.,
  #      264., 272., 280., 288., 296., 304., 312., 320., 328., 336., 344.,
  #      352., 360., 368., 376., 384., 392., 400., 408., 416., 424., 432.,
  #      440., 448., 456., 464., 472., 480., 488., 496., 504., 512., 520.,
  #      528., 536., 544., 552., 560., 568., 576., 584., 592.],
  #     dtype=float32),
  x_centers = tf.to_float(tf.range(grid_width))
  x_centers = x_centers * anchor_stride[1] + anchor_offset[1]
  # output： array([  0.,   8.,  16.,  24.,  32.,  40.,  48.,  56.,  64.,  72.,  80.,
  #       88.,  96., 104., 112., 120., 128., 136., 144., 152., 160., 168.,
  #      176., 184., 192., 200., 208., 216., 224., 232., 240., 248., 256.,
  #      264., 272., 280., 288., 296., 304., 312., 320., 328., 336., 344.,
  #      352., 360., 368., 376., 384., 392., 400., 408., 416., 424., 432.,
  #      440., 448., 456., 464., 472., 480., 488., 496., 504., 512., 520.,
  #      528., 536., 544., 552., 560., 568., 576., 584., 592., 600., 608.,
  #      616., 624., 632., 640., 648., 656., 664., 672., 680., 688., 696.,
  #      704., 712., 720., 728., 736., 744., 752., 760., 768., 776., 784.,
  #      792.], dtype=float32)]

  # 下面就是算一下坐标和anchorbox的值了~
  x_centers, y_centers = ops.meshgrid(x_centers, y_centers)

  widths_grid, x_centers_grid = ops.meshgrid(widths, x_centers)
  heights_grid, y_centers_grid = ops.meshgrid(heights, y_centers)
  bbox_centers = tf.stack([y_centers_grid, x_centers_grid], axis=3)
  bbox_sizes = tf.stack([heights_grid, widths_grid], axis=3)
  bbox_centers = tf.reshape(bbox_centers, [-1, 2])
  bbox_sizes = tf.reshape(bbox_sizes, [-1, 2])
  bbox_corners = _center_size_bbox_to_corners_bbox(bbox_centers, bbox_sizes)
  return box_list.BoxList(bbox_corners)


def _center_size_bbox_to_corners_bbox(centers, sizes):
  """Converts bbox center-size representation to corners representation.

  Args:
    centers: a tensor with shape [N, 2] representing bounding box centers
    sizes: a tensor with shape [N, 2] representing bounding boxes

  Returns:
    corners: tensor with shape [N, 4] representing bounding boxes in corners
      representation
  """
  return tf.concat([centers - .5 * sizes, centers + .5 * sizes], 1)

https://blog.csdn.net/qian99/article/details/79942591

https://blog.csdn.net/zkq_1986/article/details/78975379

https://blog.csdn.net/sinat_24143931/article/details/78773936