Tensorflow2.0：Faster RCNN 代码详解(二)

最新推荐文章于 2023-05-25 15:19:09 发布

DocPark

最新推荐文章于 2023-05-25 15:19:09 发布

阅读量1.6k

点赞数 1

分类专栏： Tensorflow2.0 文章标签：深度学习 tensorflow

本文链接：https://blog.csdn.net/weixin_45288820/article/details/104683685

版权

本文深入解析Tensorflow2.0中Faster RCNN的实现，重点在于rpn_heads文件及与其紧密相关的loss和anchor_target代码。内容包括对rpn_heads、loss、anchor_target代码的详细解析，帮助理解Faster RCNN中RPN网络的工作原理。

摘要由CSDN通过智能技术生成

这次重点解析在Tensorflow2.0：Faster RCNN 代码详解(一)中关于引用rpn_heads文件函数的代码，该代码实际上是编写了Faster RCNN 中RPN网络的过程。

import tensorflow as tf
from tensorflow.keras import layers
from detection.core.anchor import anchor_generator, anchor_target
from detection.core.loss import losses
from detection.core.bbox import transforms
from detection.utils.misc import *

根据上述可以看出，rpn_heads文件引入了loss，anchor_generator, anchor_target，transforms和misc。其中，loss和anchor_generator, anchor_targetr比较关键，在其余部分给出解析。

第一部分针对rpn_heads文件代码解析



class RPNHead(tf.keras.Model):
    def __init__(self, 
                 anchor_scales=(32, 64, 128, 256, 512), 
                 anchor_ratios=(0.5, 1, 2), 
                 anchor_feature_strides=(4, 8, 16, 32, 64),
                 proposal_count=2000, 
                 nms_threshold=0.7, 
                 target_means=(0., 0., 0., 0.), 
                 target_stds=(0.1, 0.1, 0.2, 0.2), 
                 num_rpn_deltas=256,
                 positive_fraction=0.5,
                 pos_iou_thr=0.7,
                 neg_iou_thr=0.3,
                 **kwags):
        '''
        Network head of Region Proposal Network.

                                      / - rpn_cls (1x1 conv)
        input - rpn_conv (3x3 conv) -
                                      \ - rpn_reg (1x1 conv)

        Attributes
        ---
            anchor_scales: 1D array of anchor sizes in pixels.
            anchor_ratios: 1D array of anchor ratios of width/height.
            anchor_feature_strides: Stride of the feature map relative 
                to the image in pixels.
            proposal_count: int. RPN proposals kept after non-maximum 
                suppression.
            nms_threshold: float. Non-maximum suppression threshold to 
                filter RPN proposals.
            target_means: [4] Bounding box refinement mean.
            target_stds: [4] Bounding box refinement standard deviation.
            num_rpn_deltas: int.
            positive_fraction: float.
            pos_iou_thr: float.
            neg_iou_thr: float.
        '''
        super(RPNHead, self).__init__(**kwags)
        
        self.proposal_count = proposal_count
        self.nms_threshold = nms_threshold
        self.target_means = target_means
        self.target_stds = target_stds

        self.generator = anchor_generator.AnchorGenerator(
            scales=anchor_scales, 
            ratios=anchor_ratios, 
            feature_strides=anchor_feature_strides)
        
        self.anchor_target = anchor_target.AnchorTarget(
            target_means=target_means, 
            target_stds=target_stds,
            num_rpn_deltas=num_rpn_deltas,
            positive_fraction=positive_fraction,
            pos_iou_thr=pos_iou_thr,
            neg_iou_thr=neg_iou_thr)
        
        self.rpn_class_loss = losses.rpn_class_loss
        self.rpn_bbox_loss = losses.rpn_bbox_loss
        
        
        # Shared convolutional base of the RPN
        self.rpn_conv_shared = layers.Conv2D(512, (3, 3), padding='same',
                                             kernel_initializer='he_normal', 
                                             name='rpn_conv_shared')
        
        self.rpn_class_raw = layers.Conv2D(len(anchor_ratios) * 2, (1, 1),
                                           kernel_initializer='he_normal', 
                                           name='rpn_class_raw')

        self.rpn_delta_pred = layers.Conv2D(len(anchor_ratios) * 4, (1, 1),
                                           kernel_initializer='he_normal', 
                                           name='rpn_bbox_pred')
        
    def call(self, inputs, training=True):
        '''
        Args
        ---
            inputs: [batch_size, feat_map_height, feat_map_width, channels] 
                one level of pyramid feat-maps.
        
        Returns
        ---
            rpn_class_logits: [batch_size, num_anchors, 2]
            rpn_probs: [batch_size, num_anchors, 2]
            rpn_deltas: [batch_size, num_anchors, 4]
        '''
        
        layer_outputs = []
        
        for feat in inputs:   # for every anchors feature maps
            """
            # 五种feature map
            (1, 304, 304, 256)
            (1, 152, 152, 256)
            (1, 76, 76, 256)
            (1, 38, 38, 256)
            (1, 19, 19, 256)
            对于一种feature maps来说(以feature map(1, 304, 304, 256)为例子)
            进行class转换
            rpn_class_raw: (1, 304, 304, 6)  # 一个cell有3种anchor，每个有两个值，一个是前景值，一个是背景景值
            rpn_class_logits: (1, 277248, 2) # 输出每个anchor的两个置信值
            进行位置转换
            rpn_delta_pred: (1, 304, 304, 12) # 每种anchor有四个数代表anchor的坐标
            rpn_deltas: (1, 277248, 4)  # 输出每个anchor
            
            """
            # feature map大小不变，经卷积后通道为512
            shared = self.rpn_conv_shared(feat)
            shared = tf.nn.relu(shared)
            # 输出每种可能性，即概率值
            x = self.rpn_class_raw(shared)  # (1, 304, 304, 2)
            rpn_class_logits = tf.reshape(x, [tf.shape(x)[0], -1, 2])  # (1, 277248, 2)
            # 计算每个预测框的前景值和后景值
            # 方法：tf.nn.softmax
            rpn_probs = tf.nn.softmax(rpn_class_logits)  # tf.nn.softmax可针对每一行数据进行计算
            # 计算每个预测框的偏移量
            x = self.rpn_delta_pred(shared)
            rpn_deltas = tf.reshape(x, [tf.shape(x)[0], -1, 4])
            layer_outputs.append([rpn_class_logits, rpn_probs, rpn_deltas])

            """
            Return：
            (1, 277248, 2) (1, 277248, 2) (1, 277248, 4)
            (1, 69312, 2) (1, 69312, 2) (1, 69312, 4)
            (1, 17328, 2) (1, 17328, 2) (1, 17328, 4)
            (1, 4332, 2) (1, 4332, 2) (1, 4332, 4)
            (1, 1083, 2) (1, 1083, 2) (1, 1083, 4)

            """

        # 将feature map进行卷积处理得到上面每个anchor的类别class，置信probability以及偏移量(dy, dx, log(dh), log(dw))
        # 方法：Zip在整理数据的优势：
        # 具体：layer_outputs是五个列表，每个列表三个元素。zip使得变成三个列表，每个列表五个元素，元素是以前的列
        outputs = list(zip(*layer_outputs))  # zip带星号是解压的意思
        outputs = [tf.concat(list(o), axis=1) for o in outputs]
        rpn_class_logits, rpn_probs, rpn_deltas = outputs
        # 输出格式为：[batch_size, num_anchors，class]，[batch_size, num_anchors，probability]，
        #                                              [batch_size, num_anchors，(dy, dx, log(dh), log(dw)]
        # 输出shape为： (1, 369303, 2) (1, 369303, 2) (1, 369303, 4)
        
        return rpn_class_logits, rpn_probs, rpn_deltas
# 计算误差
    def loss(self, rpn_class_logits, rpn_deltas, gt_boxes, gt_class_ids, img_metas):
        """

        :param rpn_class_logits: [N, 2]
        :param rpn_deltas: [N, 4]
        :param gt_boxes:  [GT_N]
        :param gt_class_ids:  [GT_N]
        :param img_metas: [11]
        :return:
        """
# 1.根据feature map输出所有预测框的坐标

        # 具体：在feature map上生成所有的预测框，并给出预测框是否超出图像边界的标签valid_flags，0为无效超出边界，1为有效
        # 输出的shape为：anchors: [batch_size, num_anchors, 4=(y1, x1, y2, x2)]
        #                valid_flags: [batch_size, num_anchors]
        anchors, valid_flags