Faster-rcnn之RNP网络代码详解B

最新推荐文章于 2022-12-26 14:03:07 发布

jhsignal

最新推荐文章于 2022-12-26 14:03:07 发布

阅读量405

点赞数

文章标签：深度学习人工智能 python

本文链接：https://blog.csdn.net/jhsignal/article/details/117297763

版权

1. 首先将一个batch内的图片经过resnet50-fpn网络输出得到featrues，经过FPN层以后会输出5个特征图，在每个特征图上的每个点上进行预测3个坐标框

        # RPN uses all feature maps that are available
        # features是所有预测特征层组成的OrderedDict
        features = list(features.values())

        # 计算每个预测特征层上的预测目标概率和bboxes regression参数
        # objectness和pred_bbox_deltas都是list
        #他们的shape分别为[8,15,h,w]和[8,60,h,w]
        objectness, pred_bbox_deltas = rpn_head(features)

经过此函数之后输出就如图中黄色部分所示：
在这里插入图片描述

在这里插入图片描述

rpn_head = RPNHead(out_channels, rpn_anchor_generator.num_anchors_per_location()[0]

def num_anchors_per_location(self):
        # 计算每个预测特征层上每个滑动窗口的预测目标数
        return [len(s) * len(a) for s, a in zip(self.sizes, self.aspect_ratios)]

其中num_anchors就是rpn_anchor_generator.num_anchors_per_location的数目，而它的计算是由总共有多少个特征层输出乘以在每个特征层上预测框的个数。

例如这里的anchor_sizes和aspect_ratios分别代表预测在每个特征图上生成anchor的面积是多大，有几个比例。

           anchor_sizes = ((32,), (64,), (128,), (256,), (512,))
           aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)

class RPNHead(nn.Module):
    """
    add a RPN head with classification and regression
    通过滑动窗口计算预测目标概率与bbox regression参数

    Arguments:
        in_channels: number of channels of the input feature
        num_anchors: number of anchors to be predicted
    """

    def __init__(self, in_channels, num_anchors):
        super(RPNHead, self).__init__()
        # 3x3 滑动窗口
        self.conv = nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1)
        # 计算预测的目标分数（这里的目标只是指前景或者背景）
        self.cls_logits = nn.Conv2d(in_channels, num_anchors, kernel_size=1, stride=1)
        # 计算预测的目标bbox regression参数
        self.bbox_pred = nn.Conv2d(in_channels, num_anchors * 4, kernel_size=1, stride=1)

        for layer in self.children():
            if isinstance(layer, nn.Conv2d):
                torch.nn.init.normal_(layer.weight, std=0.01)
                torch.nn.init.constant_(layer.bias, 0)

    def forward(self, x):
        # type: (List[Tensor]) -> Tuple[List[Tensor], List[Tensor]]
        logits = []
        bbox_reg = []
        for i, feature in enumerate(x):
            t = F.relu(self.conv(feature))
            logits.append(self.cls_logits(t))
            bbox_reg.append(self.bbox_pred(t))
        return logits, bbox_reg

最后经过RPN_head之后就会得到坐标预测的回归参数和标签预测的参数。并且他们的shape分别为[8,15,h,w]和[8,60,h,w]。也就是有8张图片，使用的FPN网络。

jhsignal

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Faster-rcnn之RNP网络代码详解B

1. 首先将一个batch内的图片经过resnet50-fpn网络输出得到featrues，经过FPN层以后会输出5个特征图，在每个特征图上的每个点上进行预测3个坐标框 # RPN uses all feature maps that are available # features是所有预测特征层组成的OrderedDict features = list(features.values()) # 计算每个预测特征层上的预测目标概率和bbo
复制链接

扫一扫