maskrcnn_benchmark理解记录——modeling\roi_heads\box_head\roi_box_feature_extractors.py

最新推荐文章于 2024-08-08 08:06:23 发布

业精于勤荒于嬉-行成于思而毁于随

最新推荐文章于 2024-08-08 08:06:23 发布

阅读量2.4k

点赞数 4

分类专栏： maskrcnn理解记录

本文链接：https://blog.csdn.net/m0_37644085/article/details/88639138

版权

本文详细解析了Mask R-CNN中模型的ROI Heads部分，特别是box_head的roi_box_feature_extractors.py模块。内容涉及POOLER_SCALES如何根据backbone的strides确定，以及在ResNet基础上的Faster R-CNN和Mask R-CNN中不同阶段的特征图尺寸变化。文中还讨论了RoIAlign算法的作用和在不同Head（RCNN, Mask, Keypoints）中的应用，以及最终的average pooling操作。" 115235668,1368632,PHP无限极分类实现与原理解析,"['PHP', '数据库设计', '递归算法']

摘要由CSDN通过智能技术生成

摘取config记录如下

MODEL:
  META_ARCHITECTURE: "GeneralizedRCNN"
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
  RESNETS:
    BACKBONE_OUT_CHANNELS: 256
  ROI_HEADS:
    USE_FPN: True
  ROI_BOX_HEAD:
    POOLER_RESOLUTION: 7
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    POOLER_SAMPLING_RATIO: 2
    FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
    PREDICTOR: "FPNPredictor"
    NUM_CLASSES: 2

MODEL.ROI_BOX_HEAD:
    POOLER_RESOLUTION: 7
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    POOLER_SAMPLING_RATIO: 2
    FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"

1.关于POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)和POOLER_SAMPLING_RATIO: 2

_C.MODEL.ROI_BOX_HEAD.POOLER_SCALES = (0.25, 0.125, 0.0625, 0.03125)

_C.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO = 2

POOLER_SCALES是由于backbone(Resnet或Resnext架构)的strides生成的不同的缩小比例，（因为后四层作RPN的，所以这里是四层# conv2_x →conv5_x 作为特征提取层那么对应POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) 其实是第2到5层的池化 1/4；1/8;1/16；1/32）。BTW, you should understand well the ResNet and ResNeXt architectures to better understand this explanation.resnet【链接】【链接】a

例如，假设您在输入图像中找到了坐标[0,0,64,64]的RoI。再次假设您希望从所有backbone的层级pool its features (这个其实还挺好玩，常叫pool为池化，但其实是pool its features，汇集其特征。那其实pool就是一步步地聚集、提取特征)

So, since there is a stride of 2 in the conv1 layer and another stride of 2 at the end of the first block, it results in a feature-map 4x smaller than the original image, thus, a scale of 0.25（这里是到conv2_x）. Since, there is a stride of 2 between all the convolution blocks of the backbone, the scale gets divided by 2