摘取config记录如下
MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
BACKBONE:
CONV_BODY: "R-50-FPN"
RESNETS:
BACKBONE_OUT_CHANNELS: 256
ROI_HEADS:
USE_FPN: True
ROI_BOX_HEAD:
POOLER_RESOLUTION: 7
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
POOLER_SAMPLING_RATIO: 2
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
PREDICTOR: "FPNPredictor"
NUM_CLASSES: 2
MODEL.ROI_BOX_HEAD:
POOLER_RESOLUTION: 7
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
POOLER_SAMPLING_RATIO: 2
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
1.关于POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)和POOLER_SAMPLING_RATIO: 2
_C.MODEL.ROI_BOX_HEAD.POOLER_SCALES = (0.25, 0.125, 0.0625, 0.03125)
_C.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO = 2
POOLER_SCALES是由于backbone(Resnet或Resnext架构)的strides生成的不同的缩小比例,(因为后四层作RPN的,所以这里是四层# conv2_x →conv5_x 作为特征提取层 那么对应POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) 其实是第2到5层的池化 1/4;1/8;1/16;1/32)。BTW, you should understand well the ResNet and ResNeXt architectures to better understand this explanation.resnet【链接】【链接】a
例如,假设您在输入图像中找到了坐标[0,0,64,64]的RoI。 再次假设您希望从所有backbone的层级pool its features (这个其实还挺好玩,常叫pool为池化,但其实是pool its features,汇集其特征。那其实pool就是一步步地聚集、提取特征)
So, since there is a stride of 2 in the conv1 layer and another stride of 2 at the end of the first block, it results in a feature-map 4x smaller than the original image, thus, a scale of 0.25(这里 是到conv2_x). Since, there is a stride of 2 between all the convolution blocks of the backbone, the scale gets divided by 2

本文详细解析了Mask R-CNN中模型的ROI Heads部分,特别是box_head的roi_box_feature_extractors.py模块。内容涉及POOLER_SCALES如何根据backbone的strides确定,以及在ResNet基础上的Faster R-CNN和Mask R-CNN中不同阶段的特征图尺寸变化。文中还讨论了RoIAlign算法的作用和在不同Head(RCNN, Mask, Keypoints)中的应用,以及最终的average pooling操作。"
115235668,1368632,PHP无限极分类实现与原理解析,"['PHP', '数据库设计', '递归算法']
最低0.47元/天 解锁文章
293

被折叠的 条评论
为什么被折叠?



