摘取config记录如下
MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
BACKBONE:
CONV_BODY: "R-50-FPN"
RESNETS:
BACKBONE_OUT_CHANNELS: 256
ROI_HEADS:
USE_FPN: True
ROI_BOX_HEAD:
POOLER_RESOLUTION: 7
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
POOLER_SAMPLING_RATIO: 2
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
PREDICTOR: "FPNPredictor"
NUM_CLASSES: 2
MODEL.ROI_BOX_HEAD:
POOLER_RESOLUTION: 7
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
POOLER_SAMPLING_RATIO: 2
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
1.关于POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)和POOLER_SAMPLING_RATIO: 2
_C.MODEL.ROI_BOX_HEAD.POOLER_SCALES = (0.25, 0.125, 0.0625, 0.03125)
_C.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO = 2
POOLER_SCALES是由于backbone(Resnet或Resnext架构
)的strides生成的不同的缩小比例,(因为后四层作RPN的,所以这里是四层# conv2_x →conv5_x 作为特征提取层 那么对应POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) 其实是第2到5层的池化 1/4;1/8;1/16;1/32)。BTW, you should understand well the ResNet and ResNeXt architectures to better understand this explanation.resnet【链接】【链接】a
例如,假设您在输入图像中找到了坐标[0,0,64,64]的RoI。 再次假设您希望从所有
backbone的层级
pool its features (这个其实还挺好玩,常叫pool为池化,但其实是pool its features,汇集其特征。那其实pool就是一步步地聚集、提取特征)
So, since there is a stride of 2 in the conv1
layer and another stride of 2 at the end of the first block, it results in a feature-map 4x smaller than the original image, thus, a scale of 0.25(这里 是到conv2_x). Since, there is a stride of 2 between all the convolution blocks of the backbone, the scale gets divided by 2