maskrcnn-benchmark理解记录——R_50_FPN的module和各层维度

本文记录了对maskrcnn-benchmark的理解,重点关注GeneralizedRCNN中的ResNet backbone和FPN结构。ResNet中通过downsample调整维度,FPN则在特征金字塔网络中起到关键作用。此外,还提及了RPN的anchor_generator和rois_heads的box及keypoint模块。
摘要由CSDN通过智能技术生成

Table of Contents

序、一些记录

一、先看GeneralizedRCNN

 A、backbone

1)ResNet

2)FPN  #todo

 B、rpn

1) anchor_generator: 

2)head #todo

3)box_selector_train(inference.py)#todo

4)box_selector_test

C、rois_heads 

1)box

2)keypoint

 二、module总览

三、配置总览


序、一些记录

1.记录下COCO数据集:主要从复杂的日常场景中截取,图像中的目标通过精确的segmentation进行位置的标定。图像包括80种物体类别,328,000影像和2,500,000个label。主要的是有约 100,000个人的关键点信息的标注。而coco每张图约4~6个人,那就是约2万张有人的样本。

2.nn.Conv2d的功能是:对由多个输入平面组成的输入信号进行二维卷积。(有bias)

Conc2d(in_channels (int),out_channels (int),kernel_size (int or tuple), stride (int or tuple, optional),padding (int or tuple, optional),dilation (int or tuple, optional),bias (bool, optional),groups:将输入数据分组,通常不用管这个参数.)

(int1, int2)的元组(本质上单个的int就是相同int的(int, int)。在元组中,第1个参数对应高度维度,第2个参数对应宽度维度。
bias (bool, optional): If True, adds a learnable bias to the output. Default: True(偏差)

2. 1关于 groups 参数

对于 groups 参数,用于分解 inputs 和 outputs 间的关系,分组进行卷积操作.

[1] - groups=1,所有输入进行卷积操作,得到输出.

[2] - groups=2,等价于有两个并列 conv 操作,每个的输入是一半的 input_channels,并输出一半 - 的 output_channels,然后再进行链接.

[3] - groups=in_channels,每个 input channel 被其自己的 filters 进行卷积操作,尺寸为Cout/Cin

group=in_channelsout_channels = K * in_channels,其中,K 是正整数,此时的操作被称为 depthwise convolution.

groups 决定了将原输入in_channels 分为几组,而每组 channel 重用几次,由out_channels/groups 计算得到,这也说明了为什么需要 groups能供被 out_channelsin_channels整除. - pyotrch_nn.Conv2d中groups参数的理解

3.总览在后面。

一、先看GeneralizedRCNN

有backbone、rpn、rois_heads。

GeneralizedRCNN(
  (backbone): Sequential(
    (body): ResNet()
    (fpn): FPN()
  )
  (rpn): RPNModule(
    (anchor_generator):AnchorGenerator()
    (head): RPNHead()
    (box_selector_train): RPNPostProcessor()
    (box_selector_test): RPNPostProcessor()
  )
  (roi_heads): CombinedROIHeads(
    (box): ROIBoxHead()
    (keypoint): ROIKeypointHead()
  )
)

 A、backbone

1)ResNet

    (body): ResNet(
      (stem): StemWithFixedBatchNorm()
      (layer1): Sequential()
      (layer2): Sequential()
      (layer3): Sequential()
      (layer4): Sequential()#Sequential
    )

每一层开始有个下采样,并进行维度的调节。eg: layer2(上表中的conv3_x)

先 (downsample): Sequential(
            (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)     把上层的维度256,转为512以便于与此block【1*1,128;3*3,128;1*1,512】的结尾512维度一致,然后进行add。

2)FPN  #todo

    (fpn): FPN(
      (fpn_inner1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
      (fpn_layer1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (fpn_inner2): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
      (fpn_layer2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (fpn_inner3): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
      (fpn_layer3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (fpn_inner4): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
      (fpn_layer4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (top_blocks): LastLevelMaxPool()
    )

 B、rpn

1) anchor_generator: 

    (anchor_generator): AnchorGenerator(
      (cell_anchors): BufferList()
    )

2)head #todo

    (head): RPNHead(
      (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (cls_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))    #
      (bbox_pred): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))    # 位置3*4 
    )

3)box_selector_train(inference.py)#todo

(box_selector_train): RPNPostProcessor()

4)box_selector_test

(box_selector_test): RPNPostProcessor()

C、rois_heads 

1)box

#这里就是RCNN Head

(box): ROIBoxHead(
    (feature_extractor): FPN2MLPFeatureExtractor() 
    #7*7*256.配合spatial_scale对conv1-conv2进行ROIAlign,→fc1024→fc1024
    (predictor): FPNPredictor()         #class(2)、box(4)
    (post_processor): PostProcessor()   #从一组classification scores, box regression and proposals,
                                        #计算后处理框(post-processed boxes), 并应用NMS得到最终结果
)

#TODO :modeling/roi_heads/box_head/inference.py  →PostProcessor()

    (box): ROIBoxHead(
      (feature_extractor): FPN2MLPFeatureExtractor(
        (pooler): Pooler(
          (poolers): ModuleList(
            (0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=2)
            (1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=2)
            (2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=2)
            (3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=2)
          )
        )
        (fc6): Linear(in_features=12544, out_features=1024, bias=True)
        (fc7): Linear(in_features=1024, out_features=1024, bias=True)
      )
      (predictor): FPNPredictor(
        (cls_score): Linear(in_features=1024, out_features=2, bias=True)
        (bbox_pred): Linear(in_features=1024, out_features=8, bias=True)
      )
      (post_processor): PostProcessor()
    )


2)keypoint

    (keypoint): ROIKeypointHead(
      (feature_extractor): KeypointRCNNFeatureExtractor()
      (predictor): KeypointRCNNPredictor()
      (post_processor): KeypointPostProcessor()
    )

i)feature_extractor  这里就是 #14*14*256.配合spatial_scale对conv1-conv2进行ROIAlign,就是

14*14*256→14*14*512--*8-->14*14*512(8层)→

ii)predictor  (s=3,1/2)7*7*17。共17个热图,也就是类别为17。每个热图大小为7*7,并且呢,每个热图只会分出来一类。

iii) modeling/roi_heads/keypoint_head/inference.py→KeypointPostProcessor TODO

 

    (keypoint): ROIKeypointHead(
      (feature_extractor): KeypointRCNNFeatureExtractor(
        (pooler): Pooler(
          (poolers): ModuleList(
            (0): ROIAlign(output_size=(14, 14), spatial_scale=0.25, sampling_ratio=2)
            (1): ROIAlign(output_size=(14, 14), spatial_scale=0.125, sampling_ratio=2)
            (2): ROIAlign(output_size=(14, 14), spatial_scale=0.0625, sampling_ratio=2)
            (3): ROIAlign(output_size=(14, 14), spatial_scale=0.03125, sampling_ratio=2)
          )
        )
        (conv_fcn1): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))  #same
        (conv_fcn2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_fcn3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_fcn4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_fcn5): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_fcn6): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_fcn7): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_fcn8): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (predictor): KeypointRCNNPredictor(
        (kps_score_lowres): ConvTranspose2d(512, 17, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
      )                          # 这个是缩小的1/2
      (post_processor): KeypointPostProcessor()
    )

 二、module总览



GeneralizedRCNN(
  (backbone): Sequential(
    (body): ResNet(
      (stem): StemWithFixedBatchNorm(
        (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
        (bn1): FrozenBatchNorm2d()
      )
      (layer1): Sequential(
        (0): BottleneckWithFixedBatchNorm(
          (downsample): Sequential(
            (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): FrozenBatchNorm2d()
          )
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
        )
        (1): BottleneckWithFixedBatchNorm(
          (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
        )
        (2): BottleneckWithFixedBatchNorm(
          (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
        )
      )
      (layer2): Sequential(
        (0): BottleneckWithFixedBatchNorm(
          (downsample): Sequential(
            (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d()
          )
          (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
        )
        (1): BottleneckWithFixedBatchNorm(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
        )
        (2): BottleneckWithFixedBatchNorm(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
        )
        (3): BottleneckWithFixedBatchNorm(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
        )
      )
      (layer3): Sequential(
        (0): BottleneckWithFixedBatchNorm(
          (downsample): Sequential(
            (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d()
          )
          (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
        )
        (1): BottleneckWithFixedBatchNorm(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
        )
        (2): BottleneckWithFixedBatchNorm(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
        )
        (3): BottleneckWithFixedBatchNorm(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
        )
        (4): BottleneckWithFixedBatchNorm(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
        )
        (5): BottleneckWithFixedBatchNorm(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
        )
      )
      (layer4): Sequential(
        (0): BottleneckWithFixedBatchNorm(
          (downsample): Sequential(
            (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d()
          )
          (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
        )
        (1): BottleneckWithFixedBatchNorm(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
        )
        (2): BottleneckWithFixedBatchNorm(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
        )
      )
    )
    (fpn): 
  • 3
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 6
    评论
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值