Table of Contents
3)box_selector_train(inference.py)#todo
序、一些记录
1.记录下COCO数据集:主要从复杂的日常场景中截取,图像中的目标通过精确的segmentation进行位置的标定。图像包括80种物体类别,328,000影像和2,500,000个label。主要的是有约 100,000个人的关键点信息的标注。而coco每张图约4~6个人,那就是约2万张有人的样本。
2.nn.Conv2d
的功能是:对由多个输入平面组成的输入信号进行二维卷积。(有bias)
Conc2d(in_channels (int),out_channels (int),kernel_size (int or tuple), stride (int or tuple, optional),padding (int or tuple, optional),dilation (int or tuple, optional),bias (bool, optional),groups:将输入数据分组,通常不用管这个参数.)
(int1, int2)的元组(本质上单个的int就是相同int的(int, int))。在元组中,第1个参数对应高度维度,第2个参数对应宽度维度。
bias (bool, optional): If True, adds a learnable bias to the output. Default: True(偏差)
2. 1关于 groups 参数
对于 groups 参数,用于分解 inputs 和 outputs 间的关系,分组进行卷积操作.
[1] - groups=1
,所有输入进行卷积操作,得到输出.
[2] - groups=2
,等价于有两个并列 conv 操作,每个的输入是一半的 input_channels,并输出一半 - 的 output_channels,然后再进行链接.
[3] - groups=in_channels
,每个 input channel 被其自己的 filters 进行卷积操作,尺寸为Cout/Cin
当 group=in_channels
且 out_channels = K * in_channels
,其中,K
是正整数,此时的操作被称为 depthwise convolution.
groups
决定了将原输入in_channels 分为几组,而每组channel
重用几次,由out_channels/groups
计算得到,这也说明了为什么需要groups
能供被out_channels
与in_channels
整除. - pyotrch_nn.Conv2d中groups参数的理解
3.总览在后面。
一、先看GeneralizedRCNN
有backbone、rpn、rois_heads。
GeneralizedRCNN(
(backbone): Sequential(
(body): ResNet()
(fpn): FPN()
)
(rpn): RPNModule(
(anchor_generator):AnchorGenerator()
(head): RPNHead()
(box_selector_train): RPNPostProcessor()
(box_selector_test): RPNPostProcessor()
)
(roi_heads): CombinedROIHeads(
(box): ROIBoxHead()
(keypoint): ROIKeypointHead()
)
)
A、backbone
1)ResNet
(body): ResNet(
(stem): StemWithFixedBatchNorm()
(layer1): Sequential()
(layer2): Sequential()
(layer3): Sequential()
(layer4): Sequential()#Sequential
)
每一层开始有个下采样,并进行维度的调节。eg: layer2(上表中的conv3_x)
先 (downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) 把上层的维度256,转为512以便于与此block【1*1,128;3*3,128;1*1,512】的结尾512维度一致,然后进行add。
2)FPN #todo
(fpn): FPN(
(fpn_inner1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_layer1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_inner2): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_layer2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_inner3): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_layer3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_inner4): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_layer4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(top_blocks): LastLevelMaxPool()
)
B、rpn
1) anchor_generator:
(anchor_generator): AnchorGenerator(
(cell_anchors): BufferList()
)
2)head #todo
(head): RPNHead(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(cls_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1)) #
(bbox_pred): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1)) # 位置3*4
)
3)box_selector_train(inference.py)#todo
(box_selector_train): RPNPostProcessor()
4)box_selector_test
(box_selector_test): RPNPostProcessor()
C、rois_heads
1)box
#这里就是RCNN Head
(box): ROIBoxHead(
(feature_extractor): FPN2MLPFeatureExtractor()
#7*7*256.配合spatial_scale对conv1-conv2进行ROIAlign,→fc1024→fc1024
(predictor): FPNPredictor() #class(2)、box(4)
(post_processor): PostProcessor() #从一组classification scores, box regression and proposals,
#计算后处理框(post-processed boxes), 并应用NMS得到最终结果
)
#TODO :modeling/roi_heads/box_head/inference.py →PostProcessor()
(box): ROIBoxHead(
(feature_extractor): FPN2MLPFeatureExtractor(
(pooler): Pooler(
(poolers): ModuleList(
(0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=2)
(1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=2)
(2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=2)
(3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=2)
)
)
(fc6): Linear(in_features=12544, out_features=1024, bias=True)
(fc7): Linear(in_features=1024, out_features=1024, bias=True)
)
(predictor): FPNPredictor(
(cls_score): Linear(in_features=1024, out_features=2, bias=True)
(bbox_pred): Linear(in_features=1024, out_features=8, bias=True)
)
(post_processor): PostProcessor()
)
2)keypoint
(keypoint): ROIKeypointHead(
(feature_extractor): KeypointRCNNFeatureExtractor()
(predictor): KeypointRCNNPredictor()
(post_processor): KeypointPostProcessor()
)
i)feature_extractor 这里就是 #14*14*256.配合spatial_scale对conv1-conv2进行ROIAlign,就是
14*14*256→14*14*512--*8-->14*14*512(8层)→
ii)predictor (s=3,1/2)7*7*17。共17个热图,也就是类别为17。每个热图大小为7*7,并且呢,每个热图只会分出来一类。
iii) modeling/roi_heads/keypoint_head/inference.py→KeypointPostProcessor TODO
(keypoint): ROIKeypointHead(
(feature_extractor): KeypointRCNNFeatureExtractor(
(pooler): Pooler(
(poolers): ModuleList(
(0): ROIAlign(output_size=(14, 14), spatial_scale=0.25, sampling_ratio=2)
(1): ROIAlign(output_size=(14, 14), spatial_scale=0.125, sampling_ratio=2)
(2): ROIAlign(output_size=(14, 14), spatial_scale=0.0625, sampling_ratio=2)
(3): ROIAlign(output_size=(14, 14), spatial_scale=0.03125, sampling_ratio=2)
)
)
(conv_fcn1): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) #same
(conv_fcn2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv_fcn3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv_fcn4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv_fcn5): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv_fcn6): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv_fcn7): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv_fcn8): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(predictor): KeypointRCNNPredictor(
(kps_score_lowres): ConvTranspose2d(512, 17, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
) # 这个是缩小的1/2
(post_processor): KeypointPostProcessor()
)
二、module总览
GeneralizedRCNN(
(backbone): Sequential(
(body): ResNet(
(stem): StemWithFixedBatchNorm(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): FrozenBatchNorm2d()
)
(layer1): Sequential(
(0): BottleneckWithFixedBatchNorm(
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): FrozenBatchNorm2d()
)
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
)
(1): BottleneckWithFixedBatchNorm(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
)
(2): BottleneckWithFixedBatchNorm(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
)
)
(layer2): Sequential(
(0): BottleneckWithFixedBatchNorm(
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): FrozenBatchNorm2d()
)
(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
)
(1): BottleneckWithFixedBatchNorm(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
)
(2): BottleneckWithFixedBatchNorm(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
)
(3): BottleneckWithFixedBatchNorm(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
)
)
(layer3): Sequential(
(0): BottleneckWithFixedBatchNorm(
(downsample): Sequential(
(0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): FrozenBatchNorm2d()
)
(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
)
(1): BottleneckWithFixedBatchNorm(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
)
(2): BottleneckWithFixedBatchNorm(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
)
(3): BottleneckWithFixedBatchNorm(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
)
(4): BottleneckWithFixedBatchNorm(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
)
(5): BottleneckWithFixedBatchNorm(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
)
)
(layer4): Sequential(
(0): BottleneckWithFixedBatchNorm(
(downsample): Sequential(
(0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): FrozenBatchNorm2d()
)
(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
)
(1): BottleneckWithFixedBatchNorm(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
)
(2): BottleneckWithFixedBatchNorm(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
)
)
)
(fpn):