2021SC@SDUSC
接上文 继续torchvision Faster-RCNN ResNet-50 FPN的分析
FPN
FPN,即Feature Pyramid Networks,是一种多尺寸,金字塔结构深度学习网络,使用了FPN的Faster-RCNN,其测试结果超过大部分single-model,包括COCO 2016年挑战的获胜模型。其优势是对小尺寸对象的检测。
FPN代码解读
torchvision中包含了ResNet50 FPN完整的源代码(这里参考的是torchvision 0.7.0里面的代码),这里就解读一下对应的实现,为了解释流畅,尽量采用ResNet-50中的layer name,以及对应的参数:
FPN结构:
(fpn): FeaturePyramidNetwork(
(inner_blocks): ModuleList(
(0): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(2): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(3): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
)
(layer_blocks): ModuleList(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(extra_blocks): LastLevelMaxPool()
)
FPN处理数据的代码看看如下代码,就能知道对应的流程:
class FeaturePyramidNetwork(nn.Module):
......
def forward(self, x):
# type: (Dict[str, Tensor]) -> Dict[str, Tensor]
"""
Computes the FPN for a set of feature maps.
Arguments:
x (OrderedDict[Tensor]): feature maps for each feature level.
Returns:
results (OrderedDict[Tensor]): feature maps after FPN layers.
They are ordered from highest resolution first.
"""
# unpack OrderedDict into two lists for easier handling
names = list(x.keys())
x = list(x.values())
last_inner = self.get_result_from_inner_blocks(x[-1], -1)
results = []
results.append(self.get_result_from_layer_blocks(last_inner, -1))
for idx in range(len(x) - 2, -1, -1):
inner_lateral = self.get_result_from_inner_blocks(x[idx], idx)
feat_shape = inner_lateral.shape[-2:]
inner_top_down = F.interpolate(last_inner, size=feat_shape, mode="nearest")
last_inner = inner_lateral + inner_top_down
results.insert(0, self.get_result_from_layer_blocks(last_inner, idx))
if self.extra_blocks is not None:
results, names = self.extra_blocks(results, x, names)
# make it back an OrderedDict
out = OrderedDict([(k, v) for k, v in zip(names, results)])
return out
这里要指出来的是,如何在pytorch中实现2x up:
F.interpolate(last_inner, size=feat_shape, mode="nearest")
这里feat_shape就是2x up之后的shape.
另外一个需要指出的是results,就是存放了每层layer_block_conv的输出,然后送入RPN网络进行背景前景二分类和Bounding-Box回归,在top层支持检测出大的object,越往下越小的对象将被检测出来。
下面是整理的全局图 可以很好地理解整体结构
这里左边对应的是layer name,比如conv5_x,这是和ResNet表中layer name可以对应起来。左边的部分称为Bottom-up pathway,右边称为Top-down pathway,ResNet从conv2_x~conv5_x,每层的输出都会输出一份到右边的pathway,这里称之为lateral connections,总的来说可以用下面公式表示表示FPN:
FPN=Top-downpathway+laterlconnections
接下来是对另一部分讲解