Feature Pyramid Networks for Object Detection 论文中说,
Fig. 3 shows the building block that constructs our top-down feature maps. With a coarser-resolution feature map,we upsample the spatial resolution by a factor of 2 (using nearest neighbor upsampling for simplicity). The upsampled map is then merged with the corresponding bottom-upmap (which undergoes a 1×1 convolutional layer to reduce channel dimensions) by element-wise addition.
不同层之间是通过双线性差值upsample之后,再element-wise(元素之间)相加。代码实现参考https://github.com/DetectionTeamUCAS/FPN_Tensorflow
def fusion_two_layer(C_i, P_j, scope):
'''
i = j+1
:param C_i: shape is [1, h, w, c]
:param P_j: shape is [1, h/2, w/2, 256]
:return:
P_i
'''
with tf.variable_scope(scope):
level_name = scope.split('_')[1]
h, w = tf.shape(C_i)[1], tf.shape(C_i)[2]
upsample_p = tf.image.resize_bilinear(P_j,
size=[h, w],
name='up_s