【论文学习】Path Aggregation Network for Instance Segmentation

主要记录Fully-connected Fusion

Motivation Fully-connected layers, or MLP, were widely used in mask prediction in instance segmentation [10, 41,34] and mask proposal generation [48, 49]. Results of [8, 33] show that FCN is also competent in predicting pixelwise masks for instances. Recently, Mask R-CNN [21] applied a tiny FCN on the pooled feature grid to predict corresponding masks avoiding competition between classes.
We note fc layers yield different properties compared with FCN where the latter gives prediction at each pixel based on a local receptive field and parameters are shared at different spatial locations. Contrarily, fc layers are location sensitive since predictions at different spatial locations are achieved by varying sets of parameters. So they have the ability to adapt to different spatial locations. Also prediction at each spatial location is made with global information of the entire proposal. It is helpful to differentiate instances [48] and recognize separate parts belonging to the same object. Given properties of fc and convolutional layers different from each other, we fuse predictions from these two types of layers for better mask prediction.
Mask Prediction Structure Our component of mask prediction is light-weighted and easy to implement. The mask branch operates on pooled feature grid for each proposal.
As shown in Figure 4, the main path is a small FCN, which consists of 4 consecutive convolutional layers and 1 deconvolutional layer. Each convolutional layer consists of 256 3 × 3 filters and the deconvolutional layer up-samples feature with factor 2. It predicts a binary pixel-wise mask for each class independently to decouple segmentation and classification, similar to that of Mask R-CNN.
动机完全连接的层或MLP广泛用于实例分割[10,41,34]和掩模提议生成[48,49]中的掩模预测。 [8,33]的结果表明,FCN也能够预测像素掩模的实例。最近,Mask R-CNN [21]在汇集的特征网格上应用了一个微小的FCN来预测相应的掩模,避免了类之间的竞争。
我们注意到fc层与FCN相比产生不同的属性,其中后者基于局部感受野在每个像素处给出预测并且在不同的空间位置处共享参数。相反,fc层是位置敏感的,因为通过改变参数组来实现不同空间位置处的预测。因此,他们有能力适应不同的空间位置。此外,使用整个提案的全局信息进行每个空间位置的预测。区分实例[48]并识别属于同一对象的单独部分是有帮助的。鉴于fc和卷积层的特性彼此不同,我们融合了这两种类型的预测,以便更好地进行掩模预测。
掩模预测结构掩模预测的组件重量轻,易于实现。掩码分支在每个提议的池化特征网格上运行。
如图4所示,主路径是一个小FCN,由4个连续的卷积层和1个反卷积层组成。每个卷积层由256个3×3滤波器组成,反卷积层上采样特征为因子2.它独立地预测每个类的二进制像素方掩码以解耦分段和分类,类似于掩码R-CNN。

We further create a short path from layer conv3 to a fc layer. There are two 3 × 3 convolutional layers where the second shrinks channels to half to reduce computational overhead. A fc layer is used to predict a class-agnostic foreground/background mask. It not only is efficient, but also allows parameters in the fc layer trained with more sam-ples, leading to better generality. The mask size we use is 28 × 28 so that the fc layer produces a 784 × 1 × 1 vector.This vector is reshaped to the same spatial size as the mask predicted by FCN. To obtain the final mask prediction, mask of each class from FCN and foreground/background prediction from fc are added.  Using only one fc layer, instead of multiple of them, for final prediction prevents the issue of collapsing the hidden spatial feature map into a short feature vector, which loses spatial information

我们进一步创建了从层conv3到fc层的短路径。有两个3×3卷积层,其中第二个将通道缩小到一半以减少计算开销。fc层用于预测类不可知的前景/背景掩码。它不仅有效,而且还允许fc层中的参数用更多样本训练,从而获得更好的通用性。

假如conv4_fc 是和conv3一样的 14*4*256.那么conv5_fc(14*14*128),经过(14*14*128*C_out)的卷积核,,得到fc大小为1*1*C_out 。由于使用的掩模尺寸是28×28,因此fc层产生784×1×1矢量,那么C_out=784。该矢量被重新整形为与FCN预测的掩模相同的空间尺寸(28×28*C类)。 为了获得最终的掩模预测,添加来自FCN的每个类的掩模和来自fc的前景/背景预测。仅使用一个fc层而不是多个fc层进行最终预测可以防止将隐藏的空间要素图折叠成短特征向量,而丢失空间信息的问题。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值