论文阅读笔记 -- ParsingNet

ParseNet: Looking Wider to See Better

 

The FCN approach can be thought of as sliding an classification network around an input image, and processes each sliding window area independently. In particular, FCN disregards global information about an image, thus ignoring potentially useful scene-level semantic context. Our approach allows integrating global context in an end-to-end fully convolutional network (as opposed to a patch-based approach) for semantic segmentation.

  •  FCN needs global context to help clarify local confusions.
  •  Limits in long pixel distance labelling consistancy  needs glocal features, especially for large object segementation.

   

Notice that the scale of features from different layers may be quite different, making it difficult to directly combine them for prediction. We find that L2 normalizing features for each layer and combining them using a scaling factor learned through backpropagation works well to address this potential difficulty.

  • When merging the features, one must be careful to normalize each individual feature to make the combined feature work well; in classical computer vision this is referred as the cue combination problem.
  • A form of attention mechansim. The fundamental role of Attention is to fill the gap of features from different layers in terms of scales and semantics, thus eaiser to combining them in decoder.

 

Although theoretically, features from the top layers of a network have very large receptive fields (e.g. fc7 in FCN with VGG has a 404 × 404 pixels receptive field), we argue that in practice, the empirical size of the receptivefields is much smaller, and is not enough to capture the global context. 

To identify the effective receptive field, we slide a small patch of random noise across the input image, and measure the change in the activation of the desired layer. If the activation does not vary significantly, that suggests the given random patch is outside of the empirical receptive field, as shown in Figure 2. The effective receptive field at the last layer of this network barely covers 1/4 of the entire image.

  •  A new method to probe efficient receiptive field.

 

Context is known to be very useful for improving performance on detection and segmentation tasksusing deep learning. Mostajabi et al. (2014); Szegedy et al. (2014b) and references therein illustrate how context can be used to help in different tasks.

  • How to more efficiently extract contextual information from the whole image ? Global Pooling, I think, will lose positional information, which is undesirable for segmentation.  May be Global Convolutional Network is a potential soluiton.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值