论文阅读笔记 -- ParsingNet

最新推荐文章于 2023-01-04 17:30:35 发布

masonwang_513

最新推荐文章于 2023-01-04 17:30:35 发布

阅读量620

点赞数

分类专栏： PaperReading

本文链接：https://blog.csdn.net/reform513/article/details/104471996

版权

PaperReading 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

ParseNet: Looking Wider to See Better

The FCN approach can be thought of as sliding an classification network around an input image, and processes each sliding window area independently. In particular, FCN disregards global information about an image, thus ignoring potentially useful scene-level semantic context. Our approach allows integrating global context in an end-to-end fully convolutional network (as opposed to a patch-based approach) for semantic segmentation.

FCN needs global context to help clarify local confusions.
Limits in long pixel distance labelling consistancy needs glocal features, especially for large object segementation.

Notice that the scale of features from different layers may be quite different, making it difficult to directly combine them for prediction. We find that L2 normalizing features for each layer and combining them using a scaling factor learned through backpropagation works well to address this potential difficulty.

When merging the features, one must be careful to normalize each individual feature to make the combined feature work well; in classical computer vision this is referred as the cue combination problem.
A form of attention mechansim. The fundamental role of Attention is to fill the gap of features from different layers in terms of scales and semantics, thus eaiser to combining them in decoder.

Although theoretically, features from the top layers of a network have very large receptive fields (e.g. fc7 in FCN with VGG has a 404 × 404 pixels receptive field), we argue that in practice, the empirical size of the receptivefields is much smaller, and is not enough to capture the global context.

To identify the effective receptive field, we slide a small patch of random noise across the input image, and measure the change in the activation of the desired layer. If the activation does not vary significantly, that suggests the given random patch is outside of the empirical receptive field, as shown in Figure 2. The effective receptive field at the last layer of this network barely covers 1/4 of the entire image.

A new method to probe efficient receiptive field.

Context is known to be very useful for improving performance on detection and segmentation tasksusing deep learning. Mostajabi et al. (2014); Szegedy et al. (2014b) and references therein illustrate how context can be used to help in different tasks.

How to more efficiently extract contextual information from the whole image ? Global Pooling, I think, will lose positional information, which is undesirable for segmentation. May be Global Convolutional Network is a potential soluiton.

masonwang_513

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
论文阅读笔记 -- ParsingNet

ParseNet: Looking Wider to See BetterThe FCN approach can be thought of as sliding an classification network around an input image, and processes each sliding window area independently. In par...
复制链接

扫一扫

专栏目录