[Paper note] RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

最新推荐文章于 2023-07-14 22:11:37 发布

chn13

最新推荐文章于 2023-07-14 22:11:37 发布

阅读量1.7k

点赞数

分类专栏： paper-note 文章标签：深度学习 CNN 语义分割 parsing

本文链接：https://blog.csdn.net/chn13/article/details/53838213

版权

20 篇文章 2 订阅

订阅专栏

Conv and pooling result in the lost of finer image structure, producing low resolution segmentation.
- DeepLab solve this by atrous (or dilated) convolutions to account for larger receptive fields without downscaling the image.
- FCN and Hypercolumns exploits features from intermediate layers for generating high-resolution prediction.
Feature from all levels help to generate semantic segmentation.

Compare with standard CNN and Dilated convolutions
Single RefineNet structure
Purpose of different components in RefineNet
- RCU: finetune feature map for fusion task
- Multi-resolution fusion: scale to the same resolution and sum
- Chained residual pooling: capture background context from a large image region
- Output convolutions: another RCU. There are 3 RCU between two RefineNet blocks (2 path). The whole net also adds 2 RCU before dense softmax classifier.
Residual connection: in RefineNet unit and between RefineNet blocks

Datasets
- Person-Part 1717 training and 1818 testing, human parsing
- NYUDv2 795 training, 654 testing, RGB-D images showing interior scenes
- PASCAL VOC 2012 training/validation/test – 1464/1449/1456 (usually trained with MS COCO dataset)
- Cityscapes training/validation** 2975/500, 19 classes
- PASCAL-Context segmentation labels of the whole scene for PASCAL VOC images
- SUN-RGBD around 10,000 RGB-D indoor images, 37 classes
- ADE20K MIT 150 classes, more than 20K scene images
Measurement: IoU, pixel accuracy, mean accuracy
Augmentation: random scaling, random cropping and horizontal flipping
The result on PASCAL VOC 2012 is not better than PSPNet (see the following paper note)
Variant of model
- 4-cascaded performs best but need more time