Referring Image Segmentation via Recurrent Refinement Networks
略读 motivation
This work segments image object from natural language descriptions, which requires combining natural language processing with visual understanding technology. The researchers aim to generate high-quality segmentation masks by incorporating multi-scale information to refine segmentation results.
Temporal Deformable Residual Networks for Action Segmentation in Videos
略读,motivation
This work tackles action segmentation in videos. It proposes a temporal deformable residual network (TDRN) to analyse video intervals at multiple temporal scales. TDRN consists of two temporal streams: i) Residual stream that analyses video information at its full temporal resolution. ii) Pooling/Unpooling stream that captures long-range video information at different scale. The former facilitates local, fine-scale action segmentation, while the latter use multiscale context for improving frame classification accuracy.
启发
- temporal convolution是什么?与3d-convolution相同吗?用起来效果怎么样?
Context Encoding for Semantic Segmentation
Research Background
Given an image, semantic segmentation can assigns pre-pixel predictions of object categories, which provides comprehensive sense description including the information of object category, location and shape. The FCN based frameworks adopt several strategies, i.e., dilated convolution and multi-resolution pyramid-based feature representation, to enlarge the reception field so as to alleviate spatial resolution loss. However, contextual information is not well-explored.
Motivation and Proposed approach
This work believes that being aware of sense context and narrow the list of probable categories can make semantic segmentation much easier. The classic computer vision approaches encode global context information by capturing feature statistics. This work extend the Encoding Layer (former works) to capture global feature statistics for understanding semantic context.
优点与启发
- 借鉴texture classification的工作,对于semantic segmentation学习context embedding。相比于之前的context模型,本研究似乎初次做到了context的感知。想清楚什么办法可以解决问题,先不用在意实现的难度,然后从本领域或其它领域找可以借鉴的工作,实现想法
- 本文提出的semantic encoding loss对于大物体和小物体贡献相同,可以参考类似的想法提升video object segmentation任务中小物体的性能
潜在的不足
- 如果某一类出现,将其对应的feature map highlight,则这一层channel数目等于 类别数目?
- 感觉context embedding 可以做得更深入