Semantic-aware Scene Recognition阅读笔记
该论文是发表在 2020 年 Pattern Recognition 的论文,作者是西班牙的学者。
Abstract:
场景分类的主要问题:
Semantic ambiguity (语义歧义性): images of several scene classes may share similar objects, which causes confusion among them. The problem is aggravated when images of a particular scene class are notably different.
- 不同的场景类别会共享相似的物体,从而导致他们语义模糊;
- 当同一类别的差异越大时,这个问题就越显著。
本文主要贡献:
An end-to-end multi-modal CNN is proposed, which combines image and context information by an attention module .
Context information, in the shape of a semantic segmentation, is used to gate RGB-features by leveraging on information encoded in the semantic representation: the set of scene objects and stuff, and their relative locations.
This gating process reinforces the learning of indicative scene content and enhances scene disambiguation by refocusing the receptive fields of the CNN towards them.
- 通过注意力模块将图像和上下文信息结合在一起。
- 通过利用编码在语义特征上的信息(场景物体集合,及其相对位置),上下文信息被用于控制RGB特征。
- 该控制过程通过将CNN的感受野朝向更具判别性的区域,从而学习到指示场景内容和加强了场景消歧性。
Problem, Challenge, Motivation, and Contribution
Problem: The complexity of the scene recognition task lies partially on the ambiguity between different scene categories showing similar appearances and objects’ distributions: inter-scene boundaries can be blurry, as the set