Semantic-aware Scene Recognition 阅读笔记

最新推荐文章于 2024-06-14 09:47:26 发布

maths_girl

最新推荐文章于 2024-06-14 09:47:26 发布

阅读量724

点赞数

分类专栏：场景分类

本文链接：https://blog.csdn.net/maths_girl/article/details/113240925

版权

本文介绍了2020年发表在Pattern Recognition上的一篇论文，研究如何解决场景分类中的语义歧义性问题。论文提出了一种端到端的多模态CNN架构，通过结合图像和语义分割信息，利用注意力模块增强学习到的场景内容，提升场景识别的准确性。在MIT Indoor 67, SUN 397 和 Places365 数据集上取得最先进的结果。" 78516725,6751841,使用OpenCV创建与遍历3D图像,"['图像处理', 'OpenCV', '计算机视觉', '矩阵操作']

摘要由CSDN通过智能技术生成

Semantic-aware Scene Recognition阅读笔记

该论文是发表在 2020 年 Pattern Recognition 的论文，作者是西班牙的学者。

Abstract:

场景分类的主要问题：
Semantic ambiguity （语义歧义性）: images of several scene classes may share similar objects, which causes confusion among them. The problem is aggravated when images of a particular scene class are notably different.

不同的场景类别会共享相似的物体，从而导致他们语义模糊；
当同一类别的差异越大时，这个问题就越显著。

本文主要贡献：
An end-to-end multi-modal CNN is proposed, which combines image and context information by an attention module .
Context information, in the shape of a semantic segmentation, is used to gate RGB-features by leveraging on information encoded in the semantic representation: the set of scene objects and stuff, and their relative locations.
This gating process reinforces the learning of indicative scene content and enhances scene disambiguation by refocusing the receptive fields of the CNN towards them.

通过注意力模块将图像和上下文信息结合在一起。
通过利用编码在语义特征上的信息（场景物体集合，及其相对位置），上下文信息被用于控制RGB特征。
该控制过程通过将CNN的感受野朝向更具判别性的区域，从而学习到指示场景内容和加强了场景消歧性。

Problem, Challenge, Motivation, and Contribution

Problem: The complexity of the scene recognition task lies partially on the ambiguity between different scene categories showing similar appearances and objects’ distributions: inter-scene boundaries can be blurry, as the set