空间注意力机制：GMA of DPANet

最新推荐文章于 2024-08-01 22:26:03 发布

计算机视觉-Archer

最新推荐文章于 2024-08-01 22:26:03 发布

阅读量510

点赞数

分类专栏：显著/伪装目标检测-SOD/COD顶会算法总结

本文链接：https://blog.csdn.net/zjc910997316/article/details/118768758

版权

显著/伪装目标检测-SOD/COD顶会算法总结专栏收录该内容

66 篇文章 11 订阅

订阅专栏

该文针对跨模态RGB-D数据的互补性和不一致性问题，设计了一种GMA模块，结合注意力机制选择重要特征并防止深度图污染。通过空间注意力减少单模态特征冗余，增强显著区域的特征响应。同时，采用双对称注意力子模块捕捉跨模态的长期依赖。该方法旨在提升RGB-D显著性检测的准确性。

摘要由CSDN通过智能技术生成

Taken into account that there exist complementarity and inconsistency of the cross-modal RGB-D data, directly integrating the cross-modal information may induce negative results, such as contaminations from unreliable depth maps.
Besides, the features of the single modality usually are affluent in spatial or channel aspect, but also include information redundancy.
To cope with these issues, we design a GMA module that exploits the attention mechanism to automatically select and strengthen important features for saliency detection, and incorporate the gate controller into the GMA module to prevent the contamination from the unreliable depth map.

To reduce the redundancy of single-modal features and highlight the feature response on the salient regions, we apply spatial attention (see ‘S’ in Fig. 3) to the input feature rbi and dbi, respectively.
The process can be described as:

考虑到跨模态RGB-D数据的互补性和不一致性，直接整合跨模态信息可能会导致负面结果，如不可靠深度图的污染。
此外，单模态的特点通常是空间或通道方面丰富，但也包含信息冗余。
针对这些问题，我们设计了一个GMA模块，利用注意机制自动选择和加强重要特征以进行显著性检测，并将门控制器纳入GMA模块，以防止不可靠的深度图造成污染。

为了减少单模态特征的冗余，突出突出显著区域上的特征响应，我们对输入特征rbi和dbi分别应用空间注意(图3中的S)。
这个过程可以描述为:

where fin represents the input feature of the RGB branch or depth branch (i.e., rbi or dbi),
convi (i = 1,2) refers to the convolution layer,
⊙ denotes element-wise multiplication,
δ is the ReLU activation function,
and fout represents the modified RGB/depth feature (i.e., rbi or dbi).
The channels of modified feature rbi, dbi are unified into 256 dimensions at each stage.
Note that, the weights are not shared for the RGB and depth branches in our model

Further, inspired by the success of self-attention [54], [55], we design two symmetrical attention sub-modules to capture long-range dependencies from a cross-modal perspective. Taking Adr in Fig. 3 as an example, Adr exploits the depth information to generate a spatial weight for RGB feature rbi, as depth cues usually can provide helpful information

(e.g., the coarse location of salient objects) for RGB branch. Technically, we first apply 1 × 1 convolution operation to ̃ C1×(HW) C1×(HW) projectthedbi intoWq ∈R ,Wk ∈R ,and

̃ C×(HW) projecttherbiintoWv∈R ,whereC,z,Wreferto the channel, height, width of the feature Wv, respectively, and C1 is set to 1/8 of C for computation efficiency. We compute the enhanced feature as follows: