空间注意力机制:GMA of DPANet

该文针对跨模态RGB-D数据的互补性和不一致性问题,设计了一种GMA模块,结合注意力机制选择重要特征并防止深度图污染。通过空间注意力减少单模态特征冗余,增强显著区域的特征响应。同时,采用双对称注意力子模块捕捉跨模态的长期依赖。该方法旨在提升RGB-D显著性检测的准确性。
摘要由CSDN通过智能技术生成

Taken into account that there exist complementarity and inconsistency of the cross-modal RGB-D data, directly integrating the cross-modal information may induce negative results, such as contaminations from unreliable depth maps.
Besides, the features of the single modality usually are affluent in spatial or channel aspect, but also include information redundancy.
To cope with these issues, we design a GMA module that exploits the attention mechanism to automatically select and strengthen important features for saliency detection, and incorporate the gate controller into the GMA module to prevent the contamination from the unreliable depth map.

To reduce the redundancy of single-modal features and highlight the feature response on the salient regions, we apply spatial attention (see ‘S’ in Fig. 3) to the input feature rbi and dbi, respectively.
The process can be described as:

考虑到跨模态RGB-D数据的互补性和不一致性,直接整合跨模态信息可能会导致负面结果,如不可靠深度图的污染。
此外,单模态的特点通常是空间或通道方面丰富,但也包含信息冗余。
针对这些问题,我们设计了一个GMA模块,利用注意机制自动选择和加强重要特征以进行显著性检测,并将门控制器纳入GMA模块,以防止不可靠的深度图造成污染

为了减少单模态特征的冗余,突出突出显著区域上的特征响应,我们对输入特征rbi和dbi分别应用空间注意(图3中的S)。
这个过程可以描述为:

where fin represents the input feature of the RGB branch or depth branch (i.e., rbi or dbi), 
convi (i = 1,2) refers to the convolution layer, 
⊙ denotes element-wise multiplication, 
δ is the ReLU activation function,
and fout represents the modified RGB/depth feature (i.e., rbi or dbi).
The channels of modified feature rbi, dbi are unified into 256 dimensions at each stage.
Note that, the weights are not shared for the RGB and depth branches in our model

Further, inspired by the success of self-attention [54], [55], we design two symmetrical attention sub-modules to capture long-range dependencies from a cross-modal perspective. Taking Adr in Fig. 3 as an example, Adr exploits the depth information to generate a spatial weight for RGB feature rbi, as depth cues usually can provide helpful information

(e.g., the coarse location of salient objects) for RGB branch. Technically, we first apply 1 × 1 convolution operation to ̃ C1×(HW) C1×(HW) projectthedbi intoWq ∈R ,Wk ∈R ,and

̃ C×(HW) projecttherbiintoWv∈R ,whereC,z,Wreferto the channel, height, width of the feature Wv, respectively, and C1 is set to 1/8 of C for computation efficiency. We compute the enhanced feature as follows:

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

计算机视觉-Archer

图像分割没有团队的同学可加群

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值