(2021 ICCV)Specificity-preserving RGB-D Saliency Detection(A类)

# 一、作者

​ Tao Zhou , Huazhu Fu ,Geng Chen ,Yi Zhou ,Deng-Ping Fan, Ling Shao

二、地址

2.1 原文地址

ICCV 地址

2.2 代码地址

代码

三、摘要

​ RGB-D saliency detection has attracted increasing attention, due to its effectiveness and the fact that depth cues can now be conveniently captured. Existing works often focus on learning a shared representation through various fusion strategies, with few methods explicitly considering how to preserve modality-specific characteristics. In this paper, taking a new perspective, we propose a specificity preserving network (SP-Net) for RGB-D saliency detection, which benefits saliency detection performance by exploring both the shared information and modality-specific properties (e.g., specificity). Specifically, two modality-specific networks and a shared learning network are adopted to generate individual and shared saliency maps. A cross enhanced integration module (CIM) is proposed to fuse cross-modal features in the shared learning network, which are then propagated to the next layer for integrating cross level information. Besides, we propose a multi-modal feature aggregation (MFA) module to integrate the modality specific features from each individual decoder into the shared decoder, which can provide rich complementary multi-modal information to boost the saliency detection performance. Further, a skip connection is used to combine hierarchical features between the encoder and decoder layers. Experiments on six benchmark datasets demonstrate that our SP-Net outperforms other state-of-the-art methods.

四、主要内容

4.1 主要工作

• We propose a novel specificity-preserving network for RGB-D saliency detection (SP-Net), which can explore the shared information as well as preserve modality-specific characteristics.

• We propose a cross-enhanced integration module (CIM) (互补互动模块) to fuse the cross-modal features and learn shared representations for the two modalities. The output of each CIM is then propagated to the next layer to capture cross-level information.

• We propose a simple but effective multi-modal feature aggregation (MFA) module to integrate these learned modality-specific features. It makes full use of the features learned in the modality-specific decoder to boost the saliency detection performance.

• Extensive experiments on six public datasets demonstrate the superiority of our model over thirty benchmarking methods. Moreover, we carry out an attributebased evaluation to study the performance of many state-of-the-art RGB-D saliency detection methods under different challenging factors (e.g., number of salient objects, indoor or outdoor environments, and light conditions), which has not been done previously by existing studies.

4.2 网路结构图

4.2.1 总体结构图 (Res2Net-50)

在这里插入图片描述

​ The overall architecture of the proposed SP-Net. Our model consists of two modality-specific learning networks and a shared learning network. The modality-specific learning networks are used to preserve the individual properties for each modality (i.e., RGB or depth), while the shared network is used to fuse cross-modal features and explore their complementary information. Skip connections are adopted to combine hierarchical features between the encoder and decoder layers. The learned features from the modality-specific decoder are integrated into the shared decoder to provide rich multi-modal complementary information for boosting saliency detection performance. Here, “C” denotes feature concatenation.

4.2.2 CIM (to learn shared feature representation)

SPNet_CIM

4.2.3 MFA (to integrate features into the shared decoder)

SPNet_MFA

4.3 LOSS

L t o t a l = L s h ( S s h , G ) + L s p ( S R , G ) + L s p ( S D , G ) L s p : m o d a l i t y − s p e c i f i c L s h : s h a r e d   d e c o d e r s S R , S D : p r e d i c t i o n   m a p s   w h e n   u s i n g   R G B   a n d   d e p t h   i m a g e s S s h : p r e d e c t i o n   m a p   u s i n g   t h e i r   s h a r e d   r e p r e s e n t a t i o n G : g r o u n d   t r u t h   m a p L_{total}=L_{sh}(S_{sh},G)+L_{sp}(S_R,G)+L_{sp}(S_D,G)\\ L_{sp}:modality-specific\\ L_{sh}:shared\ decoders\\ S_R,S_D:prediction\ maps\ when\ using\ RGB\ and\ depth\ images\\ S_{sh}:predection\ map\ using\ their\ shared\ representation\\ G: ground\ truth \ map Ltotal=Lsh(Ssh,G)+Lsp(SR,G)+Lsp(SD,G)Lsp:modalityspecificLsh:shared decodersSR,SD:prediction maps when using RGB and depth imagesSsh:predection map using their shared representationG:ground truth map

五、评估材料

​ S-measure、Eϕ 、F-measure、MAE

S α = α ∗ S o + ( 1 − α ) ∗ S r S r : r e g i o n   p e r c e p t i o n S o : o b j e c t   p e r c e p t i o n S_\alpha = \alpha * S_o + (1-\alpha)*S_r \\ S_r :region\ perception\\ S_o: object\ perception Sα=αSo+(1α)SrSr:region perceptionSo:object perception

E ϕ = 1 W ∗ H ∑ i = 1 W ∑ j = 1 H ϕ F M ( i , j ) E_\phi=\frac{1}{W*H}\sum_{i=1}^{W}\sum_{j=1}^{H}\phi FM(i, j) Eϕ=WH1i=1Wj=1HϕFM(i,j)

F β = ( 1 + β 2 ) P ∗ R β 2 P + R F_\beta = (1+\beta^2)\frac{P*R}{\beta ^2 P+ R} Fβ=(1+β2)β2P+RPR

六、结论

​ In this paper, we have proposed a novel SP-Net for RGB-D saliency detection. Different from most existing works, which mainly focus on learning shared representations, our model not only explores the shared cross-modal information but also compensates modality-specific characteristics to boost the saliency detection performance. Besides, the proposed CIM can propagate information across modalities and layers, while our MFA module can provide specific properties to the shared decoder to enhance the complementary multi-modal information. Quantitative and qualitative evaluations conducted on six challenging benchmark datasets demonstrate the superiority of our SP-Net over other existing RGB-D saliency detection approaches. In the future, we can apply our model to the light field saliency detection task .

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
subspace-preserving(保子空间性质)是一个数学概念,在线性代数和数值分析中经常被使用。它是指一个线性变换或者一个算法在转换过程中能够保持原始向量所处的子空间结构不变。 子空间是向量空间的一个重要的概念,它是由向量空间中某些向量的线性组合所构成的。子空间具有一些特定的性质,包括零向量在其中、封闭性和确定性等。 当一个线性变换或者一个算法具有保子空间性质时,它意味着在变换过程中原始向量所处的子空间结构保持不变。换句话说,任意向量在该变换或算法之后仍然可以由原始向量的线性组合表示。 举个例子,假设有一个二维平面上的向量集合,它们构成了一个平面子空间。如果一个线性变换或算法具有保子空间性质,那么经过该变换或算法处理后,原始平面子空间中的向量仍然可以线性组合表示,且在新的向量集合中仍然构成一个平面子空间。 保子空间性质在很多数学和工程应用中都是非常重要的。例如,在信号处理中,当利用线性变换对信号进行处理时,保子空间性质能够确保信号在变换过程中不产生额外的信息丢失。在图像处理中,保子空间性质能够确保图像在压缩和重构过程中能够保持较好的视觉质量。 总之,subspace-preserving(保子空间性质)是指一个线性变换或者一个算法在转换过程中能够保持原始向量所处的子空间结构不变。这个概念在数学和工程应用中具有很大的实用性和重要性。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值