(2021 ICCV) Disentangled High Quality Salient Object Detection (A类)

一、作者

​ Lv Tang ,Bo Li1,Yijie Zhong ,Shouhong Ding, Mofei Song2, Youtu Lab

二、地址

2.1 原文地址

ICCV 地址

2.2 代码地址

代码

三、摘要

​ Aiming at discovering and locating most distinctive objects from visual scenes, salient object detection (SOD) plays an essential role in various computer vision systems. Coming to the era of high resolution, SOD methods are facing new challenges. The major limitation of previous methods is that they try to identify the salient regions and estimate the accurate objects boundaries simultaneously with a single regression task at low-resolution. This practice ignores the inherent difference between the two difficult problems, resulting in poor detection quality. In this paper, we propose a novel deep learning framework for high-resolution SOD task, which disentangles the task into a low-resolution saliency classification network (LRSCN) and a high-resolution refinement network (HRRN). As apixel-wise classification task, LRSCN is designed to capture sufficient semantics at low-resolution to identify the definite salient(most pixels inside the salient object have the highest saliency value), background( most pixels in the background regions have the lowest salient value) and uncertain image regions(saliency values of the pixels at blurry object boundaries fluctuate between 0 and 1). HRRN is a regression task, which aims at accurately refining the saliency value of pixels in the uncertain region to preserve a clear object boundary at high-resolution with limited GPU memory. It is worth noting that by introducing uncertainty into the training process, our HRRN can well address the high-resolution refinement task without using any high-resolution training data. Extensive experiments on high-resolution saliency datasets as well as some widely used saliency benchmarks show that the method achieves superior performance compared to the state-of -the-art methods.

四、主要内容

4.1 主要工作

  • We provide a new perspective that high-resolution salient object detection should be disentangled into two tasks, and demonstrate that the disentanglement of the two tasks is essential for improving the performance of DNN based SOD models.
  • Motivated by the principle of disentanglement, we propose a novel framework for high-resolution salient object detection, which uses LRSCN to capture sufficient semantics at low-resolution and HRRN for accurate boundary refinement at high-resolution.
  • We make the earliest efforts to introduce the uncertainty into SOD network training, which empowers HRRN to well address the high-resolution refinement task without any high-resolution training datasets.
  • We perform extensive experiments to demonstrate the proposed method refreshes the SOTA performance on high-resolution saliency datasets as well as some widely used saliency benchmarks by a large margin.

4.2 网络结构 (VGG-16)

DHQNet

4.3 MECF and AGA

4.3.1 ME ( based on Global Convolutional Network (GCN))

DHQNet_GCN

4.3.2 CF (utilize cross-level feature fusion module)

4.3.3 SGA ( guarantees the alignment of trimap and saliecny map)

DHQNet_mecf and aga

4.4 LOSS

4.4.1 LRSCN Loss

$ T^{gt}$ : trimap groundtruth

T g t ( x , y ) = { 2 ,   T g t ( x , y ) ϵ   d e f i n i t e   s a l i e n t 0 , T g t ( x , y ) ϵ   d e f i n i t e   b a c k g r o u n d 1 , T g t ( x , y ) ϵ   d e f i n i t e   u n c e r t a i n r e g i o n T^{gt}(x, y) = \begin{cases} 2, \ T^{gt}(x,y)\epsilon \ definite\ salient \\ 0,T^{gt}(x,y)\epsilon \ definite\ background \\ 1, T^{gt}(x,y)\epsilon \ definite\ uncertain region \end{cases} Tgt(x,y)=2, Tgt(x,y)ϵ definite salient0,Tgt(x,y)ϵ definite background1,Tgt(x,y)ϵ definite uncertainregion

L t r i m a p = 1 N ∑ i − l o g ( e T i ∑ j e T j ) L_{trimap}=\frac{1}{N}\sum_i-log(\frac{e^{T_i}}{\sum_j e^{T_j}}) Ltrimap=N1ilog(jeTjeTi)

L L R S C N = L s a l i e n c y + L t r i m a p L s a l i e n c y : B C E + S S I M + F − m e a s u r e L_{LRSCN}=L_{saliency}+L_{trimap}\\ L_{saliency}:BCE+SSIM+F-measure LLRSCN=Lsaliency+LtrimapLsaliency:BCE+SSIM+Fmeasure

4.4.2 HRRN Loss

​ uncertainty loss will make the weight of the loss in the uncertainty region be small and let the network ignore effects from noisy data as much as possible.

L 1 = 1 E ∑ i ϵ E ∣ S i H − G i H ∣ E : n u m b e   o f   p i x e l s L_1 = \frac{1}{E} \sum_{i\epsilon E}|S_i^H - G_i^H| \\ E:numbe\ of\ pixels L1=E1iϵESiHGiHE:numbe of pixels

L u n c e r t a i n t y = 1 U ∑ i ϵ U ∣ ∣ S i H − G i H ∣ ∣ 2 2 σ i 2 + 1 2 l o g σ i 2 U : t o t a l   n u m b e r   o f   p i x e l s   i n   u n c e r t a i n   r e g i o n L_{uncertainty} = \frac{1}{U} \sum_{i\epsilon U}\frac{{||S_i^H - G_i^H||}^2}{2\sigma_i^2}+ \frac{1}{2}log\sigma_i^2 \\ U:total\ number\ of\ pixels\ in\ uncertain\ region Luncertainty=U1iϵU2σi2SiHGiH2+21logσi2U:total number of pixels in uncertain region

L H R R N = L u n c e r t a i n t y + L 1 L_{HRRN} = L_{uncertainty} + L_1 LHRRN=Luncertainty+L1

五、评估材料

  • MAE
  • F-measure ( F β F_\beta Fβ and F β m a x F_\beta ^{max} Fβmax)
  • Structure Measure
  • RP曲线
  • BDE ( Boundary Displacement Error) 边界漂移误差
  • B μ B_\mu Bμ

六、结论

​ In this paper, we argue that there are two difficult and inherently different problems in high-resolution SOD. From this perspective, we propose a novel deep learning framework to disentangle the high-resolution SOD into two tasks: LRSCN and HRRN. LRSCN can identify the definite salient, background and uncertain regions at low-resolution with sufficient semantics. While HRRN can accurately refining the saliency value of pixels in the uncertain region to preserve a clear object boundary at high-resolution with limited GPU memory. We also make the earliest efforts to introduce the uncertainty into SOD network training, which empower HRRN to learn rich details without using any high-resolution training datasets. Extensive evaluations on high-resolution datasets and popular benchmark datasets not only verify the superiority of our method but also demonstrate the importance of disentanglement for SOD. We believe our novel disentanglement view in this work can contribute to other high-resolution computer vision tasks in the future.

DHQNet_result

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值