【论文翻译】Taking A Closer Look at Domain Shift:Category-level Adversaries for Semantics Consistent Domai

本文提出了一种类别级对抗网络(CLAN),用于解决无监督域适应中的语义不一致问题。传统的全局对齐策略忽视了类别级别的联合分布,可能导致已对齐的特征被错误映射。CLAN通过自适应地调整对抗性损失,根据类别级别对齐程度对每个特征加权,防止良好对齐的特征被误映射,从而在GTA5 → Cityscapes和SYNTHIA → Cityscapes任务中实现了与最先进的分割精度相当的结果。
摘要由CSDN通过智能技术生成

Taking A Closer Look at Domain Shift:Category-level Adversaries for Semantics Consistent Domain Adaptation

仔细研究域转移:语义一致域适应的类别级对手

Abstract

We consider the problem of unsupervised domain adap- tation in semantic segmentation. A key in this campaign consists in reducing the domain shift, i.e., enforcing the data distributions of the two domains to be similar. One of the common strategies is to align the marginal distribution in the feature space through adversarial learning. How- ever, this global alignment strategy does not consider the category-level joint distribution. A possible consequence of such global movement is that some categories which are originally well aligned between the source and target may be incorrectly mapped, thus leading to worse segmentation results in target domain. To address this problem, we introduce a category-level adversarial network, aiming to enforce local semantic consistency during the trend of global alignment. Our idea is to take a close look at the category-level joint distribution and align each class with an adaptive adversar- ial loss. Specifically, we reduce the weight of the adversarial loss for category-level aligned features while increasing the adversarial force for those poorly aligned. In this process, we decide how well a feature is category-level aligned be- tween source and target by a co-training approach. In two domain adaptation tasks, i.e., GTA5 → Cityscapes and SYN- THIA → Cityscapes, we validate that the proposed method matches the state of the art in segmentation accuracy.


我们考虑语义分割中的无监督域适应问题。 该活动的一个关键在于减少域转移,即强制两个域的数据分布相似。 一种常见的策略是通过对抗性学习来对齐特征空间中的边缘分布。 然而,这种全局对齐策略没有考虑类别级别的联合分布。 这种全局移动的一个可能后果是,一些原本在源和目标之间对齐良好的类别可能被错误地映射,从而导致目标域中的分割结果更差。

为了解决这个问题,我们引入了一个类别级别的对抗网络,旨在在全局对齐的趋势中强制执行局部语义一致性。 我们的想法是仔细研究类别级别的联合分布,并将每个类别与自适应对抗损失对齐。

具体来说,我们减少了类别级别对齐特征的对抗性损失的权重,同时增加了那些对齐不佳的对抗性损失的权重。 在这个过程中,我们通过协同训练方法确定一个特征在源和目标之间的类别级别对齐程度。

在两个领域适应任务中,即 GTA5 → Cityscapes 和 SYNTHIA → Cityscapes,我们验证了所提出的方法在分割精度方面与现有技术相匹配。

1. Introduction

Semantic segmentation aims to assign each pixel of a photograph to a semantic class label. Currently, the achieve- ment is at the price of large amount of dense pixel-level annotations obtained by expensive human labor [4, 23, 26]. An alternative would be resorting to simulated data, such as computer generated scenes [30, 31], so that unlimited amount of labels are made available. However, models trained with the simulated images do not generalize well to realistic domains. The reason lies in the different data distributions of the two domains, typically known as domain shift [36]. To address this issue, domain adaptation approaches [34, 40, 14, 45, 17, 16, 13, 47] are proposed to bridge the gap between the source and target domains. A majority of recent methods [24, 39, 42, 41] aim to align the feature distributions of different domains. Works along this line are based on the theoretical insights in [1] that mini- mizing the divergence between domains lowers the upper bound of error on the target domain. Among this cohort of domain adaptation methods, a common and pivotal step is minimizing some distance metric between the source and target feature distributions [24, 39]. Another popular choice, which borrows the idea from adversarial learning [10], is to minimize the accuracy of domain prediction. Through a min- imax game between two adversarial networks, the generator is trained to produce features that confuse the discriminator while the latter is required to correctly classify which domain the features are generated from.

语义分割旨在将照片的每个像素分配给语义类标签。 目前,这一成就是以昂贵的人工获得的大量密集像素级注释为代价的[4,23,26]。 另一种方法是使用模拟数据,例如计算机生成的场景 [30, 31],以便提供无限数量的标签。 然而,用模拟图像训练的模型不能很好地推广到现实领域。 原因在于两个域的不同数据分布,通常称为域移位[36]。 为了解决这个问题,提出了域适应方法[34、40、14、45、17、16、13、47]来弥合源域和目标域之间的差距。 大多数最近的方法 [24, 39, 42, 41] 旨在对齐不同域的特征分布。 沿着这条路线的工作是基于 [1] 中的理论见解,即最小化域之间的差异会降低目标域的误差上限。

在这一系列域适应方法中,一个常见且关键的步骤是最小化源特征分布和目标特征分布之间的一些距离度量[24, 39]。 另一个流行的选择是从对抗性学习 [10] 中借鉴思想,即最小化领域预测的准确性。 通过两个对抗网络之间的极小极大博弈,生成器被训练以产生混淆判别器的特征,而判别器需要正确分类特征是从哪个域生成的。


Although the works along the path of adversarial learning have led to impressive results [38, 15, 22, 19, 42, 35], they suffer from a major limitation: when the generator net- work can perfectly fool the discriminator, it merely aligns the global marginal distribution of the features in the two domains ( i.e., P ( F s ) ≈ P ( F t ) P(F_s) ≈ P(F_t) P(Fs)P(Ft), where Fs and Ft denote the features of source and target domain in latent space) while ignores the local joint distribution shift, which is closely related to the semantic consistency of each category (i.e., P ( F s , Y s ) ≠ P ( F t , Y t ) P(F_s,Y_s) \ne P(F_t,Y_t) P(Fs,Ys)=P(Ft,Yt), where Ys and Yt denote the cat- egories of the features).

As a result, the de facto use of the adversarial loss may cause those target domain features, which are already well aligned to their semantic counterpart in source domain, to be mapped to an incorrect semantic category (negative transfer). This side effect becomes more severe when utilize a larger weight on the adversarial loss.

尽管对抗性学习路径上的工作取得了令人印象深刻的结果 [38,15,22,19,42,35],但它们存在一个主要限制:当生成器网络可以完美地欺骗鉴别器时,它仅对齐两个域中特征的全局边缘分布(即 P(Fs) ≈ P(Ft),其中 Fs 和 Ft 表示潜在空间中源域和目标域的特征),而忽略局部联合分布偏移, 这与每个类别的语义一致性密切相关(即 P(Fs,Ys) ≠ \ne = P(Ft,Yt),其中 Ys 和 Yt 表示特征的类别)。

结果,对抗性损失的实际使用可能导致那些已经与源域中的语义对应物很好对齐的目标域特征被映射到不正确的语义类别(负迁移)。 当对对抗性损失使用更大的权重时,这种副作用会变得更加严重。


To address the limitation of the global adversarial learning, we propose a category-level adversarial network (CLAN), prioritizing category-level alignment which will naturally lead to global distribution alignment. The cartoon comparison of traditional adversarial learning and the pro- posed one is shown in Fig. 1. The key idea of CLAN is two-fold. First, we identify those classes whose features are already well aligned between the source and target do- mains, and protect this category-level alignment from the side effect of adversarial learning. Second, we identify the classes whose features are distributed differently between the two domains and increase the weight of the adversarial loss during training. In this process, we utilize co-training [46], which enables high-confidence predictions with two diverse classifiers, to predict how well each feature is se- mantically aligned between the source and target domains. Specifically, if the two classifiers give consistent predictions, it indicates that the feature is predictive and achieves good semantic alignment. In such case, we reduce the influence of the adversarial loss in order to encourage the network to generate invariant features that can keep semantic consistency between domains. On the contrary, if the predictions dis- agree with each other, which indicates that the target feature is far from being correctly mapped, we increase the weight of the adversarial loss on that feature so as to accelerate the alignment. Note that 1) Our adversarial learning scheme acts directly on the output space. By regarding the output predic- tions as features, the proposed method jointly promotes the optimization for both classifier and extractor; 2) Our method does not guarantee rigorous joint distribution alignment be- tween domains. Yet, compared with marginal distribution alignment, our method can map the target features closer (or no negative transfer at worst) to the source features of the same categories.

Figure 1. (Best viewed in color.) Illustration of traditional and the proposed adversarial learning. The size of the solid gray arrow represents the weight of the adversarial loss. (a) Traditional adver- sarial learning ignores the semantic consistency when pursuing the marginal distribution alignment. As a result, the global movement might cause the well-aligned features (class A) to be mapped onto different joint distributions (negative transfer). (b) The proposed self-adaptive adversarial learning reweights the adversarial loss for each feature by a local alignment score. Our method reduces the influence of the adversaries when discovers a high semantic align- ment score on a feature, and vice versa. As is shown, the proposed strategy encourages a category-level joint distribution alignment for both class A and class B.
图 1.(最好以彩色显示。)传统和提议的对抗性学习的插图。 实心灰色箭头的大小代表对抗性损失的权重。 (a) 传统的对抗学习在追求边缘分布对齐时忽略了语义一致性。 因此,全局运动可能会导致对齐良好的特征(A 类)映射到不同的联合分布(负迁移)。 (b) 所提出的自适应对抗学习通过局部对齐分数重新加权每个特征的对抗损失。 我们的方法在发现特征上的高语义对齐分数时减少了对手的影响,反之亦然。 如图所示,所提出的策略鼓励 A 类和 B 类的类别级联合分布对齐。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-G6MP54Q1-1653198789584)(https://gitee.com/zyxstar/Pic_bed/raw/master/image/image-20220430221313771.png)]

为了解决全局对抗学习的局限性,我们提出了一个类别级对抗网络(CLAN),优先考虑类别级对齐,这自然会导致全局分布对齐。 传统对抗学习与提议的对抗学习的卡通比较如图 1 所示。

CLAN 的关键思想:

  1. 我们识别那些特征已经在源域和目标域之间很好地对齐的类,并保护这种类别级别的对齐免受对抗性学习的副作用。
  2. 我们识别特征在两个域之间分布不同的类,并在训练期间增加对抗性损失的权重。

在这个过程中,我们利用协同训练[46],它可以使用两个不同的分类器进行高置信度的预测,以预测每个特征在源域和目标域之间的语义对齐程度。

  • 具体来说,如果两个分类器给出一致的预测,则表明该特征具有预测性并实现了良好的语义对齐。 在这种情况下,我们减少对抗性损失的影响,以鼓励网络生成可以保持域之间语义一致性的不变特征。
  • 相反,如果预测彼此不一致,这表明目标特征远未正确映射,我们会增加对抗性损失对该特征的权重以加速对齐。

请注意,1)我们的对抗学习方案直接作用于输出空间。 通过将输出预测视为特征,该方法共同促进了分类器和提取器的优化; 2)我们的方法不能保证域之间严格的联合分布对齐。 然而,与边缘分布对齐相比,我们的方法可以将目标特征映射到更接近(或者最坏情况下没有负迁移)到相同类别的源特征。


The main contributions are summarized below.

主要贡献总结如下。

By proposing to adaptively weight the adversarial loss for different features, we emphasize the importance of category-level feature alignment in reducing domain shift.

通过提议对不同特征的对抗性损失进行自适应加权,我们强调了类别级特征对齐在减少域偏移中的重要性。

Our results are on par with the state-of-the-art UDA methods on two transfer learning tasks, i.e., GTA5 [30] → Cityscapes [8] and SYNTHIA [31] → Cityscapes.

我们的结果在两个迁移学习任务上与最先进的 UDA 方法相当,即 GTA5 [30] → Cityscapes [8] 和 SYNTHIA [31] → Cityscapes。

2. Related Works

This section will focus on adversarial learning and co- training techniques for unsupervised domain adaptation, which form the two main motivations of our method.

本节将重点介绍用于无监督域适应的对抗性学习协同训练技术,它们构成了我们方法的两个主要动机。


Adversarial learning. Ben-David et al. [1] had proven that the adaptation loss is bounded by three terms, e.g., the expect loss on source domain, the domain divergence, and the shared error of the ideal joint hypothesis on the source and target domain. Because the first term corresponds to the well-studied supervised learning problems and the third term is considered sufficiently low to achieve an accurate adaptation, the majority of recent works lay emphasis on the second term. Adversarial adaptation methods are good examples of this type of approaches and can be investigated on different levels. Some methods focus on the distribution shift in the latent feature space [38, 15, 22, 19, 42, 35]. In an example, Hoffman et al. [15] appended category statistic constraints to the adversarial model, aiming to improve se- mantic consistency in target domain. Other methods address the adaption problem on the pixel level [21, 3], which relate to the style transfer approaches [48, 7] to make images indis- tinguishable across domains. A joint consideration of pixel and feature level domain adaptation is studied in [14]. Be- sides alignment in the bottom feature layers, Tsai et al. [40] found that aligning directly the output space is more effective in semantic segmentation. Domain adaptation in the output space enables the joint optimization for both prediction and representation, so our method utilizes this advantage.

对抗性学习。 本大卫等人。 [1] 已经证明,适应损失受三个项的限制:

  1. 源域的期望损失
  2. 域发散
  3. 源域和目标域的理想联合假设的共享误差。

因为第一项对应于经过充分研究的监督学习问题,而第三项被认为足够低以实现准确的适应,所以最近的大多数工作都强调第二项。

对抗性适应方法是这类方法的好例子,可以在不同层次上进行研究。 一些方法专注于潜在特征空间中的分布变化[38,15,22,19,42,35]。 在一个例子中,霍夫曼等人。 [15] 将类别统计约束附加到对抗模型中,旨在提高目标域的语义一致性。 其他方法解决了像素级别的适应问题 [21, 3],这与样式迁移方法 [48, 7] 有关,以使图像在不同域之间无法区分。 在[14]中研究了像素和特征级域自适应的联合考虑。 除了底部特征层的对齐之外,Tsai 等人[40]发现直接对齐输出空间在语义分割中更有效。 输出空间中的域适应可以实现预测和表示的联合优化,因此我们的方法利用了这一优势。


Co-training. Co-training [46] belongs to multi-view learning in which learners are trained alternately on two distinct views with confident labels from the unlabeled data. In UDA, this line of methods [43, 5, 32, 25] are able to as- sign pseudo labels to unlabeled samples in the target domain, which enables direct measurement and minimization the classification loss on target domain. In general, co-training enforces the two classifiers to be diverse in the learned pa- rameters, which can be achieved via dropout [33], con

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值