【翻译】Causality Inspired Representation Learning for Domain Generalization

今天发现了更好的翻译工具:https://www.deepl.com/translator

以下翻译都是基于上面的翻译工具进行的翻译。

0.摘要

Domain generalization (DG) is essentially an out-of-distribution problem, aiming to generalize the knowledge learned from multiple source domains to an unseen target domain. The mainstream is to leverage statistical models to model the dependence between data and labels, intending to learn representations independent of domain. Nevertheless, the statistical models are superficial descriptions of reality since they are only required to model dependence instead of the intrinsic causal mechanism. When the dependence changes with the target distribution, the statistic models may fail to generalize. In this regard, we introduce a general structural causal model to formalize the DG problem. Specifically, we assume that each input is constructed from a mix of causal factors (whose relationship with the label is invariant across domains) and non-causal factors (categoryindependent), and only the former cause the classification judgments. Our goal is to extract the causal factors from inputs and then reconstruct the invariant causal mechanisms. However, the theoretical idea is far from practical of DG since the required causal/non-causal factors are unobserved. We highlight that ideal causal factors should meet three basic properties: separated from the non-causal ones, jointly independent, and causally sufficient for the classification. Based on that, we propose a Causality Inspired Representation Learning (CIRL) algorithm that enforces the representations to satisfy the above properties and then uses them to simulate the causal factors, which yields improved generalization ability. Extensive experimental results on several widely used datasets verify the effectiveness of our approach.

领域泛化(DG)本质上是一个分布外问题,旨在将从多个源领域学习到的知识泛化到一个未见过的目标领域。主流方法是利用统计模型来模拟数据与标签之间的依赖关系,从而学习与领域无关的表征。然而,统计模型只是对现实的肤浅描述,因为它们只需要对依赖性而非内在因果机制进行建模。当依赖性随目标分布发生变化时,统计模型可能无法泛化。为此,我们引入了一个通用的结构因果模型来形式化 DG 问题。具体来说,我们假设每个输入都是由因果因素(其与标签的关系在不同领域是不变的)和非因果因素(与类别无关)混合构建而成,只有前者才会导致分类判断。我们的目标是从输入中提取因果因素,然后重建不变的因果机制。然而,由于所需的因果/非因果因素是无法观测到的,因此这一理论想法与 DG 的实际情况相去甚远。我们强调,理想的因果因素应满足三个基本属性:与非因果因素分离共同独立以及分类的因果充分性。在此基础上,我们提出了一种因果关系启发表征学习(CIRL)算法,该算法强制要求表征满足上述属性,然后用它们来模拟因果因素,从而提高泛化能力。在几个广泛使用的数据集上的大量实验结果验证了我们方法的有效性。

1. Introduction

In recent years, with the increasing complexity of tasks in real world, out-of-distribution (OOD) problem has raised a severe challenge for deep neural networks based on the i.i.d. hypothesis [29, 30, 36]. Directly applying the model trained on source domain to an unseen target domain with different distribution typically suffers from a catastrophic performance degradation [17, 35, 37, 66]. In order to deal with the domain shift problem, Domain Generalization (DG) has attracted increasing attention, which aims to generalize the knowledge extracted from multiple source domains to an unseen target domain [2, 25, 28, 41].近年来,随着现实世界中任务的复杂性不断增加,分布外(OOD)问题对基于 i.i.d. 假设的深度神经网络提出了严峻挑战 [29, 30, 36]。将源域上训练好的模型直接应用到分布不同的未知目标域上,通常会出现灾难性的性能下降 [17, 35, 37, 66]。为了解决领域转移问题,领域泛化(Domain Generalization,DG)引起了越来越多的关注,它旨在将从多个源领域提取的知识泛化到一个未知的目标领域[2, 25, 28, 41]。
In order to improve generalization capability, many DG methods have been proposed, which can be roughly categorized into invariant representation learning [12, 28, 31, 40], domain augmentation [62,69,73,78], meta-learning [2,9,26], etc. Though promising results have been achieved, there exists one intrinsic problem with them. These efforts merely try to make up for the problems caused by OOD data and model the statistical dependence between data and labels without explaining the underlying causal mechanisms. It has been argued recently [51] that such practices may not be sufficient, and generalizing well outside the i.i.d. setting requires learning not mere statistical dependence between variables, but an underlying causal model [4, 46, 50, 51, 58, 63]. For instance, in an image classification task, it is very likely that all the giraffes are on the grass, showing high statistical dependence, which could easily mislead the model to make wrong predictions when the background varies in target domain. After all, the characteristics of giraffes such as head, neck, etc., instead of the background make a giraffe giraffe.为了提高泛化能力,人们提出了许多 DG 方法,大致可分为不变表示学习 [12, 28, 31, 40]、领域增强 [62,69,73,78] 和元学习 [2,9,26] 等。虽然这些研究取得了可喜的成果,但也存在一个固有的问题。这些努力只是试图弥补 OOD 数据带来的问题,并对数据和标签之间的统计依赖性进行建模,而没有解释其背后的因果机制。最近[51]有学者认为,这种做法可能还不够,要想在 i.i.d. 设置之外很好地进行泛化,需要学习的不仅仅是变量之间的统计依赖关系,而是潜在的因果模型[4, 46, 50, 51, 58, 63]。例如,在图像分类任务中,很有可能所有的长颈鹿都在草地上,这显示了很高的统计依赖性,当目标域的背景发生变化时,这很容易误导模型做出错误的预测。毕竟,长颈鹿的头部、颈部等特征而非背景才是长颈鹿的特征。
In this paper, we introduce a structural causal model (SCM) [57] to formalize the DG problem, aiming to excavate the intrinsic causal mechanisms between data and labels, and achieve better generalization ability. Specifically, we assume the category-related information in data as causal factors, whose relationship with the label is independent of domain, e.g., "shape" in digit recognition. While the information independent of category is assumed as non-causal factors, which is generally domainrelated information, e.g., "handwriting style" in digit recognition. Each raw data X is constructed from a mix of causal factors S and non-causal factors U and only the former causally effects the category label Y , as shown in Fig. 1. Our goal is to extract the causal factors S from raw input X and then reconstruct the invariant causal mechanisms, which can be done with the aid of causal intervention P (Y ∣do(U ), S). The do-operator do(⋅) [13] denotes intervention upon variables. Unfortunately, we cannot directly factorize raw input as X = f (S, U ) since the causal/non-causal factors are generally unobserved and cannot be formulated, which makes the causal inference particularly challenging [60, 64].本文引入结构因果模型(SCM)[57]来形式化 DG 问题,旨在挖掘数据与标签之间的内在因果机制,实现更好的泛化能力。具体来说,我们将数据中与类别相关的信息假定为因果因素,它们与标签的关系与领域无关,例如数字识别中的 "形状"。而与类别无关的信息则被假定为非因果因素通常是与领域相关的信息,如数字识别中的 "笔迹风格"。如图 1 所示,每个原始数据 X 都是由因果因素 S 和非因果因素 U 混合构建而成,只有前者会对类别标签 Y 产生因果影响。我们的目标从原始输入 X 中提取因果因素 S,然后重建不变的因果机制,这可以借助因果干预 P(Y ∣do(U ), S)来实现。do-operator do(⋅) [13] 表示对变量的干预。遗憾的是,我们无法将原始输入直接因子化为 X = f (S, U ) ,因为因果/非因果因素一般都是无法观察到的,也无法表述,这使得因果推理尤其具有挑战性[60, 64]。

Figure 1. SCM of DG. The solid arrow indicates that the parent node causes the child one; while the dash arrow means there exists statistical dependence.

图 1. DG 的SCM 。实线箭头表示父节点导致子节点;而虚线箭头表示存在统计依赖关系。

In order to make the theoretical idea into practice, we highlight that the causal factors S are expected to satisfy three properties based on the researches in [51, 54, 58]: 1) separated from the non-causal factors U ; 2) the factorization of S should be jointly independent; 3) causally sufficient for the classification task X −→ Y in the sense of containing all the causal information. As shown in Fig. 2 (a), the mixture with U causes S to contain underlying non-causal information, while the jointly dependent factorization makesS redundant, further leading to the miss of some underlying causal information. In contrast, the causal factors S in Fig. 2 (b) are ideal ones that meet all the requirements. Inspired by this, we propose a Causality Inspired Representation Learning (CIRL) algorithm, enforcing the learned representations to possess the above properties and then exploiting each dimension of the representations to mimic the factorization of causal factors, which have stronger generalization ability.为了将理论想法付诸实践,我们强调,根据 [51, 54, 58] 的研究,因果因子 S 应满足三个属性: 1) 与非因果因素 U 分离;2) S 的factorization应该是jointly independent的;3) 在包含所有因果信息的意义上,对分类任务 X -→ Y 来说是因果充分的。如图 2 (a)所示,与 U 混合会导致 S 包含潜在的非因果信息,而the jointly dependent factorization会使 S 成为冗余,进一步导致遗漏某些潜在的因果信息。相比之下,图 2 (b) 中的因果因子 S 是符合所有要求的理想因果因子。受此启发,我们提出了一种因果关系启发表征学习(CIRL)算法,强制要求学习到的表征具有上述特性,然后利用表征的各个维度来模仿因果因子的因果化,因为因果因子的因果化具有更强的泛化能力。
Concisely, for each input, we first exploit a causal intervention module to separate the causal factors S from non-causal factors U via generating new data with perturbed domain-related information. The generated data have different non-causal factors U but the same causal-factors Scompared with the original ones, so the representations are enforced to remain invariant. Besides, we propose a factorization module that makes each dimension of the representations jointly independent and then can be used to approximate the causal factors. Furthermore, to be causally sufficient towards classification, we design an adversarial mask module which iteratively detects dimensions that contain relatively less causal information and forces them to contain more and novel causal information via adversarial learning between a masker and the representation generator. The contributions of our work are as follows:简而言之,对于每个输入,我们首先利用因果干预模块,通过生成带有领域相关扰动信息的新数据,将因果因素 S 与非因果因素 U 分离开来。生成的数据具有不同的非因果因素 U,但与原始数据具有相同的因果因素 Scompared,因此表征会强制保持不变。此外,我们还提出了一个因子化模块,它可以使表征的每个维度共同独立,然后可以用来近似因果因子。此外,为了使分类具有充分的因果关系,我们还设计了一个对抗性掩码模块,通过掩码器和表示生成器之间的对抗性学习,迭代检测包含相对较少因果信息的维度,并强制它们包含更多新的因果信息。我们的工作贡献如下:

• We point out the insufficiency of only modeling statistical dependence and introduce a causality-based view into DG to excavate the intrinsic causal mechanisms.

• We highlight three properties that the ideal causal factors should possess, and propose a CIRL algorithm to learn causal representations that can mimic the causal factors, which have better generalization ability.

• Extensive experiments on several widely used datasets and analytical results demonstrate the effectiveness and superiority of our method.

- 我们指出了仅对统计依赖性建模的不足,并将基于因果关系的观点引入 DG,以挖掘内在的因果机制。

- 我们强调了理想因果因子应具备的三个属性,并提出了一种 CIRL 算法来学习能够模仿因果因素的因果表征,这种因果表征具有更好的泛化能力。

- 在几个广泛使用的数据集上进行的大量实验和分析结果证明了我们方法的有效性和优越性。

Figure 2. Illustration of the three properties of causal factors.

2. Related Work

Domain Generalization (DG) aims to extract knowledge from multiple source domains that are wellgeneralizable to unseen target domains. A promising and prevalent solution is to align the distribution of domains by learning domain-invariant representation via kernelbased optimization [11, 41], adversarial learning [28, 31, 40], second-order correlation [49] or using Variational Bayes [72]. Data augmentation is also an important technique to empower the model with generalization ability by enriching source diversity. Several researches have been explored in previous works: [62] perturbs images according to adversarial gradients induced by domain discriminator. [73, 78] mix the styles of training instances across domains by mixing feature statistics [78] or amplitude spectrums [73]. [76] generates more training synthetic data by maximizing a divergence measure. Another popular way that has been investigated is meta-learning, which simulates domain shift by dividing meta-train and meta-test domains from the original source domains [2, 9, 26, 32]. Other DG works also explore low-rank decomposition [53], secondary task as solving jigsaw puzzles [5] and gradient-guided dropout [18]. Different from all the methods above, we tackle DG problem from a causal viewpoint. Our method focuses on excavating intrinsic causal mechanisms by learning causal representations, which has shown better generalization ability.领域泛化(Domain Generalization,DG)旨在从多个源领域中提取可很好泛化到未见目标领域的知识。一种有前景且普遍的解决方案是通过基于核的优化[11, 41]、对抗学习[28, 31, 40]、二阶相关[49]或使用变分贝叶斯[72]来学习域不变表示,从而对齐域的分布。数据增强也是一项重要技术,可通过丰富来源多样性来增强模型的泛化能力。在之前的工作中,已经进行了多项研究: [62]根据领域判别器诱导的对抗梯度对图像进行扰动。[73,78] 通过混合特征统计[78] 或振幅频谱[73] 来混合不同领域的训练实例风格。[76]通过最大化发散度量生成更多的训练合成数据。另一种流行的研究方法是元学习,它通过从原始源域中划分元训练域和元测试域来模拟域转移 [2,9,26,32]。其他 DG 作品还探索了低阶分解 [53]、解决拼图游戏的次要任务 [5] 和gradient-guided dropout [18]。与上述方法不同,我们从因果关系的角度来解决 DG 问题。我们的方法侧重于通过学习因果表征来挖掘内在的因果机制,这种方法已经显示出了更好的泛化能力
Causal Mechanism [19, 47, 50] focuses on the fact that statistical dependence ("seeing people take medicine suggests that he is sick") cannot reliably predict the outcome of a counterfactual input ("stopping taking medicine does not make him healthy"). Generally, it can be viewed as components of reasoning chains [24] that provide predictions for situations that are very far from the observed distribution. In that sense, excavating causal mechanisms means acquiring robust knowledge that holds beyond the support of observed data distributions [59]. The connection between causality and generalization has gained increasing interest in the past few years [39, 51]. Many causality-based methods have been proposed to gain invariant causal mechanisms [16, 65, 71] or recover causal features [6, 13, 33, 55] and hence improve OOD generalization. It is worth noting that they generally rely on restrictive assumptions on the causal diagram or structural equations. Very recently, MatchDG [38] introduces causality into DG literature by enforcing the inputs across domains have the same representation via contrastive learning if they are derived from the same object. Our CIRL is related to MatchDG in its efforts to learn causal representations. However, CIRL differs in the fact that it is done explicitly with exploiting dimension-wise representations to mimic causal factors based on a much theoretical formulation and only relies on a more general causal structural model without restrictive assumptions. Essentially, CIRL can be seen as causal factorization with intervention, which is clearly different from the object conditional MatchDG.因果机制[19, 47, 50]关注的是这样一个事实,即统计依赖性("看到别人吃药说明他生病了")无法可靠地预测反事实输入("停止吃药并不会使他健康")的结果。一般来说,它可被视为推理链的组成部分[24],可对与观察到的分布相去甚远的情况做出预测。从这个意义上说,挖掘因果机制意味着获取超越观察数据分布支持的鲁棒知识[59]。在过去几年中,因果关系泛化之间的联系越来越受到关注 [39, 51]。人们提出了许多基于因果关系的方法,以获得不变的因果机制 [16, 65, 71] 或恢复因果特征 [6, 13, 33, 55],从而提高 OOD 的泛化能力。值得注意的是,这些方法一般都依赖于对因果图或结构方程的限制性假设。最近,MatchDG [38] 将因果关系引入了 DG 文献,它通过对比学习强制跨域输入具有相同的表示(如果它们来自同一对象)。我们的 CIRL 在学习因果表征方面与 MatchDG 有关。然而,CIRL 的不同之处在于,它是在一个理论性更强的表述基础上,明确利用维度表征来模仿因果因素,并且只依赖于一个更一般的因果结构模型,而没有限制性假设。从本质上讲,CIRL 可以看作是带有干预的因果因式分解,这与对象条件的 MatchDG 显然不同。

3. Method

In this section, we consider DG from the causal view with a general structural causal model as Fig. 1 shows. We demonstrate that the intrinsic causal mechanisms (formalized as conditional distributions) can be feasible to construct if the causal factors are given. However, as discussed in [1], it is hard to recover the causal factors exactly since they are unobservable. Therefore, we propose to learn causal representations based on the properties of causal factors as a mimic, while inheriting the superior generalization ability.在本节中,我们将从因果关系的角度来考虑 DG,并使用图 1 所示的一般结构因果模型。我们证明,如果因果因素是给定的,那么构建内在因果机制(形式化为条件分布)是可行的。然而,正如文献[1]所讨论的,由于因果因素是不可观测的,因此很难准确地恢复因果因素。因此,我们建议根据因果因素的属性来学习因果表征,作为一种模仿,同时继承其优越的泛化能力。

3.1. DG from the Causal View

The mainstream of DG focuses on modeling the statistical dependence between observed inputs and corresponding labels, i.e., P (X, Y ), which is assumed variant across domains. To obtain an invariant dependence, they generally enforce the distribution to be domain-invariant marginally or conditionally, i.e., minimizing the gap across domains in P (X) or P (X ∣ Y ). However, since the statistical dependence cannot explain the intrinsic causal mechanism between inputs and labels, it tends to vary with domain. Therefore, the learned invariant dependence among source domains may still fail on an unseen target domain. Meanwhile, causal mechanisms usually keep stable across domains [51]. We first articulated the connection between causality and statistical dependence as Reichenbach [54] claimed in Principle 1.DG 的主流侧重于对观察到的输入和相应标签之间的统计依赖性(即 P (X, Y))建模,并假定这种依赖性在不同领域之间存在差异。为了获得不变的依赖性,他们一般会强制要求分布在边际上或条件上是域不变的,即尽量减小 P (X) 或 P (X ∣ Y) 的跨域差距。然而,由于统计依赖性无法解释输入和标签之间的内在因果机制,它往往会随领域而变化。因此,学习到的源域之间的不变依赖性在未见过的目标域上仍可能失效。与此同时,因果机制通常会在不同领域之间保持稳定[51]。正如 Reichenbach [54] 在原则 1 中所说的那样,我们首先阐明了因果关系和统计依赖性之间的联系。
Principle 1 ( [54]). Common Cause Principle: if two observables X and Y are statistically dependent, then there exists a variable S that causally influences both and explains all the dependence in the sense of making them independent when conditioned on S.
Based on Principle 1, we formalize the following structural causal model (SCM) to describe the DG problem:

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值