Transfer Learning从入门到放弃（二）

最新推荐文章于 2023-10-25 10:52:58 发布

whtt523

最新推荐文章于 2023-10-25 10:52:58 发布

阅读量659

点赞数

本文链接：https://blog.csdn.net/whtt523/article/details/104349097

版权

本文深入研究了远域迁移学习（DDTL）问题，针对源域和目标域之间无直接关联的情况。提出了选择性学习算法（SLA），通过中间域的有选择性学习来减小分布差距。SLA利用自编码器模型，逐步从中间域选择无标签数据以改善目标域的分类性能。实验证明，SLA相比无迁移方法在某些任务上提高了17%的精度。

摘要由CSDN通过智能技术生成

Distant Domain Transfer Learning

前面第一篇Transfer Learning的论文是其在医学影像上的应用，这一篇主要是提出了远域迁移学习的概念，并从其结构上探讨远域迁移学习的学习过程。

原文

In this paper, we study a novel transfer learning problem termed Distant Domain Transfer Learning (DDTL). Different from existing transfer learning problems which assume that there is a close relation between the source domain and the target domain, in the DDTL problem, the target domain can be totally different from the source domain. For example, the source domain classifies face images but the target domain distinguishes plane images. Inspired by the cognitive process of human where two seemingly unrelated concepts can be connected by learning intermediate concepts gradually, we propose a Selective Learning Algorithm (SLA) to solve the DDTL problem with supervised autoencoder or supervised convolutional autoencoder as a base model for handling different types of inputs. Intuitively, the SLA algorithm selects usefully unlabeled data gradually from intermediate domains as a bridge to break the large distribution gap for transferring knowledge between two distant domains. Empirical studies on image classification problems demonstrate the effectiveness of the proposed algorithm, and on some tasks the improvement in terms of the classification accuracy is up to 17% over “non-transfer” methods.

译文

在这篇论文中，我们研究了一种被称为远域迁移学习（DDTL）的新颖的迁移学习问题。现行的迁移学习问题假设源域和目标域之间有密切的联系。不同于现行的迁移学习问题的是，在远域迁移学习问题中，目标域和源域可以完全不同。例如，源域是对人脸图像的分类而目标域却是对飞机图像的辨别。人类可以通过学习中间概念将两个看起来不相干的概念逐渐联系起来，受人类这一认知过程的启发，我们提出了一种基于有监督自编码器或有监督卷积自编码器作为处理不同类型输入的基础模型的选择性学习算法（SLA），来解决远域迁移学习问题。直观上，SLA算法逐渐从中间域中挑选有用的无标签数据作为桥梁来打破两个远域之间有关迁移知识的巨大分布差距。在图像分类问题上的实践研究证实了所提出算法的有效性，并且在某些任务上，相较于“无迁移”的方法，其分类精度提升了17%。

笔记

本文是针对于目标域和源域的相似度不高的所谓远域情况下提出的新的远域迁移学习算法——SLA算法，并证实了SLA算法的能力和精度。SLA算法是基于自编码器或卷积自编码器实现的，用于解决远域迁移学习问题。

Introduction

Transfer Learning, which borrows knowledge from a source domain to enhance the learning ability in a target domain, has received much attention recently and has been demonstrated to be effective in many applications. An essential requirement for successful knowledge transfer is that the source domain and the target domain should be closely related. This relation can be in the form of related instances, features or models, and measured by the KL-divergence or A-distnace (Blitzer et al. 2008). For two distant domains where no direct relation can be found, transferring knowledge between them forcibly will not work. In the worst case, it could lead to even worse performance than ‘non-transfer’ algorithms in the target domain, which is the ‘negative transfer’ phenomena(Rosenstein et al. 2005; Pan and Yang 2010). For example, online photo sharing communities, such as Flickr and Qzone, generate vast amount of images as well as their tags. However, due to the diverse interests of users, the tag distribution is often long-tailed, which can be verified by our analysis in Figure 1 on the tag distribution of the uploaded images at Qzone from January to April in 2016. For the tags in the head part, we can build accurate learners as there are plenty of labeled data but in the tail part, due to the scarce labeled data, the learner for each tag usually has no satisfactory performance. In this case, we can adopt transfer learning algorithms to build accurate classifiers for tags in the tail part by reusing knowledge in the head part. When the tag in the tail part is related to that in the head part, this strategy usually works very well. For example, as shown in Figure 1, we can build an accurate tiger classifier by transferring knowledge from cat images when we have few labeled tiger images, where the performance improvement is as large as 24% compared to some supervised learning algorithm learned from labeled tiger images only. However, if the two tags (e.g., face and airplane images) are totally unrelated from our perspective, existing transfer learning algorithms such as (Patel et al. 2015) fail as shown in Figure 1. One reason for the failure of existing transfer learning algorithms is that the two domains, face and airplane, do not share any common characteristic in shape or other aspects, and hence they are conceptually distant, which violates the assumption of existing transfer learning works that the source domain and the target domain are closely related.

In this paper, we focus on transferring knowledge between two distant domains, which is referred to Distant Domain Transfer Learning (DDTL). The DDTL problem is critical as solving it can largely expand the application scope of transfer learning and help reuse as much previous knowledge as possible. Nonetheless, this is a difficult problem as the distribution gap between the source domain and the target domain is large. The motivation behind our solution to solve the DDTL problem is inspired by human’s ‘transitivity’ learning and inference ability (Bryant and Trabasso 1971). That is, people transfer knowledge between two seemingly unrelated concepts via one or more intermediate concepts as a bridge.

Along this line, there are several works aiming to solve the DDTL problem. For instance, Tan et al. (2015) introduce annotated images to bridge the knowledge transfer between text data in the source domain and image data in the target domain, and Xie et al. (2016) predict the poverty based on the daytime satellite imagery by transferring knowledge from an object classification task with the help of some nighttime light intensity information as an intermediate bridge. Those studies assume that there is only one intermediate domain and that all the data in the intermediate domain are helpful. However, in some cases the distant domains can only be related via multiple intermediate domains. Exploiting only one intermediate domain is not enough to help transfer knowledge across long-distant domains. Moreover, given multiple intermediate domains, it is highly possible that only a subset of data from each intermediate domain is useful for the target domain, and hence we need an automatic selection mechanism to determine the subsets.

In this paper, to solve the DDTL problem in a better way, we aim to transfer knowledge between distant domains by gradually selecting multiple subsets of instances from a mixture of intermediate domains as a bridge. We use the reconstruction error as a measure of distance between two domains. That is, if the data reconstruction error on some data points in the source domain is small based on a model trained on the target domain, then we consider that these data points in the source domain are helpful for the target domain. Based on this measure, we propose a Selective Learning Algorithm (SLA) for the DDTL problem, which simultaneously selects useful instances from the source and intermediate domains, learns high-level representations for selected data, and trains a classifier for the target domain. The learning process of SLA is an iterative procedure that selectively adds new data points from intermediate domains and removes unhelpful data in the source domain to revise the source-specific model changing towards a target-specific model step by step until some stopping criterion is satisfied.

The contributions of this paper are three-fold. Firstly, to our best knowledge, this is the first work that studies the DDTL problem by using a mixture of intermediate domains. Secondly, we propose an SLA algorithm for DDTL. Thirdly, we conduct extensive experiments on several real-world datasets to demonstrate the effectiveness of the proposed algorithm.

译文

迁移学习借用了源域的知识来提高目标域的学习能力，近年来受到了广泛的关注，在很多领域得到了应用。成功的的知识迁移的一个基本要求是源域和目标域要紧密相联。这种联系可以以相关样本、特征或模型的形式呈现，且可以用KL散度和A距离（A-distance）来度量。对于两个找不到直接关联的远域，强行在它们之间进行知识迁移是行不通的。最坏的情况下，这样做甚至会导致比在目标域使用“无迁移”算法更坏的性能，也就是“负迁移”现象。例如，在线照片共享社区（如Flickr和Qzone）会生成大量的图片及其标签。然而，由于用户兴趣的多样性，标签的分布往往是拖着个长尾巴，我们在图1中对2016年1 - 4月Qzone上传图片的标签分布进行了分析，验证了这一点。对于头部的标签，由于有大量的数据我们可以建立准确的学习器，但是在尾部，由于缺乏标签数据，每个标签的学习器通常没有令人满意的表现。在这种情况下，我们可以采用迁移学习算法，通过重用头部的知识来搭建尾部标签的精确分类器。如若尾部的标签和头部的标签相近，这一算法通常很有效。例如，正如图1所示，当我们缺少老虎图像的标签数据时，我们可以利用迁移从猫的图像学到的知识来建立一个准确的老虎的分类器。这样建立的算法与仅对标记的老虎图像的有监督学习算法相比，性能改进高达24%。然而，如果从我们的角度来看两个标签（如人脸和飞机图像）完全不相关，现有的迁移学习算法如论文(Patel et al. 2015)就会迁移失败，如图1所示。现有的迁移学习算法失败的一个原因是，脸和飞机这两个领域在形状或其他方面不享有任何共性，因此他们就属于概念上的远域，这违反了现有迁移学习在源域和目标域密切相关时才会起作用的假设。

在这篇论文中，我们关注的是在两个远域之间进行知识迁移，这被称为远域迁移学习(DDTL)。远域迁移学习问题至关重要，因为解决它可以极大地扩展迁移学习的应用范围，并帮助尽可能多地重用以前的知识。尽管如此，远域迁移学习仍然是一个难题，因为源域和目标域之间的分布差距很大。我们解决DDTL问题的灵感来自于人类的“传递性”学习和推理能力。也就是说，人们通过一个或多个中间概念作为桥梁，在两个看似不相关的概念之间进行知识迁移。

沿着这条思路，有不少工作旨在解决DDTL问题。例如，Tan et al. (2015)引入注释图像来搭建源域中文本数据和目标域中图像数据之间知识迁移的桥梁，而Xie et al. (2016)基于日间卫星图像，在以一些夜间光强度信息作为中间桥梁的帮助下，利用来自对象分类任务的知识迁移，从而预测贫困程度。这些研究只假设有一个中间域，且中间域的所有数据都是有用的。然而，在一些情况下，远域只能通过多个中间域进行关联。仅利用一个中间域不足以帮助在远域之间进行知识迁移。此外，在给定多个中间域的情况下，很有可能只有来自每个中间域的数据子集对目标域有用，因此我们需要一种自动选择机制来确定子集。

在本文中，为了更好地解决DDTL问题，我们的目标是通过从混合的中间域中逐步选择多个样本子集作为桥梁，在远域之间进行知识迁移。我们使用重构误差作为两个域之间距离的度量。也就是说，如果基于在目标域上训练的模型，源域上某些数据点的数据重构误差较小，则我们认为这些源域上的数据点对目标域是有帮助的。在此基础上，我们提出了一种针对DDTL问题的选择性学习算法(SLA)，该算法同时从源域和中间域选择有用的样本，学习所选数据的高级表示，训练目标域的分类器。SLA的学习过程是一个迭代的过程，它有选择地从中间域中添加新的数据点，并删除源域中无用的数据，逐步将特定于源域的模型修改为特定于目标域的模型，直到满足某种停止条件。

本文的贡献有三方面。首先，据我们所知，这是第一次使用中间域的混合来研究DDTL问题。其次，我们提出了一个针对DDTL问题的SLA算法。第三，我们在几个真实的数据集上进行了大量的实验来证明所提算法的有效性。

笔记

这一部分主要是对全文工作的一个介绍，首先谈到了迁移学习的成功。然后介绍了源领域(Source Domain)、目标领域(Target Domain)、负迁移(Negative Transfer)，以及衡量源域和目标域之间相似性的参数：KL散度(Kullback-Leibler divergence)和A-distance。之后借用从人脸向飞机图像的失败迁移引出了远域迁移学习(Distant Domain Transfer Learning, DDTL)的概念。在介绍了一些前辈对DDTL问题的研究后，作者提出了本文所研究的选择性学习算法(Selective Learning Algorithm, SLA)并介绍了其学习过程。最后，简述了本文的三个主要工作（就不列了）。

源领域
有知识、有大量数据标注的领域，是要迁移的对象
目标领域
最终要赋予知识、赋予标注的对象
负迁移
两个领域之间基本不相似，这样知识迁移就会产生负效果
KL散度
$D_{KL} (P||Q) = \sum_{i=1} P(x) log\frac{P(x)}{Q(x)}$
这是一个非对称距离： $D_{KL} (P||Q) \neq D_{KL} (Q||P)$
A-distance
A-distance可以用来估计不同分布之间的差异性。它被定义为建立一个线性分类器来区分两个数据领域的hinge损失。它的计算方式是，我们首先在源域和目标域上训练一个二分类器 $h$ ，使得这个分类器可以区分样本是来自于哪一个领域。用 $e r r (h)$ 来表示分类器的损失，则A-distance定义为： $A(D_s,D_t)=2(1-2err(h))$
传递迁移学习
有一个中间域
远域迁移学习
有多个中间域
选择性学习算法
该算法同时从源域和中间域选择有用的实例，学习所选数据的高级表示，训练目标域的分类器。SLA的学习过程是一个迭代的过程，它有选择地从中间域中添加新的数据点，并删除源域中无用的数据，逐步将特定于源域的模型修改为特定于目标域的模型，直到满足某种停止条件。

Figure 1

猫虎迁移学习、人机迁移学习

介绍

上传在Qzone的图片的标签分布。在第一个任务中，我们在猫和老虎的图像之间进行知识迁移。迁移学习算法比有监督学习算法有更好的性能。在第二个任务中，我们在人脸和飞机图像之间进行知识迁移。转移学习算法由于性能比有监督学习算法差而失败。然而，在应用我们提出的SLA算法时，我们发现我们的模型获得了更好的性能。

笔记

Figure 1主要利用qq空间里获得的图像进行迁移学习，在近类（猫和虎）的迁移中表现较好，而在远域（人和飞机）的迁移中，只有使用SLA算法的迁移学习才能起到迁移作用。

Related work

原文

Typical transfer learning algorithms include instance weighting approaches (Dai et al. 2007) which select relevant data from the source domain to help the learning in the target domain, feature mapping approaches (Pan et al. 2011) which transform the data in both source and target domains into a common feature space where data from the two domains follow similar distributions, and model adaptation approach (Aytar and Zisserman 2011) which adapt the model trained in the source domain to the target domain. However, these approaches cannot handle the DDTL problem as they assume that the source domain and the target domain are conceptually close. Recent studies (Yosinski et al. 2014; Oquab et al. 2014; Long et al. 2015) reveal that deep neural networks can learn transferable features for a target domain from a source domain but they still assume that the target domain is closely related to the source domain.

The transitive transfer learning (TTL) (Tan et al. 2015; Xie et al. 2016) also learns from the target domain with the help of a source domain and an intermediate domain. In TTL, there is only one intermediate domain, which is selected by users manually, and all intermediate domain data are used. Different from TTL, our work automatically selects subsets from a mixture of multiple intermediate domains as a bridge across the source domain and the target domain. Transfer Learning with Multiple Source Domains (TLMS) (Mansour, Mohri, and Rostamizadeh 2009; Tan et al. 2013) leverages multiple source domains to help learning in the target domain, and aims to combine knowledge simultaneously transferred from all the source domains. The different between TLMS and our work is two-fold. First, all the source domains in TLMS have sufficient labeled data. Second, all the source domains in TLMS are close to the target domain.

Self-Taught Learning (STL) (Raina et al. 2007) aims to build a target classifier with limited target-domain labeled data by utilizing a large amount of unlabeled data from different domains to learn a universal feature representation. The difference between STL and our work is two-fold. First, there is no a so-called source domain in STL. Second, STL aims to use all the unlabeled data from different domains to help learning in the target domain while our work aims to identify useful subsets of instances from the intermediate domains to bridge the source domain and the target domain.

Semi-supervised autoencoder (SSA) (Weston, Ratle, and Collobert 2008; Socher et al. 2011) also aims to minimize both the reconstruction error and the training loss while learning a feature representation. However, our work is different from SSA in three-fold. First, in SSA, both unlabeled and labeled data are from the same domain, while in our work, labeled data are from either the source domain or the target domain and unlabeled data are from a mixture of intermediate domains, whose distributions can be very different to each other. Second, SSA uses all the labeled and unlabeled data for learning, while our work selectively chooses some unlabeled data from the intermediate domains and removes some labeled data from the source domain for assisting the learning in the target domain. Third, SSA does not have convolutional layer(s), while our work uses convolutional filters if the input is a matrix or tensor.

译文

典型的迁移学习算法包括基于样本的方法，基于特征的方法和基于模型的方法。基于样本的迁移方法从源域选择相关数据，以帮助目标域的学习；基于特征的迁移方法将源域和目标域中的数据变换到统一特征空间，在这个空间中来自这两个域的数据遵循类似的分布；基于模型的迁移方法使在源域训练的模型适应于目标域。然而，这些方法不能处理DDTL问题，因为他们是在源域和目标域在概念上相近的假设前提下进行的。最近的研究 (Yosinski et al. 2014; Oquab et al. 2014; Long et al. 2015)表明深度神经网络可以从源域中学习目标域中的可迁移特征，但他们的工作仍是基于目标域和源域密切相关的假设。

传递迁移学习(TTL)也是利用一个源域和一个中间域的帮助从目标域学习。TTL中只有一个中间域。该中间域是人为选取的，且该域中所有的数据都有用。与TTL不同，我们的工作是从多个中间域的混合中自动选择子集，作为连接源域和目标域的桥梁。多源域迁移学习(TLMS)利用多个源域来帮助目标域中的学习，并旨在同时结合所有源域迁移的知识。我们的工作和TLMS有两点不同。第一，TLMS中的所有源域都有足够的带标签数据。第二，TLMS中的所有源域都和目标域相关。

自学习(STL)的目标是利用大量不同领域的未标记数据，学习一种通用的特征表示，以建立一个目标域有标记数据有限的分类器。我们的工作和STL有两点不同。第一，STL里没有所谓的源域。第二，STL旨在使用来自不同域的所有未标记的数据来帮助在目标域中进行学习，而我们的工作旨在识别来自中间域的有用的样本子集来连接源域和目标域。

半监督自编码器(SSA)亦旨在于学习特征表示时最小化重构误差和训练损失。然而，我们与SSA的不同有三点。首先，在SSA中，无论缺乏标签的数据还是有标签的数据都来自于同一个领域；而在我们的工作中，有标签的数据来自于源域或目标域，未带标记的数据来自中间域的混合，而这些中间域的分布可能大相径庭。其次，SSA使用所有带标记的和未标记的数据进行学习，而我们的工作是从中间域中选择性地选择一些未标记的数据，并从源域中删除一些标记的数据来辅助目标域中的学习。第三，SSA没有卷积层，而我们的工作在输入为矩阵或张量时使用卷积滤波器。

笔记

这一部分作者介绍了传统的一些迁移学习算法的原理，并比较了TTL、TLMS、STL、SSA与SLA算法的异同。

Problem definition

原文

We denote by $S=\{(x_S^1, y_S^1), ···, (x_S^{n_S}, y_S^{n_S})\}$ the source domain labeled data of size $n_S$ , which is assumed to be sufficient enough to train an accurate classifier for the source domain, and by $T=\{(x_T^1, y_T^1), ···, (x_T^{n_T}, y_T^{n_T})\}$ the target domain labeled data of size $n_T$ , which is assumed to be too insufficient to learn an accurate classifier for the target domain. Moreover, we denote by $I=\{x_I^1, ···,x_I^{n_I}\}$ the mixture of unlabeled data of multiple intermediate domains, where $n_I$ is assumed to be large enough. In this work, a domain corresponds to a concept or class for a specific classification problem, such as face or airplane recognition from images. Without loss of generality, we suppose the classification problems in the source domain and the target domain are both binary. All data points are supposed to lie in the same feature space. Let $p_S(x)$ , $p_S(y|x)$ , $p_S(x, y)$ be the marginal, conditional and joint distributions of the source domain data, respectively, $p_T (x)$ , $p_T (y|x)$ , $p_T (x, y)$ be the parallel definitions for the target domain, and $p_I (x)$ be the marginal distribution for the intermediate domains. In a DDTL problem, we have $p_T (x) \neq p_S(x)$ , $p_T (x) \neq p_I (x)$ , and $p_T (y|x) \neq p_S(y|x)$ . The goal of DDTL is to exploit the unlabeled data in the intermediate domains to build a bridge between the source and target domains, which are originally distant to each other, and train an accurate classifier for the target domain by transferring supervised knowledge from the source domain with the help of the bridge. Note that not all the data in the intermediate domains are supposed to be similar to the source domain data, and some of them may be quite different. Therefore, simply using all the intermediate data to build the bridge may fail to work.

译文

我们用 $S=\{(x_S^1, y_S^1), ···, (x_S^{n_S}, y_S^{n_S})\}$ 表示源域中大小为 $n_S$ 的带标签数据集，且我们假定用这个数据集为源域训练一个精确的分类器是足够的；同理用 $T=\{(x_T^1, y_T^1), ···, (x_T^{n_T}, y_T^{n_T})\}$ 表示目标域中大小为 $n_T$ 的带标签数据集，且假定用它为目标域来训练一个精确的分类器是绝对不够的。此外，我们用 $I=\{x_I^1, ···,x_I^{n_I}\}$ 表示多个中间域的未标记数据的混合，且假定 $n_I$ 足够大。在这项工作中，一个领域对应于一个特定分类问题的概念或类别，例如从图像中识别人脸或飞机。在不失一般性的前提下，我们假设源域和目标域的分类问题都是二进制的。所有的数据点都应该位于相同的特征空间中。设 $p_S \left(x\right)$ , $p_S(y|x)$ , $p_S(x, y)$ 依次为源域数据的边际分布、条件分布和联合分布， $p_T \left(x\right)$ , $p_T (y|x)$ , $p_T (x, y)$ 是在目标域上的并行定义， $p_I \left(x\right)$ 为中间域的边际分布。在一个远域迁移学习问题中，我们有 $p_T \left(x\right) \neq p_S \left(x\right)$ , $p_T \left(x\right) \neq p_I \left(x\right)$ , 和 $p_T \left(y|x\right) \neq p_S(y|x)$ . 远域迁移学习的目标是利用中间域的未标记数据建立一个源域和目标域之间最初遥遥相望的桥梁，并在该桥的帮助下，通过从源域转移监督知识来训练目标域的精确分类器。注意，并不是所有中间域中的数据都应该与源域数据相似，其中一些可能会有很大差异。因此，简单地使用所有中间数据来构建桥梁可能达不到效果。

笔记

该部分介绍了本文中关于DDLT的一些常用记法，包括源域、中间域、目标域的表示，以及两域之间应满足的概率关系。值得一提的是，本文中的源域、中间域、目标域选取基于两个前提假设：

源域和目标域的分类问题是二进制的
所有的数据，无论是隶属于哪个域，都应分布于相同的特征空间。

The Selective Learning Algorithm

原文

In this section, we present the proposed SLA.

译文

在本节中，我们将介绍提到过的SLA.

Auto-Encoders and Its Variant

原文

As a basis component in our proposed method to solve the DDTL problem is the autoencoder (Bengio 2009) and its variant, we first review them. An autoencoder is an unsupervised feed-forward neural network with an input layer, one or more hidden layers, and an output layer. It usually includes two processes: encoding and decoding. Given an input $\in R_q$ , an autoencoder first encodes it through an encoding function $f_e\left( \cdot \right)$ to map it to a hidden representation, and then decodes it through a decoding function $f_d\left( \cdot \right)$ to reconstruct $x$ . The process of the autoencoder can be summarized as encoding: $h=f_e\left(x\right)$ , and decoding: $\hat{x}=f_d\left(h\right)$ , where $\hat x$ is the reconstructed input to approximate $x$ . The learning of the pair of encoding and decoding functions, $f_e\left( \cdot \right)$ and $f_d\left(\cdot\right)$ , is done by minimizing the reconstruction error over all training data, i.e., $\mathop {\min}\limits_{f_e, f_d} {\sum_{i=1}^n {\left \| {\hat x}_i - x_i \right \|}^2_2}$