[cvpr2017]Joint Geometrical and Statistical Alignment for Visual Domain Adaptation

最新推荐文章于 2022-08-04 20:45:17 发布

MataFela

最新推荐文章于 2022-08-04 20:45:17 发布

阅读量2.8k

点赞数 1

分类专栏： domain adaptation 文章标签：机器学习

domain adaptation 专栏收录该内容

10 篇文章 4 订阅

订阅专栏

线性判别分析
 线性判别分析LDA原理总结
三个散度矩阵：

LDA上的三个散度矩阵

introduction

作者提出了一个称为Joint Geometrical and Statistical Alignment (JGSA)的框架，用于减少source domain和target domain之间的统计量上和几何上的差异(shift)
具体来说，就是学习source domain和target domain上的两个（耦合的？）投影(coupled projections)，将source domain和target domain映射到一个低维空间当中，同时减少几何移位（geometrical shift）和分布移位（distribution shift）
本文仅仅讨论非监督情况
常见的域适应包括基于实例（instance-based）的适应、基于特征表示( feature representation）的适应、基于分类器（classifier-based )的适应，其中在非监督的情况下，因为没有target labels，所以基于分类器的适应是不可行的。
- 通常分布差异（distribution devergence）可以通过基于实例(instance-based)的适应，比如对source domain中的样本的权重重新加权
- 或者可以通过特征表示的方法（feature representation/transformation）的方式，将source domain和target domain的特征投影到第三个使得分布的偏差较小的domain当中。
- 基于实例（instance-based）的方法需要比较严格的假设：1）source domain和target domain的条件分布是相同的，2）source domain中的某些部分数据可以通过重新加权被重用于目标域中的学习。
- 基于特征表示（feature representation/transformation）的s适应的假设则相对来说更弱一点，仅仅假设存在一个使得source domain和target domain的分布相似的公共空间。
本文采用基于特征变换（feature representation/transformation）的方法
有两大类特征变换的方法：1)以数据为中心（data centric methods ）；2)以子空间为中心（subspace centric methods）
- 以数据为中心的方法（data centric methods ）寻求一个统一的转换，将数据从source domain和target domain投影到域不变空间(domain invariant space)当中，以求减少source domain和target domain上数据的分布差异(distributional divergence)，并且同时保留原始空间当中的数据属性
- 以数据为中心的方法（data centric methods ）仅仅利用两个域中的共同特征（shared feature），然而当source domain和target domain的差异很大(have large discrepancy)的时候会导致失败，因为使得source domain和target domain分布一致的公共空间可能会不存在。
- 以子空间为中心的方法(subspace centric
  methods)则是通过操纵两个域的子空间(比如建立线性映射，或者使用类似grassmann 这样的流形来进行映射)来减少域位移(domain shift)，使得每个域的子空间都有助于最终映射的形成。
- 作者认为，以子空间为中心的方法仅在两个域的子空间上进行操作，而不明确地考虑两个域的投影数据之间的分布偏移。（However, the subspace centric methods only manipulate on the subspaces of the two domains without explicitly considering the distribution shift between projected data of two domains.）
作者在他的网络中学习两个耦合的投影（coupled projections），将source domain和target domain上的数据映射到相应的子空间。在映射之后：
- 最大化target domain上数据的方差以保留target domain上数据的特征
- 保留source domain上数据的判别信息（discriminative information）以使得有效地传送类别的信息
- 最小化投影后的source domain和target domain上数据的条件分布差异(conditional distribution divergences)，在统计上(statistically)减少域偏移(domain shift)
- 使得两个域的投影之间的差异较小（子空间之间的差异较小），在几何上（geometrically）减少域偏移（domain shift）。
- 1) the variance of target domain is maximized,
  2) the discriminative information of source domain is preserved,
  3) the divergence of source and target distributions
  is small, and 4) the divergence between source and target
  subspaces is small.

优势

与基于数据的方法不同，作者的方法不需要强大的假设:统一变换可以减少分布偏移，同时保留数据属性。
不同于基于子空间的适应方法，作者的方式不仅减少了子空间的几何移动（reduce the shift of subspace geometries）并且减少了两个域之间的分布偏移
作者认为自己的方法可以很容易的扩展到kernelized（核方法）来处理域之间的偏移是非线性的情况

Joint Geometrical and Statistical Alignment(JGSA)

定义（不假设有个统一的转换（unified transformation））：

Target Variance Maximization

为了避免把特征维度投影到无关的维度上，作者鼓励target domain上的方差在相关的子空间当中最大：
- scatter matrix-WIKI
- scatter matrix-翻译
- scatter matrix和协方差矩阵（其实差不多）
- $Tr( )$ ：迹，trace

Source Discriminative Information Preservation

作者使用source domain上的标签信息来限制使得source domain上数据具有判别性
- 最大化类间散度，最小化类内散度（就是聚类。。。。。。）

Distribution Divergence Minimization

（注意，以下使用的是映射以后的数据）

使用MMD计算source domain和target domain的分布之间的差异。
已经有人使用提出利用由source domain的分类器预测的target domain的伪标签（pseudo labels）来表示目标域中的类条件数据分布。然后迭代地改进target domain的伪标签，以进一步减少两个域之间条件分布的差异。
通过合并边缘和条件分布差异最小化项，来使得分布散度最小化：

Subspace Divergence Minimization

作者还希望能够减少source domain和target domain的子空间（用于映射的那个）之间的差异（就是上文的那个 A 和B）:
- 其中 $|| \frac{}{} ||^2_F$ 是Frobenius范数

Overall Objective Function

总的目标函数：
作者希望更进一步地，希望target domain的映射 $B$ 的数量更小（在我看来这个是方差更小）:

Optimization

目标函数和W的大小（放缩 W 不会影响目标函数）是无关的所以把上述的目标函数重写为：
- $W$ 可以通过使用广义特征值分解得出。
- 算法：
- Kernelization Analysis
  - 作者的JGSA方法可被拓展到使用在再生希尔伯特空间(a Reproducing Kernel Hilbert Space (RKHS))上的非线性映射。
  - 表示定理Representer Theorem-WIKI
  - 核方法
  - 使用核方法把 $X$ 换为 $\Phi(X)$
  experiment