笔记：Domain Adaptation on the Statistical Manifold

最新推荐文章于 2023-04-01 18:44:57 发布

置顶 Yvonne_fan

最新推荐文章于 2023-04-01 18:44:57 发布

阅读量1.9k

点赞数

分类专栏：域适应文章标签：优化算法域适应流形学习

本文链接：https://blog.csdn.net/choubaguaihailan/article/details/71155917

版权

域适应专栏收录该内容

2 篇文章 0 订阅

订阅专栏

在统计流形上的域适应

Domain Adaptation on the Statistical Manifold

Yvonne-fan
- Introduction
- Algorithm
- Optimization
- Evaluation

Introduction

首先我将介绍一下这篇文章的主要背景知识介绍，分为两部分，分别为域适应，非监督域适应。

名词解释

样本：能够用一些特征表示的事物。
标签：能够绝对分类样本的属性
域：代表一些样本的集合。
标签样本：有标签的样本。
未标签样本：无标签的样本
源域：有一定数量的标签样本，能够迁移一定的信息到目标域当中。
目标域：希望解决问题的样本域，标签样本数量较少。

Domain Adaptation 域适应

上图介绍了域适应与传统机器学习的区别。
传统的机器学习知识将来自不同域的样本分别学习，之间不会产生任何关系。但是域适应在目标域当中没有足够多甚是没有标签样本时，会从一些与这个目标域类似的源域当中来学习一些知识帮助目标域的操作，就是一种域的适应。主要是从源域到目标域的知识迁移。

Unsupervised Domain Adaptation 非监督域适应

就是说目标域当中没有标签样本。
To match the distributions of the source and target samples. Two different approaches have been proposed: sample re-weighting and subspace extraction. Sample re-weighting, or selection, assign weights to the source samples and optimize those weights so as to minimize a distance measure between the (re-weighted) source and target distributions.
解决这类问题一般可以想到是使得源域样本集和目标域样本集的数据分布更匹配。一般有两种方法：样本重赋权和子空间映射。
样本重赋权、或者是选择，给源域样本赋予权值，通过最小化源域样本（重赋权之后的）和目标域样本之间的分布差异来优化权值。
Subspace-based techniques try to find a linear transformation (or projection) of the source data, such that a distance measure between the (transformed) source and targetdistributions is minimized.
而子空间映射则是试图找到一个源域样本集的线性变化，从而使得源域样本（重赋权之后的）和目标域样本之间的分布差异最小。
A popular choice of distance between two distributions, and, to the best of our knowledge, the only one that has been used for domain adaptation, is the Maximum Mean Discrepancy (MMD).
一种较为受欢迎的分布差异计算方法是MMD。

接着介绍下MMD方法：

Maximum Mean Discrepancy (MMD)

在介绍算法之前呢，先介绍一个用来计算分布差异的方法，叫做最大平均差异（MMD），这篇文章中主要用这个方法来计算样本域之间的分布差异。

这里写图片描述
首先算法定义了一个函数集合f，其中，p，q分别表示分布函数，X,Y分别表示样本集合，定义
MMD和其经验估计如下：
MMD即为两个期望差的上确界，期望分别是两个不同分布的样本函数的期望。
经验估计则取了函数值的均值。
这里写图片描述
为了映射到希尔伯特空间，可以讲函数表示为内积（后期可以拓展到核函数，这里不多赘述）

映射到希尔伯特空间之后，可以推断期望差值的上确界可以近似为期望之间的范数，最终可以得到上面(4)式中的形式。

Algorithm

算法部分将首先介绍一下统计流形中的黑林格距离，

Hellinger Distance on Statistical Manifolds

MMD doesn’t consider Probability distribution lie on a Riemannian mainfold:
MMD没有考虑黎曼流行中的概率分布。

What is Riemannian mainfold?
given a non-empty set X and a family of probability density functions p(x |θ) parametrized by θ on X, the space M = {p(x| θ)| θ ∈ Rd} forms a Riemannian manifold.
给定一个非空集合X和一组概率密度函数p(x |θ)，则上面的M组成了一个黎曼流形。

In general, the parametrization of the PDFs of the data is unknown, and choosing a specific distribution may not reflect the reality. An important method that measure the similarity between probability distribution is the f-divergences, which can be expressed as:
通常情况下，PDFs 的参数是未知的，而选择一个特定的分布并不能反映真实情况。这时，一个计算概率分布相似性的方法f-差异就非常有效了，它的定义如下：

The (squared) Hellinger distance is a special case of f-divergences, obtained by taking $f(t)=(\sqrt(t)-1)^2$ .
（平方）黑林格距离是f-差异的一种特殊形式，通过一种特定的f（x）。
The (squared) Hellinger distance can thus be written as:
（平方）黑林格距离由此可以写为：
这里写图片描述

下面介绍本篇文章的主要算法

Empirical Estimate of the Hellinger Distance

从而进行黑林格距离的经验估计

这里写图片描述
从分布p中采样np个样本，从分布q中采样nq个样本，则式1可以近似写为：

其中 $T=f/(f+g)$
上图中p，q，f，g均表示分布函数。

So, we can learn that :
则可得：

To estimate the PDFs f(x) and g(x), let us begin with the following definitions:
为了估计就f和g的概率密度函数，首先我们进行如下定义

h is the bandwidth or smoothing parameter
其中，h是平滑参数
且

which in conjunction with the Gaussian kernel density estimate (KDE)
我们用高斯核密度估计可以得到f和g的估计形式。
这里写图片描述

Make use of kernel density estimation (KDE) with a Gaussian kernel to model the source and target distributions.
这里采用核密度估计方法来近似源域和目标域样本分布：
这里写图片描述
从，T(x)可表示为：

where k(·, ·) is the Gaussian kernel function.
上式中k(·, ·) 表示高斯核函数。

Domain Adaptation on Statistical Manifolds

这里介绍两种基于统计流形分布差异的无监督域适应的方法，分别是样本重赋权和子空间映射。

Statistically Invariant Sample Selection (SISS)

给源域样本赋予权值，通过最小化源域样本（重赋权之后的）和目标域样本之间的分布差异来优化权值。这里用黑林格距离来衡量分布差异。
用 $\alpha$ 是一个用来指示源域样本是否要被选择出做标准集， $\alpha$ 的取值范围为{0,1}从而标准集与目标域之间的分布尽量相似。
可以得到目标问题为：
这里写图片描述
Where yi,c is a binary variable indicating whether the ith source sample belongs to class c or not, and C is the total number of classes.
yi,c用来标记第i个样本是否属于第c个类，最后一个约束用来保证挑选之后的源域样本集中每个类的比例与之前保持一致。
这里写图片描述
在样本分布估计中，权重 $\alpha$ 也要被考虑进来，从而T(x)可表示为：
由于6式的二分类约束难解，将式6放松为下面的问题：

where β is a variable that replaces $αi/((\sumαi)$
β= $αi/((\sumαi)$
we make use of Matlab’ssolver fmincon to solve the nonlinear problem (7) and obtain the binary weights α by thresholding β。
用matlab的包fmincon来解决式7的非线性问题。

Statistically Invariant Embedding (SIE)

The goal is to find a representation of the data that is invariant across the source and target domains, seek to project the data to a low-dimensional latent space shared by both domains:
目标是找到一个在源域和目标域之间不变的数据表示，可以在两个域之间找一个地位的共享映射。
这里写图片描述
映射矩阵W需要是正交的，这样更好达到良好的降维。

Encode the class information
This can be achieved by encouraging clustering in the latent space of the source samples belonging to the same class.
加入编码分类信息，考虑将源域样本在低维空间的映射进行聚类。
这里写图片描述
u代表类c样本的平均。

Optimization of Statistically Invariant Embedding (SIE)

解决式子8和式子9的非线性，带约束的问题：
conjugate gradient (CG)
共轭梯度法进行优化。

(i) compute the gradient of the objective function on the manifold,
计算目标函数在流形上的梯度；
(ii) determine a search direction based on this gradient,
基于这个梯度决定寻找方向；
(iii) perform a line search along a geodesic on the manifold.
在等高线上进行线性搜寻。
Note that the gradient on the manifold is obtained from the usual gradient of the objective function with respect to W.
注意流形上的梯度是通过目标函数关于W地一般梯度得到的。
有了W，我们可以训练一个SVM分类器在映射后的源域样本中，再讲模型应用到目标域样本中。

Experiment

Visual Object Recognition

dataset

dataset	Visual Recognition
domain	Caltech,Amazon,DSLR,Webcam

图片样本集，分别包含四个域，用不同的域互换来做源域和目标域。

这里写图片描述
表1中显示了SISS和其他对比方法的比较，前两种是无适应的，后面LM是基本方法，和KMM方法，以及用黑林格距离代替MMD的方法KMM-LM。可以看到这篇文章中的算法的分类精度高于其他算法。
表2显示了SIE和SIE-CC与其他算法的比较，SIE-CC表示了式子9中的方法，可以看到这篇文章中的算法的分类精度高于其他算法。