Conditional Adversarial Domain Adaptation的理论支持

Reproducing Kernel Hilbert Space

Given a nonempty set X \mathcal{X} X , and H \mathcal{H} H is a Hilbert space of function f : X → R f :\mathcal{X} \to R f:XR Then H \mathcal{H} H called a Reproducing Kernel Hilbert Space endowed with the dot product < ⋅ , ⋅ > < \cdot , \cdot> <,>, if there exists a function k : X × X → R k : \mathcal{X} \times \mathcal{X} \to R k:X×XR with the following properties:

  1. < f ( ⋅ ) , k ( x , ⋅ ) > = f ( x ) <f(\cdot) , k(x,\cdot)>=f(x) <f(),k(x,)>=f(x) , in particular < k ( x , ⋅ ) , k ( x ′ , ⋅ ) > = k ( x , x ′ ) <k(x , \cdot) , k(x',\cdot)>=k(x,x') <k(x,),k(x,)>=k(x,x)
  2. k spans H \mathcal{H} H : H = s p a n { k ( x , ⋅ ) ∣ x ∈ X } = { f ( ⋅ ) = ∑ i = 1 m α i k ( x , ⋅ ) : m ∈ N , x i ∈ X , α i ∈ R } \mathcal{H}= span\{ k(x,\cdot) | x \in \mathcal{X} \}=\{ f(\cdot)= \sum_{i=1}^m \alpha_ik(x,\cdot) : m \in N , x_i \in \mathcal{X} , \alpha_i \in R\} H=span{k(x,)xX}={f()=i=1mαik(x,):mN,xiX,αiR}

Hilbert Space Embedding

X \mathcal{X} X : the domain of observations
P x \mathbf{P}_x Px : a probability measure on X \mathcal{X} X

Y \mathcal{Y} Y : the domain of observations
P y \mathbf{P}_y Py : a probability measure on Y \mathcal{Y} Y.
P x × y \mathbf{P}_{x\times y} Px×y : A joint probability measure on X × Y \mathcal{X \times Y} X×Y

reproducing kernel Hilbert space (RKHS) H \mathcal{H} H of functions on X \mathcal{X} X with kernel k ( x , x ′ ) : = < φ ( x ) , φ ( x ′ ) > k(x,x'):=<\varphi(x) , \varphi(x')> k(x,x):=<φ(x),φ(x)>.

mean map μ \mu μ :
μ [ P x ] : = E x [ k ( x , ⋅ ) ] \mu [\mathbf{P}_x]:= \mathbf{E}_x [k(x, \cdot)] μ[Px]:=Ex[k(x,)]
μ [ X ] : = 1 m ∑ i = 1 m k ( x i , ⋅ ) \mu [X] := \frac{1}{m} \sum_{i=1}^m k(x_i , \cdot) μ[X]:=m1i=1mk(xi,)

then μ [ P x ] \mu [\mathbf{P}_x] μ[Px] is an element of th Hilbert space ,so
< μ [ P x ] , f > = E x [ f ( x ) ] <\mu[\mathbf{P}_x] , f>= \mathbf{E}_x[f(x)] <μ[Px],f>=Ex[f(x)]
< μ [ X ] , f > = 1 m ∑ i = 1 m f ( x i ) <\mu[X] , f>= \frac1m\sum_{i=1}^m f(x_i) <μ[X],f>=m1i=1mf(xi)
where X = { x 1 , x 2 , ⋯   , x m } X=\{ x_1 ,x_2 , \cdots , x_m \} X={x1,x2,,xm} is assumed to be drawn independently and identically distributed from P x \mathbf{P}_x Px. and μ [ X ] \mu [X] μ[X] is an estimate of the mean map.
μ [ X ] = 1 m ∑ i = 1 m k ( x i , ⋅ ) = 1 m γ 1 m \mu [X] = \frac{1}{m} \sum_{i=1}^m k(x_i , \cdot)=\frac1m \mathbf{\gamma 1}_m μ[X]=m1i=1mk(xi,)=m1γ1m , where γ : = { k ( x 1 , ⋅ ) , k ( x 2 , ⋅ ) , ⋯   , k ( x m , ⋅ ) } \mathbf{\gamma} :=\{ k(x_1,\cdot) , k(x_2,\cdot) , \cdots , k(x_m,\cdot)\} γ:={k(x1,),k(x2,),,k(xm,)}

Covariance operators

Given A joint probability measure P x × y \mathbf{P}_{x\times y} Px×y on X × Y \mathcal{X \times Y} X×Y , the uncentered covariance operator C X Y \mathcal{C}_{XY} CXY (Baker, 1973) C X Y : = E X Y [ φ ( x ) ⊗ ϕ ( y ) ] \mathcal{C}_{XY} := \mathbb{E}_{XY}[\varphi(x) \otimes\phi(y)] CXY:=EXY[φ(x)ϕ(y)], where ⊗ \otimes denotes tensor product.

Given m m m pairs of i . i . d . i.i.d. i.i.d. observations { ( x l , y l ) } l = 1 m \{ (x^l , y^l) \}_{l=1}^m {(xl,yl)}l=1m , we denote by γ = ( φ ( x 1 ) , φ ( x 2 ) , ⋯   , φ ( x m ) ) \mathbf{\gamma} = (\varphi(x^1) ,\varphi(x^2), \cdots,\varphi(x^m) ) γ=(φ(x1),φ(x2),,φ(xm)) and Φ = ( ϕ ( y 1 ) , ϕ ( y 2 ) , ⋯   , ϕ ( y m ) ) \Phi =(\phi(y^1),\phi(y^2),\cdots,\phi(y^m)) Φ=(ϕ(y1),ϕ(y2),,ϕ(ym)). Conceptually, the covariance operator C X Y \mathcal{C}_{XY} CXY can then be estimated as C ^ X Y = 1 m γ Φ T \hat{\mathcal{C}}_{XY}= \frac1m\gamma\Phi^T C^XY=m1γΦT
Notes on mean embeddings and covariance
operators

tensor product :Notes on Tensor Products and the Exterior Algebra

Tensor Product Kernels: Characteristic Propertyand Universality
张量积计算
由泛函分析得
let ( X , < ⋅ , ⋅ > ) (X,<\cdot,\cdot>) (X,<,>) be an inner product space over a field F \mathbb{F} F. For each x ∈ X x \in X xX , define ∥ x ∥ : = < x , x > \Vert x\Vert := \sqrt {<x,x>} x:=<x,x> Then ∥ ⋅ ∥ \Vert \cdot \Vert defines a norm on X X X. That is , ( X , ∥ ⋅ ∥ ) (X,\Vert \cdot \Vert) (X,) is a normed linear space over F \mathbb{F} F.

问题:
原论文提到

Conditional embedding operators

By analogy with the embedding of marginal distributions, the conditional density P ( Y ∣ x ) \mathbb{P}(Y|x) P(Yx) an also be rep-resented as an RKHS element: μ [ Y ∣ x ] : = E Y ∣ x [ ϕ ( Y ) ] \mu[Y|x] := \mathbb{E}_{Y|x}[\phi(Y)] μ[Yx]:=EYx[ϕ(Y)]with each element corresponding toa particular value of x x x.
These conditional embeddings can be defined via a conditional embedding operator C Y ∣ X : F → G \mathcal{C}_{Y|X} : \mathcal{F} \to \mathcal{G} CYX:FG
μ [ Y ∣ x ] = C Y ∣ X φ ( x ) : = C Y X C X X − 1 φ ( x ) \mu[Y|x]=\mathcal{C}_{Y|X} \varphi(x):= \mathcal{C}_{YX}\mathcal{C}_{XX}^{-1}\varphi(x) μ[Yx]=CYXφ(x):=CYXCXX1φ(x)

Given m m m pairs of i . i . d . i.i.d. i.i.d. observations { ( x l , y l ) } l = 1 m \{ (x^l , y^l) \}_{l=1}^m {(xl,yl)}l=1m from P x × y \mathbb{P}_{x\times y} Px×y, the conditional embedding operator can be estimated as C ^ Y ∣ X = Φ γ T m ( γ γ T m + λ I ) − 1 = Φ ( K + λ m I ) − 1 γ T \hat\mathcal{C}_{Y|X} =\frac{\Phi\gamma^T}{m}(\frac{\gamma\gamma^T}{m}+\lambda I)^{-1}=\Phi(K+\lambda mI)^{-1}\gamma^T C^YX=mΦγT(mγγT+λI)1=Φ(K+λmI)1γTwhere K : = γ T γ K :=\gamma^T\gamma K:=γTγ with ( i , j ) (i,j) (i,j)th entry k ( x i , x j ) k(x_i,x_j) k(xi,xj)

参考文献

A Hilbert Space Embedding for Distributions PDF
Hilbert Space Embeddings of Hidden Markov Models
Hilbert Space Embeddings of Hidden Markov Models ppt

Generalization and Equilibrium in Generative Adversarial Nets (GANs)

  • class of generators : { G u , u ∈ U } \{ G_u , u \in \mathcal{U} \} {Gu,uU} , where G G G is a function : R l → R d \mathbb{R}^l \to \mathbb{R}^d RlRd,and u u u is parameters of the generators.
  • x = G u ( h ) , h ∼ l x=G_u(h) , h\sim l x=Gu(h),hl-dimensional spherical Gaussian distribution
  • class of discriminators : { D v , v ∈ V } \{ D_v , v\in \mathcal{V} \} {Dv,vV} , where D D D is a function : R d → [ 0 , 1 ] \mathbb{R}^d \to [0,1] Rd[0,1]
  • D v ( x ) D_v(x) Dv(x) is usually interpreted as the probability that the sample x x x comes from the real distribution D r e a l D_{real} Dreal

Objective functions min ⁡ u ∈ U max ⁡ v ∈ V E x ∈ D r e a l [ log ⁡ D v ( x ) ] + E x ∈ D G u [ log ⁡ ( 1 − D v ( x ) ) ] \min_{u \in \mathcal{U}} \max_{v \in \mathcal{V}}\mathbb{E}_{x \in \mathcal{D}_{real}} [\log D_v(x)]+\mathbb{E}_{x \in \mathcal{D}_{G_u}} [\log(1-D_v(x))] uUminvVmaxExDreal[logDv(x)]+ExDGu[log(1Dv(x))]

原作者的解读 知乎
最新|哥伦比亚机器学习研讨会:GAN理论/应用最新进展总结(附视频)
混合高斯分布

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值