Reproducing Kernel Hilbert Space
Given a nonempty set X \mathcal{X} X , and H \mathcal{H} H is a Hilbert space of function f : X → R f :\mathcal{X} \to R f:X→R Then H \mathcal{H} H called a Reproducing Kernel Hilbert Space endowed with the dot product < ⋅ , ⋅ > < \cdot , \cdot> <⋅,⋅>, if there exists a function k : X × X → R k : \mathcal{X} \times \mathcal{X} \to R k:X×X→R with the following properties:
- < f ( ⋅ ) , k ( x , ⋅ ) > = f ( x ) <f(\cdot) , k(x,\cdot)>=f(x) <f(⋅),k(x,⋅)>=f(x) , in particular < k ( x , ⋅ ) , k ( x ′ , ⋅ ) > = k ( x , x ′ ) <k(x , \cdot) , k(x',\cdot)>=k(x,x') <k(x,⋅),k(x′,⋅)>=k(x,x′)
- k spans H \mathcal{H} H : H = s p a n { k ( x , ⋅ ) ∣ x ∈ X } = { f ( ⋅ ) = ∑ i = 1 m α i k ( x , ⋅ ) : m ∈ N , x i ∈ X , α i ∈ R } \mathcal{H}= span\{ k(x,\cdot) | x \in \mathcal{X} \}=\{ f(\cdot)= \sum_{i=1}^m \alpha_ik(x,\cdot) : m \in N , x_i \in \mathcal{X} , \alpha_i \in R\} H=span{k(x,⋅)∣x∈X}={f(⋅)=∑i=1mαik(x,⋅):m∈N,xi∈X,αi∈R}
Hilbert Space Embedding
X
\mathcal{X}
X : the domain of observations
P
x
\mathbf{P}_x
Px : a probability measure on
X
\mathcal{X}
X
Y
\mathcal{Y}
Y : the domain of observations
P
y
\mathbf{P}_y
Py : a probability measure on
Y
\mathcal{Y}
Y.
P
x
×
y
\mathbf{P}_{x\times y}
Px×y : A joint probability measure on
X
×
Y
\mathcal{X \times Y}
X×Y
reproducing kernel Hilbert space (RKHS) H \mathcal{H} H of functions on X \mathcal{X} X with kernel k ( x , x ′ ) : = < φ ( x ) , φ ( x ′ ) > k(x,x'):=<\varphi(x) , \varphi(x')> k(x,x′):=<φ(x),φ(x′)>.
mean map
μ
\mu
μ :
μ
[
P
x
]
:
=
E
x
[
k
(
x
,
⋅
)
]
\mu [\mathbf{P}_x]:= \mathbf{E}_x [k(x, \cdot)]
μ[Px]:=Ex[k(x,⋅)]
μ
[
X
]
:
=
1
m
∑
i
=
1
m
k
(
x
i
,
⋅
)
\mu [X] := \frac{1}{m} \sum_{i=1}^m k(x_i , \cdot)
μ[X]:=m1∑i=1mk(xi,⋅)
then
μ
[
P
x
]
\mu [\mathbf{P}_x]
μ[Px] is an element of th Hilbert space ,so
<
μ
[
P
x
]
,
f
>
=
E
x
[
f
(
x
)
]
<\mu[\mathbf{P}_x] , f>= \mathbf{E}_x[f(x)]
<μ[Px],f>=Ex[f(x)]
<
μ
[
X
]
,
f
>
=
1
m
∑
i
=
1
m
f
(
x
i
)
<\mu[X] , f>= \frac1m\sum_{i=1}^m f(x_i)
<μ[X],f>=m1∑i=1mf(xi)
where
X
=
{
x
1
,
x
2
,
⋯
,
x
m
}
X=\{ x_1 ,x_2 , \cdots , x_m \}
X={x1,x2,⋯,xm} is assumed to be drawn independently and identically distributed from
P
x
\mathbf{P}_x
Px. and
μ
[
X
]
\mu [X]
μ[X] is an estimate of the mean map.
μ
[
X
]
=
1
m
∑
i
=
1
m
k
(
x
i
,
⋅
)
=
1
m
γ
1
m
\mu [X] = \frac{1}{m} \sum_{i=1}^m k(x_i , \cdot)=\frac1m \mathbf{\gamma 1}_m
μ[X]=m1∑i=1mk(xi,⋅)=m1γ1m , where
γ
:
=
{
k
(
x
1
,
⋅
)
,
k
(
x
2
,
⋅
)
,
⋯
,
k
(
x
m
,
⋅
)
}
\mathbf{\gamma} :=\{ k(x_1,\cdot) , k(x_2,\cdot) , \cdots , k(x_m,\cdot)\}
γ:={k(x1,⋅),k(x2,⋅),⋯,k(xm,⋅)}
Covariance operators
Given A joint probability measure P x × y \mathbf{P}_{x\times y} Px×y on X × Y \mathcal{X \times Y} X×Y , the uncentered covariance operator C X Y \mathcal{C}_{XY} CXY (Baker, 1973) C X Y : = E X Y [ φ ( x ) ⊗ ϕ ( y ) ] \mathcal{C}_{XY} := \mathbb{E}_{XY}[\varphi(x) \otimes\phi(y)] CXY:=EXY[φ(x)⊗ϕ(y)], where ⊗ \otimes ⊗ denotes tensor product.
Given
m
m
m pairs of
i
.
i
.
d
.
i.i.d.
i.i.d. observations
{
(
x
l
,
y
l
)
}
l
=
1
m
\{ (x^l , y^l) \}_{l=1}^m
{(xl,yl)}l=1m , we denote by
γ
=
(
φ
(
x
1
)
,
φ
(
x
2
)
,
⋯
,
φ
(
x
m
)
)
\mathbf{\gamma} = (\varphi(x^1) ,\varphi(x^2), \cdots,\varphi(x^m) )
γ=(φ(x1),φ(x2),⋯,φ(xm)) and
Φ
=
(
ϕ
(
y
1
)
,
ϕ
(
y
2
)
,
⋯
,
ϕ
(
y
m
)
)
\Phi =(\phi(y^1),\phi(y^2),\cdots,\phi(y^m))
Φ=(ϕ(y1),ϕ(y2),⋯,ϕ(ym)). Conceptually, the covariance operator
C
X
Y
\mathcal{C}_{XY}
CXY can then be estimated as
C
^
X
Y
=
1
m
γ
Φ
T
\hat{\mathcal{C}}_{XY}= \frac1m\gamma\Phi^T
C^XY=m1γΦT
Notes on mean embeddings and covariance
operators
tensor product :Notes on Tensor Products and the Exterior Algebra
Tensor Product Kernels: Characteristic Propertyand Universality
张量积计算
由泛函分析得
let
(
X
,
<
⋅
,
⋅
>
)
(X,<\cdot,\cdot>)
(X,<⋅,⋅>) be an inner product space over a field
F
\mathbb{F}
F. For each
x
∈
X
x \in X
x∈X , define
∥
x
∥
:
=
<
x
,
x
>
\Vert x\Vert := \sqrt {<x,x>}
∥x∥:=<x,x> Then
∥
⋅
∥
\Vert \cdot \Vert
∥⋅∥ defines a norm on
X
X
X. That is ,
(
X
,
∥
⋅
∥
)
(X,\Vert \cdot \Vert)
(X,∥⋅∥) is a normed linear space over
F
\mathbb{F}
F.
问题:
原论文提到
Conditional embedding operators
By analogy with the embedding of marginal distributions, the conditional density
P
(
Y
∣
x
)
\mathbb{P}(Y|x)
P(Y∣x) an also be rep-resented as an RKHS element:
μ
[
Y
∣
x
]
:
=
E
Y
∣
x
[
ϕ
(
Y
)
]
\mu[Y|x] := \mathbb{E}_{Y|x}[\phi(Y)]
μ[Y∣x]:=EY∣x[ϕ(Y)]with each element corresponding toa particular value of
x
x
x.
These conditional embeddings can be defined via a conditional embedding operator
C
Y
∣
X
:
F
→
G
\mathcal{C}_{Y|X} : \mathcal{F} \to \mathcal{G}
CY∣X:F→G
μ
[
Y
∣
x
]
=
C
Y
∣
X
φ
(
x
)
:
=
C
Y
X
C
X
X
−
1
φ
(
x
)
\mu[Y|x]=\mathcal{C}_{Y|X} \varphi(x):= \mathcal{C}_{YX}\mathcal{C}_{XX}^{-1}\varphi(x)
μ[Y∣x]=CY∣Xφ(x):=CYXCXX−1φ(x)
Given m m m pairs of i . i . d . i.i.d. i.i.d. observations { ( x l , y l ) } l = 1 m \{ (x^l , y^l) \}_{l=1}^m {(xl,yl)}l=1m from P x × y \mathbb{P}_{x\times y} Px×y, the conditional embedding operator can be estimated as C ^ Y ∣ X = Φ γ T m ( γ γ T m + λ I ) − 1 = Φ ( K + λ m I ) − 1 γ T \hat\mathcal{C}_{Y|X} =\frac{\Phi\gamma^T}{m}(\frac{\gamma\gamma^T}{m}+\lambda I)^{-1}=\Phi(K+\lambda mI)^{-1}\gamma^T C^Y∣X=mΦγT(mγγT+λI)−1=Φ(K+λmI)−1γTwhere K : = γ T γ K :=\gamma^T\gamma K:=γTγ with ( i , j ) (i,j) (i,j)th entry k ( x i , x j ) k(x_i,x_j) k(xi,xj)
参考文献
A Hilbert Space Embedding for Distributions PDF
Hilbert Space Embeddings of Hidden Markov Models
Hilbert Space Embeddings of Hidden Markov Models ppt
Generalization and Equilibrium in Generative Adversarial Nets (GANs)
- class of generators : { G u , u ∈ U } \{ G_u , u \in \mathcal{U} \} {Gu,u∈U} , where G G G is a function : R l → R d \mathbb{R}^l \to \mathbb{R}^d Rl→Rd,and u u u is parameters of the generators.
- x = G u ( h ) , h ∼ l x=G_u(h) , h\sim l x=Gu(h),h∼l-dimensional spherical Gaussian distribution
- class of discriminators : { D v , v ∈ V } \{ D_v , v\in \mathcal{V} \} {Dv,v∈V} , where D D D is a function : R d → [ 0 , 1 ] \mathbb{R}^d \to [0,1] Rd→[0,1]
- D v ( x ) D_v(x) Dv(x) is usually interpreted as the probability that the sample x x x comes from the real distribution D r e a l D_{real} Dreal
Objective functions min u ∈ U max v ∈ V E x ∈ D r e a l [ log D v ( x ) ] + E x ∈ D G u [ log ( 1 − D v ( x ) ) ] \min_{u \in \mathcal{U}} \max_{v \in \mathcal{V}}\mathbb{E}_{x \in \mathcal{D}_{real}} [\log D_v(x)]+\mathbb{E}_{x \in \mathcal{D}_{G_u}} [\log(1-D_v(x))] u∈Uminv∈VmaxEx∈Dreal[logDv(x)]+Ex∈DGu[log(1−Dv(x))]