Lemma 1 Let (,d) be a metric space, and let p, q be two Borel probability measures defined on . Then p = q if and only if Ex(f(x))=Ey(f(y)) for all f∈C() , where C() is the space of bounded continuous functions on .
Because it is not practical to work with such a rich function class in the finite sample setting. We thus define a more general class of statistic, for as yet unspecified function classes , to measure the disparity between p and q.
Definition 2 Let
be a class of functions
f:→ℝ
and let p,q,x,y,X,Y be defined as above. We defined the maximum mean discrepancy (MMD) as
In the statistics literature, this is known as an integal probability metric. A biased empirical estimate of the MMD is obtained by replacing the population expectations with empirical expectations computed on the samples X and Y,
We must therefore identify a function clas that is rich enough to uniquely identify whether p = q, yet restrictive enough to provide useful finite sample estimates.
Lemma 3 If k(⋅,⋅) is measurable and Exk(x,x)‾‾‾‾‾‾√<∞ then μp∈ .
Lemma 4 Assume the condition in Lemma 3 for the existence of the mean embeddings
μp,μq
is satisfied. Then
Theorem 5 Let be a unit ball in a universal RKHS , defined on the compact metric space , with associated continuous kernel k(⋅,⋅) . Then MMD[,p,q]=0 if and only if p = q.
Lemma 6 Given
x
and
where x′ is an independent copy of x with the same distribution, and