-
Problem
Given ( x 1 , y 1 ) , … , ( x n , y n ) ∈ P ( x , y ) , (x_1, y_1), …, (x_n, y_n) \in P(x, y), (x1,y1),…,(xn,yn)∈P(x,y),
determing whether P ( x , y ) = P ( x ) × P ( y ) . P(x, y) = P(x) \times P(y). P(x,y)=P(x)×P(y).
Or measure degree of dependence.
Below is the formal problem setting:
Let Pxy be a Borel probability measure defined on a domain X x Y, and let Px and Py be the respective marginal distributions on X and Y. Given an I.I.D sample Z := (X, Y) = {(x1, y1), …, (xm, ym)} of size m drawn independently and identically distributed according to Pxy, does Pxy factorize as PxPy. -
Applications
- Independent component analysis;
- Dimsionality reduction and feature extraction;
- Statistical modeling.
-
Indirect Approach
- Perform density estimate of P(x, y)
- Check whether the estimate approximately factorizes
-
Direct Approach
- Check properties of factorizing distributions
- e.g. kurtosis, covariance operators.
HSIC is defined as the squared Hilbert-Schmidt (HS) norm of the associated cross-covariance operator Cxy:
H
S
I
C
(
P
x
y
,
F
,
G
)
=
∣
∣
C
x
y
∣
∣
H
S
2
.
{\rm HSIC}(P_{xy}, F, G) = || C_{xy} || ^2 _{HS}.
HSIC(Pxy,F,G)=∣∣Cxy∣∣HS2.
Above definition of HSIC involves two other definitions. One is cross-covariance operator which is defined:
C
x
y
:
=
E
x
y
(
[
f
(
x
)
−
E
x
(
f
(
x
)
)
]
[
g
(
y
)
−
E
y
(
g
(
y
)
)
]
)
.
C_{xy} := {\rm E}_{xy}([f(x) - {\rm E}_{x}(f(x))] [g(y) - {\rm E}_{y}(g(y))]).
Cxy:=Exy([f(x)−Ex(f(x))][g(y)−Ey(g(y))]).
It is obvious that Cxy is a linear opertor that maps from G to F. The operator itself can be writen in:
C
x
y
:
=
E
x
y
[
(
f
(
x
)
−
μ
x
)
⊗
(
g
(
y
)
−
μ
y
)
]
,
C_{xy} := {\rm E}_{xy}[(f(x) - \mu_x) \otimes (g(y) - \mu_y)],
Cxy:=Exy[(f(x)−μx)⊗(g(y)−μy)],
where it denotes the tensor product.