Multiple-Output Gaussian Process
Working Situation
As the picture shows, we want to learn from the three sensors (with complete signal information) to recover the fourth one.
Dependencies between processes
Multiple-independent Output GP
f
1
(
x
)
∼
G
P
(
0
,
k
1
(
x
,
x
′
)
)
                    
f
2
(
x
)
∼
G
P
(
0
,
k
2
(
x
,
x
′
)
)
D
1
=
{
(
x
i
,
1
,
y
1
(
x
i
,
2
)
)
∣
i
=
1
,
…
,
N
1
}
                    
D
2
=
{
(
x
i
,
2
,
y
2
(
x
i
,
2
)
)
∣
i
=
1
,
…
,
N
2
}
y
1
∼
N
(
0
,
K
1
+
σ
1
2
)
                    
y
2
∼
N
(
0
,
K
2
+
σ
2
2
l
)
\begin{aligned} f_{1}(\mathbf{x}) \sim \mathcal{G} \mathcal{P}\left(0, k_{1}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)\right) &\;\;\;\;\;\;\;\;\;\; f_{2}(\mathbf{x}) \sim \mathcal{G} \mathcal{P}\left(0, k_{2}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)\right) \\ D_{1}=\left\{\left(\mathbf{x}_{i, 1}, y_{1}\left(\mathbf{x}_{i, 2}\right)\right) | i=1, \ldots, N_{1}\right\} & \;\;\;\;\;\;\;\;\;\;\mathcal{D}_{2}=\left\{\left(\mathbf{x}_{i, 2}, y_{2}\left(\mathbf{x}_{i, 2}\right)\right) | i=1, \ldots, N_{2}\right\} \\ \mathbf{y}_{1} \sim \mathcal{N}\left(\mathbf{0}, \mathbf{K}_{1}+\sigma_{1}^{2}\right) & \;\;\;\;\;\;\;\;\;\;\mathbf{y}_{2} \sim \mathcal{N}\left(\mathbf{0}, \mathbf{K}_{2}+\sigma_{2}^{2} \mathbf{l}\right) \end{aligned}
f1(x)∼GP(0,k1(x,x′))D1={(xi,1,y1(xi,2))∣i=1,…,N1}y1∼N(0,K1+σ12)f2(x)∼GP(0,k2(x,x′))D2={(xi,2,y2(xi,2))∣i=1,…,N2}y2∼N(0,K2+σ22l)
[ y 1 y 2 ] ∼ N ( [ 0 0 ] , [ K 1 0 0 K 2 ] + [ σ 1 2 l 0 0 σ 2 2 l ] ) \left[\begin{array}{l}{\mathbf{y}_{1}} \\ {\mathbf{y}_{2}}\end{array}\right] \sim \mathcal{N}\left(\left[\begin{array}{l}{\mathbf{0}} \\ {\mathbf{0}}\end{array}\right],\left[\begin{array}{cc}{\mathbf{K}_{1}} & {\mathbf{0}} \\ {\mathbf{0}} & {\mathbf{K}_{2}}\end{array}\right]+\left[\begin{array}{cc}{\sigma_{1}^{2} \mathbf{l}} & {\mathbf{0}} \\ {\mathbf{0}} & {\sigma_{2}^{2} \mathbf{l}}\end{array}\right]\right) [y1y2]∼N([00],[K100K2]+[σ12l00σ22l])
How to find the independences for kernel design
K f , f = [ K 1 ? ? K 2 ] \mathbf{K}_{\mathbf{f}, \mathbf{f}}=\left[\begin{array}{cc}{\mathbf{K}_{1}} & {?} \\ {?} & {\mathbf{K}_{2}}\end{array}\right] Kf,f=[K1??K2]
Build a cross-covariance function c o v [ f 1 ( x ) , f 2 ( x ′ ) ] cov[f_1(x), f_2(x^{'})] cov[f1(x),f2(x′)] such that K f , f K_{f,f} Kf,f is positive semi-definite.
Different input configurations of data
D
1
=
{
(
x
i
,
f
1
(
x
i
)
)
i
=
1
N
}
          
D
1
=
{
(
x
i
,
1
,
f
1
(
x
i
,
1
)
)
i
=
1
N
1
}
D
2
=
{
(
x
i
,
f
2
(
x
i
)
)
i
=
1
N
}
          
D
2
=
{
(
x
i
,
2
,
f
2
(
x
i
,
2
)
)
i
=
1
N
2
}
\begin{array}{ll}{\mathcal{D}_{1}=\left\{\left(\mathbf{x}_{i}, f_{1}\left(\mathbf{x}_{i}\right)\right)_{i=1}^{N}\right\}} &\;\;\;\;\; {\mathcal{D}_{1}=\left\{\left(\mathbf{x}_{i, 1}, f_{1}\left(\mathbf{x}_{i, 1}\right)\right)_{i=1}^{N_{1}}\right\}} \\ {\mathcal{D}_{2}=\left\{\left(\mathbf{x}_{i}, f_{2}\left(\mathbf{x}_{i}\right)\right)_{i=1}^{N}\right\}} & \;\;\;\;\;{\mathcal{D}_{2}=\left\{\left(\mathbf{x}_{i, 2}, f_{2}\left(\mathbf{x}_{i, 2}\right)\right)_{i=1}^{N_{2}}\right\}}\end{array}
D1={(xi,f1(xi))i=1N}D2={(xi,f2(xi))i=1N}D1={(xi,1,f1(xi,1))i=1N1}D2={(xi,2,f2(xi,2))i=1N2}
Intrinsic Coregionalization Model
Two outputs
Sample Once
Consider two outputs $f_1(x) $ f 2 ( x ) f_{2}(x) f2(x) with x ∈ R p x\in \mathcal{R}^{p} x∈Rp.
- Sample from a GP u ( x ) ∼ G P ( 0 , k ( x , x ′ ) ) u(\mathbf{x}) \sim \mathcal{G P}\left(0, k\left(\mathbf{x}, \mathbf{x}^{\prime}\right)\right) u(x)∼GP(0,k(x,x′)) to obtain u 1 ( x ) u^{1}(\mathbf{x}) u1(x)
- Obtain $f_1(x) $ and f 2 ( x ) f_{2}(x) f2(x) by linearly transforming:
f 1 ( x ) = a 1 1 u 1 ( x ) f 2 ( x ) = a 2 1 u 1 ( x ) \begin{aligned} f_{1}(\mathbf{x}) &=a_{1}^{1} u^{1}(\mathbf{x}) \\ f_{2}(\mathbf{x}) &=a_{2}^{1} u^{1}(\mathbf{x}) \end{aligned} f1(x)f2(x)=a11u1(x)=a21u1(x)
For a fixed value
x
x
x. we can group
f
1
(
x
)
f_1(x)
f1(x) and
f
2
(
x
)
f_2(x)
f2(x) in a vector:
f
(
x
)
=
[
f
1
(
x
)
f
2
(
x
)
]
\mathbf{f}(\mathbf{x})=\left[\begin{array}{l}{f_{1}(\mathbf{x})} \\ {f_{2}(\mathbf{x})}\end{array}\right]
f(x)=[f1(x)f2(x)]
and this vector will be refer as a
v
e
c
t
o
r
−
v
a
l
u
e
d
  
f
u
n
c
t
i
o
n
\bf{vector-valued \; function}
vector−valuedfunction.
The covariance for
f
(
x
)
f(x)
f(x) is computed as:
cov
(
f
(
x
)
,
f
(
x
′
)
)
=
E
{
f
(
x
)
[
f
(
x
′
)
]
⊤
}
−
E
{
f
(
x
)
}
[
E
{
f
(
x
′
)
}
]
⊤
\operatorname{cov}\left(\mathbf{f}(\mathbf{x}), \mathbf{f}\left(\mathbf{x}^{\prime}\right)\right)=\mathbb{E}\left\{\mathbf{f}(\mathbf{x})\left[\mathbf{f}\left(\mathbf{x}^{\prime}\right)\right]^{\top}\right\}-\mathbb{E}\{\mathbf{f}(\mathbf{x})\}\left[\mathbb{E}\left\{\mathbf{f}\left(\mathbf{x}^{\prime}\right)\right\}\right]^{\top}
cov(f(x),f(x′))=E{f(x)[f(x′)]⊤}−E{f(x)}[E{f(x′)}]⊤
E { [ f 1 ( x ) f 2 ( x ) ] [ f 1 ( x ′ ) f 2 ( x ′ ) ] } = [ E { f 1 ( x ) f 1 ( x ′ ) } E { f 1 ( x ) f 2 ( x ′ ) } E { f 2 ( x ) f 1 ( x ′ ) } E { f 2 ( x ) f 2 ( x ′ ) } ] E { f 1 ( x ) f 1 ( x ′ ) } = E { a 1 1 u 1 ( x ) a 1 1 u 1 ( x ′ ) } = ( a 1 1 ) 2 E { u 1 ( x ) u 1 ( x ′ ) } E { f 1 ( x ) f 2 ( x ′ ) } = E { a 1 1 u 1 ( x ) a 2 1 ( x ′ ) } = a 1 1 a 2 1 E { u 1 ( x ) u 1 ( x ′ ) } E { f 2 ( x ) f 2 ( x ′ ) } = E { a 2 1 u 1 ( x ) a 2 1 u 1 ( x ′ ) } = ( a 2 1 ) 2 E { u 1 ( x ) u 1 ( x ′ ) } \mathbb{E}\left\{\left[\begin{array}{c}{f_{1}(\mathbf{x})} \\ {f_{2}(\mathbf{x})}\end{array}\right]\left[\begin{array}{ll}{f_{1}\left(\mathbf{x}^{\prime}\right)} & {f_{2}\left(\mathbf{x}^{\prime}\right) ]}\end{array}\right\}=\left[\begin{array}{cc}{\mathbb{E}\left\{f_{1}(\mathbf{x}) f_{1}\left(\mathbf{x}^{\prime}\right)\right\}} & {\mathbb{E}\left\{f_{1}(\mathbf{x}) f_{2}\left(\mathbf{x}^{\prime}\right)\right\}} \\ {\mathbb{E}\left\{f_{2}(\mathbf{x}) f_{1}\left(\mathbf{x}^{\prime}\right)\right\}} & {\mathbb{E}\left\{f_{2}(\mathbf{x}) f_{2}\left(\mathbf{x}^{\prime}\right)\right\}}\end{array}\right]\right.\\ \begin{aligned} \mathbb{E}\left\{f_{1}(\mathbf{x}) f_{1}\left(\mathbf{x}^{\prime}\right)\right\} &=\mathbb{E}\left\{a_{1}^{1} u^{1}(\mathbf{x}) a_{1}^{1} u^{1}\left(\mathbf{x}^{\prime}\right)\right\}=\left(a_{1}^{1}\right)^{2} \mathbb{E}\left\{u^{1}(\mathbf{x}) u^{1}\left(\mathbf{x}^{\prime}\right)\right\} \\ \mathbb{E}\left\{f_{1}(\mathbf{x}) f_{2}\left(\mathbf{x}^{\prime}\right)\right\} &=\mathbb{E}\left\{a_{1}^{1} u^{1}(\mathbf{x}) a_{2}^{1}\left(\mathbf{x}^{\prime}\right)\right\}=a_{1}^{1} a_{2}^{1} \mathbb{E}\left\{u^{1}(\mathbf{x}) u^{1}\left(\mathbf{x}^{\prime}\right)\right\} \\ \mathbb{E}\left\{f_{2}(\mathbf{x}) f_{2}\left(\mathbf{x}^{\prime}\right)\right\} &=\mathbb{E}\left\{a_{2}^{1} u^{1}(\mathbf{x}) a_{2}^{1} u^{1}\left(\mathbf{x}^{\prime}\right)\right\}=\left(a_{2}^{1}\right)^{2} \mathbb{E}\left\{u^{1}(\mathbf{x}) u^{1}\left(\mathbf{x}^{\prime}\right)\right\} \end{aligned} E{[f1(x)f2(x)][f1(x′)f2(x′)]}=[E{f1(x)f1(x′)}E{f2(x)f1(x′)}E{f1(x)f2(x′)}E{f2(x)f2(x′)}]E{f1(x)f1(x′)}E{f1(x)f2(x′)}E{f2(x)f2(x′)}=E{a11u1(x)a11u1(x′)}=(a11)2E{u1(x)u1(x′)}=E{a11u1(x)a21(x′)}=a11a21E{u1(x)u1(x′)}=E{a21u1(x)a21u1(x′)}=(a21)2E{u1(x)u1(x′)}
So that term could be written as:
E
{
f
(
x
)
[
f
(
x
′
)
]
⊤
}
=
[
(
a
1
1
)
2
E
{
u
1
(
x
)
u
1
(
x
′
)
}
a
1
1
a
2
1
E
{
u
1
(
x
)
u
1
(
x
′
)
}
a
1
a
2
E
{
u
1
(
x
)
u
1
(
x
′
)
}
(
a
2
1
)
2
E
{
u
1
(
x
)
u
1
(
x
′
)
}
]
=
[
(
a
1
1
)
2
a
1
1
a
2
1
a
1
1
a
2
1
(
a
2
1
)
2
]
E
{
u
1
(
x
)
u
1
(
x
′
)
}
\mathbb{E}\left\{\mathbf{f}(\mathbf{x})\left[\mathbf{f}\left(\mathbf{x}^{\prime}\right)\right]^{\top}\right\} =\left[\begin{array}{cc}{\left(a_{1}^{1}\right)^{2} \mathbb{E}\left\{u^{1}(\mathbf{x}) u^{1}\left(\mathbf{x}^{\prime}\right)\right\}} & {a_{1}^{1} a_{2}^{1} \mathbb{E}\left\{u^{1}(\mathbf{x}) u^{1}\left(\mathbf{x}^{\prime}\right)\right\}} \\ {a^{1} a^{2} \mathbb{E}\left\{u^{1}(\mathbf{x}) u^{1}\left(\mathbf{x}^{\prime}\right)\right\}} & {\left(a_{2}^{1}\right)^{2} \mathbb{E}\left\{u^{1}(\mathbf{x}) u^{1}\left(\mathbf{x}^{\prime}\right)\right\}}\end{array}\right]\\ =\left[\begin{array}{cc}{\left(a_{1}^{1}\right)^{2}} & {a_{1}^{1} a_{2}^{1}} \\{a_{1}^{1} a_{2}^{1}} & {\left(a_{2}^{1}\right)^{2}}\end{array}\right] \mathbb{E}\left\{u^{1}(\mathbf{x}) u^{1}\left(\mathbf{x}^{\prime}\right)\right\}
E{f(x)[f(x′)]⊤}=[(a11)2E{u1(x)u1(x′)}a1a2E{u1(x)u1(x′)}a11a21E{u1(x)u1(x′)}(a21)2E{u1(x)u1(x′)}]=[(a11)2a11a21a11a21(a21)2]E{u1(x)u1(x′)}
The term
E
{
f
(
x
)
}
\mathbb{E}\{\mathbf{f}(\mathbf{x})\}
E{f(x)} is computed as:
E
{
[
f
1
(
x
)
f
2
(
x
)
]
}
=
[
E
{
f
1
(
x
)
}
E
{
f
1
(
x
)
}
]
=
[
E
{
a
1
1
u
1
(
x
)
}
E
{
a
2
1
u
1
(
x
)
}
]
]
=
[
a
1
1
a
2
1
]
E
{
u
1
(
x
)
}
\mathbb{E}\left\{\left[\begin{array}{c}{f_{1}(\mathbf{x})} \\ {f_{2}(\mathbf{x})}\end{array}\right]\right\}=\left[\begin{array}{c}{\mathbb{E}\left\{f_{1}(\mathbf{x})\right\}} \\ {\mathbb{E}\left\{f_{1}(\mathbf{x})\right\}}\end{array}\right]=\left[\begin{array}{c}{\mathbb{E}\left\{a_{1}^{1} u^{1}(\mathbf{x})\right\}} \\ {\mathbb{E}\left\{a_{2}^{1} u^{1}(\mathbf{x})\right\}}\end{array}\right] ]=\left[\begin{array}{c}{a_{1}^{1}} \\ {a_{2}^{1}}\end{array}\right] \mathbb{E}\left\{u^{1}(\mathbf{x})\right\}
E{[f1(x)f2(x)]}=[E{f1(x)}E{f1(x)}]=[E{a11u1(x)}E{a21u1(x)}]]=[a11a21]E{u1(x)}
Putting them together, the covariance for
f
(
x
′
)
f(x^{'})
f(x′) follows as:
[
(
a
1
1
)
2
a
1
1
a
2
1
a
1
1
a
2
1
(
a
2
1
)
2
]
E
{
u
1
(
x
)
u
1
(
x
′
)
}
−
[
a
1
1
a
2
1
]
[
a
1
1
a
2
1
]
{
u
1
(
x
)
}
E
{
u
1
(
x
′
)
}
\left[\begin{array}{cc}{\left(a_{1}^{1}\right)^{2}} & {a_{1}^{1} a_{2}^{1}} \\ {a_{1}^{1} a_{2}^{1}} & {\left(a_{2}^{1}\right)^{2}}\end{array}\right] \mathbb{E}\left\{u^{1}(\mathbf{x}) u^{1}\left(\mathbf{x}^{\prime}\right)\right\}-\left[\begin{array}{c}{a_{1}^{1}} \\ {a_{2}^{1}}\end{array}\right]\left[\begin{array}{cc}{a_{1}^{1}} & {a_{2}^{1} ]}\end{array}\left\{u^{1}(\mathbf{x})\right\} \mathbb{E}\left\{u^{1}\left(\mathbf{x}^{\prime}\right)\right\}\right.
[(a11)2a11a21a11a21(a21)2]E{u1(x)u1(x′)}−[a11a21][a11a21]{u1(x)}E{u1(x′)}
Defining
a
=
[
a
1
1
a
2
1
]
⊤
\mathbf{a}=\left[\begin{array}{ll}{a_{1}^{1}} & {a_{2}^{1}}\end{array}\right]^{\top}
a=[a11a21]⊤,
cov
(
f
(
x
)
,
f
(
x
′
)
)
=
a
a
⊤
E
{
u
1
(
x
)
u
1
(
x
′
)
}
−
a
a
⊤
E
{
u
1
(
x
)
}
E
{
u
1
(
x
′
)
}
=
a
a
⊤
[
E
{
u
1
(
x
)
u
1
(
x
′
)
}
−
E
{
u
1
(
x
)
}
E
{
u
1
(
x
′
)
}
]
⎵
k
(
x
,
x
′
)
=
a
a
⊤
k
(
x
,
x
′
)
\begin{aligned} \operatorname{cov}\left(\mathbf{f}(\mathbf{x}), \mathbf{f}\left(\mathbf{x}^{\prime}\right)\right) &=\mathbf{a a}^{\top} \mathbb{E}\left\{u^{1}(\mathbf{x}) u^{1}\left(\mathbf{x}^{\prime}\right)\right\}-\mathbf{a a}^{\top} \mathbb{E}\left\{u^{1}(\mathbf{x})\right\} \mathbb{E}\left\{u^{1}\left(\mathbf{x}^{\prime}\right)\right\} \\ &=\mathbf{a a}^{\top} \underbrace{\left[\mathbb{E}\left\{u^{1}(\mathbf{x}) u^{1}\left(\mathbf{x}^{\prime}\right)\right\}-\mathbb{E}\left\{u^{1}(\mathbf{x})\right\} \mathbb{E}\left\{u^{1}\left(\mathbf{x}^{\prime}\right)\right\}\right]}_{k\left(\mathbf{x}, \mathbf{x}^{\prime}\right)} \\ &=\mathbf{a} \mathbf{a}^{\top} k\left(\mathbf{x}, \mathbf{x}^{\prime}\right) \end{aligned}
cov(f(x),f(x′))=aa⊤E{u1(x)u1(x′)}−aa⊤E{u1(x)}E{u1(x′)}=aa⊤k(x,x′)
[E{u1(x)u1(x′)}−E{u1(x)}E{u1(x′)}]=aa⊤k(x,x′)
We define
B
=
a
a
⊤
\mathbf{B}=\mathbf{a a}^{\top}
B=aa⊤, leading to
cov
(
f
(
x
)
,
f
(
x
′
)
)
=
B
k
(
x
,
x
′
)
=
[
b
11
b
12
b
21
b
22
]
k
(
x
,
x
′
)
\operatorname{cov}\left(\mathbf{f}(\mathbf{x}), \mathbf{f}\left(\mathbf{x}^{\prime}\right)\right)=\mathbf{B} k\left(\mathbf{x}, \mathbf{x}^{\prime}\right)=\left[\begin{array}{ll}{b_{11}} & {b_{12}} \\ {b_{21}} & {b_{22}}\end{array}\right] k\left(\mathbf{x}, \mathbf{x}^{\prime}\right)
cov(f(x),f(x′))=Bk(x,x′)=[b11b21b12b22]k(x,x′)
and the
B
\bf{B}
B has rank one, since it is the result of the multiplication of two column-vector.
Sample Twice
Sample twice from a GP u ( x ) ∼ G P ( 0 , k ( x , x ′ ) ) u(\mathbf{x}) \sim \mathcal{G} \mathcal{P}\left(0, k\left(\mathbf{x}, \mathbf{x}^{\prime}\right)\right) u(x)∼GP(0,k(x,x′)) to obtain u 1 ( x ) and u 2 ( x ) u^{1}(\mathbf{x}) \text { and } u^{2}(\mathbf{x}) u1(x) and u2(x).
Adding a scaled transformation.:
f
1
(
x
)
=
a
1
1
u
1
(
x
)
+
a
1
2
u
2
(
x
)
f
2
(
x
)
=
a
2
1
u
1
(
x
)
+
a
2
2
u
2
(
x
)
\begin{array}{l}{f_{1}(\mathbf{x})=a_{1}^{1} u^{1}(\mathbf{x})+a_{1}^{2} u^{2}(\mathbf{x})} \\ {f_{2}(\mathbf{x})=a_{2}^{1} u^{1}(\mathbf{x})+a_{2}^{2} u^{2}(\mathbf{x})}\end{array}
f1(x)=a11u1(x)+a12u2(x)f2(x)=a21u1(x)+a22u2(x)
**
Notice that the
u
1
u_1
u1 and
u
2
u_2
u2 are independent, although they share the same covariance
k
k
k.
f
(
x
)
=
[
(
a
1
1
)
a
1
2
a
2
1
(
a
2
2
)
]
[
u
1
u
2
]
\mathbf{f}(\mathbf{x}) = \left[\begin{array}{cc}{\left(a_{1}^{1}\right)^{}} & {a_{1}^{2} } \\ {a_{2}^{1} } & {\left(a_{2}^{2}\right)^{}}\end{array}\right] \left[\begin{array}{l}{u^{1}} \\ {u^{2}}\end{array}\right]
f(x)=[(a11)a21a12(a22)][u1u2]
The vector-valued function can be written as
f
(
x
)
f(x)
f(x), where
a
1
=
[
a
1
1
    
a
2
1
]
⊤
and
a
2
=
[
a
1
2
    
a
2
2
]
⊤
\mathbf{a}^{1}=\left[a_{1}^{1 } \;\;a_{2}^{1}\right]^{\top} \text { and } \mathbf{a}^{2}=\left[a_{1}^{2}\;\; a_{2}^{2}\right]^{\top}
a1=[a11a21]⊤ and a2=[a12a22]⊤
The covariance for
f
(
x
)
f(x)
f(x) is computed as:
cov
(
f
(
x
)
,
f
(
x
′
)
)
=
a
1
(
a
1
)
⊤
cov
(
u
1
(
x
)
,
u
1
(
x
′
)
)
+
a
2
(
a
2
)
⊤
cov
(
u
2
(
x
)
,
u
2
(
x
′
)
)
=
a
1
(
a
1
)
⊤
k
(
x
,
x
′
)
+
a
2
(
a
2
)
⊤
k
(
x
,
x
′
)
=
[
a
1
(
a
1
)
⊤
+
a
2
(
a
2
)
⊤
]
k
(
x
,
x
′
)
\begin{aligned} \operatorname{cov}\left(\mathbf{f}(\mathbf{x}), \mathbf{f}\left(\mathbf{x}^{\prime}\right)\right) &=\mathbf{a}^{1}\left(\mathbf{a}^{1}\right)^{\top} \operatorname{cov}\left(u^{1}(\mathbf{x}), u^{1}\left(\mathbf{x}^{\prime}\right)\right)+\mathbf{a}^{2}\left(\mathbf{a}^{2}\right)^{\top} \operatorname{cov}\left(u^{2}(\mathbf{x}), u^{2}\left(\mathbf{x}^{\prime}\right)\right) \\ &=\mathbf{a}^{1}\left(\mathbf{a}^{1}\right)^{\top} k\left(\mathbf{x}, \mathbf{x}^{\prime}\right)+\mathbf{a}^{2}\left(\mathbf{a}^{2}\right)^{\top} k\left(\mathbf{x}, \mathbf{x}^{\prime}\right) \\ &=\left[\mathbf{a}^{1}\left(\mathbf{a}^{1}\right)^{\top}+\mathbf{a}^{2}\left(\mathbf{a}^{2}\right)^{\top}\right] k\left(\mathbf{x}, \mathbf{x}^{\prime}\right) \end{aligned}
cov(f(x),f(x′))=a1(a1)⊤cov(u1(x),u1(x′))+a2(a2)⊤cov(u2(x),u2(x′))=a1(a1)⊤k(x,x′)+a2(a2)⊤k(x,x′)=[a1(a1)⊤+a2(a2)⊤]k(x,x′)
notice that
u
1
u_1
u1 and
u
2
u_2
u2 are independent, so their variance could be added directly.
we define
B
=
a
1
(
a
1
)
⊤
+
a
2
(
a
2
)
⊤
\mathbf{B}=\mathbf{a}^{1}\left(\mathbf{a}^{1}\right)^{\top}+\mathbf{a}^{2}\left(\mathbf{a}^{2}\right)^{\top}
B=a1(a1)⊤+a2(a2)⊤, leading to:
cov
(
f
(
x
)
,
f
(
x
′
)
)
=
B
k
(
x
,
x
′
)
=
[
b
11
b
12
b
21
b
22
]
k
(
x
,
x
′
)
\operatorname{cov}\left(\mathbf{f}(\mathbf{x}), \mathbf{f}\left(\mathbf{x}^{\prime}\right)\right)=\mathbf{B} k\left(\mathbf{x}, \mathbf{x}^{\prime}\right)=\left[\begin{array}{ll}{b_{11}} & {b_{12}} \\ {b_{21}} & {b_{22}}\end{array}\right] k\left(\mathbf{x}, \mathbf{x}^{\prime}\right)
cov(f(x),f(x′))=Bk(x,x′)=[b11b21b12b22]k(x,x′)
Notice that
B
B
B has rank two.
Observed Data:
[
f
1
f
2
]
=
[
f
1
(
x
1
)
⋮
f
1
(
x
N
)
f
2
(
x
1
)
⋮
f
2
(
x
N
)
]
∼
N
(
[
0
0
]
,
[
b
11
K
b
12
K
b
21
K
b
22
K
]
)
\left[\begin{array}{c}{\mathbf{f}_{1}} \\ {\mathbf{f}_{2}}\end{array}\right]=\left[\begin{array}{c}{f_{1}\left(\mathbf{x}_{1}\right)} \\ {\vdots} \\ {f_{1}\left(\mathbf{x}_{N}\right)} \\ {f_{2}\left(\mathbf{x}_{1}\right)} \\ {\vdots} \\ {f_{2}\left(\mathbf{x}_{N}\right)}\end{array}\right] \sim \mathcal{N}\left(\left[\begin{array}{l}{\mathbf{0}} \\ {\mathbf{0}}\end{array}\right],\left[\begin{array}{cc}{b_{11} \mathbf{K}} & {b_{12} \mathbf{K}} \\ {b_{21} \mathbf{K}} & {b_{22} \mathbf{K}}\end{array}\right]\right)
[f1f2]=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎡f1(x1)⋮f1(xN)f2(x1)⋮f2(xN)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎤∼N([00],[b11Kb21Kb12Kb22K])
The matrix
k
∈
R
N
∗
N
\bf{k} \in \mathcal{R}^{N*N}
k∈RN∗N has elements
k
(
x
i
,
x
j
)
k(x_i,x_j)
k(xi,xj).
If we use Kronecker product we would get:
[
f
1
f
2
]
=
[
f
1
(
x
1
)
⋮
f
1
(
x
N
)
f
2
(
x
1
)
⋮
f
2
(
x
N
)
]
∼
N
(
[
0
0
]
,
B
⊗
K
)
\left[\begin{array}{c}{\mathbf{f}_{1}} \\ {\mathbf{f}_{2}}\end{array}\right]=\left[\begin{array}{c}{f_{1}\left(\mathbf{x}_{1}\right)} \\ {\vdots} \\ {f_{1}\left(\mathbf{x}_{N}\right)} \\ {f_{2}\left(\mathbf{x}_{1}\right)} \\ {\vdots} \\ {f_{2}\left(\mathbf{x}_{N}\right)}\end{array}\right] \sim \mathcal{N}\left(\left[\begin{array}{l}{\mathbf{0}} \\ {\mathbf{0}}\end{array}\right], \mathbf{B} \otimes \mathbf{K}\right)
[f1f2]=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎡f1(x1)⋮f1(xN)f2(x1)⋮f2(xN)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎤∼N([00],B⊗K)
General Case
Consider a set of functions { f d ( x ) } d = 1 D \left\{f_{d}(\mathbf{x})\right\}_{d=1}^{D} {fd(x)}d=1D.
In the ICM,
f
d
(
x
)
=
∑
i
=
1
R
a
d
i
u
i
(
x
)
f_{d}(\mathbf{x})=\sum_{i=1}^{R} a_{d}^{i} u^{i}(\mathbf{x})
fd(x)=i=1∑Radiui(x)
where the functions
u
i
(
x
)
u_i(x)
ui(x) are GPs sampled independently, and share the same covariance function
k
(
x
,
x
′
)
k(x, x^{'})
k(x,x′).
For
f
(
x
)
=
[
f
1
(
x
)
⋯
f
D
(
x
)
]
⊤
\mathbf{f}(\mathbf{x})=\left[f_{1}(\mathbf{x}) \cdots f_{D}(\mathbf{x})\right]^{\top}
f(x)=[f1(x)⋯fD(x)]⊤, the covariance is given as:
cov
[
f
(
x
)
,
f
(
x
′
)
]
=
A
A
⊤
k
(
x
,
x
′
)
=
B
k
(
x
,
x
′
)
\operatorname{cov}\left[\mathbf{f}(\mathbf{x}), \mathbf{f}\left(\mathbf{x}^{\prime}\right)\right]=\mathbf{A} \mathbf{A}^{\top} k\left(\mathbf{x}, \mathbf{x}^{\prime}\right)=\mathbf{B} k\left(\mathbf{x}, \mathbf{x}^{\prime}\right)
cov[f(x),f(x′)]=AA⊤k(x,x′)=Bk(x,x′)
where
A
=
[
a
1
a
2
⋯
a
R
]
\mathbf{A}=\left[\mathbf{a}^{1} \mathbf{a}^{2} \cdots \mathbf{a}^{R}\right]
A=[a1a2⋯aR]
and the Rank of
B
B
B is given by
R
R
R.
ICM: autokrigeability
If the outputs are considered to be noise-free, prediction using the ICM under an isotopic data case is equivalent to independent prediction over each output. This circumstance is also known as autokrigeability.
The prove:
Assume that we only have two outputs: f 1 , f 2 f_1,f_2 f1,f2
the predicated mean could be written as:
μ
=
K
f
∗
,
f
(
K
f
,
f
)
−
1
f
K
f
,
f
=
B
⊗
K
\mu = K_{f_{*},f} (K_{f,f})^{-1}f\\ K_{f,f} = B \otimes K
μ=Kf∗,f(Kf,f)−1fKf,f=B⊗K
μ = B ⊗ K ∗ ( B ⊗ K ) − 1 f = B ⊗ K ∗ ( B − 1 ⊗ K − 1 ) f = B B − 1 ⊗ K ∗ K − 1 f = I ⊗ K ∗ K − 1 f = [ K ∗ K − 1 0 0 K ∗ K − 1 ] [ f 1 f 2 ] \begin{aligned} \mu &= B \otimes K_{*} (B \otimes K)^{-1} f\\ &= B \otimes K_{*} (B^{-1} \otimes K^{-1})f\\ &= BB^{-1}\otimes K_{*}K^{-1}f\\ &=I \otimes K_{*}K^{-1}f \\ &=\begin{bmatrix} K_{*}K^{-1} & 0\\ 0 & K_{*}K^{-1} \end{bmatrix}\begin{bmatrix} f_{1}\\ f_{2} \end{bmatrix}\end{aligned} μ=B⊗K∗(B⊗K)−1f=B⊗K∗(B−1⊗K−1)f=BB−1⊗K∗K−1f=I⊗K∗K−1f=[K∗K−100K∗K−1][f1f2]
it means, the prediction of f 1 f_{1} f1 only depends on the data set for f 1 f_{1} f1
Semiparametric Latent Factor Model (SLFM)
ICM uses R samples u i ( x ) u^{i}(x) ui(x) from u ( x ) u(x) u(x) with the same covariance function. SLFM uses Q samples from u q u_{q} uq processes with different covariance functions.
Two Outputs
-
Sample from a GP G P ( 0 , k 1 ( x , x ′ ) ) \mathcal{G P}\left(0, k_{1}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)\right) GP(0,k1(x,x′)) to obtain u 1 ( x ) u_1(x) u1(x).
-
Sample from a GP G P ( 0 , k 2 ( x , x ′ ) ) \mathcal{G P}\left(0, k_{2}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)\right) GP(0,k2(x,x′)) to obtain u 2 ( x ) u_2(x) u2(x).
-
Adding a scaled versions:
f 1 ( x ) = a 1 , 1 u 1 ( x ) + a 1 , 2 u 2 ( x ) f 2 ( x ) = a 2 , 1 u 1 ( x ) + a 2 , 2 u 2 ( x ) \begin{array}{l}{f_{1}(\mathbf{x})=a_{1,1} u_{1}(\mathbf{x})+a_{1,2} u_{2}(\mathbf{x})} \\ {f_{2}(\mathbf{x})=a_{2,1} u_{1}(\mathbf{x})+a_{2,2} u_{2}(\mathbf{x})}\end{array} f1(x)=a1,1u1(x)+a1,2u2(x)f2(x)=a2,1u1(x)+a2,2u2(x)
Similar, it can be written as:
f
(
x
)
=
a
1
u
1
(
x
)
+
a
2
u
2
(
x
)
\mathbf{f}(\mathbf{x})=\mathbf{a}_{1} u_{1}(\mathbf{x})+\mathbf{a}_{2} u_{2}(\mathbf{x})
f(x)=a1u1(x)+a2u2(x)
with
a
1
=
[
a
1
,
1
a
2
,
1
]
⊤
and
a
2
=
[
a
1
,
2
a
2
,
2
]
⊤
\mathbf{a}_{1}=\left[a_{1,1} a_{2,1}\right]^{\top} \text { and } \mathbf{a}_{2}=\left[a_{1,2} a_{2,2}\right]^{\top}
a1=[a1,1a2,1]⊤ and a2=[a1,2a2,2]⊤
The covariance for
f
(
x
)
f(x)
f(x) is computed as:
cov
(
f
(
x
)
,
f
(
x
′
)
)
=
a
1
(
a
1
)
⊤
cov
(
u
1
(
x
)
,
u
1
(
x
′
)
)
+
a
2
(
a
2
)
⊤
cov
(
u
2
(
x
)
,
u
2
(
x
′
)
)
=
a
1
(
a
1
)
⊤
k
1
(
x
,
x
′
)
+
a
2
(
a
2
)
⊤
k
2
(
x
,
x
′
)
\begin{aligned} \operatorname{cov}\left(\mathbf{f}(\mathbf{x}), \mathbf{f}\left(\mathbf{x}^{\prime}\right)\right) &=\mathbf{a}_{1}\left(\mathbf{a}_{1}\right)^{\top} \operatorname{cov}\left(u_{1}(\mathbf{x}), u_{1}\left(\mathbf{x}^{\prime}\right)\right)+\mathbf{a}_{2}\left(\mathbf{a}_{2}\right)^{\top} \operatorname{cov}\left(u_{2}(\mathbf{x}), u_{2}\left(\mathbf{x}^{\prime}\right)\right) \\ &=\mathbf{a}_{1}\left(\mathbf{a}_{1}\right)^{\top} k_{1}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)+\mathbf{a}_{2}\left(\mathbf{a}_{2}\right)^{\top} k_{2}\left(\mathbf{x}, \mathbf{x}^{\prime}\right) \end{aligned}
cov(f(x),f(x′))=a1(a1)⊤cov(u1(x),u1(x′))+a2(a2)⊤cov(u2(x),u2(x′))=a1(a1)⊤k1(x,x′)+a2(a2)⊤k2(x,x′)
We define
B
1
=
a
1
(
a
1
)
⊤
and
B
2
=
a
2
(
a
2
)
⊤
\mathbf{B}_{1}=\mathbf{a}_{1}\left(\mathbf{a}_{1}\right)^{\top} \text { and } \mathbf{B}_{2}=\mathbf{a}_{2}\left(\mathbf{a}_{2}\right)^{\top}
B1=a1(a1)⊤ and B2=a2(a2)⊤, leading to:
cov
(
f
(
x
)
,
f
(
x
′
)
)
=
B
1
k
1
(
x
,
x
′
)
+
B
2
k
2
(
x
,
x
′
)
\operatorname{cov}\left(\mathbf{f}(\mathbf{x}), \mathbf{f}\left(\mathbf{x}^{\prime}\right)\right)=\mathbf{B}_{1} k_{1}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)+\mathbf{B}_{2} k_{2}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)
cov(f(x),f(x′))=B1k1(x,x′)+B2k2(x,x′)
Notice that $B_{1} $ and
B
2
B_{2}
B2 have rank one.
[
f
1
f
2
]
=
[
f
1
(
x
1
)
⋮
f
1
(
x
N
)
f
2
(
x
1
)
⋮
f
2
(
x
N
)
]
∼
N
(
[
0
0
]
,
B
1
⊗
K
1
+
B
2
⊗
K
2
)
\left[\begin{array}{c}{\mathbf{f}_{1}} \\ {\mathbf{f}_{2}}\end{array}\right]=\left[\begin{array}{c}{f_{1}\left(\mathbf{x}_{1}\right)} \\ {\vdots} \\ {f_{1}\left(\mathbf{x}_{N}\right)} \\ {f_{2}\left(\mathbf{x}_{1}\right)} \\ {\vdots} \\ {f_{2}\left(\mathbf{x}_{N}\right)}\end{array}\right] \sim \mathcal{N}\left(\left[\begin{array}{l}{\mathbf{0}} \\ {\mathbf{0}}\end{array}\right], \mathbf{B}_{1} \otimes \mathbf{K}_{1}+\mathbf{B}_{2} \otimes \mathbf{K}_{2}\right)
[f1f2]=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎡f1(x1)⋮f1(xN)f2(x1)⋮f2(xN)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎤∼N([00],B1⊗K1+B2⊗K2)
General Case:
Consider a set of functions { f d ( x ) } d = 1 D \left\{f_{d}(\mathbf{x})\right\}_{d=1}^{D} {fd(x)}d=1D
In the SLFM,
f
d
(
x
)
=
∑
q
=
1
Q
a
d
,
q
u
q
(
x
)
f_{d}(\mathbf{x})=\sum_{q=1}^{Q} a_{d, q} u_{q}(\mathbf{x})
fd(x)=q=1∑Qad,quq(x)
where the functions
u
q
(
x
)
u_{q}(x)
uq(x) are GPs with covariance functions
k
q
(
x
,
x
′
)
k_{q}(x,x^{'})
kq(x,x′).
For
f
(
x
)
=
[
f
1
(
x
)
⋯
f
D
(
x
)
]
⊤
\mathbf{f}(\mathbf{x})=\left[f_{1}(\mathbf{x}) \cdots f_{D}(\mathbf{x})\right]^{\top}
f(x)=[f1(x)⋯fD(x)]⊤, the covariance is given as:
cov
[
f
(
x
)
,
f
(
x
′
)
]
=
∑
q
=
1
Q
A
q
A
q
⊤
k
q
(
x
,
x
′
)
=
∑
q
=
1
Q
B
q
k
q
(
x
,
x
′
)
\operatorname{cov}\left[\mathbf{f}(\mathbf{x}), \mathbf{f}\left(\mathbf{x}^{\prime}\right)\right]=\sum_{q=1}^{Q} \mathbf{A}_{q} \mathbf{A}_{q}^{\top} k_{q}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)=\sum_{q=1}^{Q} \mathbf{B}_{q} k_{q}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)
cov[f(x),f(x′)]=q=1∑QAqAq⊤kq(x,x′)=q=1∑QBqkq(x,x′)
where
A
q
=
a
q
A_{q} = a_{q}
Aq=aq.
The rank of each B q B_{q} Bq is one.
Linear model of coregionalization (LMC)
The LMC generalizes the ICM and the SLFM allowing several independent samples from GPs with different covariances.
Consider a set of functions
{
f
d
(
x
)
}
d
=
1
D
\left\{f_{d}(\mathbf{x})\right\}_{d=1}^{D}
{fd(x)}d=1D
f
d
(
x
)
=
∑
q
=
1
Q
∑
i
=
1
R
q
a
d
,
q
i
u
q
i
(
x
)
f_{d}(\mathbf{x})=\sum_{q=1}^{Q} \sum_{i=1}^{R_{q}} a_{d, q}^{i} u_{q}^{i}(\mathbf{x})
fd(x)=q=1∑Qi=1∑Rqad,qiuqi(x)
where the functions
u
q
i
u_{q}^{i}
uqi are GPs with zero means and covariance functions:
cov
[
u
q
i
(
x
)
,
u
q
′
i
′
(
x
′
)
]
=
k
q
(
x
,
x
′
)
\operatorname{cov}\left[u_{q}^{i}(\mathbf{x}), u_{q^{\prime}}^{i^{\prime}}\left(\mathbf{x}^{\prime}\right)\right]=k_{q}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)
cov[uqi(x),uq′i′(x′)]=kq(x,x′)
if
i
=
i
′
i = i^{'}
i=i′ and
q
=
q
′
q = q^{'}
q=q′
There are Q Q Q groups of samples. For each group, there are R q R_{q} Rq samples obtained independently from the same GP with covariance k q ( x , x ′ ) k_q(x,x^{'}) kq(x,x′).
The LMC corresponds to the sum of Q ICMs.
Suppose we have D = 2, Q = 2, and
R
q
R_q
Rq=2. According to LMC:
f
1
(
x
)
=
a
1
,
1
1
u
1
1
(
x
)
+
a
1
,
1
2
u
1
2
(
x
)
+
a
1
,
2
1
u
2
1
(
x
)
+
a
1
,
2
2
u
2
2
(
x
)
f
2
(
x
)
=
a
2
,
1
1
u
1
1
(
x
)
+
a
2
,
1
2
u
1
2
(
x
)
+
a
2
,
2
1
u
2
1
(
x
)
+
a
2
,
2
2
u
2
2
(
x
)
\begin{array}{l}{f_{1}(\mathbf{x})=a_{1,1}^{1} u_{1}^{1}(\mathbf{x})+a_{1,1}^{2} u_{1}^{2}(\mathbf{x})+a_{1,2}^{1} u_{2}^{1}(\mathbf{x})+a_{1,2}^{2} u_{2}^{2}(\mathbf{x})} \\ {f_{2}(\mathbf{x})=a_{2,1}^{1} u_{1}^{1}(\mathbf{x})+a_{2,1}^{2} u_{1}^{2}(\mathbf{x})+a_{2,2}^{1} u_{2}^{1}(\mathbf{x})+a_{2,2}^{2} u_{2}^{2}(\mathbf{x})}\end{array}
f1(x)=a1,11u11(x)+a1,12u12(x)+a1,21u21(x)+a1,22u22(x)f2(x)=a2,11u11(x)+a2,12u12(x)+a2,21u21(x)+a2,22u22(x)
For
f
(
x
)
=
[
f
1
(
x
)
⋯
f
D
(
x
)
]
⊤
\mathbf{f}(\mathbf{x})=\left[f_{1}(\mathbf{x}) \cdots f_{D}(\mathbf{x})\right]^{\top}
f(x)=[f1(x)⋯fD(x)]⊤, the covariance
cov
[
f
(
x
)
,
f
(
x
′
)
]
\operatorname{cov}\left[\mathbf{f}(\mathbf{x}), \mathbf{f}\left(\mathbf{x}^{\prime}\right)\right]
cov[f(x),f(x′)] is given as:
cov
[
f
(
x
)
,
f
(
x
′
)
]
=
∑
q
=
1
Q
A
q
A
q
⊤
k
q
(
x
,
x
′
)
=
∑
q
=
1
Q
B
q
k
q
(
x
,
x
′
)
\operatorname{cov}\left[\mathbf{f}(\mathbf{x}), \mathbf{f}\left(\mathbf{x}^{\prime}\right)\right]=\sum_{q=1}^{Q} \mathbf{A}_{q} \mathbf{A}_{q}^{\top} k_{q}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)=\sum_{q=1}^{Q} \mathbf{B}_{q} k_{q}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)
cov[f(x),f(x′)]=q=1∑QAqAq⊤kq(x,x′)=q=1∑QBqkq(x,x′)
where
A
q
=
[
a
q
1
a
q
2
⋯
a
q
R
q
]
\mathbf{A}_{q}=\left[\mathbf{a}_{q}^{1} \mathbf{a}_{q}^{2} \cdots \mathbf{a}_{q}^{R_{q}}\right]
Aq=[aq1aq2⋯aqRq].
The rank of each B q B_{q} Bq is R q R_{q} Rq.
The matrices B q B_{q} Bq are known as the coregionalization matrices.
[
f
1
f
2
]
=
[
f
1
(
x
1
)
⋮
f
1
(
x
N
)
f
2
(
x
1
)
⋮
f
2
(
x
N
)
]
∼
N
(
[
0
0
]
,
∑
q
=
1
Q
B
q
⊗
K
q
)
\left[\begin{array}{c}{\mathbf{f}_{1}} \\ {\mathbf{f}_{2}}\end{array}\right]=\left[\begin{array}{c}{f_{1}\left(\mathbf{x}_{1}\right)} \\ {\vdots} \\ {f_{1}\left(\mathbf{x}_{N}\right)} \\ {f_{2}\left(\mathbf{x}_{1}\right)} \\ {\vdots} \\ {f_{2}\left(\mathbf{x}_{N}\right)}\end{array}\right] \sim \mathcal{N}\left(\left[\begin{array}{l}{\mathbf{0}} \\ {\mathbf{0}}\end{array}\right], \sum_{q=1}^{Q} \mathbf{B}_{q} \otimes \mathbf{K}_{q}\right)
[f1f2]=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎡f1(x1)⋮f1(xN)f2(x1)⋮f2(xN)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎤∼N([00],q=1∑QBq⊗Kq)