Basic Ideas
Why use S§DE solvers for GPs?
- The O ( n 3 ) O(n^3) O(n3) computational complexity is a challenge.
- What do we get:
- O ( n ) O(n) O(n) state-space methods for SDEs/SPDEs.
- Sparse approximations developed for SPDEs.
- Reduced rank Fourier/basis function approximations. Path to non-Gaussian processes.
- Downsides:
- We often need to approximate.
- Mathematics can become messy
Stochastic differential equations and Gaussian processes
Ornstein-Uhlenbeck process
The mean and covariance functions:
m
(
x
)
=
0
k
(
x
,
x
′
)
=
σ
2
exp
(
−
λ
∣
x
−
x
′
∣
)
\begin{aligned} m(x) &=0 \\ k\left(x, x^{\prime}\right) &=\sigma^{2} \exp \left(-\lambda\left|x-x^{\prime}\right|\right) \end{aligned}
m(x)k(x,x′)=0=σ2exp(−λ∣x−x′∣)
This has a path representation as a stochastic differential equation (SDE):
d
f
(
t
)
d
t
=
−
λ
f
(
t
)
+
w
(
t
)
\frac{d f(t)}{d t}=-\lambda f(t)+w(t)
dtdf(t)=−λf(t)+w(t)
where
w
(
t
)
w(t)
w(t) is a white noise process with
x
x
x relabeled as
t
t
t.
Prove:
F
T
:
(
i
ω
)
f
^
=
−
λ
f
^
+
ω
^
f
^
=
ω
^
λ
+
(
i
ω
)
S
p
e
c
t
r
a
l
D
e
n
s
i
t
y
:
δ
(
ω
)
=
E
[
∣
w
^
∣
2
]
w
2
+
λ
2
=
q
w
2
+
λ
2
I
F
:
h
(
τ
)
=
1
2
π
∫
q
w
2
+
λ
2
exp
(
i
w
τ
)
d
τ
\begin{aligned} FT: (i \omega) \hat{f} &= -\lambda \hat{f} + \hat{\omega} \\ \hat{f} &= \frac{\hat{\omega}}{\lambda +(i \omega) } \\ Spectral Density: \delta(\omega) &= \frac{{E}[|\hat{w}|^{2}]}{w^2+\lambda^2} = \frac{q}{w^2+\lambda^2}\\ IF:h(\tau) &= \frac{1}{2 \pi} \int \frac{q}{w^2+\lambda^2} \exp(iw\tau) d\tau\\ \end{aligned}
FT:(iω)f^f^SpectralDensity:δ(ω)IF:h(τ)=−λf^+ω^=λ+(iω)ω^=w2+λ2E[∣w^∣2]=w2+λ2q=2π1∫w2+λ2qexp(iwτ)dτ
Consider a Gaussian process regression problem:
f
(
x
)
∼
G
P
(
0
,
σ
2
exp
(
−
λ
∣
x
−
x
′
∣
)
)
y
k
=
f
(
x
k
)
+
ε
k
\begin{aligned} f(x) & \sim \mathrm{GP}\left(0, \sigma^{2} \exp \left(-\lambda\left|x-x^{\prime}\right|\right)\right) \\ y_{k} &=f\left(x_{k}\right)+\varepsilon_{k} \end{aligned}
f(x)yk∼GP(0,σ2exp(−λ∣x−x′∣))=f(xk)+εk
this is equivalent to the state-space model:
d
f
(
t
)
d
t
=
−
λ
f
(
t
)
+
w
(
t
)
y
k
=
f
(
t
k
)
+
ε
k
\begin{aligned} \frac{d f(t)}{d t} &=-\lambda f(t)+w(t) \\ y_{k} &=f\left(t_{k}\right)+\varepsilon_{k} \end{aligned}
dtdf(t)yk=−λf(t)+w(t)=f(tk)+εk
that is, with
f
k
=
f
(
t
k
)
fk = f(t_k)
fk=f(tk) we have a Gauss-Markov model
f
k
+
1
∼
p
(
f
k
+
1
∣
f
k
)
y
k
∼
p
(
y
k
∣
f
k
)
\begin{aligned} f_{k+1} & \sim p\left(f_{k+1} | f_{k}\right) \\ y_{k} & \sim p\left(y_{k} | f_{k}\right) \end{aligned}
fk+1yk∼p(fk+1∣fk)∼p(yk∣fk)
Solvable in
O
(
n
)
O(n)
O(n) time using Kalman filter/smoother