GPS方法中监督相推导
GPS方法中监督相优化问题
π
θ
←
arg
min
θ
∑
t
,
i
,
j
D
K
L
(
π
θ
(
u
t
∣
x
t
,
i
,
j
)
∥
p
i
(
u
t
∣
x
t
,
i
,
j
)
)
\pi_{\theta} \leftarrow \arg \min _{\theta} \sum_{t, i, j} D_{\mathrm{KL}}\left(\pi_{\theta}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t, i, j}\right) \| p_{i}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t, i, j}\right)\right)
πθ←argθmint,i,j∑DKL(πθ(ut∣xt,i,j)∥pi(ut∣xt,i,j))
其中
π
θ
(
u
t
∣
x
t
)
=
N
(
μ
π
(
x
t
)
,
Σ
π
(
x
t
)
)
\pi_{\theta}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t}\right)=\mathcal{N}\left(\mu^{\pi}\left(\mathbf{x}_{t}\right), \Sigma^{\pi}\left(\mathbf{x}_{t}\right)\right)
πθ(ut∣xt)=N(μπ(xt),Σπ(xt)),
p
i
(
u
t
∣
x
t
)
=
N
(
K
t
i
x
t
+
k
t
i
,
C
t
i
)
p_{i}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t}\right)=\mathcal{N}\left(\mathbf{K}_{t i} \mathbf{x}_{t}+\mathbf{k}_{t i}, \mathbf{C}_{t i}\right)
pi(ut∣xt)=N(Ktixt+kti,Cti),
i
i
i为condition的数量,
j
j
j为采样数量。
展开
p
i
(
u
t
∣
x
t
)
p_{i}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t}\right)
pi(ut∣xt)可得:
p
i
(
u
t
∣
x
t
)
=
N
(
K
t
i
x
t
+
k
t
i
,
C
t
i
)
=
1
(
2
π
)
m
∣
C
t
i
∣
exp
(
−
1
2
(
u
t
−
(
K
t
i
x
t
+
k
t
i
)
)
T
C
t
i
−
1
(
u
t
−
(
K
t
i
x
t
+
k
t
i
)
)
)
\begin{aligned} p_{i}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t}\right)& =\mathcal{N}\left(\mathbf{K}_{t i} \mathbf{x}_{t}+\mathbf{k}_{t i}, \mathbf{C}_{t i}\right) \\ & = \frac{1}{\sqrt{(2\pi)^{m}}|\mathbf{C}_{ti}|}\exp(-\frac{1}{2}(\mathbf{u}_t-(\mathbf{K}_{t i} \mathbf{x}_{t}+\mathbf{k}_{t i}))^T\mathbf{C}_{t i}^{-1}(\mathbf{u}_t-(\mathbf{K}_{t i} \mathbf{x}_{t}+\mathbf{k}_{t i}))) \end{aligned}
pi(ut∣xt)=N(Ktixt+kti,Cti)=(2π)m∣Cti∣1exp(−21(ut−(Ktixt+kti))TCti−1(ut−(Ktixt+kti)))
接下来有:
D
K
L
(
π
θ
(
u
t
∣
x
t
,
i
,
j
)
∥
p
i
(
u
t
∣
x
t
,
i
,
j
)
)
=
∫
π
θ
ln
π
θ
p
=
−
∫
π
θ
ln
p
−
(
−
∫
π
θ
ln
π
θ
)
=
−
E
π
θ
[
ln
p
]
−
H
(
π
θ
)
=
E
π
θ
[
1
2
ln
(
(
2
π
)
m
∣
C
t
i
∣
)
+
1
2
(
u
t
−
(
K
t
i
x
t
+
k
t
i
)
)
T
C
t
i
−
1
(
u
t
−
(
K
t
i
x
t
+
k
t
i
)
)
]
−
H
(
π
θ
)
\begin{aligned} &D_{\mathrm{KL}}\left(\pi_{\theta}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t, i, j}\right) \| p_{i}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t, i, j}\right)\right) \\ &= \int\pi_{\theta}\ln\frac{\pi_\theta}{p} \\ & = -\int\pi_\theta\ln p - (-\int \pi_\theta\ln\pi_\theta) \\ & = -\mathbb{E}_{\pi_\theta}\left[\ln p\right] - \mathcal{H}(\pi_\theta) \\ & = \mathbb{E}_{\pi_\theta}\left[\frac{1}{2}\ln((2\pi)^m|\mathbf{C}_{ti}|)+\frac{1}{2}(\mathbf{u}_t-(\mathbf{K}_{t i} \mathbf{x}_{t}+\mathbf{k}_{t i}))^T\mathbf{C}_{t i}^{-1}(\mathbf{u}_t-(\mathbf{K}_{t i} \mathbf{x}_{t}+\mathbf{k}_{t i}))\right] - \mathcal{H}(\pi_\theta) \\ \end{aligned}
DKL(πθ(ut∣xt,i,j)∥pi(ut∣xt,i,j))=∫πθlnpπθ=−∫πθlnp−(−∫πθlnπθ)=−Eπθ[lnp]−H(πθ)=Eπθ[21ln((2π)m∣Cti∣)+21(ut−(Ktixt+kti))TCti−1(ut−(Ktixt+kti))]−H(πθ)
由多变量高斯分布之间的KL散度(KL Divergence)知:
D K L ( π θ ( u t ∣ x t , i , j ) ∥ p i ( u t ∣ x t , i , j ) ) = 1 2 ln ( ( 2 π ) m ∣ C t i ∣ ) + 1 2 ( tr ( C − 1 Σ π ( x t , i , j ) ) + ( μ π ( x t , i , j ) − μ t i p ( x t , i , j ) ) T C t i − 1 ( μ π ( x t , i , j ) − μ t i p ( x t , i , j ) ) − 1 2 ln ∣ Σ π ( x t , i , j ) ∣ − const \begin{aligned} &D_{\mathrm{KL}}\left(\pi_{\theta}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t, i, j}\right) \| p_{i}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t, i, j}\right)\right) \\ &= \frac{1}{2}\ln((2\pi)^m|\mathbf{C}_{ti}|) + \frac{1}{2}(\text{tr}(\mathbf{C}^{-1}\Sigma^{\pi}(\mathbf{x}_{t,i,j})) +(\mu^{\pi}(\mathbf{x}_{t,i,j})-\mu^p_{ti}(\mathbf{x}_{t,i,j}))^T\mathbf{C}_{t i}^{-1}(\mu^{\pi}(\mathbf{x}_{t,i,j})-\mu^p_{ti}(\mathbf{x}_{t,i,j})) - \frac{1}{2}\ln|\Sigma^{\pi}(\mathbf{x}_{t,i,j})| - \text{const} \end{aligned} DKL(πθ(ut∣xt,i,j)∥pi(ut∣xt,i,j))=21ln((2π)m∣Cti∣)+21(tr(C−1Σπ(xt,i,j))+(μπ(xt,i,j)−μtip(xt,i,j))TCti−1(μπ(xt,i,j)−μtip(xt,i,j))−21ln∣Σπ(xt,i,j)∣−const
所以:
π
θ
←
arg
min
θ
∑
t
,
i
,
j
(
tr
(
C
−
1
Σ
π
(
x
t
,
i
,
j
)
)
+
(
μ
π
(
x
t
,
i
,
j
)
−
μ
t
i
p
(
x
t
,
i
,
j
)
)
T
C
t
i
−
1
(
μ
π
(
x
t
,
i
,
j
)
−
μ
t
i
p
(
x
t
,
i
,
j
)
)
−
ln
∣
Σ
π
(
x
t
,
i
,
j
)
∣
\begin{aligned} \pi_{\theta} \leftarrow \arg \min _{\theta} \sum_{t, i, j} (\text{tr}(\mathbf{C}^{-1}\Sigma^{\pi}(\mathbf{x}_{t,i,j})) +(\mu^{\pi}(\mathbf{x}_{t,i,j})-\mu^p_{ti}(\mathbf{x}_{t,i,j}))^T\mathbf{C}_{t i}^{-1}(\mu^{\pi}(\mathbf{x}_{t,i,j})-\mu^p_{ti}(\mathbf{x}_{t,i,j})) - \ln|\Sigma^{\pi}(\mathbf{x}_{t,i,j})| \end{aligned}
πθ←argθmint,i,j∑(tr(C−1Σπ(xt,i,j))+(μπ(xt,i,j)−μtip(xt,i,j))TCti−1(μπ(xt,i,j)−μtip(xt,i,j))−ln∣Σπ(xt,i,j)∣