GPS方法中监督相推导

GPS方法中监督相推导

GPS方法中监督相优化问题
π θ ← arg ⁡ min ⁡ θ ∑ t , i , j D K L ( π θ ( u t ∣ x t , i , j ) ∥ p i ( u t ∣ x t , i , j ) ) \pi_{\theta} \leftarrow \arg \min _{\theta} \sum_{t, i, j} D_{\mathrm{KL}}\left(\pi_{\theta}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t, i, j}\right) \| p_{i}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t, i, j}\right)\right) πθargθmint,i,jDKL(πθ(utxt,i,j)pi(utxt,i,j))
其中 π θ ( u t ∣ x t ) = N ( μ π ( x t ) , Σ π ( x t ) ) \pi_{\theta}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t}\right)=\mathcal{N}\left(\mu^{\pi}\left(\mathbf{x}_{t}\right), \Sigma^{\pi}\left(\mathbf{x}_{t}\right)\right) πθ(utxt)=N(μπ(xt),Σπ(xt)) p i ( u t ∣ x t ) = N ( K t i x t + k t i , C t i ) p_{i}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t}\right)=\mathcal{N}\left(\mathbf{K}_{t i} \mathbf{x}_{t}+\mathbf{k}_{t i}, \mathbf{C}_{t i}\right) pi(utxt)=N(Ktixt+kti,Cti) i i i为condition的数量, j j j为采样数量。

展开 p i ( u t ∣ x t ) p_{i}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t}\right) pi(utxt)可得:
p i ( u t ∣ x t ) = N ( K t i x t + k t i , C t i ) = 1 ( 2 π ) m ∣ C t i ∣ exp ⁡ ( − 1 2 ( u t − ( K t i x t + k t i ) ) T C t i − 1 ( u t − ( K t i x t + k t i ) ) ) \begin{aligned} p_{i}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t}\right)& =\mathcal{N}\left(\mathbf{K}_{t i} \mathbf{x}_{t}+\mathbf{k}_{t i}, \mathbf{C}_{t i}\right) \\ & = \frac{1}{\sqrt{(2\pi)^{m}}|\mathbf{C}_{ti}|}\exp(-\frac{1}{2}(\mathbf{u}_t-(\mathbf{K}_{t i} \mathbf{x}_{t}+\mathbf{k}_{t i}))^T\mathbf{C}_{t i}^{-1}(\mathbf{u}_t-(\mathbf{K}_{t i} \mathbf{x}_{t}+\mathbf{k}_{t i}))) \end{aligned} pi(utxt)=N(Ktixt+kti,Cti)=(2π)m Cti1exp(21(ut(Ktixt+kti))TCti1(ut(Ktixt+kti)))

接下来有:
D K L ( π θ ( u t ∣ x t , i , j ) ∥ p i ( u t ∣ x t , i , j ) ) = ∫ π θ ln ⁡ π θ p = − ∫ π θ ln ⁡ p − ( − ∫ π θ ln ⁡ π θ ) = − E π θ [ ln ⁡ p ] − H ( π θ ) = E π θ [ 1 2 ln ⁡ ( ( 2 π ) m ∣ C t i ∣ ) + 1 2 ( u t − ( K t i x t + k t i ) ) T C t i − 1 ( u t − ( K t i x t + k t i ) ) ] − H ( π θ ) \begin{aligned} &D_{\mathrm{KL}}\left(\pi_{\theta}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t, i, j}\right) \| p_{i}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t, i, j}\right)\right) \\ &= \int\pi_{\theta}\ln\frac{\pi_\theta}{p} \\ & = -\int\pi_\theta\ln p - (-\int \pi_\theta\ln\pi_\theta) \\ & = -\mathbb{E}_{\pi_\theta}\left[\ln p\right] - \mathcal{H}(\pi_\theta) \\ & = \mathbb{E}_{\pi_\theta}\left[\frac{1}{2}\ln((2\pi)^m|\mathbf{C}_{ti}|)+\frac{1}{2}(\mathbf{u}_t-(\mathbf{K}_{t i} \mathbf{x}_{t}+\mathbf{k}_{t i}))^T\mathbf{C}_{t i}^{-1}(\mathbf{u}_t-(\mathbf{K}_{t i} \mathbf{x}_{t}+\mathbf{k}_{t i}))\right] - \mathcal{H}(\pi_\theta) \\ \end{aligned} DKL(πθ(utxt,i,j)pi(utxt,i,j))=πθlnpπθ=πθlnp(πθlnπθ)=Eπθ[lnp]H(πθ)=Eπθ[21ln((2π)mCti)+21(ut(Ktixt+kti))TCti1(ut(Ktixt+kti))]H(πθ)

多变量高斯分布之间的KL散度(KL Divergence)知:
在这里插入图片描述

D K L ( π θ ( u t ∣ x t , i , j ) ∥ p i ( u t ∣ x t , i , j ) ) = 1 2 ln ⁡ ( ( 2 π ) m ∣ C t i ∣ ) + 1 2 ( tr ( C − 1 Σ π ( x t , i , j ) ) + ( μ π ( x t , i , j ) − μ t i p ( x t , i , j ) ) T C t i − 1 ( μ π ( x t , i , j ) − μ t i p ( x t , i , j ) ) − 1 2 ln ⁡ ∣ Σ π ( x t , i , j ) ∣ − const \begin{aligned} &D_{\mathrm{KL}}\left(\pi_{\theta}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t, i, j}\right) \| p_{i}\left(\mathbf{u}_{t} \mid \mathbf{x}_{t, i, j}\right)\right) \\ &= \frac{1}{2}\ln((2\pi)^m|\mathbf{C}_{ti}|) + \frac{1}{2}(\text{tr}(\mathbf{C}^{-1}\Sigma^{\pi}(\mathbf{x}_{t,i,j})) +(\mu^{\pi}(\mathbf{x}_{t,i,j})-\mu^p_{ti}(\mathbf{x}_{t,i,j}))^T\mathbf{C}_{t i}^{-1}(\mu^{\pi}(\mathbf{x}_{t,i,j})-\mu^p_{ti}(\mathbf{x}_{t,i,j})) - \frac{1}{2}\ln|\Sigma^{\pi}(\mathbf{x}_{t,i,j})| - \text{const} \end{aligned} DKL(πθ(utxt,i,j)pi(utxt,i,j))=21ln((2π)mCti)+21(tr(C1Σπ(xt,i,j))+(μπ(xt,i,j)μtip(xt,i,j))TCti1(μπ(xt,i,j)μtip(xt,i,j))21lnΣπ(xt,i,j)const

所以:
π θ ← arg ⁡ min ⁡ θ ∑ t , i , j ( tr ( C − 1 Σ π ( x t , i , j ) ) + ( μ π ( x t , i , j ) − μ t i p ( x t , i , j ) ) T C t i − 1 ( μ π ( x t , i , j ) − μ t i p ( x t , i , j ) ) − ln ⁡ ∣ Σ π ( x t , i , j ) ∣ \begin{aligned} \pi_{\theta} \leftarrow \arg \min _{\theta} \sum_{t, i, j} (\text{tr}(\mathbf{C}^{-1}\Sigma^{\pi}(\mathbf{x}_{t,i,j})) +(\mu^{\pi}(\mathbf{x}_{t,i,j})-\mu^p_{ti}(\mathbf{x}_{t,i,j}))^T\mathbf{C}_{t i}^{-1}(\mu^{\pi}(\mathbf{x}_{t,i,j})-\mu^p_{ti}(\mathbf{x}_{t,i,j})) - \ln|\Sigma^{\pi}(\mathbf{x}_{t,i,j})| \end{aligned} πθargθmint,i,j(tr(C1Σπ(xt,i,j))+(μπ(xt,i,j)μtip(xt,i,j))TCti1(μπ(xt,i,j)μtip(xt,i,j))lnΣπ(xt,i,j)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值