正态(高斯)分布

一维正态分布

f ( x ) = 1 σ 2 π exp ⁡ ( − ( x − μ ) 2 2 σ 2 ) \begin{aligned} f(x)=\dfrac{1}{\sigma\sqrt{2\pi}}\exp\left({-\dfrac{(x-\mu)^2}{2\sigma^2}}\right) \end{aligned} f(x)=σ2π 1exp(2σ2(xμ)2)则称 X ∼ N ( μ , σ 2 ) X\sim N(\mu,\sigma^2) XN(μ,σ2)

∫ − ∞ + ∞ f ( x ) d x = ∫ − ∞ + ∞ 1 σ 2 π exp ⁡ ( − ( x − μ ) 2 2 σ 2 ) d x = t = x − μ σ 1 2 π ∫ − ∞ + ∞ exp ⁡ ( − t 2 2 ) d t = 1 \begin{aligned} \begin{aligned} \int_{-\infty}^{+\infty}f(x){\rm d}x &= \int_{-\infty}^{+\infty}\dfrac{1}{\sigma\sqrt{2\pi}}\exp\left({-\dfrac{(x-\mu)^2}{2\sigma^2}}\right){\rm d}x\\ &\xlongequal[]{t=\dfrac{x-\mu}{\sigma}} \dfrac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}\exp\left({-\dfrac{t^2}{2}}\right){\rm d}t=1 \end{aligned} \end{aligned} +f(x)dx=+σ2π 1exp(2σ2(xμ)2)dxt=σxμ 2π 1+exp(2t2)dt=1
( ∫ − ∞ + ∞ exp ⁡ ( − t 2 2 ) d t ) 2 = ∫ − ∞ + ∞ ∫ − ∞ + ∞ exp ⁡ ( − t 2 + u 2 2 ) d t d u = t = ρ c o s θ , u = ρ s i n θ ∫ 0 2 π ( ∫ 0 + ∞ exp ⁡ ( − ρ 2 2 ) ρ d ρ ) d θ = ∫ 0 2 π d θ = 2 π \begin{aligned} \begin{aligned} \left(\int_{-\infty}^{+\infty}\exp\left({-\dfrac{t^2}{2}}\right){\rm d}t\right)^2 &= \int_{-\infty}^{+\infty}\int_{-\infty}^{+\infty}\exp\left({-\dfrac{t^2+u^2}{2}}\right){\rm d}t{\rm d}u\\ &\xlongequal{t=\rho cos\theta,u=\rho sin\theta}\int_{0}^{2\pi}\left(\int_{0}^{+\infty}\exp\left({-\dfrac{\rho^2}{2}}\right)\rho {\rm d}\rho\right){\rm d}\theta=\int_{0}^{2\pi}{\rm d}\theta=2\pi \end{aligned} \end{aligned} (+exp(2t2)dt)2=++exp(2t2+u2)dtdut=ρcosθ,u=ρsinθ 02π(0+exp(2ρ2)ρdρ)dθ=02πdθ=2π

标准正态分布

φ ( x ) = 1 2 π exp ⁡ ( − x 2 2 ) Φ ( x ) = 1 2 π ∫ − ∞ + ∞ exp ⁡ ( − x 2 2 ) d x \begin{aligned} \varphi(x)&=\dfrac{1}{\sqrt{2\pi}}\exp\left({-\dfrac{x^2}{2}}\right)\\ \Phi(x)&=\dfrac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}\exp\left({-\dfrac{x^2}{2}}\right)dx \end{aligned} φ(x)Φ(x)=2π 1exp(2x2)=2π 1+exp(2x2)dx

性质:

  • φ ( − x ) = φ ( x ) \varphi(-x)=\varphi(x) φ(x)=φ(x)
  • Φ ( − x ) = 1 − Φ ( x ) \Phi(-x)=1-\Phi(x) Φ(x)=1Φ(x)
    pf:
    Φ ( − x ) = ∫ − ∞ + ∞ φ ( t ) d t = t = − u ∫ x + ∞ φ ( u ) d u = ∫ − ∞ + ∞ φ ( u ) d u − ∫ − ∞ x φ ( u ) d u = 1 − Φ ( x ) \begin{aligned} \Phi(-x)=\int_{-\infty}^{+\infty}\varphi(t)dt\xlongequal{t=-u}\int_{x}^{+\infty}\varphi(u)du=\int_{-\infty}^{+\infty}\varphi(u)du-\int_{-\infty}^{x}\varphi(u)du=1-\Phi(x) \end{aligned} Φ(x)=+φ(t)dtt=u x+φ(u)du=+φ(u)duxφ(u)du=1Φ(x)

高维正态分布

先从各维度不相关的多元正态分布入手

d维数据 x = [ x 1 x 2 ⋯ x d ] T x=\begin{bmatrix}x_1&x_2&\cdots &x_d\end{bmatrix}^T x=[x1x2xd]T,各维度的均值为 μ 1 , μ 2 , ⋯   , μ d \mu_1,\mu_2,\cdots,\mu_d μ1,μ2,,μd,标准差为 σ 1 , σ 2 , ⋯   , σ d \sigma_1,\sigma_2,\cdots,\sigma_d σ1,σ2,,σd

高斯概率密度函数可以表示为
p ( x ) = p ( x 1 ) p ( x 2 ) ⋯ p ( x d ) = 1 ( 2 π ) d σ 1 σ 2 ⋯ σ d exp ⁡ ( − 1 2 [ ( x 1 − μ 1 σ 1 ) 2 + ( x 2 − μ 2 σ 2 ) 2 + ⋯ + ( x d − μ d σ d ) 2 ] ) d 2 ( x , μ ) = ( x 1 − μ 1 σ 1 ) 2 + ( x 2 − μ 2 σ 2 ) 2 + ⋯ + ( x d − μ d σ d ) 2 = ( X − μ ) T Σ − 1 ( X − μ ) = [ x 1 − μ 1 x 2 − μ 2 ⋯ x d − μ d ] [ 1 σ 1 2 ⋯ 0 0 1 σ 2 2 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ 1 σ d 2 ] [ x 1 − μ 1 x 2 − μ 2 ⋮ x d − μ d ] \begin{aligned} p({\boldsymbol{x}})&=p(x_1)p(x_2)\cdots p(x_d)=\dfrac{1}{(\sqrt{2\pi})^d\sigma_1\sigma_2\cdots\sigma_d}\exp\left({-\dfrac{1}{2}\left[(\dfrac{x_1-\mu_1}{\sigma_1})^2+(\dfrac{x_2-\mu_2}{\sigma_2})^2+\cdots+(\dfrac{x_d-\mu_d}{\sigma_d})^2\right]}\right)\\ d^2(x,\mu)&=(\dfrac{x_1-\mu_1}{\sigma_1})^2+(\dfrac{x_2-\mu_2}{\sigma_2})^2+\cdots+(\dfrac{x_d-\mu_d}{\sigma_d})^2=(X-\mu)^T\boldsymbol{\Sigma}^{-1}(X-\mu)\\ &=\begin{bmatrix}x_1-\mu_1&x_2-\mu_2&\cdots &x_d-\mu_d\end{bmatrix}\begin{bmatrix}\dfrac{1}{\sigma_1^2}&&\cdots &0\\0&\dfrac{1}{\sigma_2^2}&\cdots &0\\\vdots&\vdots&\ddots &\vdots\\0&0&\cdots &\dfrac{1}{\sigma_d^2}\end{bmatrix}\begin{bmatrix}x_1-\mu_1\\x_2-\mu_2\\\vdots \\x_d-\mu_d\end{bmatrix} \end{aligned} p(x)d2(x,μ)=p(x1)p(x2)p(xd)=(2π )dσ1σ2σd1exp(21[(σ1x1μ1)2+(σ2x2μ2)2++(σdxdμd)2])=(σ1x1μ1)2+(σ2x2μ2)2++(σdxdμd)2=(Xμ)TΣ1(Xμ)=[x1μ1x2μ2xdμd] σ12100σ221000σd21 x1μ1x2μ2xdμd
得到高维正态分布:
p ( x ) = 1 ( 2 π ) d σ 1 σ 2 ⋯ σ d exp ⁡ ( − 1 2 [ d 2 ( x , μ ) ] ) = 1 ( 2 π ) d 2 ∣ Σ ∣ 1 2 exp ⁡ ( − 1 2 ( X − μ ) T Σ − 1 ( X − μ ) ) \begin{aligned} p({\boldsymbol{x}})=\dfrac{1}{(\sqrt{2\pi})^d\sigma_1\sigma_2\cdots\sigma_d}\exp\left({-\dfrac{1}{2}[d^2(x,\mu)]}\right)=\dfrac{1}{(2\pi)^{\frac{d}{2}}|\boldsymbol{\Sigma}|^{\frac{1}{2}}}\exp\left({-\dfrac{1}{2}(X-\mu)^T\boldsymbol{\Sigma}^{-1}(X-\mu)}\right) \end{aligned} p(x)=(2π )dσ1σ2σd1exp(21[d2(x,μ)])=(2π)2dΣ211exp(21(Xμ)TΣ1(Xμ))

高斯分布的归一化积

p 1 ( x ) = N ( x ∣ μ 1 , σ 1 ) , p 2 ( x ) = N ( x ∣ μ 2 , σ 2 ) p_1(x)=\mathcal{N}(x|\mu_1,\sigma_1),p_2(x)=\mathcal{N}(x|\mu_2,\sigma_2) p1(x)=N(xμ1,σ1),p2(x)=N(xμ2,σ2)均是关于变量 x x x的分布
p 1 ( x ) p 2 ( x ) ∼ exp ⁡ ( − 1 2 σ 1 2 ( x − μ 1 ) 2 ) exp ⁡ ( − 1 2 σ 2 2 ( x − μ 2 ) 2 ) = exp ⁡ ( − 1 2 ( σ 1 2 + σ 2 2 ) x 2 − 2 ( μ 1 σ 2 2 + μ 2 σ 1 2 ) x + c o n s t a n t σ 1 2 σ 2 2 ) ∼ exp ⁡ ( − 1 2 σ 1 2 + σ 2 2 σ 1 2 σ 2 2 ( x − μ 1 σ 2 2 + μ 2 σ 1 2 σ 1 2 + σ 2 2 ) 2 ) \begin{aligned} p_1(x)p_2(x)&\sim \exp\left({-\dfrac{1}{2\sigma_1^2}(x-\mu_1)^2}\right)\exp\left({-\dfrac{1}{2\sigma_2^2}(x-\mu_2)^2}\right)\\&=\exp\left({-\dfrac{1}{2}\dfrac{(\sigma_1^2+\sigma_2^2)x^2-2(\mu_1\sigma_2^2+\mu_2\sigma^2_1)x+\rm{constant}}{\sigma_1^2\sigma_2^2}}\right)\\ &\sim\exp\left(-\dfrac{1}{2}\dfrac{\sigma_1^2+\sigma_2^2}{\sigma_1^2\sigma_2^2}\left(x-\dfrac{\mu_1\sigma_2^2+\mu_2\sigma_1^2}{\sigma_1^2+\sigma_2^2}\right)^2\right) \end{aligned} p1(x)p2(x)exp(2σ121(xμ1)2)exp(2σ221(xμ2)2)=exp(21σ12σ22(σ12+σ22)x22(μ1σ22+μ2σ12)x+constant)exp(21σ12σ22σ12+σ22(xσ12+σ22μ1σ22+μ2σ12)2)
得到两个高斯分布相乘仍为高斯分布
μ = μ 1 σ 2 2 + μ 2 σ 1 2 σ 1 2 + σ 2 2 σ = σ 1 2 σ 2 2 σ 1 2 + σ 2 2    ⟹    μ = ( μ 1 σ 1 2 + μ 2 σ 2 2 ) σ 2 1 σ 2 = 1 σ 1 2 + 1 σ 2 2 \begin{array}{l} \mu=\dfrac{\mu_1\sigma_2^2+\mu_2\sigma_1^2}{\sigma_1^2+\sigma_2^2}\\\\ \sigma=\sqrt{\dfrac{\sigma_1^2\sigma^2_2}{\sigma_1^2+\sigma_2^2}} \end{array}\implies \begin{array}{l} \mu=\left(\dfrac{\mu_1}{\sigma_1^2}+\dfrac{\mu_2}{\sigma_2^2}\right){\sigma^2}\\\\ \dfrac{1}{\sigma^2}=\dfrac{1}{\sigma_1^2}+\dfrac{1}{\sigma^2_2} \end{array} μ=σ12+σ22μ1σ22+μ2σ12σ=σ12+σ22σ12σ22 μ=(σ12μ1+σ22μ2)σ2σ21=σ121+σ221

高维高斯分布

p 1 ( x ) p 2 ( x ) ∼ exp ⁡ ( − 1 2 ( x − μ 1 ) T Σ 1 − 1 ( x − μ 1 ) ) exp ⁡ ( − 1 2 ( x − μ 2 ) T Σ 2 − 1 ( x − μ 2 ) ) = exp ⁡ ( − 1 2 [ x T ( Σ 1 − 1 + Σ 2 − 1 ) x − 2 ( μ 1 T Σ 1 − 1 + μ 2 T Σ 2 − 1 ) x + μ 1 T Σ 1 − 1 μ 1 + μ 2 T Σ 2 − 1 μ 2 ] ) = exp ⁡ ( − [ x − ( Σ 1 − 1 + Σ 2 − 1 ) − 1 ( Σ 1 − 1 μ 1 + Σ 2 − 1 μ 2 ) ] T ( Σ 1 − 1 + Σ 2 − 1 ) [ x − ( Σ 1 − 1 + Σ 2 − 1 ) − 1 ( Σ 1 − 1 μ 1 + Σ 2 − 1 μ 2 ) ] 2 + c o n s t a n t ) \begin{aligned} p_1(\boldsymbol{x})p_2(\boldsymbol{x})\sim& \exp\left({-\dfrac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu_1})^T\boldsymbol{\Sigma}_1^{-1}(\boldsymbol{x}-\boldsymbol{\mu_1})}\right)\exp\left({-\dfrac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu_2})^T\boldsymbol{\Sigma}_2^{-1}(\boldsymbol{x}-\boldsymbol{\mu_2})}\right) \\ =&\exp\left({-\dfrac{1}{2}\left[\boldsymbol{x}^T(\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\Sigma}_2^{-1})\boldsymbol{x}-2(\boldsymbol{\mu_1}^T\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\mu_2}^T\boldsymbol{\Sigma}_2^{-1})\boldsymbol{x}+\boldsymbol{\mu_1}^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu_1}+\boldsymbol{\mu_2}^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu_2}\right]}\right) \\ =&\exp\left({-\dfrac{\left[\boldsymbol{x}-\left(\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\Sigma}_2^{-1}\right)^{-1}\left(\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu_1}+\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu_2}\right)\right]^T\left(\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\Sigma}_2^{-1}\right)\left[\boldsymbol{x}-\left(\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\Sigma}_2^{-1}\right)^{-1}\left(\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu_1}+\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu_2}\right)\right]}{2}+\rm{constant}}\right) \end{aligned} p1(x)p2(x)==exp(21(xμ1)TΣ11(xμ1))exp(21(xμ2)TΣ21(xμ2))exp(21[xT(Σ11+Σ21)x2(μ1TΣ11+μ2TΣ21)x+μ1TΣ11μ1+μ2TΣ21μ2])exp 2[x(Σ11+Σ21)1(Σ11μ1+Σ21μ2)]T(Σ11+Σ21)[x(Σ11+Σ21)1(Σ11μ1+Σ21μ2)]+constant

得到两个高斯分布相乘仍为高斯分布
μ = ( Σ 1 − 1 μ 1 + Σ 2 − 1 μ 2 ) ( Σ 1 − 1 + Σ 2 − 1 ) − 1 Σ = ( Σ 1 − 1 + Σ 2 − 1 ) − 1 \begin{aligned} \boldsymbol{\mu}&=\left(\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu}_1+\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu}_2\right)\left(\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\Sigma}_2^{-1}\right)^{-1} \\ \boldsymbol{\Sigma}&=\left(\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\Sigma}_2^{-1}\right)^{-1} \end{aligned} μΣ=(Σ11μ1+Σ21μ2)(Σ11+Σ21)1=(Σ11+Σ21)1

Σ − 1 = ( Σ 1 − 1 + Σ 2 − 1 ) ( Σ 1 − 1 + Σ 2 − 1 ) μ = ( Σ 1 − 1 μ 1 + Σ 2 − 1 μ 2 ) \begin{aligned} \boldsymbol{\Sigma}^{-1}&=\left(\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\Sigma}_2^{-1}\right)\\ \left(\boldsymbol{\Sigma}_1^{-1}+\boldsymbol{\Sigma}_2^{-1}\right)\boldsymbol{\mu}&=\left(\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu}_1+\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu}_2\right) \end{aligned} Σ1(Σ11+Σ21)μ=(Σ11+Σ21)=(Σ11μ1+Σ21μ2)
进一步扩展到K个高斯分布的归一化积:
Σ − 1 = ∑ k = 1 K Σ k − 1 Σ − 1 μ = ∑ k = 1 K Σ k − 1 μ k \begin{aligned} \boldsymbol{\Sigma}^{-1} &= \sum_{k=1}^K \boldsymbol{\Sigma}_k^{-1} \\ \boldsymbol{\Sigma}^{-1}\boldsymbol{\mu} &= \sum_{k=1}^K \boldsymbol{\Sigma}_k^{-1}\boldsymbol{\mu}_k \end{aligned} Σ1Σ1μ=k=1KΣk1=k=1KΣk1μk

高斯分布随机变量线性变换的归一化积

exp ⁡ ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) ∼ exp ⁡ ( − 1 2 ( G 1 x − μ 1 ) T Σ 1 − 1 ( G 1 x − μ 1 ) ) exp ⁡ ( − 1 2 ( G 2 x − μ 2 ) T Σ 2 − 1 ( G 2 x − μ 2 ) ) = exp ⁡ ( − 1 2 [ x T ( G 1 T Σ 1 − 1 G 1 + G 2 T Σ 2 − 1 G 2 ) x − 2 ( μ 1 T Σ 1 − 1 G 1 + μ 2 T Σ 2 − 1 G 2 ) x + μ 1 T Σ 1 − 1 μ 1 + μ 2 T Σ 2 − 1 μ 2 ] ) = exp ⁡ ( − [ x − ( G 1 T Σ 1 − 1 G 1 + G 2 T Σ 2 − 1 G 2 ) − 1 ( G 1 T Σ 1 − 1 μ 1 + G 2 T Σ 2 − 1 μ 2 ) ] T ( G 1 T Σ 1 − 1 G 1 + G 2 T Σ 2 − 1 G 2 ) [ x − ( G 1 T Σ 1 − 1 G 1 + G 2 T Σ 2 − 1 G 2 ) − 1 ( G 1 T Σ 1 − 1 μ 1 + G 2 T Σ 2 − 1 μ 2 ) ] 2 + c o n s t a n t ) \begin{aligned} \exp\left({-\dfrac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu})^T\boldsymbol{\Sigma}^{-1}(\boldsymbol{x}-\boldsymbol{\mu})}\right)\sim& \exp\left({-\dfrac{1}{2}(\boldsymbol{G}_1\boldsymbol{x}-\boldsymbol{\mu_1})^T\boldsymbol{\Sigma}_1^{-1}(\boldsymbol{G}_1\boldsymbol{x}-\boldsymbol{\mu_1})}\right)\exp\left({-\dfrac{1}{2}(\boldsymbol{G}_2\boldsymbol{x}-\boldsymbol{\mu_2})^T\boldsymbol{\Sigma}_2^{-1}(\boldsymbol{G}_2\boldsymbol{x}-\boldsymbol{\mu_2})}\right) \\ &=\exp\left({-\dfrac{1}{2}\left[\boldsymbol{x}^T(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{G}_1+\boldsymbol{G}_2^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{G}_2)\boldsymbol{x}-2(\boldsymbol{\mu_1}^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{G}_1+\boldsymbol{\mu_2}^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{G}_2)\boldsymbol{x}+\boldsymbol{\mu_1}^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu_1}+\boldsymbol{\mu_2}^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu_2}\right]}\right) \\ &=\exp\left({-\dfrac{\left[\boldsymbol{x}-\left(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{G}_1+\boldsymbol{G}_2^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{G}_2\right)^{-1}\left(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu_1}+\boldsymbol{G}_2^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu_2}\right)\right]^T\left(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{G}_1+\boldsymbol{G}_2^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{G}_2\right)\left[\boldsymbol{x}-\left(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{G}_1+\boldsymbol{G}_2^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{G}_2\right)^{-1}\left(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu_1}+\boldsymbol{G}_2^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu_2}\right)\right]}{2}+\rm{constant}}\right) \end{aligned} exp(21(xμ)TΣ1(xμ))exp(21(G1xμ1)TΣ11(G1xμ1))exp(21(G2xμ2)TΣ21(G2xμ2))=exp(21[xT(G1TΣ11G1+G2TΣ21G2)x2(μ1TΣ11G1+μ2TΣ21G2)x+μ1TΣ11μ1+μ2TΣ21μ2])=exp 2[x(G1TΣ11G1+G2TΣ21G2)1(G1TΣ11μ1+G2TΣ21μ2)]T(G1TΣ11G1+G2TΣ21G2)[x(G1TΣ11G1+G2TΣ21G2)1(G1TΣ11μ1+G2TΣ21μ2)]+constant
可得
Σ − 1 = ( G 1 T Σ 1 − 1 G 1 + G 2 T Σ 2 − 1 G 2 ) ( G 1 T Σ 1 − 1 G 1 + G 2 T Σ 2 − 1 G 2 ) μ = Σ − 1 μ = ( G 1 T Σ 1 − 1 μ 1 + G 2 T Σ 2 − 1 μ 2 ) \begin{aligned} \boldsymbol{\Sigma}^{-1}&=\left(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{G}_1+\boldsymbol{G}^T_2\boldsymbol{\Sigma}_2^{-1}\boldsymbol{G}_2\right)\\ \left(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{G}_1+\boldsymbol{G}^T_2\boldsymbol{\Sigma}_2^{-1}\boldsymbol{G}_2\right)\boldsymbol{\mu}=\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}&=\left(\boldsymbol{G}_1^T\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\mu}_1+\boldsymbol{G}_2^T\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\mu}_2\right) \end{aligned} Σ1(G1TΣ11G1+G2TΣ21G2)μ=Σ1μ=(G1TΣ11G1+G2TΣ21G2)=(G1TΣ11μ1+G2TΣ21μ2)
进一步扩展到K个高斯分布的归一化积:
Σ − 1 = ∑ k = 1 K G k T Σ k − 1 G k Σ − 1 μ = ∑ k = 1 K G k T Σ k − 1 μ k \begin{aligned} \boldsymbol{\Sigma}^{-1} &= \sum_{k=1}^K \boldsymbol{G}_k^T\boldsymbol{\Sigma}_k^{-1}\boldsymbol{G}_k \\ \boldsymbol{\Sigma}^{-1}\boldsymbol{\mu} &= \sum_{k=1}^K \boldsymbol{G}_k^T\boldsymbol{\Sigma}_k^{-1}\boldsymbol{\mu}_k \end{aligned} Σ1Σ1μ=k=1KGkTΣk1Gk=k=1KGkTΣk1μk

联合高斯概率密度函数,分解与推断

设有一对服从多元正态分布的变量 ( x , y ) (\boldsymbol{x},\boldsymbol{y}) (x,y),联合概率为
p ( x , y ) = N ( [ μ x μ y ] , [ Σ x x Σ x y Σ y x Σ y y ] ) p\left( \boldsymbol{x},\boldsymbol{y}\right) =\mathcal{N}\left( \begin{bmatrix} \boldsymbol{\mu }_{x} \\ \boldsymbol{\mu }_{y} \end{bmatrix},\begin{bmatrix} \boldsymbol{\Sigma}_{xx} & \boldsymbol{\Sigma}_{xy} \\ \boldsymbol{\Sigma}_{yx} & \boldsymbol{\Sigma}_{yy} \end{bmatrix}\right) \\ p(x,y)=N([μxμy],[ΣxxΣyxΣxyΣyy])
注意 Σ y x = Σ x y T \boldsymbol{\Sigma}_{yx}=\boldsymbol{\Sigma}_{xy}^T Σyx=ΣxyT
我们总是可以将联合概率密度分解成两个因子的乘积(条件概率乘以边缘概率)
p ( x , y ) = p ( x ∣ y ) p ( y ) p\left( \boldsymbol{x}, \boldsymbol{y}\right) =p\left( \boldsymbol{x}\vert \boldsymbol{y}\right) p\left( \boldsymbol{y}\right) p(x,y)=p(xy)p(y)
对协方差矩阵进行相似对角化得:
[ I − Σ x y Σ y y − 1 0 I ] [ Σ x x Σ x y Σ y x Σ x y ] = [ Σ x x − Σ x y Σ y y − 1 Σ y x 0 Σ y x Σ y y ] [ I − Σ x y Σ y y − 1 0 1 ] [ Σ x x Σ x y Σ y x Σ y y ] [ I 0 − Σ y y − 1 Σ y x I ] = [ Σ x x − Σ x y Σ y y − 1 Σ y x 0 0 Σ y y ] [ Σ x x Σ x y Σ y x Σ y y ] = [ I Σ x y Σ y y − 1 0 I ] [ Σ x x − Σ x y Σ y y − 1 Σ y x 0 0 Σ y y ] [ I 0 Σ y y − 1 Σ y x I ] \begin{aligned} \begin{bmatrix} \boldsymbol{I} & -\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1} \\ \boldsymbol{0} & \boldsymbol{I} \end{bmatrix}\begin{bmatrix} \boldsymbol{\Sigma} _{xx} & \boldsymbol{\Sigma} _{xy} \\ \boldsymbol{\Sigma} _{yx} & \boldsymbol{\Sigma} _{xy} \end{bmatrix} &=\begin{bmatrix} \boldsymbol{\Sigma} _{xx}-\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\boldsymbol{\Sigma} _{yx} & \boldsymbol{0} \\ \boldsymbol{\Sigma} _{yx} & \boldsymbol{\Sigma}_{yy} \end{bmatrix}\\ \\ \begin{bmatrix} \boldsymbol{I} & -\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1} \\ \boldsymbol{0} & 1 \end{bmatrix} \begin{bmatrix} \boldsymbol{\Sigma} _{xx} & \boldsymbol{\Sigma} _{xy} \\ \boldsymbol{\Sigma} _{yx} & \boldsymbol{\Sigma} _{yy} \end{bmatrix} \begin{bmatrix} \boldsymbol{I} & \boldsymbol{0} \\ -\boldsymbol{\Sigma} _{yy}^{-1}\boldsymbol{\Sigma} _{yx} & \boldsymbol{I} \end{bmatrix} &=\begin{bmatrix} \boldsymbol{\Sigma} _{xx}-\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\boldsymbol{\Sigma} _{yx} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{\Sigma}_{yy} \end{bmatrix} \\\\ \begin{bmatrix} \boldsymbol{\Sigma} _{xx} & \boldsymbol{\Sigma}_{xy} \\ \boldsymbol{\Sigma} _{yx} & \boldsymbol{\Sigma} _{yy} \end{bmatrix} &=\begin{bmatrix} \boldsymbol{I} & \boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1} \\ \boldsymbol{0} & \boldsymbol{I} \end{bmatrix} \begin{bmatrix} \boldsymbol{\Sigma}_{xx}-\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\boldsymbol{\Sigma} _{yx} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{\Sigma}_{yy} \end{bmatrix} \begin{bmatrix} \boldsymbol{I} & \boldsymbol{0} \\ \boldsymbol{\Sigma} _{yy}^{-1}\boldsymbol{\Sigma} _{yx} & \boldsymbol{I} \end{bmatrix} \end{aligned} [I0ΣxyΣyy1I][ΣxxΣyxΣxyΣxy][I0ΣxyΣyy11][ΣxxΣyxΣxyΣyy][IΣyy1Σyx0I][ΣxxΣyxΣxyΣyy]=[ΣxxΣxyΣyy1ΣyxΣyx0Σyy]=[ΣxxΣxyΣyy1Σyx00Σyy]=[I0ΣxyΣyy1I][ΣxxΣxyΣyy1Σyx00Σyy][IΣyy1Σyx0I]

联合概率密度 p ( x , y ) p(\boldsymbol{x},\boldsymbol{y}) p(x,y)指数部分的二次项
( [ x y ] − [ μ x μ y ] ) T ( Σ x x Σ x y Σ y x Σ y y ) − 1 ( [ x y ] − [ μ x μ y ] ) = ( [ x y ] − [ μ x μ y ] ) T [ I 0 − Σ y y − 1 Σ y x I ] [ ( Σ x x − Σ x y Σ y y − 1 Σ y x ) − 1 0 0 Σ y y − 1 ] [ I − Σ x y Σ y y − 1 0 I ] ( [ x y ] − [ μ x μ y ] ) = [ ( x − μ x ) − Σ x y Σ y y − 1 ( y − μ y ) y − μ y ] T [ ( Σ x x − Σ x y Σ y y Σ y x ) − 1 0 0 Σ y y − 1 ] [ ( x − μ x ) − Σ x y Σ y y − 1 ( y − μ y ) y − μ y ] = [ x − μ x − Σ x y Σ y y − 1 ( y − μ y ) ] T ( Σ x x − Σ x y Σ y y − 1 Σ y x ) − 1 [ x − μ x − Σ x y Σ y y − 1 ( y − μ y ) ] + ( y − μ y ) T Σ y y − 1 ( y − μ y ) \left( \begin{bmatrix} \boldsymbol{x} \\ \boldsymbol{y} \end{bmatrix}-\begin{bmatrix} \boldsymbol{\mu} _{x} \\ \boldsymbol{\mu} _{y} \end{bmatrix}\right) ^{T}\begin{pmatrix} \boldsymbol{\Sigma}_{xx} & \boldsymbol{\Sigma} _{xy} \\ \boldsymbol{\Sigma} _{yx} & \boldsymbol{\Sigma}_{yy} \end{pmatrix}^{-1}\left( \begin{bmatrix} \boldsymbol{x} \\ \boldsymbol{y} \end{bmatrix}-\begin{bmatrix} \boldsymbol{\mu} _{x} \\ \boldsymbol{\mu} _{y} \end{bmatrix}\right) \\ =\left( \begin{bmatrix} \boldsymbol{x} \\ \boldsymbol{y} \end{bmatrix}-\begin{bmatrix} \boldsymbol{\mu} _{x} \\ \boldsymbol{\mu} _{y} \end{bmatrix}\right) ^{T}\begin{bmatrix} \boldsymbol{I} & \boldsymbol{0} \\ -\boldsymbol{\Sigma} _{yy}^{-1}\boldsymbol{\Sigma} _{yx} & \boldsymbol{I} \end{bmatrix} \begin{bmatrix} \left( \boldsymbol{\Sigma}_{xx}-\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\boldsymbol{\Sigma} _{yx}\right) ^{-1} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{\Sigma}_{yy}^{-1} \end{bmatrix} \begin{bmatrix} \boldsymbol{I} & -\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1} \\ \boldsymbol{0} & \boldsymbol{I} \end{bmatrix}\left( \begin{bmatrix} \boldsymbol{x} \\ \boldsymbol{y} \end{bmatrix}-\begin{bmatrix} \boldsymbol{\mu} _{x} \\ \boldsymbol{\mu} _{y} \end{bmatrix}\right) \\= \begin{bmatrix} \left( \boldsymbol{x}-\boldsymbol{\mu}_x\right) -\boldsymbol{\Sigma}_{xy}\boldsymbol{\Sigma}^{-1}_{yy}\left( y-\boldsymbol{\mu}_y\right) \\ \boldsymbol{y}-\boldsymbol{\mu}_y \end{bmatrix}^{T} \begin{bmatrix} \left( \boldsymbol{\Sigma} _{xx}-\boldsymbol{\Sigma}_{xy}\boldsymbol{\Sigma}_{yy}\boldsymbol{\Sigma}_{yx}\right) ^{-1} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{\Sigma} _{yy}^{-1} \end{bmatrix} \begin{bmatrix} \left( \boldsymbol{x}-\boldsymbol{\mu}_x\right) -\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\left( \boldsymbol{y}-\boldsymbol{\boldsymbol{\mu}} _{y}\right) \\ \boldsymbol{y}-\boldsymbol{\boldsymbol{\mu}} _{y} \end{bmatrix}\\ =\left[ \boldsymbol{x}-\boldsymbol{\mu} _{x}-\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\left( \boldsymbol{y}-\boldsymbol{\mu}_y\right) \right] ^{T} \left( \boldsymbol{\Sigma} _{xx}-\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\boldsymbol{\Sigma} _{yx}\right) ^{-1} \left[ \boldsymbol{x}-\boldsymbol{\mu} _{x}-\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\left( y-\boldsymbol{\mu} _{y}\right) \right] +\left( \boldsymbol{y}-\boldsymbol{\mu}_y\right) ^{T}\boldsymbol{\Sigma} _{yy}^{-1}\left( \boldsymbol{y}-\boldsymbol{\mu} _{y}\right) ([xy][μxμy])T(ΣxxΣyxΣxyΣyy)1([xy][μxμy])=([xy][μxμy])T[IΣyy1Σyx0I][(ΣxxΣxyΣyy1Σyx)100Σyy1][I0ΣxyΣyy1I]([xy][μxμy])=[(xμx)ΣxyΣyy1(yμy)yμy]T[(ΣxxΣxyΣyyΣyx)100Σyy1][(xμx)ΣxyΣyy1(yμy)yμy]=[xμxΣxyΣyy1(yμy)]T(ΣxxΣxyΣyy1Σyx)1[xμxΣxyΣyy1(yμy)]+(yμy)TΣyy1(yμy)
这是两个二次项的和。由于幂运算中同底数幂相乘,底数不变、指数相加的性质,可以得到:
p ( x , y ) = p ( x ∣ y ) p ( y ) p ( x ∣ y ) = N ( μ x + Σ x y Σ y y − 1 ( y − μ y ) , Σ x x − Σ x y Σ y y − 1 Σ y x ) p ( y ) = N ( μ y , Σ y y ) \begin{array}{l} p\left( \boldsymbol{x},\boldsymbol{y}\right) =p\left( \boldsymbol{x}\vert \boldsymbol{y}\right) p\left( \boldsymbol{y}\right) \\ p\left( \boldsymbol{x}\vert \boldsymbol{y}\right) = \mathcal{N}( \boldsymbol{\mu} _{x}+\boldsymbol{\Sigma} _{xy}\boldsymbol{\Sigma} _{yy}^{-1}\left( \boldsymbol{y}-\boldsymbol{\mu}_y\right) , \boldsymbol{\Sigma}_{xx}-\boldsymbol{\Sigma}_{xy}\boldsymbol{\Sigma}_{yy}^{-1}\boldsymbol{\Sigma}_{yx}) \\ p\left( \boldsymbol{y}\right) =\mathcal{N}\left( \boldsymbol{\mu}_y,\boldsymbol{\Sigma} _{yy}\right) \end{array} p(x,y)=p(xy)p(y)p(xy)=N(μx+ΣxyΣyy1(yμy),ΣxxΣxyΣyy1Σyx)p(y)=N(μy,Σyy)

高斯分布随机变量的非线性变换

研究高斯分布经过一个随机非线性变换之后的情况,即计算:
p ( y ) = ∫ − ∞ ∞ p ( y ∣ x ) p ( x ) d x \begin{aligned} p(\boldsymbol{y})=\int_{-\infty}^{\infty}p(\boldsymbol{y}|\boldsymbol{x})p(\boldsymbol{x}){\rm d}\boldsymbol{x} \end{aligned} p(y)=p(yx)p(x)dx
其中
p ( y ∣ x ) = N ( g ( x ) , R ) p ( x ) = N ( μ x , Σ x x ) \begin{aligned} p(\boldsymbol{y}|\boldsymbol{x}) &=\mathcal{N}(\boldsymbol{g}(\boldsymbol{x}),\boldsymbol{R})\\ p(\boldsymbol{x}) &=\mathcal{N}(\boldsymbol{\mu}_x,\boldsymbol{\Sigma}_{xx}) \end{aligned} p(yx)p(x)=N(g(x),R)=N(μx,Σxx)
这里 g ( ⋅ ) \boldsymbol{g}(\cdot) g()表示 g : x ↦ y \boldsymbol{g}:\boldsymbol{x}\mapsto\boldsymbol{y} g:xy,是一个非线性映射。它受零均值高斯噪声干扰,其协方差为 R \boldsymbol{R} R。后文我们需要用到这类随机非线性映射对传感器进行建模。
对非线性变换进行线性化后,得到
g ( x ) ≈ μ y + G ( x − μ x ) G = ∂ g ( x ) ∂ x ∣ x = μ x μ y = g ( μ x ) \begin{aligned} \boldsymbol{g}\left( \boldsymbol{x}\right) &\approx\boldsymbol{\mu} _{y}+\boldsymbol{G}\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right) \\ \boldsymbol{G}&=\dfrac{\partial \boldsymbol{g}\left( \boldsymbol{x}\right) }{\partial \boldsymbol{x}}{\huge\rvert}_{ \boldsymbol{x}=\boldsymbol{\mu} _{\boldsymbol{x}}}\\ \boldsymbol{\mu} _{y}&=\boldsymbol{g}\left( \boldsymbol{\mu} _{\boldsymbol{x}}\right) \end{aligned} g(x)Gμyμy+G(xμx)=xg(x)x=μx=g(μx)

p ( y ) = ∫ − ∞ + ∞ p ( y ∣ x ) p ( x ) d x = η ∫ − ∞ + ∞ exp ⁡ { − 1 2 [ y − ( μ y + G ( x − μ x ) ) ] T R − 1 [ y − ( μ y + G ( x − μ x ) ) ] } × exp ⁡ [ − 1 2 ( x − μ x ) T Σ x x − 1 ( x − μ x ) ] d x = η ∫ − ∞ + ∞ exp ⁡ { − 1 2 [ ( y − μ y ) T R − 1 ( y − μ y ) + ( x − μ x ) T G T R − 1 G ( x − μ x ) − 2 ( y − μ y ) T R − 1 G ( x − μ x ) + ( x − μ x ) T Σ x x − 1 ( x − μ x ) ] } d x = η exp ⁡ [ − 1 2 ( y − μ y ) T R − 1 ( y − μ y ) ] ∫ − ∞ + ∞ exp ⁡ { − 1 2 [ ( x − μ x ) T ( G T R − 1 G + Σ x x − 1 ) ( x − μ x ) − 2 ( y − μ y ) T R − 1 G ( x − μ x ) ] } d x \begin{aligned} \begin{aligned} p\left( \boldsymbol{\boldsymbol{y}}\right) &=\int_{-\infty }^{+\infty }p\left( \boldsymbol{\boldsymbol{y}}| \boldsymbol{\boldsymbol{x}}\right) p\left( \boldsymbol{\boldsymbol{x}}\right) {\rm d}\boldsymbol{\boldsymbol{x}}\\ &=\eta \int _{-\infty }^{+\infty }\exp \left\{ -\dfrac{1}{2}\left[ \boldsymbol{y}-\left( \boldsymbol{\mu}_y+\boldsymbol{G}\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right) \right) \right] ^{T}\boldsymbol{R}^{-1}\left[ \boldsymbol{y}-\left( \boldsymbol{\mu}_y+\boldsymbol{G}\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right) \right) \right] \right\} \\ &\times \exp \left[ -\dfrac{1}{2}\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right) ^{T}\boldsymbol{\Sigma}_{xx}^{-1}\left( \boldsymbol{x}-\boldsymbol{\mu}_x\right) \right] {\rm d}\boldsymbol{x} \\ &=\eta \int _{-\infty }^{+\infty }\exp \left\{ -\dfrac{1}{2}\left[ \left( \boldsymbol{y}-\boldsymbol{\mu} _y\right) ^{T}\boldsymbol{R}^{-1}\left( \boldsymbol{y}-\boldsymbol{\mu}_y\right) +\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right)^T \boldsymbol{G}^{T}\boldsymbol{R}^{-1}\boldsymbol{G}\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right) -2\left( \boldsymbol{y}-\boldsymbol{\mu}_y\right)^T \boldsymbol{R}^{-1}\boldsymbol{G}\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right) +\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right) ^{T}\Sigma_{xx}^{-1}\left( \boldsymbol{x}-\boldsymbol{\mu}_x\right) \right] \right\} {\rm d}\boldsymbol{x} \\ &=\eta \exp \left[ -\dfrac{1}{2}\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) ^{T}\boldsymbol{R}^{-1}\left( \boldsymbol{y}-\boldsymbol{\boldsymbol{\mu}} _{\boldsymbol{y}}\right) \right] \int _{-\infty }^{+\infty }\exp \left\{ -\dfrac{1}{2}\left[ \left( \boldsymbol{x}-\boldsymbol{\mu_x}\right)^T \left( \boldsymbol{G}^{T}\boldsymbol{R}^{-1}\boldsymbol{G}+\Sigma_{xx}^{-1}\right) \left( \boldsymbol{x}-\boldsymbol{\mu}_{\boldsymbol{x}}\right) -2\left( \boldsymbol{y}-\mu_y\right)^T \boldsymbol{R}^{-1}\boldsymbol{G}\left( \boldsymbol{x}-\boldsymbol{\mu} _{\boldsymbol{x}}\right) \right] \right\} {\rm d}\boldsymbol{x} \end{aligned} \end{aligned} p(y)=+p(yx)p(x)dx=η+exp{21[y(μy+G(xμx))]TR1[y(μy+G(xμx))]}×exp[21(xμx)TΣxx1(xμx)]dx=η+exp{21[(yμy)TR1(yμy)+(xμx)TGTR1G(xμx)2(yμy)TR1G(xμx)+(xμx)TΣxx1(xμx)]}dx=ηexp[21(yμy)TR1(yμy)]+exp{21[(xμx)T(GTR1G+Σxx1)(xμx)2(yμy)TR1G(xμx)]}dx
其中 η \eta η是归一化常量,对积分号里的指数部分进行配方得到如下二次项
[ x − μ x − F ( y − μ y ) ] T ( G T R − 1 G + Σ x x − 1 ) [ x − μ x − F ( y − μ y ) ] − ( y − μ y ) T F T ( G T R − 1 G + Σ x x − 1 ) F ( y − μ y ) \left[ \boldsymbol{x}-\boldsymbol{\mu_x}-\boldsymbol{F}\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) \right] ^{T}\left( \boldsymbol{G}^{T}\boldsymbol{R}^{-1}\boldsymbol{G}+\boldsymbol{\Sigma_{xx}}^{-1}\right) \left[ \boldsymbol{x}-\boldsymbol{\mu_x}-\boldsymbol{F}\left( \boldsymbol{y}-\boldsymbol{\mu} _y\right) \right] -\left( \boldsymbol{y}-\boldsymbol{\mu}_y\right) ^{T}\boldsymbol{F}^{T}\left( \boldsymbol{G}^{T}\boldsymbol{R}^{-1}\boldsymbol{G}+\boldsymbol{\Sigma_{xx}}^{-1}\right) \boldsymbol{F}\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) [xμxF(yμy)]T(GTR1G+Σxx1)[xμxF(yμy)](yμy)TFT(GTR1G+Σxx1)F(yμy)
其中第二个因子与 x \boldsymbol{x} x无关,可以提取到积分外面,剩下的积分部分(第一个因子)就是 x \boldsymbol{x} x的高斯分布,因此对 x \boldsymbol{x} x积分可以得到一个常数,再与常数 η \eta η合并。
F \boldsymbol{F} F 满足如下等式:
( y − μ y ) T F T ( G T R − 1 G + Σ x x − 1 ) ( x − μ x ) = ( y − μ y ) T R − 1 G ( x − μ x ) F T ( G T R − 1 G + Σ x x − 1 ) = R − 1 G \begin{aligned} \left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) ^{T}\boldsymbol{F}^{T}\left( \boldsymbol{\boldsymbol{G}}^{T}\boldsymbol{R}^{-1}\boldsymbol{G}+\boldsymbol{\Sigma} _{xx}^{-1}\right) \left( \boldsymbol{x}-\boldsymbol{\mu_x}\right) &=\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) ^{T}\boldsymbol{R}^{-1}\boldsymbol{G}\left( \boldsymbol{x}-\boldsymbol{\mu_{x}}\right)\\ \boldsymbol{F}^{T}\left( \boldsymbol{G}^{T}\boldsymbol{R}^{-1}\boldsymbol{G}+\boldsymbol{\Sigma} _{xx}^{-1}\right) &=\boldsymbol{R}^{-1}\boldsymbol{G} \end{aligned} (yμy)TFT(GTR1G+Σxx1)(xμx)FT(GTR1G+Σxx1)=(yμy)TR1G(xμx)=R1G
于是:
p ( y ) = ρ exp ⁡ { − 1 2 ( y − μ y ) T [ R − 1 − F T ( G T R − 1 G + Σ x x − 1 ) F ] ( y − μ y ) } = ρ exp ⁡ { − 1 2 ( y − μ y ) T [ R − 1 − R − 1 G ( G T R − 1 G + Σ x x − 1 ) − 1 G T R − 1 ] ⏟ 由矩阵求逆引理 ( 29 ) 式得  =   ( R + G Σ x x G − 1 ) − 1 ( y − μ y ) } = ρ exp ⁡ [ − 1 2 ( y − μ y ) T ( R + G Σ x x G − 1 ) − 1 ( y − μ y ) ] \begin{aligned} p(\boldsymbol{y})&=\rho\exp \left\{ -\dfrac{1}{2}\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) ^{T}\left[ \boldsymbol{R}^{-1}-\boldsymbol{F}^{T}\left( \boldsymbol{G}^{T}\boldsymbol{R}^{-1}\boldsymbol{G}+\boldsymbol{\Sigma}_{xx}^{-1}\right) \boldsymbol{F}\right] \left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) \right\} \\ &=\rho\exp\left\{ -\dfrac{1}{2}\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) ^{T}\underbrace{\left[\boldsymbol{R}^{-1}-\boldsymbol{R}^{-1}\boldsymbol{G}\left( \boldsymbol{G}^{T}\boldsymbol{R}^{-1}\boldsymbol{G}+\boldsymbol{\Sigma} _{xx}^{-1}\right) ^{-1}\boldsymbol{G}^{T}\boldsymbol{R}^{-1}\right]}_{由矩阵求逆引理(29)式得\ =\ \left( \boldsymbol{R}+\boldsymbol{G}\boldsymbol{\Sigma} _{xx} \boldsymbol{G}^{-1}\right)^{-1}}\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) \right\}\\ &=\rho\exp \left[ -\dfrac{1}{2}\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) ^{T}\left( \boldsymbol{R}+\boldsymbol{G}\boldsymbol{\Sigma} _{xx}\boldsymbol{G}^{-1}\right) ^{-1}\left( \boldsymbol{y}-\boldsymbol{\mu_y}\right) \right] \end{aligned} p(y)=ρexp{21(yμy)T[R1FT(GTR1G+Σxx1)F](yμy)}=ρexp 21(yμy)T由矩阵求逆引理(29)式得 = (R+GΣxxG1)1 [R1R1G(GTR1G+Σxx1)1GTR1](yμy) =ρexp[21(yμy)T(R+GΣxxG1)1(yμy)]
其中 ρ \rho ρ 是一个新的归一化常量。该式即 x \boldsymbol{x} x的高斯分布:
y ∼ N ( μ y , Σ y y ) = N ( μ y ⏟ g ( μ x ) , R + G Σ x x G T ) \boldsymbol{y}\sim\mathcal{N}(\boldsymbol{\mu_y},\boldsymbol{\Sigma_{yy}})=\mathcal{N}(\underbrace{\boldsymbol{\mu}_y}_{\boldsymbol{g}(\boldsymbol{\mu_x})},\boldsymbol{R}+\boldsymbol{G}\boldsymbol{\Sigma_{xx}}\boldsymbol{G}^T) yN(μy,Σyy)=N(g(μx) μy,R+GΣxxGT)
中间过程使用了矩阵求逆引理,见下文。

矩阵求逆引理

对于可逆矩阵,我们可以将它分解为一个下三角-对角-上三角(lower-diagnal-upper,LDU)形式或上三角-对角-下三角(upper-diagnal-lower,UDL)形式。

LDU

[ A − 1 − B C D ] = [ I 0 C A 1 ] [ A − 1 0 0 D + C A B ] [ I − A B 0 I ] \begin{aligned} \begin{bmatrix} \boldsymbol{A}^{-1} & -\boldsymbol{B} \\ \boldsymbol{C} & \boldsymbol{D} \end{bmatrix}=\begin{bmatrix} \boldsymbol{I} & \boldsymbol{0}\\ \boldsymbol{C}\boldsymbol{A} & 1 \end{bmatrix}\begin{bmatrix} \boldsymbol{A}^{-1} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B} \end{bmatrix}\begin{bmatrix} \boldsymbol{I} & -\boldsymbol{A}\boldsymbol{B} \\ \boldsymbol{0} & \boldsymbol{I} \end{bmatrix} \end{aligned} [A1CBD]=[ICA01][A100D+CAB][I0ABI]
对两侧求逆得:
[ A − 1 − B C D ] − 1 = [ I A B 0 I ] [ A 0 0 ( D + C A B ) − 1 ] [ I 0 − C A I ] = [ A − A B ( D + C A B ) − 1 C A A B ( D + C A B ) − 1 − ( D + C A B ) − 1 C A ( D + C A B ) − 1 ] \begin{aligned} \begin{bmatrix} \boldsymbol{A}^{-1} & -\boldsymbol{B} \\ \boldsymbol{C} & \boldsymbol{D} \end{bmatrix}^{-1} &=\begin{bmatrix} \boldsymbol{I} & \boldsymbol{A}\boldsymbol{B} \\ \boldsymbol{0}& \boldsymbol{I} \end{bmatrix}\begin{bmatrix} \boldsymbol{A} & \boldsymbol{0} \\ \boldsymbol{0} & \left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right) ^{-1} \end{bmatrix}\begin{bmatrix} \boldsymbol{I} & \boldsymbol{0} \\ -\boldsymbol{C}\boldsymbol{A} & \boldsymbol{I} \end{bmatrix}\\ &=\begin{bmatrix} \boldsymbol{A}-\boldsymbol{A}\boldsymbol{B}\left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right) ^{-1}\boldsymbol{C}\boldsymbol{A} & \boldsymbol{A}\boldsymbol{B}\left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right) ^{-1} \\ -\left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right) ^{-1}\boldsymbol{C}\boldsymbol{A} & \left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right) ^{-1} \end{bmatrix} \end{aligned} [A1CBD]1=[I0ABI][A00(D+CAB)1][ICA0I]=[AAB(D+CAB)1CA(D+CAB)1CAAB(D+CAB)1(D+CAB)1]

UDL

[ A − 1 − B C D ] = [ I − B D − 1 0 I ] [ A − 1 + B D − 1 C 0 0 D ] [ I 0 D − 1 C I ] \begin{aligned} \begin{bmatrix} \boldsymbol{A}^{-1} & -\boldsymbol{B} \\ \boldsymbol{C} & \boldsymbol{D} \end{bmatrix}=\begin{bmatrix} \boldsymbol{I} & -\boldsymbol{B}\boldsymbol{D}^{-1} \\ \boldsymbol{0}& \boldsymbol{I} \end{bmatrix}\begin{bmatrix} \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{D} \end{bmatrix}\begin{bmatrix} \boldsymbol{I} & \boldsymbol{0} \\ \boldsymbol{D}^{-1}\boldsymbol{C} & \boldsymbol{I} \end{bmatrix} \end{aligned} [A1CBD]=[I0BD1I][A1+BD1C00D][ID1C0I]
对两侧求逆得:
[ A − 1 − B C D ] − 1 = [ I 0 − D − 1 C I ] [ ( A − 1 + B D − 1 C ) − 1 0 0 D − 1 ] [ I B D − 1 0 I ] = [ ( A − 1 + B D − 1 C ) − 1 ( A − 1 + B D − 1 C ) 1 B D − 1 − D − 1 C ( A − 1 + B D − 1 C ) − 1 D − 1 − D − 1 C ( A − 1 + B D − 1 C ) − 1 B D − 1 ] \begin{aligned} \begin{bmatrix} \boldsymbol{A}^{-1} & -\boldsymbol{B} \\ \boldsymbol{C} & \boldsymbol{D} \end{bmatrix}^{-1} &=\begin{bmatrix} \boldsymbol{I} & \boldsymbol{0} \\ -\boldsymbol{D}^{-1}\boldsymbol{C} & \boldsymbol{I} \end{bmatrix}\begin{bmatrix} \left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{-1} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{D}^{-1} \end{bmatrix}\begin{bmatrix} \boldsymbol{I} & \boldsymbol{B}\boldsymbol{D}^{-1} \\ \boldsymbol{0} & \boldsymbol{I} \end{bmatrix} \\&= \begin{bmatrix} \left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{-1} & \left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{1}\boldsymbol{B}\boldsymbol{D}^{-1} \\ -\boldsymbol{D}^{-1}\boldsymbol{C}\left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{-1} & \boldsymbol{D}^{-1}-\boldsymbol{D}^{-1}\boldsymbol{C}\left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{-1}\boldsymbol{B}\boldsymbol{D}^{-1} \end{bmatrix} \end{aligned} [A1CBD]1=[ID1C0I][(A1+BD1C)100D1][I0BD1I]=[(A1+BD1C)1D1C(A1+BD1C)1(A1+BD1C)1BD1D1D1C(A1+BD1C)1BD1]

SMW恒等式

比较得到如下恒等式:
( A − 1 + B D − 1 C ) − 1 ≡ A − A B ( D + C A B ) − 1 C A ( 1 ) ( D + C A B ) − 1 ≡ D − 1 − D − 1 C ( A − 1 + B D − 1 C ) − 1 B D − 1 ( 2 ) A B ( D + C A B ) − 1 ≡ ( A − 1 + B D − 1 C ) − 1 B D − 1 ( 3 ) ( D + C A B ) − 1 C A ≡ D − 1 C ( A − 1 + B D − 1 C ) − 1 ( 4 ) \begin{aligned} \left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{-1} &\equiv \boldsymbol{A}-\boldsymbol{A}\boldsymbol{B}\left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right) ^{-1}\boldsymbol{C}\boldsymbol{A} &(1) \\ \left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right) ^{-1} &\equiv \boldsymbol{D}^{-1}-\boldsymbol{D}^{-1}\boldsymbol{C}\left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{-1}\boldsymbol{B}\boldsymbol{D}^{-1} &(2) \\ \boldsymbol{A}\boldsymbol{B}\left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right) ^{-1} &\equiv \left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{-1}\boldsymbol{B}\boldsymbol{D}^{-1} &(3) \\ \left( \boldsymbol{D}+\boldsymbol{C}\boldsymbol{A}\boldsymbol{B}\right)^{-1}\boldsymbol{C}\boldsymbol{A} &\equiv \boldsymbol{D}^{-1}\boldsymbol{C}\left( \boldsymbol{A}^{-1}+\boldsymbol{B}\boldsymbol{D}^{-1}\boldsymbol{C}\right) ^{-1} &(4) \end{aligned} (A1+BD1C)1(D+CAB)1AB(D+CAB)1(D+CAB)1CAAAB(D+CAB)1CAD1D1C(A1+BD1C)1BD1(A1+BD1C)1BD1D1C(A1+BD1C)1(1)(2)(3)(4)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Shilong Wang

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值