机器学习-白板推导 P2_4

机器学习-白板推导 P2_4


已知高维高斯分布,求边缘概率分布和条件概率分布
x = [ x 1 x 2 ⋮ x p ] μ = [ μ 1 μ 2 ⋮ μ p ] Σ = [ σ 11 σ 12 ⋯ σ 1 p σ 21 σ 22 ⋯ σ 2 p ⋮ ⋮ ⋱ ⋮ σ p 1 σ p 2 ⋯ σ p p ] p × p x= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{p} \end{bmatrix} \qquad \mu= \begin{bmatrix} \mu_{1} \\ \mu_{2} \\ \vdots \\ \mu_{p} \end{bmatrix} \qquad \Sigma= \begin{bmatrix} \sigma_{11} & \sigma_{12} & \cdots &\sigma_{1p} \\ \sigma_{21} & \sigma_{22} & \cdots&\sigma_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{p1} & \sigma_{p2} & \cdots &\sigma_{pp} \end{bmatrix}_{p \times p } x=x1x2xpμ=μ1μ2μpΣ=σ11σ21σp1σ12σ22σp2σ1pσ2pσppp×p
已知
p p p维向量分成两组, x a ∈ R m , x b ∈ R n , m + n = p x_a \in R^m, x_b \in R ^n, m+n = p xaRm,xbRn,m+n=p
x x x看成 x a , x b x_a,x_b xa,xb的联合概率分布
x = [ x a x b ] μ = [ μ 0 μ 1 ] Σ = [ Σ a a Σ a b Σ b a Σ b b ] x= \begin{bmatrix} x_{a} \\ x_{b} \end{bmatrix} \qquad \mu= \begin{bmatrix} \mu_{0} \\ \mu_{1} \end{bmatrix} \qquad \Sigma= \begin{bmatrix} \Sigma_{aa} & \Sigma_{ab} \\ \Sigma_{ba} & \Sigma_{bb} \end{bmatrix} x=[xaxb]μ=[μ0μ1]Σ=[ΣaaΣbaΣabΣbb]

p ( x a ) p(x_a) p(xa), p ( x b ∣ x a ) p(x_b|x_a) p(xbxa), p ( x b ) p(x_b) p(xb), p ( x a ∣ x b ) p(x_a|x_b) p(xaxb)
配方法->PRML 这里不用

定理
已知: x ∼ N ( μ , Σ ) x \sim N(\mu, \Sigma) xN(μ,Σ), y = A x + B y=Ax+B y=Ax+B
结论: y ∼ N ( A μ + B , A Σ A T ) y \sim N(A\mu+B,A\Sigma A^T) yN(Aμ+B,AΣAT)
E [ y ] = E [ A x + B ] = A E [ x ] + B = A μ + B E[y]=E[Ax+B]=AE[x]+B=A\mu+B E[y]=E[Ax+B]=AE[x]+B=Aμ+B
V a r [ y ] = V a r [ A x + B ] = V a r [ A x ] + V a r [ B ] = A V a r [ x ] A T = A Σ A T Var[y]=Var[Ax+B]=Var[Ax]+Var[B]=AVar[x]A^T=A\Sigma A^T Var[y]=Var[Ax+B]=Var[Ax]+Var[B]=AVar[x]AT=AΣAT

p ( x a ) p(x_a) p(xa)公式推理
x a = [ I m 0 n ] [ x a x b ] E [ x a ] = [ I m 0 n ] [ u a u b ] = u a V a r [ x a ] = [ I m 0 n ] [ Σ a a Σ a b Σ b a Σ b b ] [ I m 0 n ] = [ Σ a a Σ b b ] [ I m 0 ] = Σ a a x a ∼ ( μ a , Σ a a ) \begin{aligned} & x_a= \begin{bmatrix} I_m & 0_n \end{bmatrix} \begin{bmatrix} x_a \\ x_b \end{bmatrix} \\ &E[x_a]= \begin{bmatrix} I_m & 0_n \end{bmatrix} \begin{bmatrix} u_a \\ u_b \end{bmatrix}=u_a \\ & Var[x_a]= \begin{bmatrix} I_m & 0_n \end{bmatrix} \begin{bmatrix} \Sigma_{aa} & \Sigma_{ab} \\ \Sigma_{ba} & \Sigma_{bb} \end{bmatrix} \begin{bmatrix} I_m \\ 0_n \end{bmatrix} = \begin{bmatrix} \Sigma_{aa} & \Sigma_{bb} \end{bmatrix} \begin{bmatrix} I_m\\ 0\end{bmatrix} = \Sigma_{aa} \\ &x_a \sim (\mu_a, \Sigma_{aa}) \end{aligned} xa=[Im0n][xaxb]E[xa]=[Im0n][uaub]=uaVar[xa]=[Im0n][ΣaaΣbaΣabΣbb][Im0n]=[ΣaaΣbb][Im0]=Σaaxa(μa,Σaa)

p ( x b ∣ x a ) p(x_b|x_a) p(xbxa)公式推理
定义变量:
x b . a = x b − Σ b a Σ a a − 1 x a x_{b.a}=x_b-\Sigma_{ba}\Sigma_{aa}^{-1}x_a xb.a=xbΣbaΣaa1xa 不要问为什么
μ b . a = μ b − Σ b a Σ a a − 1 μ a \mu_{b.a}=\mu_b-\Sigma_{ba}\Sigma_{aa}^{-1}\mu_a μb.a=μbΣbaΣaa1μa
Σ b b . a = Σ b b − Σ b a Σ a a − 1 Σ a b \Sigma_{bb.a}=\Sigma_{bb}-\Sigma_{ba}\Sigma_{aa}^{-1}\Sigma_{ab} Σbb.a=ΣbbΣbaΣaa1Σab \qquad schur complementary
设:
x b . a = [ − Σ b a Σ a a − 1 I ] [ x a x b ] E [ x b . a ] = [ − Σ b a Σ a a − 1 I ] [ u a u b ] = μ b − Σ b a Σ a a − 1 μ a = μ b . a V a r [ x b . a ] = [ − Σ b a Σ a a − 1 I ] [ Σ a a Σ a b Σ b a Σ b b ] [ − Σ b a Σ a a − 1 I ] = [ 0 Σ b b − Σ b a Σ a a − 1 Σ a b ] [ − Σ b a Σ a a − 1 I ] = Σ b b − Σ b a Σ a a − 1 Σ a b = Σ b b . a x b . a ∼ N ( μ b . a , Σ b b . a ) \begin{aligned} &x_{b.a}=\begin{bmatrix} -\Sigma_{ba}\Sigma_{aa}^{-1} & I \end{bmatrix} \begin{bmatrix} x_a \\ x_b \end{bmatrix}\\ &E[x_{b.a}] = \begin{bmatrix} -\Sigma_{ba}\Sigma_{aa}^{-1} & I \end{bmatrix} \begin{bmatrix} u_a \\ u_b \end{bmatrix} = \mu_b-\Sigma_{ba}\Sigma_{aa}^{-1}\mu_a = \mu_{b.a} \\ &Var[x_{b.a}]=\begin{bmatrix} -\Sigma_{ba}\Sigma_{aa}^{-1} & I \end{bmatrix} \begin{bmatrix} \Sigma_{aa} & \Sigma_{ab} \\ \Sigma_{ba} & \Sigma_{bb} \end{bmatrix}\begin{bmatrix} -\Sigma_{ba}\Sigma_{aa}^{-1} \\ I \end{bmatrix} \\ &= \begin{bmatrix} 0 & \Sigma_{bb}-\Sigma_{ba}\Sigma_{aa}^{-1}\Sigma_{ab} \end{bmatrix} \begin{bmatrix} -\Sigma_{ba}\Sigma_{aa}^{-1} \\ I \end{bmatrix} \\ &=\Sigma_{bb}-\Sigma_{ba}\Sigma_{aa}^{-1}\Sigma_{ab} \\ &=\Sigma_{bb.a} \\ &x_{b.a} \sim N(\mu_{b.a},\Sigma_{bb.a}) \end{aligned} xb.a=[ΣbaΣaa1I][xaxb]E[xb.a]=[ΣbaΣaa1I][uaub]=μbΣbaΣaa1μa=μb.aVar[xb.a]=[ΣbaΣaa1I][ΣaaΣbaΣabΣbb][ΣbaΣaa1I]=[0ΣbbΣbaΣaa1Σab][ΣbaΣaa1I]=ΣbbΣbaΣaa1Σab=Σbb.axb.aN(μb.a,Σbb.a)
已知
x b . a ∼ N ( μ b . a , Σ b b . a ) x_{b.a} \sim N(\mu_{b.a},\Sigma_{bb.a}) xb.aN(μb.a,Σbb.a)
x b . a = x b − Σ b a Σ a a − 1 x a x_{b.a}=x_b-\Sigma_{ba}\Sigma_{aa}^{-1}x_a xb.a=xbΣbaΣaa1xa
x b = x b . a + Σ b a Σ a a − 1 x a x_{b}=x_{b.a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_a xb=xb.a+ΣbaΣaa1xa
y = A x + B y=Ax+B y=Ax+B
y = x b , A = 1 , x = x b . a , B = Σ b a Σ a a − 1 x a y=x_{b},A=1,x=x_{b.a},B=\Sigma_{ba}\Sigma_{aa}^{-1}x_a y=xb,A=1,x=xb.a,B=ΣbaΣaa1xa
结论
E [ x b ∣ x a ] = μ b . a + Σ b a Σ a a − 1 x a E[x_b|x_a]=\mu_{b.a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_a E[xbxa]=μb.a+ΣbaΣaa1xa
V a r [ x b ∣ x a ] = V a r [ x b . a ] = Σ b b . a Var[x_b|x_a]=Var[x_{b.a}]=\Sigma_{bb.a} Var[xbxa]=Var[xb.a]=Σbb.a
x b ∣ x a ∼ N ( μ b . a + Σ b a Σ a a − 1 x a , Σ b b . a ) x_b|x_a \sim N(\mu_{b.a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_a,\Sigma_{bb.a}) xbxaN(μb.a+ΣbaΣaa1xa,Σbb.a)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值