Conditional and marginal distributions of a multivariate Gaussian

While reading up on Gaussian Processes (GPs), I decided it would be useful to be able to prove some of the basic facts about multivariate Gaussian distributions that are the building blocks for GPs.  Namely, how to prove that the conditional distribution and marginal distribution of a multivariate Gaussian is also Gaussian, and to give its form.

Preliminaries

First, we know that the density of a multivariate normal distribution with mean \mu and covariance \Sigma is given by

\frac{1}{(2\pi)^{k/2}|\Sigma|^{1/2}} \exp\left(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)\right).

For simplicity of notation, I’ll now assume that the distribution has zero-mean, but everything should carry over in a straightforward manner to the more general case.

Writing out x as two components \left[ \begin{array}{c} a\\ b \end{array} \right], we are now interested in two distributions, the conditional p(a|b) and the marginal p(b).

Separate the components of the covariance matrix \Sigma into a block matrix \left[ \begin{array}{cc} A & C^T \\ C & B \end{array}\right], such that Acorresponds to the covariance for a, similarly for B, and C contains the cross-terms.

Rewriting the Joint

We’d now like to be able to write out the form for the inverse covariance matrix \left[ \begin{array}{cc} A & C^T \\ C & B \end{array}\right]^{-1}.  We can make use of the Schur complement and write this as

\left[ \begin{array}{cc} A & C^T \\ C & B  \end{array}\right]^{-1} = \left[ \begin{array}{cc} I & 0 \\ -B^{-1}C & I  \end{array}\right] \left[ \begin{array}{cc} (A-C^T B^{-1} C)^{-1} & 0 \\ 0 & B^{-1}  \end{array}\right] \left[ \begin{array}{cc} I & -C^T B^{-1} \\ 0 & I  \end{array}\right].

I’ll explain below how this can be derived.

Now, we know that the joint distribution can be written as

p(a,b) \propto \exp \left(-\frac{1}{2} \left[ \begin{array}{c} a\\ b \end{array} \right]^T \left[ \begin{array}{cc} A & C^T \\ C & B   \end{array}\right]^{-1} \left[ \begin{array}{c} a\\ b \end{array} \right] \right).

We can substitute in the above expression of the inverse of the block covariance matrix, and if we simplify by multiplying the outer matrices, we obtain

p(a,b) \propto \exp \left(-\frac{1}{2} \left[ \begin{array}{c}  a - C^T B^{-1} b \\ b \end{array} \right]^T \left[ \begin{array}{cc} (A-C^T B^{-1} C)^{-1} & 0 \\ 0 & B^{-1}   \end{array}\right] \left[ \begin{array}{c}  a - C^T B^{-1} b \\ b \end{array} \right] \right).

Using the fact that the center matrix is block diagonal, we have

p(a,b) \propto \exp \left(-\frac{1}{2} (a  - C^T B^{-1} b)^T (A-C^T B^{-1} C)^{-1} (a  - C^T B^{-1} b)\right) \exp \left( -\frac{1}{2} b^T B^{-1} b\right).

Wrapping up

At this point, we’re pretty much done.  If we condition on b, the second exponential term drops out as a constant, and we have

p(a|b) \sim \mathcal{N}\left(C^T B^{-1} b, (A-C^T B^{-1} C)\right).

Note that if a and b are uncorrelated, C = 0, and we just get the marginal distribution of a.

If we marginalize over a, we can pull the second exponential term outside the integral, and the first term is just the density of a Gaussian distribution, so it integrates to 1, and we find that

p(b) = \int_a p(a,b) \sim \mathcal{N}(0,B).

Schur complement

Above, I wrote that you could use the Schur complement to get the block matrix form of the inverse covariance matrix.  How would one actually derive that?  As mentioned in the wikipedia page, the expression for the inverse can be derived using Gaussian elimination.

If you right-multiply the covariance by the left-most matrix in the expression, you obtain

\left[ \begin{array}{cc} A & C^T \\ C & B   \end{array}\right] \left[ \begin{array}{cc} I & 0 \\ -B^{-1}C & I    \end{array}\right] = \left[ \begin{array}{cc} A-C^T B^{-1} C & C^T \\ 0 & B    \end{array}\right]

zero-ing out the bottom right matrix.  Multiplying by the center matrix gives you the identity in the diagonal components, and the right-most matrix zeros out the top left matrix, giving you the identity, so the whole expression is the inverse of the covariance matrix.

Further Reading

I got started on this train of thought after reading the wikipedia page on Gaussian processes.  The external link on the page to a gentle introduction to GPs was somewhat helpful as a quick primer.  The video lectures by MacKay and Rasmussen were both good and helped to give a better understanding of GPs.

MacKay also has a nice short essay on the Humble Gaussian distribution, which gives more information on the covariance and inverse covariance matrices of Gaussian distributions.  In particular, the inverse covariance matrix tells you the relationship between two variables, conditioned on all other variables, and therefore changes if you marginalize out some of the variables.  The sign of the off diagonal elements in the inverse covariance matrix is opposite the sign of the correlation between the two variables, conditioned on all the other variables.

To go deeper into Gaussian Processes, one can read the book Gaussian Processes for Machine Learning, by Rasmussen and Williams, which is available online.  The appendix contains useful facts and references on Gaussian identities and matrix identities, such as the matrix inversion lemma, another application of Gaussian elimination to determine the inverse, in this case the inverse of a matrix sum.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值