Conditional and marginal distributions of a multivariate Gaussian

最新推荐文章于 2024-02-19 17:09:49 发布

随遇而安随缘一世

最新推荐文章于 2024-02-19 17:09:49 发布

阅读量1.7k

点赞数

分类专栏： Algorithm

Algorithm 专栏收录该内容

136 篇文章 4 订阅

订阅专栏

While reading up on Gaussian Processes (GPs), I decided it would be useful to be able to prove some of the basic facts about multivariate Gaussian distributions that are the building blocks for GPs. Namely, how to prove that the conditional distribution and marginal distribution of a multivariate Gaussian is also Gaussian, and to give its form.

Preliminaries

First, we know that the density of a multivariate normal distribution with mean $\mu$ and covariance $\Sigma$ is given by

$\frac{1}{(2\pi)^{k/2}|\Sigma|^{1/2}} \exp\left(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)\right)$ .

For simplicity of notation, I’ll now assume that the distribution has zero-mean, but everything should carry over in a straightforward manner to the more general case.

Writing out $x$ as two components $\left[ \begin{array}{c} a\\ b \end{array} \right]$ , we are now interested in two distributions, the conditional $p(a|b)$ and the marginal $p(b)$ .

Separate the components of the covariance matrix $\Sigma$ into a block matrix $\left[ \begin{array}{cc} A & C^T \\ C & B \end{array}\right]$ , such that $A$ corresponds to the covariance for $a$ , similarly for $B$ , and $C$ contains the cross-terms.

Rewriting the Joint

We’d now like to be able to write out the form for the inverse covariance matrix $\left[ \begin{array}{cc} A & C^T \\ C & B \end{array}\right]^{-1}$ . We can make use of the Schur complement and write this as

$\left[ \begin{array}{cc} A & C^T \\ C & B \end{array}\right]^{-1} = \left[ \begin{array}{cc} I & 0 \\ -B^{-1}C & I \end{array}\right] \left[ \begin{array}{cc} (A-C^T B^{-1} C)^{-1} & 0 \\ 0 & B^{-1} \end{array}\right] \left[ \begin{array}{cc} I & -C^T B^{-1} \\ 0 & I \end{array}\right]$ .

I’ll explain below how this can be derived.

Now, we know that the joint distribution can be written as

$p(a,b) \propto \exp \left(-\frac{1}{2} \left[ \begin{array}{c} a\\ b \end{array} \right]^T \left[ \begin{array}{cc} A & C^T \\ C & B \end{array}\right]^{-1} \left[ \begin{array}{c} a\\ b \end{array} \right] \right)$ .

We can substitute in the above expression of the inverse of the block covariance matrix, and if we simplify by multiplying the outer matrices, we obtain

$p(a,b) \propto \exp \left(-\frac{1}{2} \left[ \begin{array}{c} a - C^T B^{-1} b \\ b \end{array} \right]^T \left[ \begin{array}{cc} (A-C^T B^{-1} C)^{-1} & 0 \\ 0 & B^{-1} \end{array}\right] \left[ \begin{array}{c} a - C^T B^{-1} b \\ b \end{array} \right] \right)$ .

Using the fact that the center matrix is block diagonal, we have

$p(a,b) \propto \exp \left(-\frac{1}{2} (a - C^T B^{-1} b)^T (A-C^T B^{-1} C)^{-1} (a - C^T B^{-1} b)\right) \exp \left( -\frac{1}{2} b^T B^{-1} b\right)$ .

Wrapping up

At this point, we’re pretty much done. If we condition on $b$ , the second exponential term drops out as a constant, and we have

$p(a|b) \sim \mathcal{N}\left(C^T B^{-1} b, (A-C^T B^{-1} C)\right)$ .

Note that if $a$ and $b$ are uncorrelated, $C = 0$ , and we just get the marginal distribution of $a$ .

If we marginalize over $a$ , we can pull the second exponential term outside the integral, and the first term is just the density of a Gaussian distribution, so it integrates to 1, and we find that

$p(b) = \int_a p(a,b) \sim \mathcal{N}(0,B)$ .

Schur complement

Above, I wrote that you could use the Schur complement to get the block matrix form of the inverse covariance matrix. How would one actually derive that? As mentioned in the wikipedia page, the expression for the inverse can be derived using Gaussian elimination.

If you right-multiply the covariance by the left-most matrix in the expression, you obtain

$\left[ \begin{array}{cc} A & C^T \\ C & B \end{array}\right] \left[ \begin{array}{cc} I & 0 \\ -B^{-1}C & I \end{array}\right] = \left[ \begin{array}{cc} A-C^T B^{-1} C & C^T \\ 0 & B \end{array}\right]$

zero-ing out the bottom right matrix. Multiplying by the center matrix gives you the identity in the diagonal components, and the right-most matrix zeros out the top left matrix, giving you the identity, so the whole expression is the inverse of the covariance matrix.

I got started on this train of thought after reading the wikipedia page on Gaussian processes. The external link on the page to a gentle introduction to GPs was somewhat helpful as a quick primer. The video lectures by MacKay and Rasmussen were both good and helped to give a better understanding of GPs.

MacKay also has a nice short essay on the Humble Gaussian distribution, which gives more information on the covariance and inverse covariance matrices of Gaussian distributions. In particular, the inverse covariance matrix tells you the relationship between two variables, conditioned on all other variables, and therefore changes if you marginalize out some of the variables. The sign of the off diagonal elements in the inverse covariance matrix is opposite the sign of the correlation between the two variables, conditioned on all the other variables.

To go deeper into Gaussian Processes, one can read the book Gaussian Processes for Machine Learning, by Rasmussen and Williams, which is available online. The appendix contains useful facts and references on Gaussian identities and matrix identities, such as the matrix inversion lemma, another application of Gaussian elimination to determine the inverse, in this case the inverse of a matrix sum.