原文来自师兄的博客:http://blog.csdn.net/wjj5881005/article/details/53535613
1. 均值和方差未知的多元正态分布的后验(Multivariate normal with unknown mean and variance)
假设有N个观测值
{xi|i=1,2,...,N}
,且服从均值为
μ
方差为
Σ
的多元正态分布,即:
均值和方差都未知的情况下,多元正态分布的共轭先验是正态逆威沙特分布(Normal-Inverse-Wishart),即有:
其中逆威沙特分布的概率密度函数为如下形式:
Γk(⋅) 是多变量Gamma分布, ν0 和 Λ0 分别是逆威沙特分布的自由度和尺度矩阵, k 是数据的维度。
依据文献[1],在观测到数据
其中:
得到了后验分布的概率密度函数,我们就可以通过其采样多元正态分布的均值 μ 和方差 Σ 了。
2. 从后验分布中采样均值 μ 和方差 Σ
均值 μ 的采样需要依赖于 Σ ,因此采样顺序一般为:首先采样 Σ∼Inv−Wishart(ν′,Λ′) ,然后采样 μ|Σ,x∼N(μ′,Σ/κ′) 。关于均值的采样,可以看这篇博客。下面介绍一下如何从逆威沙特分布中采样方差 Σ 。首先介绍一下Odell&Feiveson于1966年提出的基本采样思路[2],然后给出Java代码。
一、 假设 Vi(1≤i≤k) 是独立的随机变量,并且采样自自由度为 ν−i+1 的卡方分布,所有有 ν−k+1≤ν−i+1≤ν .
二、假设 Nij 是独立的采样自均值为0方差为1的正态分布中的随机变量,且有 1≤i≤j≤k , Nij 独立于 Vi .
三、定义随机变量 bij ,且 1≤i,j≤k ,当 1≤i≤j≤k 时,有 bji=bij ,我们通过如下公式构造 bij 。biibij=Vi+∑r=1i−1N2ri,1≤i≤k=NijVi−−√+∑r=1i−1NriNrj,i<j≤k
四、对方阵 Λ 进行Cholesky分解,即 LLT=Λ−1
五、构造矩阵 R=LBLT
六、则 Σ′=R−1 为该逆威沙特分布的样本。
至于为什么这么做,大家可参考文献[3]或者[2]。上面的过程已经很清晰了,下面我们给出具体的Java代码, 来源自GitHub,并且做了一点的修改(Note,下面的代码使用的依然是commons.math的3.0版本,事实上,现在已经更新到4.0版本的。)
import java.util.Arrays;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.commons.math3.distribution.GammaDistribution;
import org.apache.commons.math3.linear.Array2DRowRealMatrix;
import org.apache.commons.math3.linear.CholeskyDecomposition;
import org.apache.commons.math3.linear.LUDecomposition;
import org.apache.commons.math3.linear.RealMatrix;
import org.apache.commons.math3.linear.SingularMatrixException;
import org.apache.commons.math3.random.RandomGenerator;
import org.apache.commons.math3.random.Well19937c;
/**
* Inverse Wishart distribution implementation, to sample random covariances matrices for
* multivariate gaussian distributions.
* <p/>
* The sampling method follows the procedure described by Odell & Feiveson, 1966 to get samples
* from a Wishart distribution, and then computes the inverse of the obtained samples.
*
* @author Marc Pujol <mpujol@iiia.csic.es>
*/
public class InverseWishartDistribution {
private static final Logger LOG = Logger.getLogger(InverseWishartDistribution.class.getName());
private GammaDistribution[] gammas;
private double df;
private RealMatrix scaleMatrix;
private CholeskyDecomposition cholesky;
private RandomGenerator random;
/**
* Builds a new Inverse Wishart distribution with the given scale and degrees of freedom.
*
* @param scaleMatrix scale matrix(Λ)
* @param df degrees of freedom.
*/
public InverseWishartDistribution(RealMatrix scaleMatrix, double df) {
if (!scaleMatrix.isSquare()) {
throw new RuntimeException("scaleMatrix must be square.");
}
this.scaleMatrix = scaleMatrix;
this.df = df;
this.random = new Well19937c();
initialize();
}
private void initialize() {
final int dim = scaleMatrix.getColumnDimension();
// Build gamma distributions for the diagonal
gammas = new GammaDistribution[dim];
for (int i = 0; i < dim; i++) {
gammas[i] = new GammaDistribution((df-i+0.0)/2, 2);
}
// Build the cholesky decomposition
cholesky = new CholeskyDecomposition(inverseMatrix(scaleMatrix));
}
/**
* Reseeds the random generator.
*
* @param seed new random seed.
*/
public void reseedRandomGenerator(long seed) {
random.setSeed(seed);
for (int i = 0, len = scaleMatrix.getColumnDimension(); i < len; i++) {
gammas[i].reseedRandomGenerator(seed+i);
}
}
/**
* Returns the inverse matrix.
* @return inverted matrix.
*/
public RealMatrix inverseMatrix(RealMatrix A) {
RealMatrix result = new LUDecomposition(A).getSolver().getInverse();
return result;
}
/**
* Returns a sample matrix from this distribution.
* @return sampled matrix.
*/
public RealMatrix sample() {
for (int i=0; i<100; i++) {
try {
RealMatrix A = sampleWishart();
RealMatrix result = inverseMatrix(A);
LOG.log(Level.FINE, "Cov = {0}", result);
return result;
} catch (SingularMatrixException ex) {
LOG.finer("Discarding singular matrix generated by the wishart distribution.");
}
}
throw new RuntimeException("Unable to generate inverse wishart samples!");
}
private RealMatrix sampleWishart() {
final int dim = scaleMatrix.getColumnDimension();
// Build N_{ij}
double[][] N = new double[dim][dim];
for (int j = 0; j < dim; j++) {
for (int i = 0; i < j; i++) {
N[i][j] = random.nextGaussian();
}
}
if (LOG.isLoggable(Level.FINEST)) {
LOG.log(Level.FINEST, "N = {0}", Arrays.deepToString(N));
}
// Build V_j
double[] V = new double[dim];
for (int i = 0; i < dim; i++) {
V[i] = gammas[i].sample();
}
if (LOG.isLoggable(Level.FINEST)) {
LOG.log(Level.FINEST, "V = {0}", Arrays.toString(V));
}
// Build B
double[][] B = new double[dim][dim];
// b_{11} = V_1 (first j, where sum = 0 because i == j and the inner
// loop is never entered).
// b_{jj} = V_j + \sum_{i=1}^{j-1} N_{ij}^2, j = 2, 3, ..., p
for (int j = 0; j < dim; j++) {
double sum = 0;
for (int i = 0; i < j; i++) {
sum += Math.pow(N[i][j], 2);
}
B[j][j] = V[j] + sum;
}
if (LOG.isLoggable(Level.FINEST)) {
LOG.log(Level.FINEST, "B*_jj : = {0}", Arrays.deepToString(B));
}
// b_{1j} = N_{1j} * \sqrt V_1
for (int j = 1; j < dim; j++) {
B[0][j] = N[0][j] * Math.sqrt(V[0]);
B[j][0] = B[0][j];
}
if (LOG.isLoggable(Level.FINEST)) {
LOG.log(Level.FINEST, "B*_1j = {0}", Arrays.deepToString(B));
}
// b_{ij} = N_{ij} * \sqrt V_1 + \sum_{k=1}^{i-1} N_{ki}*N_{kj}
for (int j = 1; j < dim; j++) {
for (int i = 1; i < j; i++) {
double sum = 0;
for (int k = 0; k < i; k++) {
sum += N[k][i] * N[k][j];
}
B[i][j] = N[i][j] * Math.sqrt(V[i]) + sum;
B[j][i] = B[i][j];
}
}
if (LOG.isLoggable(Level.FINEST)) {
LOG.log(Level.FINEST, "B* = {0}", Arrays.deepToString(B));
}
RealMatrix BMat = new Array2DRowRealMatrix(B);
RealMatrix A = cholesky.getL().multiply(BMat).multiply(cholesky.getLT());
if (LOG.isLoggable(Level.FINER)) {
LOG.log(Level.FINER, "A* = {0}", Arrays.deepToString(N));
}
return A;
}
}
其中因为commons.math中的卡方分布没有采样函数,所以我们借助于commons.math提供的Gamma分布进行采样,事实上,卡方分布和Gamma概率密度函数非常相近。上述采样的核心其实是先从威沙特分布中采样一个方阵,然后求其逆矩阵,则得到逆威沙特分布的一个样本。代码中inverseMatrix(scaleMatrix)是将逆威沙特分布的尺度矩阵求逆,这样就得到威沙特分布的尺度矩阵。此外近一段时间找资料的过程还发现了其一些代码,如下:
Java代码:链接,其介绍文档。链接,其介绍文档。
c#代码:链接,其对应的介绍。
Matlab:其中有一个iwishrnd方法,其介绍在这里。
[1] Gelman, A., Carlin et al., Bayesian data analysis. London: Chapman & Hall
[2] Stanley Sawyer, Wishart Distributions and Inverse-Wishart Sampling
[3] Odell, P.L., and A.H. Feiveson (1966) A numerical procedure to generate a sample covariance matrix