Kaldi’s PLDA implementation is based on [1], the so-called two-covariance PLDA by [2]. The authors derive a clean update formula for the EM training and give a detailed comment in the source code. Here we add some explanations to make formula derivation more easy to catch.
A pdf version of this note can be found here
1. Background
Recall that PLDA assume a two stage generative process:
1) generate the class center according to
2) then, generate the observed data by:
Here, μ μ is estimated by the global mean value:
here zki z k i depicts the i i -th sample of the -th class.
So let’s to the estimation of Φb Φ b and Φw Φ w .
Note that, as μ μ is fixed, we remove it from all samples. Hereafter, we assume all samples have pre-processed by removing mu m u from them.
The prior distribution of an arbitrary sample z z is:
Let’s suppose the mean of a particular class is m m , and suppose that that class had examples.
i.e. m m is Gaussian-distributed with zero mean and variance equal to the between-class variance plus times the within-class variance. Now, m m is observed (average of all observed samples).
2. EM
We’re doing an E-M procedure where we treat as the sum of two variables:
where x∼N(0,Φb) x ∼ N ( 0 , Φ b ) , y∼N(0,Φw/n) y ∼ N ( 0 , Φ w / n ) .
The distribution of x x will contribute to the stats of