Kaldi’s PLDA implementation is based on [1], the so-called two-covariance PLDA by [2]. The authors derive a clean update formula for the EM training and give a detailed comment in the source code. Here we add some explanations to make formula derivation more easy to catch.
A pdf version of this note can be found here
1. Background
Recall that PLDA assume a two stage generative process:
1) generate the class center according to
2) then, generate the observed data by:
Here,
μ
μ
is estimated by the global mean value:
here zki z k i depicts the i i -th sample of the -th class.
So let’s to the estimation of Φb Φ b and Φw Φ w .
Note that, as μ μ is fixed, we remove it from all samples. Hereafter, we assume all samples have pre-processed by removing mu m u from them.
The prior distribution of an arbitrary sample
z
z
is:
Let’s suppose the mean of a particular class is m m , and suppose that that class had examples.
i.e. m m is Gaussian-distributed with zero mean and variance equal to the between-class variance plus times the within-class variance. Now, m m is observed (average of all observed samples).
2. EM
We’re doing an E-M procedure where we treat as the sum of two variables:
where x∼N(0,Φb) x ∼ N ( 0 , Φ b ) , y∼N(0,Φw/n) y ∼ N ( 0 , Φ w / n ) .
The distribution of x x will contribute to the stats of , and y y to .
2.1 E Step
Note that given m m , there’s only one latent variable in effect. Observe the , so we can focus on working out the distribution of x x and then we can very simply get the distribution of .
Given
m
m
, the posterior distribution of is:
Hereafter, we drop the condition on m m for brevity.
Since two Gaussian’s product is Gaussian as well, we get.
where Φ^=(Φ−1b+nΦ−1w)−1 Φ ^ = ( Φ b − 1 + n Φ w − 1 ) − 1 and w=Φ^nΦ−1wm w = Φ ^ n Φ w − 1 m .
Φ^ Φ ^ and w w can be inferred by comparing the one and two order coefficients to the standard form of log Gaussian. As Kaldi’s comment does:
Note: the C is different from line to line.
where z=nΦ−1wm z = n Φ w − 1 m , and we can write this as:
where xT(Φ−1b+nΦ−1w)w=xTz x T ( Φ b − 1 + n Φ w − 1 ) w = x T z , i.e.
so
2.2 M Step
The objective function of EM update is:
derivative w.r.t
Φw/n
Φ
w
/
n
is as follows:
to zero it, we have:
Similarly, we have:
3. Summary
recap that given samples of certain class, we can calculate the following statistics:
Given K K classes, updated estimation via EM will be:
Finally, Kaldi use the following update formula for
Φw
Φ
w
:
where S S is the scatter matrix , and ck=1nk∑izki c k = 1 n k ∑ i z k i is the mean of samples of the k k -th class.
Note that is the result of EM used here, since m=x+y m = x + y only take pooling of data into consideration.
For other EM training, see [2] and the references therein.
References