【学点Kaldi】Kaldi PLDA实现C++代码阅读

最新推荐文章于 2023-03-11 13:22:45 发布

ShaunSXLiu

最新推荐文章于 2023-03-11 13:22:45 发布

阅读量2.1k

点赞数 8

分类专栏： PLDA Kaldi 文章标签： Kaldi PLDA 说话人验证声纹识别

本文链接：https://blog.csdn.net/liusongxiang666/article/details/83024845

版权

Kaldi 同时被 2 个专栏收录

2 篇文章 0 订阅

订阅专栏

PLDA

1 篇文章 0 订阅

订阅专栏

PLDA(Probabilistic Linear Discriminant Analysis) 广泛用于Speaker Verification中，这篇博客主要记录一下本人阅读Kaldi中PLDA底层C++代码的过程。Kaldi中PLDA的实现主要参考Sergey Ioffe的这篇paper. 代码中有很多有用的comments, 对于理解实现思路非常重要。本人在写这篇博客时也参考了A Note on Kaldi’s PLDA Implementation. 这篇总结得很好，在此表示感谢。

背景知识
LDA常常用来提取线性特征，这种特征旨在最大化between-class separation以及最小化within-class separation. LDA可以通过给训练数据拟合一个高斯混合模型得到：用 $\bold x$ 来表示observable sample，用 $y$ 来表示the latent variable，则类条件概率可以表示为
$P(\bold x | \bold y) = \mathcal{N}(\bold x| \bold y, \Phi_w)$
其中，
$P(\bold y) = \sum_{k=1}^K \pi_k\delta(\bold y - \mu_k)$
这种混合模型只能表示有限的 $K$ 类，我们想拓展这个概率模型，让它能够model出现在训练数据中的其他类。为此，我们让 $\bold y$ 的先验成为连续的。为了能够运算简单，我们可以让 $\bold y$ 的先验(prior)遵循高斯分布（因此，本文所讨论的是Gaussian PLDA，或者也叫做G-PLDA，对于其他类型的PLDA，例如Heavy-Tail PLDA，请参考相关paper）：
$P(\bold y) = \mathcal{N}(\bold y|\bold m, \Phi_b)$
在以上的式子中， $\Phi_w$ 表示类内covariance， $\Phi_b$ 表示内间covariance， $\bold m = \frac{1}{N}\sum_{k=1}^{K}\sum_{i=1}^{n_k}z_{ki}$ 。通常来说，我们要求前者是正定的，后者是半正定的。由线性代数的知识可以知道， $\Phi_w$ 和 $\Phi_b$ 可以被同时对角化，即我们可以找到一个非奇异矩阵 $V$ ，使得
$V^T\Phi_wV=I \\ V^T\Phi_bV = \Psi$
$\Psi$ 和 $V$ 可以通过解一个generalized eigenproblem 得到。定义 $A=V^{-T}$ , 则 $\Phi_w=AA^T$ , $\Phi_b=A\Psi A^T$ 。我们的模型就变成了
$\bold x = \bold m + A\bold u, \\ 其中， \bold u \sim \mathcal{N}(\cdot | \bold v, \bold I), \\ \bold v \sim \mathcal{N}(\cdot |0, \Psi)$
上面的式子非常重要，对于理解代码很有帮助。
我们要从训练数据中估计 $\Phi_w$ 和 $\Phi_b$ 。值得注意的是， $\bold m$ 是所有说话人i-vector 的global mean，因此我们可以先把 $\bold m$ 计算出来，并从训练数据中减去。我们仍然用 $z$ 来表示减去 $\bold m$ 后的i-vector，它的先验概率可以表示为：
$\mathcal{N}(0, \Phi_b + \Phi_w)$
对于训练数据中的某一个说话人 $k$ ，假设用 $m_k$ 来表示他/她的i-vector平均， $n_k$ 表示i-vector的个数，那么
$m_k = \frac{1}{n_k}\sum_{i=1}^{n_k}z_i \sim \mathcal{N}(0, \Phi_b+\frac{\Phi_w}{n_k})$
我们把 $m_k, k =1,\cdots , K$ 当做observable data。我们要学的参数有 $A$ 和 $\Psi$ ，其实也就是 $\Phi_b$ 和 $\Phi_w$ ，它们是可以互相转化的。对于有latent variable的统计推断问题，我们经常会借助Expectation-Maximisation来求解（即EM算法）。这里，我们把 $m_k$ 当做两个随机变量的和，即
$m_k = x + y$
其中， $\sim \mathcal{N}(0, \Phi_b)$ , $\sim \mathcal{N}(0, \Phi_w/n)$ 。
EM的过程如下（详细的推导过程，请见篇首列出的参考），

初始化 $\Phi_b$ 和 $\Phi_w$ （Kaldi中把两者均初始化为 $\bold I$ ）.
E-step:
$k$ 从1到 $K$ , 计算
$\hat \Phi_k = (\Phi_b^{-1}+n_k\Phi_w^{-1}) \\ w_k = \hat \Phi_k n_k \Phi_w^{-1}m$
M-step:
$\Phi_w = \frac{1}{N}(S + \sum_kn_k(\hat \Phi_k + (w_k - m_k)(w_k - m_k)^T)) \\ \Phi_b = \frac{1}{K}\sum_k(\hat \Phi_k + w_kw_k^T)$
其中 $\sum_k\sum_i(z_{ki} - m_k)(z_{ki} - m_k)^T, m_k = \frac{1}{n_k}\sum_i z_{ki}$ 。

Kaldi训练PLDA的代码阅读
Kaldi中PLDA实现的相关代码主要在src/ivectorbin/下的四个文件中：plda.h， plda.cc, ivector-compute-plda.cc, ivector-plda-scoring.cc。
为了方便阅读，接下来我会把主干部分贴出来。
我们首先来看ivector-compute-plda.cc, 从这里我们可以知道实现中用到了哪些类。

PldaEstimationConfig plda_config; //这里指定了E-M的迭代次数，默认是10次
SequentialTokenVectorReader spk2utt_reader(spk2utt_rspecifier);
RandomAccessBaseFloatVectorReader ivector_reader(ivector_rspecifier);
PldaStats plda_stats;  //生成一个plda_stats实例，主要用来存放数据
for (; !spk2utt_reader.Done(); spk2utt_reader.Next()) {
  std::string spk = spk2utt_reader.Key();
  const std::vector<std::string> &uttlist = spk2utt_reader.Value();
  if (uttlist.empty()) {
    KALDI_ERR << "Speaker with no utterances.";
  }
  std::vector<Vector<BaseFloat> > ivectors;
  ivectors.reserve(uttlist.size());

  for (size_t i = 0; i < uttlist.size(); i++) {
    std::string utt = uttlist[i];
    if (!ivector_reader.HasKey(utt)) {
      KALDI_WARN << "No iVector present in input for utterance " << utt;
      num_utt_err++;
    } else {
      ivectors.resize(ivectors.size() + 1);
      ivectors.back() = ivector_reader.Value(utt);
      num_utt_done++;
    }
  }

  if (ivectors.size() == 0) {
    KALDI_WARN << "Not producing output for speaker " << spk
               << " since no utterances had iVectors";
    num_spk_err++;
  } else {
    Matrix<double> ivector_mat(ivectors.size(), ivectors[0].Dim());
    for (size_t i = 0; i < ivectors.size(); i++)
      ivector_mat.Row(i).CopyFromVec(ivectors[i]);
    double weight = 1.0; // The code supports weighting but
                         // we don't support this at the command-line
                         // level yet.
    plda_stats.AddSamples(weight, ivector_mat);
    num_spk_done++;
  }
}
plda_stats.Sort();  // Sort class_info_ to make num_examples in increasing order. 
PldaEstimator plda_estimator(plda_stats);
Plda plda;
plda_estimator.Estimate(plda_config, &plda);
WriteKaldiObject(plda, plda_wxfilename, binary);

从这段代码中，我们看到主要用到了三个类：PldaStats, PldaEstimator 和 Plda。接下来我们一个一个来看。
PldaStats 主要用来存放i-vecrtor数据以及一些统计参数。它的主要数据有：

// 假设weight都是默认参数1.0
int32 dim_;                    // i-vector的维度
int64 num_classes_;     // 说话人的个数
int64 num_examples_; // 所有说话人的总i-vector个数 N
double class_weight_; // 类的个数，即说话人的个数 K
double example_weight_; // 所有说话人的总i-vector个数 N
Vector<double> sum_;     // K个说话人平均i-vector之和
SpMatrix<double> offset_scatter_; //就是第一部分所说的S矩阵

PldaStats还有一个比较重要的结构成员 ClassInfo, std::vector<ClassInfo> class_info_中每个元素是(weight, mean, num_examples)。PldaStats的成员函数AddSamples主要用来添加数据。代码如下, 注释就加在代码中了：

void PldaStats::AddSamples(double weight,
                           const Matrix<double> &group) {
  if (dim_ == 0) {
    Init(group.NumCols());  // initialize all the PldaStats parameters. See line 327.
  } else {
    KALDI_ASSERT(dim_ == group.NumCols());
  }
  int32 n = group.NumRows(); // number of examples for this class
  Vector<double> *mean = new Vector<double>(dim_);
  mean->AddRowSumMat(1.0 / n, group);   // Does *this = 1.0/n * (sum of rows of M) + 1.0 * *this

  // The following two lines computes MM^T - n * mean mean^T, i.e., the scatter matrix within one speaker.
  offset_scatter_.AddMat2(weight, group, kTrans, 1.0); //(*this) = 1.0*(*this) + weight * M^T * M
  // the following statement has the same effect as if we
  // had first subtracted the mean from each element of
  // the group before the statement above.
  offset_scatter_.AddVec2(-n * weight, *mean);  // rank-one update, this <– this + alpha v v'


  class_info_.push_back(ClassInfo(weight, mean, n));

  num_classes_ ++; 
  num_examples_ += n;               // \sum_{k=1}^K n_k
  class_weight_ += weight;          // K
  example_weight_ += weight * n;    // \sum_{k=1}^K n_k

  sum_.AddVec(weight, *mean);  // add mean_k to sum_
}

值得注意的是：group 是一个存放一个说话人所有i-vector的矩阵，其中的每一行代表一个i-vector。mean 求的是 $m_k$ 。用 $M$ 来表示i-vector矩阵，在每次调用AddSamples函数时，offset_scatter就加上 $MM^T - n_k m_k m_k^T$ , 即一个说话人的scatter matrix。

PldaEstimator是一个非常重要的类，PLDA的训练主要是通过它来实现。我们下来看一下它含有那些主要的数据成员。

const PldaStats &stats_;
SpMatrix<double> within_var_;
SpMatrix<double> between_var_;

// These stats are reset on each iteration.
SpMatrix<double> within_var_stats_;
double within_var_count_; // count corresponding to within_var_stats_
SpMatrix<double> between_var_stats_;
double between_var_count_; // count corresponding to within_var_stats_

其中within_var_ 和 between_var_用来存放每次迭代一次最终的 $\Phi_w$ 和 $\Phi_b$ 。within_var_stats_, within_var_count_, between_var_stats_ 和 between_var_count_ 相当于每次迭代的临时寄存器，会把每次迭代的中间数据写进去，每次迭代开始前都会重置为0。其中within_var_count 和 between_var_count_在每次迭代后基本上都分别等于N和K，用来计算上面EM算法公式中的 $1 / N$ 和 $1 / K$ 。
现在来具体看完成一步迭代的过程：

EstimateOneIter();
void PldaEstimator::EstimateOneIter() {
  ResetPerIterStats();
  GetStatsFromIntraClass();
  GetStatsFromClassMeans();
  EstimateFromStats();
  KALDI_VLOG(2) << "Objective function is " << ComputeObjf();
}

首先Reset所有的统计数据：把within_var_stats 和 between_var_stats_的元素都设为0，把within_var_count_ 和 between_var_count_ 设为0。

void PldaEstimator::ResetPerIterStats() {
  within_var_stats_.Resize(Dim());   // set elements to zeros.
  within_var_count_ = 0.0;
  between_var_stats_.Resize(Dim()); // set elements to zeros.
  between_var_count_ = 0.0;
}

第二步：把offset_scatter $S$ 拷贝到within_var_stats中，将within_var_count_设为 $N - K$ 。

void PldaEstimator::GetStatsFromIntraClass() {
  within_var_stats_.AddSp(1.0, stats_.offset_scatter_);  // equivalent to copying stats_.offset_scatter_ to within_var_stats_: The value computed is (1.0 * within_var_stats_[i][j]) + offset_scatter_[i][j].
  // Note: in the normal case, the expression below will be equal to the sum
  // over the classes, of (1-n), where n is the #examples for that class.  That
  // is the rank of the scatter matrix that "offset_scatter_" has for that
  // class. [if weights other than 1.0 are used, it will be different.]
  within_var_count_ += (stats_.example_weight_ - stats_.class_weight_);  // N - K, to get the unbiased covariance estimator?
}

第三步是训练PLDA的主要步骤：基本上对应上面E-M中列出的公式。

void PldaEstimator::GetStatsFromClassMeans() {
  SpMatrix<double> between_var_inv(between_var_);  // define \Phi_b^{-1} initialized with last-step \Phi_b
  between_var_inv.Invert();                        // now is \Phi_b^{-1}
  SpMatrix<double> within_var_inv(within_var_);    // the same as steps above
  within_var_inv.Invert();
  // mixed_var will equal (between_var^{-1} + n within_var^{-1})^{-1}.
  SpMatrix<double> mixed_var(Dim());               // define \hat \Phi
  int32 n = -1; // the current number of examples for the class.

  for (size_t i = 0; i < stats_.class_info_.size(); i++) {
    const ClassInfo &info = stats_.class_info_[i];
    double weight = info.weight;
    if (info.num_examples != n) {
      n = info.num_examples;
      mixed_var.CopyFromSp(between_var_inv);
      mixed_var.AddSp(n, within_var_inv);
      mixed_var.Invert();
    }
    Vector<double> m = *(info.mean); // the mean for this class.
    m.AddVec(-1.0 / stats_.class_weight_, stats_.sum_); // remove global mean
    Vector<double> temp(Dim()); // n within_var^{-1} m
    temp.AddSpVec(n, within_var_inv, m, 0.0); //Add symmetric positive definite matrix times vector: this <– n*within_var_inv*m.
    Vector<double> w(Dim()); // w, as defined in the comment.
    w.AddSpVec(1.0, mixed_var, temp, 0.0);  // w = (between_var^{-1} + n within_var^{-1})^{-1} * n within_var^{-1} m
    Vector<double> m_w(m); // m - w
    m_w.AddVec(-1.0, w);
    between_var_stats_.AddSp(weight, mixed_var);
    between_var_stats_.AddVec2(weight, w);    // \Phi_b = (between_var^{-1} + n within_var^{-1})^{-1} + w w^T
    between_var_count_ += weight;             // to count num of classes
    within_var_stats_.AddSp(weight * n, mixed_var);
    within_var_stats_.AddVec2(weight * n, m_w);   // \Phi_w = n * ((between_var^{-1} + n within_var^{-1})^{-1} + (m-w)(m-w)^T)
    within_var_count_ += weight;
  }
}

第四部计算出这一次迭代结束后得到的 $\Phi_w$ 和 $\Phi_b$ ，并把他们分别写到within_var_和between_var_中。

void PldaEstimator::EstimateFromStats() {
  within_var_.CopyFromSp(within_var_stats_);
  within_var_.Scale(1.0 / within_var_count_);
  between_var_.CopyFromSp(between_var_stats_);
  between_var_.Scale(1.0 / between_var_count_);
}

函数PldaEstimator::EstimateOneIter()中的最后一行ComputeObjf()函数主要来计算并显示每次迭代后 $\text{ln}\ p(x)$ 的值，对于训练过程没什么影响，请参照篇首的参考A Note on Kaldi’s PLDA Implementation进行代码阅读，也可以参考paper中式（7）和式（8）。
迭代10次以后，我们就得到了 $\Phi_w$ 和 $\Phi_b$ ，为了方便后面的PLDA打分，我们需要得到转换矩阵 $A$ 以及对角矩阵 $\Psi$ 。这个过程主要体现在下面的一条代码中，重要的是去看类Plda是怎么操作的。

GetOutput(plda);

类Plda中的主要数据成员有：

Vector<double> mean_;  // mean of samples in original space.
Matrix<double> transform_; // of dimension Dim() by Dim();
                           // this transform makes within-class covar unit
                           // and diagonalizes the between-class covar.
Vector<double> psi_; // of dimension Dim().  The between-class
                     // (diagonal) covariance elements, in decreasing order.

Vector<double> offset_;  // derived variable: -1.0 * transform_ * mean_

函数PldaEstimator::GetOutput就是把上面迭代所得到的 $\Phi_w$ 和 $\Phi_b$ 转换到 $A$ 和 $\Psi$ 并把它们写到Plda的实例数据成员中用于写操作。我已经在代码中加了comment，应该很好理解了。

void PldaEstimator::GetOutput(Plda *plda) {
  plda->mean_ = stats_.sum_;
  plda->mean_.Scale(1.0 / stats_.class_weight_);
  KALDI_LOG << "Norm of mean of iVector distribution is "
            << plda->mean_.Norm(2.0);

  Matrix<double> transform1(Dim(), Dim());
  ComputeNormalizingTransform(within_var_, &transform1);
  // now transform is a matrix that if we project with it,
  // within_var_ becomes unit.

  // between_var_proj is between_var after projecting with transform1.
  SpMatrix<double> between_var_proj(Dim());
  between_var_proj.AddMat2Sp(1.0, transform1, kNoTrans, between_var_, 0.0);  // alpha * M * A * M^T.

  Matrix<double> U(Dim(), Dim());
  Vector<double> s(Dim());
  // Do symmetric eigenvalue decomposition between_var_proj = U diag(s) U^T,
  // where U is orthogonal.
  between_var_proj.Eig(&s, &U);

  KALDI_ASSERT(s.Min() >= 0.0);
  int32 n;
  s.ApplyFloor(0.0, &n);
  if (n > 0) {
    KALDI_WARN << "Floored " << n << " eigenvalues of between-class "
               << "variance to zero.";
  }
  // Sort from greatest to smallest eigenvalue.
  SortSvd(&s, &U);

  // The transform U^T will make between_var_proj diagonal with value s
  // (i.e. U^T U diag(s) U U^T = diag(s)).  The final transform that
  // makes within_var_ unit and between_var_ diagonal is U^T transform1,
  // i.e. first transform1 and then U^T.

  plda->transform_.Resize(Dim(), Dim());
  plda->transform_.AddMatMat(1.0, U, kTrans, transform1, kNoTrans, 0.0);  // U^T transform1
  plda->psi_ = s;

  KALDI_LOG << "Diagonal of between-class variance in normalized space is " << s;

  if (GetVerboseLevel() >= 2) { // at higher verbose levels, do a self-test
                                // (just tests that this function does what it
                                // should).
    SpMatrix<double> tmp_within(Dim());
    tmp_within.AddMat2Sp(1.0, plda->transform_, kNoTrans, within_var_, 0.0);
    KALDI_ASSERT(tmp_within.IsUnit(0.0001));
    SpMatrix<double> tmp_between(Dim());
    tmp_between.AddMat2Sp(1.0, plda->transform_, kNoTrans, between_var_, 0.0);
    KALDI_ASSERT(tmp_between.IsDiagonal(0.0001));
    Vector<double> psi(Dim());
    psi.CopyDiagFromSp(tmp_between);
    AssertEqual(psi, plda->psi_);
  }
  plda->ComputeDerivedVars();  // off_set_ = -1.0 * (U^T V)^{-1} m
}

到这，基本上就训练完PLDA了，Kaldi通过下面的形式把训练好的PLDA模型写到硬盘里：其中的mean_是global i-vector mean。

void Plda::Write(std::ostream &os, bool binary) const {
  WriteToken(os, binary, "<Plda>");
  mean_.Write(os, binary);
  transform_.Write(os, binary);
  psi_.Write(os, binary);
  WriteToken(os, binary, "</Plda>");
}

PLDA打分
Kaldi中的PLDA参考是Sergey Ioffe的这篇paper的打分方法。我们能够把一个类的多个sample整合进一个模型，从而提高性能。假设一个类中有 $n$ 个独立的samples $u_{1...n}^g$ ，则有
$P(u^p | u^g_{1…n}) = N (u^p | \frac{n \Psi}{n \Psi + I} \bar{u}^g, I + \frac{\Psi}{n\Psi + I})$
这里上标 $^p$ 表示“probe” example （也就是要被分类的测试i-vector），我们想要得到 likelihood ratio $P(u^p | u^g_{1..n}) / P(u^p)$ , where the numerator is the probability of $u^p$ given that it’s in that class, and the
denominator is the probability of $u^p$ with no class assumption at all (e.g. in its own class). The expression above even works for n = 0 (e.g. the denominator of the likelihood ratio), where it gives us
$P(u^p) = N(u^p | 0, I + \Psi)$
i.e. it’s distributed with zero mean and covariance (within + between).
The likelihood ratio we want is:
$\frac{N(u^p | \frac{n \Psi}{n \Psi + I} \bar{u}^g, I + \frac{\Psi}{n \Psi + I})} { N(u^p | 0, I + \Psi)}$
where $\bar{u}^g$ is the mean of the “gallery examples”; and we can expand the log likelihood ratio as
$(u^p - m)^T (I + \Psi/(n \Psi + I))^{-1} (u^p - m) + logdet(I + \Psi/(n \Psi + I)) ] \\+ 0.5 [u^p (I + \Psi) u^p + logdet(I + \Psi) ]$
where $\frac{(n \Psi)}{(n \Psi + I)} \bar{u}^g$
Miscellaneous
这里记录一些杂七杂八的东西。
代码中有两种length normalization的方式：

simple lenth normalization: causes the i-vector length to be equal to $\sqrt {dim}$ .
normalize length: ensures that $ivector^T (\Psi + \frac{\bold I}{n_k})^{-1}ivector$ = $d i m$ .

ShaunSXLiu

关注

8
点赞
踩
11

收藏

觉得还不错? 一键收藏
1
评论
【学点Kaldi】Kaldi PLDA实现C++代码阅读

PLDA(Probabilistic Linear Discriminant Analysis) 广泛用于Speaker Verification中，这篇博客主要记录一下本人阅读Kaldi中PLDA底层C++代码的过程。Kaldi中PLDA的实现主要参考Sergey Ioffe的这篇paper. 代码中有很多有用的comments, 对于理解实现思路非常重要。本人在写这篇博客时也参考了A Not...
复制链接

扫一扫

专栏目录