声纹识别之i-vector/plda

最近做声纹识别的项目,要用到i-vector/plad,通过查阅各种资料,大概弄清楚了整个过程的来龙去脉,在此记录一下。
整个过程为gmm-ubm -> i-vector -> plda

gmm-ubm

Steps:

  1. 背景数据 -> UBM -> ubm’s mean super-vector
  2. UBM -> 说话人多条注册语音 -> MAP Adaptation -> 说话人相关GMM
  3. 说话人相关GMM -> GMM super-vector(均值超矢量)

Usage:
说话人测试语音 -> 说话人相关GMM score -> 阈值判断

i-vector

给定说话人一段语音, 与之对应的高斯均值超矢量定义为:
s = m + T ω s = m + T\omega s=m+Tω
其中:

  • s (super-vector)
    • 给定说话人语音的高斯均值超矢量(用户相关GMM获得)
  • m (ubm’s mean super-vector)
    • 通用背景模型(UBM)的高斯均值超矢量(与具体说话人及信道无关)
  • T (total-vavriability matrix)
    • 全局差异空间矩阵
  • ω \omega ω (i-vector)
    • 全局差异空间因子

求解过程:

  • 全局差异空间矩阵T的估计
    • 假设每一段语音都来自不同的说话人
    • 计算训练数据库中每个说话人对应的Baum-Welcn统计量
    • EM算法迭代估计T矩阵(大概10次)
  • i-vector提取
    • 计算数据库中每个目标说话人对应的Baum-Welch统计量
    • 带入全局差异空间矩阵T
    • 计算 ω \omega ω的后验均值即为i-vector

plda

i-vector包含说话人和信道的信息, 可以使用lda/plda减弱信道的影响

X i j = μ + F h i + G w i j + ϵ i j X_{ij} = \mu + Fh_{i} + Gw_{ij} + \epsilon_{ij} Xij=μ+Fhi+Gwij+ϵij
其中:

  • μ + F h i \mu + Fh_{i} μ+Fhi 信号部分, 描述说话人之间的差异(类间差异)
    • μ \mu μ 全体训练数据的均值
    • F 身份空间, 包含用来表示各种说话人的信息
    • h i h_{i} hi 具体的一个说话人的身份(说话人在身份空间中的位置)
  • G w i j + ϵ i j Gw_{ij} + \epsilon_{ij} Gwij+ϵij 噪音部分, 描述同一说话人的不同语音之间的差异(类内差异)
    • G 误差空间, 包含可以用来表示同一说话人不同语音变化的信息
    • w i j w_{ij} wij 表示说话人的某一条语音在G空间中的位置
    • ϵ i j \epsilon_{ij} ϵij 最后的残留噪声项, 表示尚未解释的东西

用两个假象变量( θ \theta θ, Σ \Sigma Σ)描述一个语音的数据结构:
θ = [ μ , F , G , Σ ] \theta = [\mu, F, G, \Sigma] θ=[μ,F,G,Σ]

plda模型训练
目标就是输入一堆数据 X i j X_{ij} Xij(多个说话人多条语音), 使用EM迭代求解

  • 均值处理
    • 计算所有训练数据 X i j X_{ij} Xij的均值 μ \mu μ, 从训练数据中减去该均值 X i j = X i j − μ X_{ij} = X_{ij} - \mu Xij=Xijμ
    • 根据训练数据中的人数N, 计算N个人的均值 N μ N_{\mu} Nμ
  • 初始化
    • 噪声空间G, 随机初始化
    • 身份空间F, 对每个人的均值数据 N μ N_{\mu} Nμ进行PCA降维, 赋值给F
    • 方差 Σ \Sigma Σ初始化为常量
  • EM迭代优化

简化版plda
X i j = μ + F h i + ϵ i j X_{ij} = \mu + Fh{i} + \epsilon_{ij} Xij=μ+Fhi+ϵij

Reference

  • 7
    点赞
  • 33
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
MSR Identity Toolbox: A Matlab Toolbox for Speaker Recognition Research Version 1.0 Seyed Omid Sadjadi, Malcolm Slaney, and Larry Heck Microsoft Research, Conversational Systems Research Center (CSRC) [email protected], {mslaney,larry.heck}@microsoft.com This report serves as a user manual for the tools available in the Microsoft Research (MSR) Identity Toolbox. This toolbox contains a collection of Matlab tools and routines that can be used for research and development in speaker recognition. It provides researchers with a test bed for developing new front-end and back-end techniques, allowing replicable evaluation of new advancements. It will also help newcomers in the field by lowering the “barrier to entry”, enabling them to quickly build baseline systems for their experiments. Although the focus of this toolbox is on speaker recognition, it can also be used for other speech related applications such as language, dialect and accent identification. In recent years, the design of robust and effective speaker recognition algorithms has attracted significant research effort from academic and commercial institutions. Speaker recognition has evolved substantially over the past 40 years; from discrete vector quantization (VQ) based systems to adapted Gaussian mixture model (GMM) solutions, and more recently to factor analysis based Eigenvoice (i-vector) frameworks. The Identity Toolbox provides tools that implement both the conventional GMM-UBM and state-of-the-art i-vector based speaker recognition strategies. A speaker recognition system includes two primary components: a front-end and a back-end. The front-end transforms acoustic waveforms into more compact and less redundant representations called acoustic features. Cepstral features are most often used for speaker recognition. It is practical to only retain the high signal-to-noise ratio (SNR) regions of the waveform, therefore there is also a need for a speech activity detector (SAD) in the fr

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值