I-Vector的推导详解

本文详细介绍了I-Vector模型,一种用于说话人确认的技术。文章从JFA模型背景出发,阐述了I-Vector如何建模说话人与信道的影响,并详细解析了全局差异空间矩阵T的估计过程,包括Baum-Welch统计量的计算、EM算法的E步和M步。最后,讨论了I-Vector的提取方法。
摘要由CSDN通过智能技术生成

在深度学习的路上,从头开始了解一下各项技术。本人是DL小白,连续记录我自己看的一些东西,大家可以互相交流。

一、前言

目前UBM-MAP-GMM是一种说话人确认技术的标准系统,在JFA模型下,我们可以把生成式模型扩展成为具有说话人部分(本征音空间矩阵V)和信道部分(本征信道空间矩阵U)这两个联合模型。但是,在JFA的实际使用中,我们发现说话人部分和信道部分没办法完美分离,还会存在一定的干扰。受到JFA理论的启发,Dehak提出了从GMM均值超矢量中提取一个更紧凑的矢量,称为I-Vector。即为Identity-Vector。

二、I-Vector模型概述

在JFA模型中,我们建模过程主要基于:本征音空间矩阵V定义的说话人空间和本征音信道空间矩阵U定义的信道空间。

在I-Vector模型中,我们采用全局差异空间(Total Variability Space,T),即包含了话者之间的差异又包含了信道之间的差异。所以I-Vector的建模过程在GMM均值超矢量中不严格区分话者的影响和信道的影响。

给定说话人s的一段语音h,这一新的说话人及信道相关的GMM均值超矢量定义为如下公式:

其中,mu是说话人与信道独立的均值超矢量,即为UBM的均值超矢量,形状为(CP, 1);

T是全局空间矩阵,形状为(CP, Rw);

Ws,h 是全局差异因子,形状为(Rw, 1);

其中,Rw表

MSR Identity Toolbox: A Matlab Toolbox for Speaker Recognition Research Version 1.0 Seyed Omid Sadjadi, Malcolm Slaney, and Larry Heck Microsoft Research, Conversational Systems Research Center (CSRC) s.omid.sadjadi@gmail.com, {mslaney,larry.heck}@microsoft.com This report serves as a user manual for the tools available in the Microsoft Research (MSR) Identity Toolbox. This toolbox contains a collection of Matlab tools and routines that can be used for research and development in speaker recognition. It provides researchers with a test bed for developing new front-end and back-end techniques, allowing replicable evaluation of new advancements. It will also help newcomers in the field by lowering the “barrier to entry”, enabling them to quickly build baseline systems for their experiments. Although the focus of this toolbox is on speaker recognition, it can also be used for other speech related applications such as language, dialect and accent identification. In recent years, the design of robust and effective speaker recognition algorithms has attracted significant research effort from academic and commercial institutions. Speaker recognition has evolved substantially over the past 40 years; from discrete vector quantization (VQ) based systems to adapted Gaussian mixture model (GMM) solutions, and more recently to factor analysis based Eigenvoice (i-vector) frameworks. The Identity Toolbox provides tools that implement both the conventional GMM-UBM and state-of-the-art i-vector based speaker recognition strategies. A speaker recognition system includes two primary components: a front-end and a back-end. The front-end transforms acoustic waveforms into more compact and less redundant representations called acoustic features. Cepstral features are most often used for speaker recognition. It is practical to only retain the high signal-to-noise ratio (SNR) regions of the waveform, therefore there is also a need for a speech activity detector (SAD) in the fr
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值