声纹验证和声纹识别中的AS-norm、Z-norm、T-norm、ZT-norm、 S-norm操作

觉子

已于 2023-03-02 17:04:26 修改

阅读量1.5k

点赞数 2

分类专栏：声纹识别声纹验证文章标签：语音识别深度学习人工智能

于 2022-11-28 16:01:42 首次发布

本文链接：https://blog.csdn.net/weixin_44297731/article/details/128081227

版权

声纹识别同时被 2 个专栏收录

1 篇文章 0 订阅

订阅专栏

声纹验证

1 篇文章 0 订阅

订阅专栏

在声纹验证及声纹识别任务中，注册语音和测试语音之间的得分受到环境差别、语义内容不同等因素的影响，为了更好的确定阈值，需要对得分进行标准化。以AS-norm为例（搞懂这个，其它的so easy），具体步骤如下：

构造冒认语音集，需要与注册语音及测试语音不同的speaker；
分别计算注册语音及测试语音与冒认语音集的余弦相似度score；
从两个score序列中分别选取topk个score并计算mean和std；
根据计算出的两个mean和std对注册语音和测试语音之间的score标准化；
代码如下：

def AS_norm(score, enroll_embedding, test_embedding, cohort_embeddings, topk):
    # score 代表注册和测试语音的score；*_embedding 代表测试和注册语音；cohort__embeddings 代表冒认数据集 
    # 计算测试语音与冒认数据集的socre
    enroll_scores = torch.matmul(cohort_embeddings, enroll_embedding.T)[:,0] 
    enroll_scores = torch.topk(enroll_scores, topk, dim = 0)[0]
    enroll_mean = torch.mean(enroll_scores, dim = 0)
    enroll_std = torch.std(enroll_scores, dim = 0)
    # 计算注册语音与冒认数据集的socre
    test_scores = torch.matmul(cohort_embeddings, test_embedding.T)[:,0]
    test_scores = torch.topk(test_scores, topk, dim = 0)[0]
    test_mean = torch.mean(test_scores, dim = 0)
    test_std = torch.std(test_scores, dim = 0)
    # score norm
    score = 0.5 * (score - enroll_mean) / enroll_std  + 0.5 * (score - test_mean) / test_std
    return score

需要注意的点：
1). 冒认数据集需要与测试（注册）数据具有相似的分布，包括场景、语种、信道、性别等；
2). 冒认数据集中的每一个speaker尽可能只包含一个语音片段；
3). 可基于冒认数据集score的平均值设置一个从负到正4-5倍标准差的“安全”区间，消除/拒绝异常值分数；
参考文章：Analysis of Scnore Normalization in Multilingual Speaker Recognition（https://www.isca-speech.org/archive_v0/Interspeech_2017/pdfs/0803.PDF）