----2016.12.18 已补充 ----
sidekit还是挺不错的,很简单,文档更是直接把源码给你,如果能顺利搭好环境,如果有基础的话,一天之内跑通应该是没有问题的。
下面开始对GMM-UBM中说话人自适应调整以及计算得分进行详细的分析,其中也会有代码改写的部分,因为那么多h5文件,看着挺烦的, 在看下面之前首先保证已经熟悉了sidekit, 并且对里边的h5文件的格式都很清楚,否则没有必要继续往下看。
下面这是自适应部分的源码,utils是自己写的,gmm-score 和 EER 部分暂时请忽略,后面会涉及到,重点看MAP部分:
import sidekit
import numpy as np
from utils import EER, gmm_score
import h5py
'''
this stand version can run the predicted result
'''
enroll_idmap = sidekit.IdMap('task/enroll_spks2utt.h5')
ubm = sidekit.Mixture()
ubm.read("task/ubm.h5")
nj = 10
server_eval = sidekit.FeaturesServer(feature_filename_structure="./mfcc_eval/{}.h5",
dataset_list=["energy", "cep", "vad"],
mask=None,
feat_norm="cmvn",
keep_all_features=False,
delta=True,
double_delta=True,
rasta=True,
context=None)
print('Compute the sufficient statistics')
enroll_stat = sidekit.StatServer(enroll_idmap, ubm)
enroll_stat.accumulate_stat(ubm=ubm, feature_server=server_eval,\ seg_indices=range(enroll_stat.segset.shape[0]), num_thread=nj)
enroll_stat.write('task/stat_enroll_stand.h5')
print('MAP adaptation of the speaker models')
regulation_factor = 3 # MAP regulation factor
enroll_sv = enroll_stat.adapt_mean_map_multisession(ubm, regulation_factor)
enroll_sv.write('task/map_enroll_stand.h5')
print('Compute trial scores')
enroll = sidekit.StatServer('task/map_enroll_stand.h5')
s = np.zeros((59, 1024))
gscore = gmm_score(ubm, enroll, server_eval, s)
scores = gscore.compute_scores()
eer = EER(scores)
eer.compute_eer()
上面这段主要是两个方法一个是计算统计量accumulate_stat(), 还有一个是MAP部分更新统计量adapt_mean_map_multisession(), 下面分别看一下这两个方法,其中有写参数传递跟源码不太一样,本来想重写,但是写的不如人家好: