0. 前言
kaldi-gop有两种算法实现,一种gop,一种是svr
svr算法要强于gop
1. GOP
1.1 GOP-GMM
在传统的基于GMM-HMM的系统中,GOP最早是在(Witt et al., 2000)中提出的。
它被定义为标准化后对数时段后验概率(the duration normalised log of the posterior)
这里的Q 表示所有phone音素集合
等式的分子是从强制对齐结果计算出来的,而分母是从具有不受约束的音素循环的维特比解码中计算出来的。
1.2 GOP-NN
GOP-NN的定义与 GOP-GMM 有点不同。GOP-NN 定义为规范音素与得分最高的音素之间的对数音素后验比率(Hu et al., 2015)。
首先,我们定义 Log Phone Postterior (LPP):
然后我们使用LPP定义 GOP-NN:
LPP 可以计算为
where s is the senone label, s ∣ s ∈ p s|s∈ps∣s∈p is the states belonging to those triphones whose current phone is p pp.
这里s 表示senone标记,s ∣ s ∈ p s|s∈ps∣s∈p 是当前音素是p pp时的那些三音素的状态
1.3 Phone-level Feature
通常,基于分类器的方法比基于gop的方法性能更好
和基于gop的方法不同,需要额外的监督训练过程。监督训练的输入特征是phone级别的分段特征。
phone级别的特征定义为:
这里 M 是所有phone音素集合
其中 phone pj 和 phone pi之间的对数后验概率比率(
Log Posterior Ratio (LPR))定义为:
2. SVR
svr支持向量机回归,kaldi-gop中的svr来自sklearn.svm,具体原理参考:地址
3. c++实现
GOP张俊博大神已经实现,链接地址:git地址
这里主要讲svr的c++实现,c++有支持向量机对应的开源库libsvm
其通过参数控制就可以实现svr,但是当时svr python版对各音素生成的模型进行了pickle.dump,而又没找到对应的c++ pickle方法,所以决定在python端使用libsvm重新实现一遍svr。
实现代码:
# Copyright 2021 Xiaomi Corporation (Author: Junbo Zhang)
# Apache 2.0
# This script trains models to convert GOP-based feature into human
# expert scores.
# 1c is as 1b, but use SVR instead of random forest regression.
# Comparing with 1b, the f1-score of the class 0 is much improved.
# MSE: 0.16
# Corr: 0.45
#
# precision recall f1-score support
#
# 0 0.42 0.30 0.35 1339
# 1 0.16 0.36 0.22 1828
# 2 0.97 0.92 0.94 44079
#
# accuracy 0.88 47246
# macro avg 0.52 0.53 0.50 47246
# weighted avg 0.92 0.88 0.90 47246
import sys
import argparse
import pickle
import kaldi_io
import numpy as np
from concurrent.futures import ProcessPoolExecutor
from sklearn.svm import SVR
from utils import (load_phone_symbol_table,
load_human_scores,
add_more_negative_data)
import os
from libsvm.svmutil import *
from libsvm.svm import *
def get_args():
parser = argparse.ArgumentParser(
description='Train a simple polynomial regression model to convert '
'gop into human expert score',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--phone-symbol-table', type=str, default='',
help='Phone symbol table, used for detect unmatch '
'feature and labels')
parser.add_argument('--nj', type=int, default=1, help='Job number')
parser.add_argument('feature_scp',
help='Input gop-based feature file, in Kaldi scp')
parser.add_argument('human_scoring_json',
help='Input human scores file, in JSON format')
parser.add_argument('model', help='Output the model file')
sys.stderr.write(' '.join(sys.argv) + "\n")
args = parser.parse_args()
return args
def train_model_for_phone(ph, label_feat_pairs, model_save_path):
labels, feats = list(zip(*label_feat_pairs))
feats_list = list(feats)
# 计算参数gamma
X = np.array(feats).reshape(-1, len(feats[0]))
X_var = X.var()
_gamma = 1.0 / (X.shape[1] * X_var) if X_var != 0 else 1.0
feats_dict_list = []
for f_l in feats_list:
feats_dict = {}
index = 0
for f_v in f_l.tolist():
feats_dict[index] = f_v
index += 1
feats_dict_list.append(feats_dict)
labels_list = list(labels)
#model = svm_train(labels_list, feats_dict_list, '-s 3 -t 2 -c 1 -m 200 -n 0 -g ' + str(_gamma))
model = svm_train(labels_list, feats_dict_list, '-s 3 -t 2 -c 2 -m 200 -n 0 -g 0.125')
if not os.path.exists(model_save_path):
os.makedirs(model_save_path)
else:
pass
svm_save_model(model_save_path + '/' + str(ph), model)
return model
def predict_score_for_phone(ph, label_feat_pairs, model_save_path):
labels, feats = list(zip(*label_feat_pairs))
feats_list = list(feats)
feats_dict_list = []
for f_l in feats_list:
feats_dict = {}
index = 0
for f_v in f_l.tolist():
feats_dict[index] = f_v
index += 1
feats_dict_list.append(feats_dict)
labels_list = list(labels)
model = svm_load_model(os.path.join(model_save_path, str(ph)))
p_labs, p_acc, p_vals = svm_predict(labels_list, feats_dict_list, model)
def predict_score_for_text(ph, feats):
model = svm_load_model(os.path.join(model_save_path, str(ph)))
p_labs, p_acc, p_vals = svm_predict([], feats, model)
def main():
args = get_args()
# Phone symbol table
_, phone_int2sym = load_phone_symbol_table(args.phone_symbol_table)
# Human expert scores
score_of, phone_of = load_human_scores(args.human_scoring_json, floor=1)
# Prepare training data
train_data_of = {}
for ph_key, feat in kaldi_io.read_vec_flt_scp(args.feature_scp):
if ph_key not in score_of:
print(f'Warning: no human score for {ph_key}')
continue
ph = int(feat[0])
if phone_int2sym is not None:
if phone_int2sym[ph] != phone_of[ph_key]:
print(f'Unmatch: {phone_int2sym[ph]} <--> {phone_of[ph_key]} ')
continue
score = score_of[ph_key]
train_data_of.setdefault(ph, []).append((score, feat[1:]))
# Make the dataset more blance
train_data_of = add_more_negative_data(train_data_of)
for ph, pairs in train_data_of.items():
train_model_for_phone(ph, pairs, args.model)
# predict_score_for_phone(ph, pairs, args.model)
# write_data_to_file(pairs, '/home/wangyanwei/work/kaldi-master/egs/gop_stagekids/data.txt')
if __name__ == "__main__":
# 切换目录
os.chdir('/home/wangyanwei/work/kaldi-master/egs/gop_stagekids/s5')
main()
其中’-s 3 -t 2 -c 2 -m 200 -n 0 -g 0.125’各参数是通过执行sklearn.svm 的svr得到的,通过验证准确率改造前后基本一致
c++端按照python端libsvm的对应predict实现即可
参考链接:
https://blog.csdn.net/junxing2018_wu/article/details/126334989