在语音识别处理中,一般采用基于高斯混合模型的HMM框架。在模型的训练中(如htk的代码),涉及到大量的概率值计算,这些概率值往往是很小的浮点数。而且概率值相乘后会越变越小,计算起来会损失精度,为了保持准确度,统一将这些概率值进行log处理,再参与运算。也就是说,在代码中处理的概率是对数域的值,即:
p1' = logp1
p2' = logp2
在对数域中,乘法和除法变换为:loga*b = loga + logb; loga/b = loga - logb;
所以对于p=p1*p2,则转换到对数域后变为:p' = logp = logp1*p2 = logp1+logp2 = p1'+p2'
而加法则采用log-add算法:
log(A+B) = log(A(1+B/A)) = logA+log(1+B/A); where, A > B; if A < B then switch A and B in formula: For the ln(1 + B/A) term, the system can calculate:
log(B/A) = logB - logA
因此,对于p=p1+p2,变换到对数域后变成:
p' = logp = logp1+p2 = log(elogp1 + elogp2) = log(ep1'*(1+ep2'-p1')) = p1' + log(1+ep2'-p1')
HTK中的源代码如下:
代码中的参数x相当于p1',y相当于p2'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
#define double LogDouble
/* EXPORT->LAdd: Return sum x + y on log scale,
sum < LSMALL is floored to LZERO */
LogDouble LAdd(LogDouble x, LogDouble y)
{
LogDouble temp,diff,z;
//exchange the x and y,if x<y
if (x<y)
{
temp = x; x = y; y = temp;
}
diff = y-x;
//相当于B/A很小,那么log(1+B/A)这项就等于0,结果取决于logA这项。
if (diff<minLogExp)
return (x<LSMALL)?LZERO:x; //如果logA很小则结果为LZERO,这是一个预定义的很小的数值
else
{
z = exp (diff);
return x+ log (1.0+z);
}
}
|
参考资料:
http://www.ck365.cn/anli/201104/27/22441.html
FPGA Implementation for GMM-Based Speaker Identification,Phaklen EhKan,1, 2 Timothy Allen,1 and Steven F. Quigley1。International Journal of Reconfigurable Computing,doi:10.1155/2011/420369