梅尔频率详解
梅尔谱能够刻画人耳听觉相应,广泛应用于语音领域。最近用fbank特征进行模式识别,采用python中librosa库,发现从Hz频率到Mel频率有两种转换方式,而默认的方式并不是按照我们熟知的公式进行转换的,因此详细研究了一下python中librosa库中与mel频谱有关的源代码。这里我们主要关注数学表达,数学表达搞清后写代码自然不是难事。
通常情况下,hz频率
f
f
f与mel频谱
m
m
m通过如下公式转换
m
=
2595
l
g
(
1
+
f
700
)
,
f
=
700
(
1
0
m
2595
−
1
)
.
m = 2595lg(1+\frac{f}{700}), f=700(10^{\frac{m}{2595}}-1)\,.
m=2595lg(1+700f),f=700(102595m−1).
一、python库librosa中hz转mel的代码
如下图,函数输入为(frequencies,htk)frequencies为待转化的频率,htk为是否用HTK formula进行转化,默认htk=False。官方文档是这么解释的:htk,bool,If True, use HTK formula to convert Hz to mel. Otherwise (False), use Slaney’s Auditory Toolbox.那么这里的Slaney’s Auditory Toolbox是怎么计算mel频率的呢?根据代码来看是这样:
对于小于1000Hz的频率,进行线性转化,
m
=
3
f
200
m=\frac{3f}{200}
m=2003f
对于大于(等于)1000Hz的频率,对数转化,
m
=
15
+
l
n
f
1000
l
n
6.4
27
m=15+\frac{ln\frac{f}{1000}}{\frac{ln6.4}{27}}
m=15+27ln6.4ln1000f,15是因为
15
=
3
∗
1000
200
15=\frac{3*1000}{200}
15=2003∗1000,至于6.4和27怎么来的我也不太清楚……
// An highlighted block
def hz_to_mel(frequencies, htk=False):
"""Convert Hz to Mels
Examples
--------
>>> librosa.hz_to_mel(60)
0.9
>>> librosa.hz_to_mel([110, 220, 440])
array([ 1.65, 3.3 , 6.6 ])
Parameters
----------
frequencies : number or np.ndarray [shape=(n,)] , float
scalar or array of frequencies
htk : bool
use HTK formula instead of Slaney
Returns
-------
mels : number or np.ndarray [shape=(n,)]
input frequencies in Mels
See Also
--------
mel_to_hz
"""
frequencies = np.asanyarray(frequencies)
if htk:
return 2595.0 * np.log10(1.0 + frequencies / 700.0)
# Fill in the linear part
f_min = 0.0
f_sp = 200.0 / 3
mels = (frequencies - f_min) / f_sp
# Fill in the log-scale part
min_log_hz = 1000.0 # beginning of log region (Hz)
min_log_mel = (min_log_hz - f_min) / f_sp # same (Mels)
logstep = np.log(6.4) / 27.0 # step size for log region
if frequencies.ndim:
# If we have array data, vectorize
log_t = frequencies >= min_log_hz
mels[log_t] = min_log_mel + np.log(frequencies[log_t] / min_log_hz) / logstep
elif frequencies >= min_log_hz:
# If we have scalar data, heck directly
mels = min_log_mel + np.log(frequencies / min_log_hz) / logstep
return mels
二、MATLAB中hz转mel的代码
hz2mel(frequency),HTK formula(default)
总结:两种计算mel频率的方法
- HTK formula:
m = 2595 l g ( 1 + f 700 ) . m = 2595lg(1+\frac{f}{700}). m=2595lg(1+700f). - Slaney’s Auditory Toolbox(default)
m = 3 f 200 ( f < 1000 ) , m = 15 + l n f 1000 l n 6.4 27 ( f > = 1000 ) m = \frac{3f}{200}(f<1000), m=15+\frac{ln\frac{f}{1000}}{\frac{ln6.4}{27}}(f>=1000) m=2003f(f<1000),m=15+27ln6.4ln1000f(f>=1000)
mel频率与hz频率的转化,在python里面可以用librosa库里面的hz_to_mel/mel_to_hz函数,在MATLAB里面可以用hz2mel/mel2hz函数。但是两者默认的转化法则不同。
【MATLAB】hz2mel(f)==librosa.hz_to_mel(f,htk=True)【python】