# 音频特征提取及差异

MFCC特征提取步骤：

FIR一阶高通滤波器，提升高频分量，传递函数为

$H(z) = 1-a*z{^{-1}}$

$y(n) = x(n) - a*x(n-1)$

matlab画出滤波器的响应曲线如下

freqz([1,-0.97],1)

![这里写图片描述](https://img-blog.csdn.net/20171228101410929?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvdTAxMDU5Mjk5NQ==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast)

STFT：

mel滤波：
mel定义了一组从线性频率到mel频率的映射，对应关系为：

$m = 2595log_{10}(1+\frac{f}{700})=1127log_{e}(1+\frac{f}{700})$

2595.0 * np.log10(1.0 + frequencies / 700.0)


mel滤波器是一组分布在mel刻度上的三角窗滤波器，matlab的voicebox中有可以直接得到mel滤波器的函数，写法如下：

fs = 8000;
bank=melbankm(20,512,fs,300/fs,3700/fs,'w');
bank=full(bank);
bank=bank/max(bank(:));
figure,plot(bank(10,:))
figure,plot(bank')


![这里写图片描述](https://img-blog.csdn.net/20171228103924837?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvdTAxMDU5Mjk5NQ==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast)

![这里写图片描述](https://img-blog.csdn.net/20171228104039441?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvdTAxMDU5Mjk5NQ==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast)

DCT变换：

$F(k) = C(k)*\sum_{n=0}^{N-1}x_k*cos(\frac{(2n+1)*k\pi}{2N})$

$F=G*f$

$x:400*268$
$xfft:512*268$
$melcoeff:20*257$
$DCT:13*20$

![这里写图片描述](https://img-blog.csdn.net/20171228111550002?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvdTAxMDU5Mjk5NQ==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast)

    basis = np.empty((n_filters, n_input))
basis[0, :] = 1.0 / np.sqrt(n_input)

samples = np.arange(1, 2*n_input, 2) * np.pi / (2.0 * n_input)

for i in range(1, n_filters):
basis[i, :] = np.cos(i*samples) * np.sqrt(2.0/n_input)


librosa中提取mfcc很简单，读取音频文件后一行代码就可以完成，以下是mfcc函数内部

# -- Mel spectrogram and MFCCs -- #
def mfcc(y=None, sr=22050, S=None, n_mfcc=20, **kwargs):
"""Mel-frequency cepstral coefficients

Parameters
----------
y     : np.ndarray [shape=(n,)] or None
audio time series

sr    : number > 0 [scalar]
sampling rate of y

S     : np.ndarray [shape=(d, t)] or None
log-power Mel spectrogram

n_mfcc: int > 0 [scalar]
number of MFCCs to return

Arguments to melspectrogram, if operating
on time series input

Returns
-------
M     : np.ndarray [shape=(n_mfcc, t)]
MFCC sequence

--------
melspectrogram

Examples
--------
Generate mfccs from a time series

>>> librosa.feature.mfcc(y=y, sr=sr)
array([[ -5.229e+02,  -4.944e+02, ...,  -5.229e+02,  -5.229e+02],
[  7.105e-15,   3.787e+01, ...,  -7.105e-15,  -7.105e-15],
...,
[  1.066e-14,  -7.500e+00, ...,   1.421e-14,   1.421e-14],
[  3.109e-14,  -5.058e+00, ...,   2.931e-14,   2.931e-14]])

Use a pre-computed log-power Mel spectrogram

>>> S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128,
...                                    fmax=8000)
>>> librosa.feature.mfcc(S=librosa.power_to_db(S))
array([[ -5.207e+02,  -4.898e+02, ...,  -5.207e+02,  -5.207e+02],
[ -2.576e-14,   4.054e+01, ...,  -3.997e-14,  -3.997e-14],
...,
[  7.105e-15,  -3.534e+00, ...,   0.000e+00,   0.000e+00],
[  3.020e-14,  -2.613e+00, ...,   3.553e-14,   3.553e-14]])

Get more components

>>> mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=40)

Visualize the MFCC series

>>> import matplotlib.pyplot as plt
>>> plt.figure(figsize=(10, 4))
>>> librosa.display.specshow(mfccs, x_axis='time')
>>> plt.colorbar()
>>> plt.title('MFCC')
>>> plt.tight_layout()

"""

if S is None:
S = power_to_db(melspectrogram(y=y, sr=sr, **kwargs))

return np.dot(filters.dct(n_mfcc, S.shape[0]), S)


• Mel映射关系（如HTK方式与Slaney）
• Mel滤波器的归一化
• DCT系数计算方式
• Mel带数量与宽度
• Mel频率范围
• 倒谱提升方式-rasta、htk、或者无
• 短时傅里叶变换各个参数
• 抖动或DC消除
• 预加重

©️2019 CSDN 皮肤主题: 技术黑板 设计师: CSDN官方博客