LibROSA库提取MFCC特征的过程解析

目录

源码解析

获取梅尔频谱

分帧

加窗

快速傅里叶变换

梅尔滤波器

取对数

离散余弦变换

总结


LibROSA(本文使用的版本是0.6.3)中的mfcc函数可以用来提取音频的梅尔频率倒谱系数(Mel-Frequency Cepstral Coefficients,MFCCs)特征,MFCC被广泛应用于语音识别。LibROSA的mfcc函数源码如下:

# -- Mel spectrogram and MFCCs -- #
def mfcc(y=None, sr=22050, S=None, n_mfcc=20, dct_type=2, norm='ortho', **kwargs):
    if S is None:
        S = power_to_db(melspectrogram(y=y, sr=sr, **kwargs))

    return scipy.fftpack.dct(S, axis=0, type=dct_type, norm=norm)[:n_mfcc]

对于音频,经典的MFCC提取过程分为预加重、分帧、加窗、快速傅里叶变换(FFT)、梅尔滤波器组过滤、取对数、离散余弦变换(DCT)这几个步骤。从mfcc函数的代码能发掘的信息有限,因此我们需要进一步查看调用的函数代码。以下将从函数调用链的底层往上分析,直到mfcc函数。

源码解析

获取梅尔频谱

分帧

LibROSA提取MFCC的过程没有预加重步骤,而是直接进行了分帧。分帧函数为librosa.util.frame(),源码如下:

def frame(y, frame_length=2048, hop_length=512):
    '''Slice a time series into overlapping frames.

    This implementation uses low-level stride manipulation to avoid
    redundant copies of the time series data.

    Parameters
    ----------
    y : np.ndarray [shape=(n,)]
        Time series to frame. Must be one-dimensional and contiguous
        in memory.

    frame_length : int > 0 [scalar]
        Length of the frame in samples

    hop_length : int > 0 [scalar]
        Number of samples to hop between frames

    Returns
    -------
    y_frames : np.ndarray [shape=(frame_length, N_FRAMES)]
        An array of frames sampled from `y`:
        `y_frames[i, j] == y[j * hop_length + i]`

    Raises
    ------
    ParameterError
        If `y` is not contiguous in memory, not an `np.ndarray`, or
        not one-dimensional.  See `np.ascontiguous()` for details.

        If `hop_length < 1`, frames cannot advance.

        If `len(y) < frame_length`.

    '''

    if not isinstance(y, np.ndarray):
        raise ParameterError('Input must be of type numpy.ndarray, '
                             'given type(y)={}'.format(type(y)))

    if y.ndim != 1:
        raise ParameterError('Input must be one-dimensional, '
                             'given y.ndim={}'.format(y.ndim))

    if len(y) < frame_length:
        raise ParameterError('Buffer is too short (n={:d})'
                             ' for frame_length={:d}'.format(len(y), frame_length))

    if hop_length < 1:
        raise ParameterError('Invalid hop_length: {:d}'.format(hop_length))

    if not y.flags['C_CONTIGUOUS']:
        raise ParameterError('Input buffer must be contiguous.')

    # Compute the number of frames that will fit. The end may get truncated.
    n_frames = 1 + int((len(y) - frame_length) / hop_length)

    # Vertical stride is one sample
    # Horizontal stride is `hop_length` samples
    y_frames = as_strided(y, shape=(frame_length, n_frames),
                          strides=(y.itemsize, hop_length * y.itemsize))
    return y_frames

可以看到,LibROSA实际上调用了scipy库中的numpy.lib.stride_tricks.as_strided函数进行分帧(若不指定,帧长默认为2048,帧移默认为512),as_strided函数的实现就不细究了。LibROSA的frame是将X*1的音频向量处理成N*M的矩阵,即将一个时间序列转化为元素部分重叠的帧序列,若帧数不能整除,则最后补零成完整的帧。举个例子,假设一个音频向量为:[0, 1, 2, 3, 4, 5],若帧长为4,帧移为2,则分帧后得到的矩阵为:[[0, 2] , [1, 3], [2, 4], [3, 5]],每一帧都有4个基本元素。

加窗

对分帧后得到的矩阵,下一步进行加窗(奇怪的是,LibROSA是先加窗后分帧的,在下一个步骤:STFT的代码中会体现)。加窗的目的是为了一定程度消除分帧后出现的帧与帧之间的不连续性。LibROSA的librosa.filters.get_window即为加窗函数,源码如下:

def get_window(window, Nx, fftbins=True):
    '''Compute a window function.

    This is a wrapper for `scipy.signal.get_window` that additionally
    supports callable or pre-computed windows.

    Parameters
    ----------
    window : string, tuple, number, callable, or list-like
        The window specification:

        - If string, it's the name of the window function (e.g., `'hann'`)
        - If tuple, it's the name of the window function and any parameters
          (e.g., `('kaiser', 4.0)`)
        - If numeric, it is treated as the beta parameter of the `'kaiser'`
          window, as in `scipy.signal.get_window`.
        - If callable, it's a function that accepts one integer argument
          (the window length)
        - If list-like, it's a pre-computed window of the correct length `Nx`

    Nx : int > 0
        The length of the window

    fftbins : bool, optional
        If True (default), create a periodic window for use with FFT
        If False, create a symmetric window for filter design applications.

    Returns
    -------
    get_window : np.ndarray
        A window of length `Nx` and type `window`

    See Also
    --------
    scipy.signal.get_wind
  • 45
    点赞
  • 145
    收藏
    觉得还不错? 一键收藏
  • 18
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 18
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值