语音识别特征提取(Fbank和MFCC)

语音识别特征提取(Fbank和MFCC)

问题解答

问题1:如果对语音模拟信号进行采样率为16000Hz的采样,得到的离散信号中包含的最大频率是多少?
答:8000Hz

问题2:对一个采样率为16K的离散信号进行下采样,下采样到8K,为什么要需要首先进行低通滤波 ?
答:对于一个样值序列间隔几个样值取样一次,这样得到新序列就是原序列的下采样。下采样之前,需要进行低通滤波,因为下采样采样频率降低了,相当于就是信号频谱扩大了(其实信号频谱没变,下采样不改变信号频谱,只是采样率降低相当于信号频谱扩张)为了防止频谱混叠,需要在下采样之前进行抗混叠滤波处理。

  1. 时域上的采样(离散化),导致了频域上的周期,为什么?
  2. 时域上的周期,导致了频域上的离散,为什么?
    答:时域离散对应频域周期,反之频域周期则时域离散,这是由傅里叶变换的定义决定的

特征提取

1.下载作业包

git clone https://github.com/nwpuaslp/ASR_Course.git

2.编写提取文件mfcc.py

import librosa
import numpy as np
from scipy.fftpack import dct

# If you want to see the spectrogram picture
import matplotlib

matplotlib.use('Agg')
import matplotlib.pyplot as plt


def plot_spectrogram(spec, note, file_name):
    """Draw the spectrogram picture
        :param spec: a feature_dim by num_frames array(real)
        :param note: title of the picture
        :param file_name: name of the file
    """
    fig = plt.figure(figsize=(20, 5))
    heatmap = plt.pcolor(spec)
    fig.colorbar(mappable=heatmap)
    plt.xlabel('Time(s)')
    plt.ylabel(note)
    plt.tight_layout()
    plt.savefig(file_name)


# preemphasis config
alpha = 0.97

# Enframe config
frame_len = 400  # 25ms, fs=16kHz
frame_shift = 160  # 10ms, fs=15kHz
fft_len = 512

# Mel filter config
num_filter = 23
num_mfcc = 12

# Read wav file
wav, fs = librosa.load('./test.wav', sr=None)


# Enframe with Hamming window function
def preemphasis(signal, coeff=alpha):
    """perform preemphasis on the input signal.

        :param signal: The signal to filter.
        :param coeff: The preemphasis coefficient. 0 is no filter, default is 0.97.
        :returns: the filtered signal.
    """
    # signal[1] - coeff*signal[0]
    # ...
    # signal[end] - coeff*signal[end-1]
    return np.append(signal[0], signal[1:] - coeff * signal[:-1])


def enframe(signal, frame_len=frame_len, frame_shift=frame_shift, win=np.hamming(frame_len)):
    """Enframe with Hamming widow function.

        :param signal: The signal be enframed
        :param win: window function, default Hamming
        :returns: the enframed signal, num_frames by frame_len array
    """

    num_samples = signal.size
    num_frames = np.floor((num_samples - frame_len) / frame_shift) + 1  # 向下取整
    frames = np.zeros((int(num_frames), frame_len))
    for i in range(int(num_frames)):
        frames[i, :] = signal[i * frame_shift:i * frame_shift + frame_len]
        frames[i, :] = frames[i, :] * win

    return frames


def get_spectrum(frames, fft_len=fft_len):
    """Get spectrum using fft
        :param frames: the enframed signal, num_frames by frame_len array
        :param fft_len: FFT length, default 512
        :returns: spectrum, a num_frames by fft_len/2+1 array (real)
    """
    cFFT = np.fft.fft(frames, n=fft_len)
    valid_len = int(fft_len / 2) + 1  # 对称,取一半
    spectrum = np.abs(cFFT[:, 0:valid_len])
    return spectrum


def fbank(spectrum, num_filter=num_filter):
    """Get mel filter bank feature from spectrum
        :param spectrum: a num_frames by fft_len/2+1 array(real)
        :param num_filter: mel filters number, default 23
        :returns: fbank feature, a num_frames by num_filter array
        DON'T FORGET LOG OPRETION AFTER MEL FILTER!
    """
    # print(spectrum.shape)

    """
        FINISH by YOURSELF
    """
    """
    mel = 2595 * log10(1 + f/700)   # 频率到mel值映射
    f = 700 * (10^(m/2595) - 1      # mel值到频率映射
    """
    feats = np.zeros((spectrum.shape[1], num_filter))
    # step1: 计算梅尔刻度上的中心频率
    low_mel_freq = 0
    high_mel_freq = 2595 * np.log10(1 + fs / 2.0 / 700)
    mel_points = np.linspace(low_mel_freq, high_mel_freq, num_filter + 2)
    print(len(mel_points))
    # step2:获得对应FFT单元的中心频率
    freq_points = (700 * (np.power(10., (mel_points / 2595)) - 1))
    print(len(freq_points))

    # step3:计算梅尔滤波器组
    filter_edge = np.floor(freq_points * (fft_len + 1) / fs)  # 对应到FFT的点数
    print(filter_edge)
    for m in range(1, 1 + num_filter):
        f_left = int(filter_edge[m - 1])
        f_center = int(filter_edge[m])
        f_right = int(filter_edge[m + 1])

        for k in range(f_left, f_center):
            feats[k, m - 1] = (k - f_left) / (f_center - f_left)

        for k in range(f_center, f_right):
            feats[k, m - 1] = (f_right - k) / (f_right - f_center)

    # 滤波
    # [num_frame, nfft / 2 + 1] * [nfft/2+1, n_filter] = [num_frame, n_filter]
    feats = np.dot(spectrum, feats)
    feats = np.where(feats == 0, np.finfo(float).eps, feats)
    feats = 20 * np.log10(feats)
    # feats = np.transpose(feats)
    return feats


def mfcc(fbank, num_mfcc=num_mfcc):
    """Get mfcc feature from fbank feature
        :param fbank: a num_frames by  num_filter array(real)
        :param num_mfcc: mfcc number, default 12
        :returns: mfcc feature, a num_frames by num_mfcc array
    """

    # feats = np.zeros((fbank.shape[0],num_mfcc))
    """
        FINISH by YOURSELF
    """
    feats = dct(fbank, type=2, axis=1, norm='ortho')[:, 1:(num_mfcc + 1)]
    # feats = np.transpose(feats)
    return feats


def write_file(feats, file_name):
    """Write the feature to file
        :param feats: a num_frames by feature_dim array(real)
        :param file_name: name of the file
    """
    f = open(file_name, 'w')
    (row, col) = feats.shape
    for i in range(row):
        f.write('[')
        for j in range(col):
            f.write(str(feats[i, j]) + ' ')
        f.write(']\n')
    f.close()


def main():
    wav, fs = librosa.load('./test.wav', sr=None)
    signal = preemphasis(wav)
    frames = enframe(signal)
    spectrum = get_spectrum(frames)
    fbank_feats = fbank(spectrum)
    mfcc_feats = mfcc(fbank_feats)
    plot_spectrogram(np.transpose(fbank_feats), 'Filter Bank', 'fbank.png')
    write_file(fbank_feats, './test.fbank')
    plot_spectrogram(mfcc_feats.T, 'MFCC', 'mfcc.png')
    write_file(mfcc_feats, './test.mfcc')


if __name__ == '__main__':
    main()

代码结构
在这里插入图片描述
mfcc
在这里插入图片描述
fbank
在这里插入图片描述

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值