matlab 实现图层,Matlab中MFCC的几种实现方式

最新推荐文章于 2022-01-23 19:43:09 发布

usotsuki1993

最新推荐文章于 2022-01-23 19:43:09 发布

阅读量799

点赞数

文章标签： matlab 实现图层

相关的函数

melbankm、mfcc_m、melcepst、cepstralFeatureExtractor、mfcc、HelperComputePitchAndMFCC、 melSpectrogram

几种函数对比及说明

melbankm

由Voicebox提供，在Mel频率上设计平均分布的滤波器，此函数与音频信号没有关系，只是做MFCC前对滤波器的设计。

function [x,mc,mn,mx]=melbankm(p,n,fs,fl,fh,w)

%MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)

% Inputs:

% p number of filters in filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]

% n length of fft

% fs sample rate in Hz

% fl low end of the lowest filter as a fraction of fs [default = 0]

% fh high end of highest filter as a fraction of fs [default = 0.5]

% w any sensible combination of the following:

% 可取代Mel频率的选项：

% 'b' = bark scale instead of mel

% 'e' = erb-rate scale

% 'l' = log10 Hz frequency scale

% 'f' = linear frequency scale

% 'c' = fl/fh specify centre of low and high filters

% 'h' = fl/fh are in Hz instead of fractions of fs

% 'H' = fl/fh are in mel/erb/bark/log10

% 滤波器形状：

% 't' = triangular shaped filters in mel/erb/bark domain (default)

% 'n' = hanning shaped filters in mel/erb/bark domain

% 'm' = hamming shaped filters in mel/erb/bark domain

% 'z' = highest and lowest filters taper down to zero [default]

% 'y' = lowest filter remains at 1 down to 0 frequency and

% highest filter remains at 1 up to nyquist freqency

% 'u' = scale filters to sum to unity

% 's' = single-sided: do not double filters to account for negative frequencies

% 输出滤波器组的响应曲线：

% 'g' = plot idealized filters [default if no output arguments present]

% Note that the filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain.

% Some people instead define an asymmetric triangular filter in the frequency domain.

% If 'ty' or 'ny' is specified, the total power in the fft is preserved.

% Outputs: x a sparse matrix containing the filterbank amplitudes

% If the mn and mx outputs are given then size(x)=[p,mx-mn+1]

% otherwise size(x)=[p,1+floor(n/2)]

% Note that the peak filter values equal 2 to account for the power

% in the negative FFT frequencies.

% mc the filterbank centre frequencies in mel/erb/bark滤波器中心频率

% mn the lowest fft bin with a non-zero coefficient

% mx the highest fft bin with a non-zero coefficient

% Note: you must specify both or neither of mn and mx.mn与mx必须同时指定或者不指定

% =============================！用法举例(MFCC流程)==============================

% (a) Calcuate the Mel-frequency Cepstral Coefficients

% f=rfft(s); % rfft() returns only 1+floor(n/2) coefficients去除虚数部分

% x=melbankm(p,n,fs); % n is the fft length, p is the number of filters wanted

% z=log(x*abs(f).^2); % multiply x by the power spectrum

% c=dct(z); % take the DCT

% (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently

% f=fft(s); % n is the fft length, p is the number of filters wanted

% [x,mc,na,nb]=melbankm(p,n,fs); % na:nb gives the fft bins that are needed

% z=log(x*(f(na:nb)).*conj(f(na:nb)));

% (c) Plot the calculated filterbanks

% plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)') % fs=sample frequency

% (d) Plot the idealized filterbanks (without output sampling)

% melbankm(p,n,fs);

该函数只是设计滤波器组，属于MFCC处理的一部分

mfcc_m

由宋知用老师书中提供，涉及到归一化Mel滤波器组系数、归一化倒谱提升窗口

bank=melbankm(p,frameSize,fs,0,0.5,'m');

% 归一化Mel滤波器组系数

bank=full(bank);

bank=bank/max(bank(:));

% 归一化倒谱提升窗口：对MFCC系数中某些谱线进行增强

w = 1 + 6 * sin(pi * [1:p2] ./ p2);

w = w/max(w);

需要修正的地方：

只有一阶差分系数

滤波器选择后并不能只截取想要的部分

归一化Mel滤波器组系数、归一化倒谱提升窗口有待考证

melcepst

属于voicebox工具箱，现在官方已经不提供了，程序中调用了melbankm函数。

function [c,tc]=melcepst(s,fs,w,nc,p,n,inc,fl,fh)

%MELCEPST Calculate the mel cepstrum of a signal C=(S,FS,W,NC,P,N,INC,FL,FH)

% Simple use: (1) c=melcepst(s,fs) % calculate mel cepstrum with 12 coefs, 256 sample frames

% (2) c=melcepst(s,fs,'E0dD') % include log energy, 0th cepstral coef, delta and delta-delta coefs

% Inputs:

% s speech signal

% fs sample rate in Hz (default 11025)

% w mode string (see below)

% nc number of cepstral coefficients excluding 0'th coefficient [default 12] MFCC维数设定

% p number of filters in filterbank [default: floor(3*log(fs)) = approx 2.1 per ocatave] 滤波器数量

% n length of frame in samples [default power of 2 < (0.03*fs)] 帧长

% inc frame increment [default n/2] 帧移

% fl low end of the lowest filter as a fraction of fs [default = 0] 滤波器最低频率

% fh high end of highest filter as a fraction of fs [default = 0.5] 滤波器最高频率，通过fs归一化

% w any sensible combination of the following:

% 时域窗函数：

% 'R' rectangular window in time domain

% 'N' Hanning window in time domain

% 'M' Hamming window in time domain (default)

% 频域窗函数：

% 't' triangular shaped filters in mel domain (default)

% 'n' hanning shaped filters in mel domain

% 'm' hamming shaped filters in mel domain

% 'p' filters act in the power domain

% 'a' filters act in the absolute magnitude domain (default)

% MFCC除12维基本参数之外的选择：

% '0' include 0'th order cepstral coefficient

% 'E' include log energy

% 'd' include delta coefficients (dc/dt)

% 'D' include delta-delta coefficients (d^2c/dt^2)

% 滤波器频率设置：

% 'z' highest and lowest filters taper down to zero (default)

% 'y' lowest filter remains at 1 down to 0 frequency and

% highest filter remains at 1 up to nyquist freqency

% If 'ty' or 'ny' is specified, the total power in the fft is preserved.

% Outputs: c mel cepstrum output: one frame per row. Log energy, if requested, is the

% first element of each row followed by the delta and then the delta-delta

% coefficients.

% tc fractional time in samples at the centre of each frame

% with the first sample being 1.

% ==================================设置默认参数=================================

if nargin<2 fs=11025; end% 滤波器的最高频率

if nargin<3 w='M'; end% hamming窗

if nargin<4 nc=12; end% MFCC维数

if nargin<5 p=floor(3*log(fs)); end% p个滤波器

if nargin<6 n=pow2(floor(log2(0.03*fs))); end% n是一帧FFT后数据的长度

if nargin<9

fh=0.5;% 滤波器的最高频率，用fs归一化

if nargin<8

fl=0;% 设计滤波器的最低频率

if nargin<7

inc=floor(n/2);

end

if isempty(w)

w='M';

end

if any(w=='R')

[z,tc]=enframe(s,n,inc);

elseif any (w=='N')

[z,tc]=enframe(s,hanning(n),inc);

else

[z,tc]=enframe(s,hamming(n),inc);

end

% =================================！理论核心部分=================================

f=rfft(z.');

[m,a,b]=melbankm(p,n,fs,fl,fh,w);% m为滤波器的频域响应

pw=f(a:b,:).*conj(f(a:b,:));% 计算帧能量

pth=max(pw(:))*1E-20;

if any(w=='p')

y=log(max(m*pw,pth));

else

ath=sqrt(pth);

y=log(max(m*abs(f(a:b,:)),ath));

end

c=rdct(y).';% 得到13维系数

nf=size(c,1);

nc=nc+1;

if p>nc

c(:,nc+1:end)=[];% 当滤波器个数比所需维数多的时候，就将后面滤波器获得的参数删去

elseif p

c=[c zeros(nf,nc-p)];% 滤波器个数少的时候，用0补齐

end

if ~any(w=='0')

c(:,1)=[];

nc=nc-1;

end

if any(w=='E')

c=[log(max(sum(pw),pth)).' c];

nc=nc+1;

end

% ===============================计算一阶和二阶差分==============================

if any(w=='D')

vf=(4:-1:-4)/60;

af=(1:-1:-1)/2;

ww=ones(5,1);

cx=[c(ww,:); c; c(nf*ww,:)];

vx=reshape(filter(vf,1,cx(:)),nf+10,nc);

vx(1:8,:)=[];

ax=reshape(filter(af,1,vx(:)),nf+2,nc);

ax(1:2,:)=[];

vx([1 nf+2],:)=[];

if any(w=='d')

c=[c vx ax];

else

c=[c ax];

end

elseif any(w=='d')

vf=(4:-1:-4)/60;

ww=ones(4,1);

cx=[c(ww,:); c; c(nf*ww,:)];

vx=reshape(filter(vf,1,cx(:)),nf+8,nc);

vx(1:8,:)=[];

c=[c vx];

end

% =======================如果不输出任何参数，就会输出语谱图==========================

if nargout<1

[nf,nc]=size(c);

% t=((0:nf-1)*inc+(n-1)/2)/fs;

ci=(1:nc)-any(w=='0')-any(w=='E');

imh = imagesc(tc/fs,ci,c.');

axis('xy');

xlabel('Time (s)');

ylabel('Mel-cepstrum coefficient');

map = (0:63)'/63;

colormap([map map map]);

colorbar;

end

melcepst默认得到12维MFCC参数，时域中用hamming窗，频域中用三角窗，最低频率为0，最高频率为采样频率的一半(采样定理)，帧移为帧长的一半，帧长为2的次幂但是小于0.03*fs。

E：包括对数能量

0：包括0阶倒谱系数

d：包括一阶差分

D：包括二阶差分

melcepst对参数'0'的处理

if ~any(w=='0')

c(:,1)=[];

nc=nc-1;

end

如果不需要'0'阶系数，就将第一列删除，并得到13-1=12维数据，说明DCT后得到的是13维数据，默认将第一个元素，即0阶倒谱系数删去。第一维比后12维都大很多(直流项？)。

1c2742096382

默认12维参数

1c2742096382

DCT后13维参数('0')

1c2742096382

13维参数'E'

cepstralFeatureExtractor

由Audio Toolbox提供，需要先将音频分帧，每一列作为一帧，再将每一帧依次输入至cepstralFeatureExtractor，所以输入的第一帧的delta与deltaDelta都是0。

test = 'D:\DataBase\TIMIT\TRAIN\DR2\MARC0\SX108.WAV';

[x, fs] = audioread(test);

n=pow2(floor(log2(0.03*fs)));

inc=floor(n/2);

f = enframe(x,hamming(n),inc);

cepFeatures = cepstralFeatureExtractor('SampleRate',fs,'LogEnergy','Replace');

[coeffs, delta, deltaDelta]= cepFeatures(f(1,:)');

参数设置中有FilterBankNormalization，选项为：Area，Bandwidth(默认)，None，用于滤波器组的权重分配

1c2742096382

滤波器归一化

cepstralFeatureExtractor类的部分代码：

classdef (StrictDefaults)cepstralFeatureExtractor < dsp.private.SampleRateEngine

%cepstralFeatureExtractor Cepstral Feature Extractor

% cepFeatures = cepstralFeatureExtractor returns a System object,

% cepFeatures, that calculates cepstral features. Columns of the input

% are treated as individual channels.

% cepFeatures = cepstralFeatureExtractor('Name',Value, ...) returns a

% cepstralFeatureExtractor System object, cepFeatures, with each

% specified property name set to the specified value. You can specify

% additional name-value pair arguments in any order as

% (Name1,Value1,...NameN,ValueN).

% step method syntax内置的step()函数:

% [COEFFS,DELTA,DELTADELTA] = step(cepFeatures,X) returns the cepstral

% coefficients, the delta, and the delta-delta. The log energy is also

% returned in the COEFFS output based on the LogEnergy property. The

% DELTA and DELTADELTA are initialized as zero-vectors. X must be a

% real-valued, double-precision or single-precision matrix. Each column

% of X is treated as an independent channel.

% System objects may be called directly like a function instead of using

% the step method. For example, y = step(obj,x) and y = obj(x) are

% equivalent.

% 对象可以直接作为函数使用，所以step()与obj()功能一致

% cepstralFeatureExtractor methods:

% step - See above description for use of this method

% release - Allow property values and input characteristics to change

% clone - Create cepstralFeatureExtractor object with same property

% values

% isLocked - Locked status (logical)

% reset - Reset the internal states to initial conditions

% getFilters - Get filterbank used to calculate the cepstral

% coefficients

% cepstralFeatureExtractor properties:

% FilterBank - Filter bank ('Mel'/'Gammatone')

% InputDomain - Domain of input signal

% NumCoeffs - Number of coefficients to return

% FFTLength - FFT length

% LogEnergy - Log energy usage ('Append'/'Replace'/'Ignore')

% SampleRate - Sample rate (Hz)

% Advanced properties:

% BandEdges - Band edges of mel filter bank (Hz)

% FilterBankNormalization - Normalize filter bank

% FilterBankDesignDomain - Domain for mel filter bank design

% FrequencyRange - Gammatone filter bank frequency range

%#codegen

properties

%SampleRate Input sample rate (Hz)

% Specify the sampling rate of the input in Hertz as a real, finite

% numeric scalar. The default is 16000 Hz. This property is

% tunable.

SampleRate = 16000;

end

properties (Constant, Hidden)

% SampleRateSet is used to setup the choices for SampleRate

SampleRateSet = matlab.system.SourceSet({'PropertyOrMethod', ...

'SystemBlock', 'InheritSampleRate', 'getInheritedSampleRate',true});

end

properties (Nontunable)

%BandEdges Band edges of Mel filter bank (Hz)

% Specify the band edges of the mel filter bank as a monotonically

% increasing vector in the range [0,fs/2]. The number of band edges

% must be in the range [4,160]. The default band edges are spaced

% linearly for the first ten and then logarithmically thereafter.

% This property applies when FilterBank is 'Mel'.

% 只有是Mel的时候，BandEdges属性才有用

BandEdges = cepstralFeatureExtractor.getDefaultBandEdges();

%FFTLength FFT length 默认FFT长度是输入的行数，所以做好分帧！

FFTLength = [];

%NumCoeffs Number of coefficients to return 默认MFCC维数13

NumCoeffs = 13;

%InputDomain Domain of the input signal 默认输入数据是时域的

InputDomain = 'Time';

%FilterBankNormalization Filter bank normalization 默认以带宽设置滤波器权重

FilterBankNormalization = 'Bandwidth';

%LogEnergy Log energy usage 默认log能量参数是有的

LogEnergy = 'Append';

end

---------------------------------------------------------略-----------------------------------------------------------

end

mfcc

由Audio Toolbox提供，最低频率不是0，它用的是cepstralFeatureExtractor函数

[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav');

[coeffs,delta,deltaDelta,loc] = mfcc(audioIn,fs);

function varargout = mfcc(x, fs, varargin)

%MFCC Extract the mfcc, log-energy, delta, and delta-delta of audio signal

% coeffs = MFCC(audioIn,fs) returns the mel-frequency cepstral

% coefficients over time for the audio input. Columns of the input are

% treated as individual channels. coeffs is returned as an L-by-M-by-N

% array.

% L - Number of frames the audio signal is partitioned into.

% This is determined by the WINDOWLENGTH and OVERLAPLENGTH

% properties.

% M - Number coefficients returned per frame.

% This is determined by the NUMCOEFFS property.

% N - Number of channels.

% 'WindowLength' defaults to round(0.030 * fs).

% 'OverlapLength' defaults to round(fs*0.02).

% 'NumCoeffs' If not specified, the number of coefficients is 13.

% 'FFTLength' By default, the FFT length is set to the WINDOWLENGTH.

% 'DeltaWindowLength' The default is 2.

% coeffs = MFCC(...,'LogEnergy',LOGENERGY) specifies if and how the log

% energy is used. Specify log energy as a character vector:

% 'Append' - Adds the log-energy as the first element of the

% returned coefficients vector. This is the default

% setting.

% 'Replace' - Replaces the zeroth coefficient (first element of

% coeffs) with the log-energy.

% 'Ignore' - Ignores and does not return the log-energy.

% =========================验证输入数据的格式=============================

validateRequiredInputs(x, fs)

params = audio.internal.MFCCValidator(fs,size(x,1),varargin{:});% 输入默认的参数

hopLength = params.WindowLength - params.OverlapLength;% 帧移

% ==========================创建mfcc提取object============================

mfccObject = cepstralFeatureExtractor( ...

'SampleRate', fs, ...

'FFTLength', params.FFTLength, ...

'NumCoeffs', params.NumCoeffs, ...

'LogEnergy', params.LogEnergy);

% ====================验证所需要的mfcc维数比滤波器个数少===================

numValidBands = sum(mfccObject.BandEdges <= floor(fs/2)) - 2;

coder.internal.errorIf(numValidBands < params.NumCoeffs, ...

'audio:mfcc:BadNumCoeffs', ...

numValidBands,fs);

% ==========================mfcc参数获取=================================

[nRow,nChan] = size(x);% 一般都是单通道，audiorea读取到的是一列数据

N = params.WindowLength;

numHops = floor((nRow-N)/hopLength) + 1;

y = audio.internal.buffer(x,N,hopLength);

c = mfccObject(y);% mfccObject是cepstralFeatureExtractor类，所以，与cepstralFeatureExtractor求解方法一样

c2 = reshape(c , size(c,1) , size(c,2)/nChan , nChan );

coeffs = permute(c2 , [2 1 3]);% 将第1维与第2维转置，因为cepstralFeatureExtractor得到的特征是列排的

varargout{1} = coeffs;

%=========================一阶差分====================================

if nargout > 1

delta = audio.internal.computeDelta(coeffs,params.DeltaWindowLength);

varargout{2} = delta;

end

% ============================二阶差分=================================

if nargout > 2

deltaDelta = audio.internal.computeDelta(delta,params.DeltaWindowLength);

varargout{3} = deltaDelta;

end

% -------------------------------------------------------------------------

% Output sample stamp -----------------------------------------------------

if nargout > 3

varargout{4} = ...

cast(((0:(numHops-1))*hopLength + params.WindowLength)','like',x);

end

% -------------------------------------------------------------------------

% Validate required inputs

% -------------------------------------------------------------------------

function validateRequiredInputs(x,fs)

validateattributes(x,{'single','double'},...

{'nonempty','2d','real'}, ...

'mfcc','audioIn')

validateattributes(fs,{'single','double'}, ...

{'nonempty','positive','real','scalar','nonnan','finite'}, ...

'mfcc','fs');

end

默认有40个滤波器，得到14维参数，相当于melcepst中的'E0'，只是melcepst的最低频率从0Hz开始；delta与deltaDelta的第一行都是0；loc是每一帧的开始位置。

如何使delta与deltaDelta的首行不为0？设置DeltaWindowLength 参数即可

1c2742096382

差分的计算

1c2742096382

40组滤波器的频带范围

从滤波器组设置可以看出，每个滤波器的起点是上个滤波器带宽的中点。

HelperComputePitchAndMFCC

查看源码后，发现使用的是mfcc函数

1c2742096382

HelperComputePitchAndMFCC

melSpectrogram

output的第一维是Number of bandpass filters in filterbank，默认为32个滤波器；第二维是Number of frames in spectrogram，即帧数。

它不可以计算差分，只是spectrogram的一个小分支，若取40个滤波器，得到的结果与mfcc相近，只是需要转置一下

几种实现方式的对比

实现方式

MFCC

频谱图

mfcc

1c2742096382

cepstralFeatureExtractor

1c2742096382

melcepst

1c2742096382

melSpectrogram

1c2742096382

结论

可见，cepstralFeatureExtractor与mfcc所用算法基本一致，只是cepstralFeatureExtractor分帧求取，melcepst与它们的第2维数据有数量级的差异，暂时认为是滤波器归一化的原因。在mfcc中，log能量是作为额外系数默认附加的，通常Matlab会提供最好的性能，所以暂时按默认选项进行。melSpectrogram默认32个滤波器，mfcc默认40个滤波器，且melSpectrogram不能计算差分，所以mfcc总的来说，更合适作为以后的计算使用。