最近一直在做语音方面的工作,由于任务需要所以提取MFCC特征,并保存为语谱图,在以下的讲述中我并没有用到什么源码或者相关理论,但是它给了一个很好的诠释,让我在写自己的代码时有章可循,谢谢原著作者。
需要注意的是,下文中的data.txt的来源:
1.使用Matlab的函数audiorecorder进行录音
2.使用Matlab的函数getaudiodata获取音频数组
3.将获取的音频数组存入Data.txt文件。
原文参考链接:http://blog.csdn.net/xiao13yu14/article/details/46991581
MFCC参数提取步骤
——>预加重
——>分帧
——>对每一帧加窗
——>对每一帧补零
——>各帧信号的FFT变换及其功率谱
——>梅尔滤波(通过40个滤波器)
——>取对数
——>DCT变换
——>归一化
1.预加重
如果数据在低频的强度大于高频,就会不利于处理,因此需要通过一个传递函数为s[n]-a*s[n]的高通滤波器。滤去数据中的低频成分,使高频特性更加突现。
2.分帧
分帧就是将N个采样点集合成一个观测单位。我们设定每帧涵盖的时间是25ms,因为采样率是16000,所以得到每帧的样本点个数是400。
另外,为了避免相邻两帧的变化过大,因此会让两相邻帧之间有一段重叠区域。我们设定的重叠区域是15ms,所以就是每隔10ms取一帧。
3.对每一帧加窗
分帧后马上进行FFT,由于转换时会将帧内信号当作周期信号处理,所以在帧的两个端点处会发生突变,转换出来的频谱与原信号频谱差别很大。所以要对每一帧加窗,使帧内信号作FFT时的两个端点处不会发生突变。
我们采用的窗是汉明窗:(M为帧长,即400)
4.对每一帧补零
我们要对每一帧信号进行FFT,而FFT要求输入数据长度一定是2^K,现在一帧为400个采样点,所以补零至最接近的512位。
5.各帧信号的FFT变换及其功率谱
对分帧加窗后的各帧信号进行512点的FFT变换得到各帧的频谱。并对语音信号的频谱取模平方得到语音信号的功率谱。
6.梅尔滤波(通过40个滤波器)
40个三角滤波器在MEL谱上均匀分布,每两个滤波器间有50%的重叠部分。
所以要先把实际频率转换成梅尔频率,实际频率最小为0Hz,最大为16000 / 2 = 8000Hz
转换成梅尔频率后,我们要实现的是40个滤波器,所以计算这40个滤波器的梅尔频率分布,然后把梅尔频率转换成实际频率
然后根据实际频率的数组,计算出f数组,公式如下(h数组就是40个滤波器的实际频率分布数组):
最后根据以下公式,计算滤波器的输出(m为滤波器的个数)
以下为10个滤波器的图:
7.取对数
三角窗滤波器组的输出求取对数,可以得到近似于同态变换的结果。
8.DCT变换
对对数能量梅尔谱进行DCT变换,取前13维输出,得到梅尔倒谱。
公式为:
9.归一化
对所有的梅尔倒谱归一化。
先求出所有倒谱向量的均值向量,公式为
再用每一个倒谱向量减去均值向量,即
C++代码实现
- #include<iostream>
- #include<fstream>
- #include<complex>
- using namespace std;
-
- int filterNum = 40;
- int sampleRate = 16000;
-
- #define Win_Time 0.025//把25ms里的所有点作为一个点分析
- #define Hop_Time 0.01//每隔10ms分一次帧
- #define Pi 3.1415927
-
-
- double* pre_emphasizing(double *sample, int len, double factor)
- {
- double *Sample = new double[len];
- Sample[0] = sample[0];
- for(int i = 1; i < len; i++)
- {
-
- Sample[i] = sample[i] - factor * sample[i - 1];
- }
- return Sample;
- }
-
- void Hamming( double *hamWin, int hamWinSize )
- {
- for (int i = 0; i < hamWinSize; i++)
- {
- hamWin[i] = (double)(0.54 - 0.46 * cos(2 * Pi * (double)i / ((double)hamWinSize - 1) ));
- }
- }
-
-
- void mfccFFT(double *frameSample, double *FFTSample, int frameSize, int pos)
- {
-
-
- double dataR[frameSize];
- double dataI[frameSize];
- for(int i = 0; i < frameSize; i++)
- {
- dataR[i] = frameSample[i + pos];
- dataI[i] = 0.0f;
- }
-
- int x0, x1, x2, x3, x4, x5, x6, xx, x7, x8;
- int i, j, k, b, p, L;
- float TR, TI, temp;
-
- for(i = 0; i < frameSize; i++)
- {
- x0 = x1 = x2 = x3 = x4 = x5 = x6 = x7 = x8 = 0;
- x0 = i & 0x01; x1 = (i / 2) & 0x01; x2 = (i / 4) & 0x01; x3 = (i / 8) & 0x01; x4 = (i / 16) & 0x01;
- x5 = (i / 32) & 0x01; x6 = (i / 64) & 0x01; x7 = (i / 128) & 0x01; x8 = (i / 256) & 0x01;
- xx = x0 * 256 + x1 * 128 + x2 * 64 + x3 * 32 + x4 * 16 + x5 * 8 + x6 * 4 + x7 * 2 + x8;
- dataI[xx] = dataR[i];
- }
- for(i = 0; i < frameSize; i++)
- {
- dataR[i] = dataI[i]; dataI[i] = 0;
- }
-
-
- for(L = 1; L <= 9; L++)
- {
- b = 1; i = L - 1;
- while(i > 0)
- {
- b = b * 2; i--;
- }
- for(j = 0; j <= b-1; j++)
- {
- p = 1; i = 9 - L;
- while(i > 0)
- {
- p = p * 2; i--;
- }
- p = p * j;
- for(k = j; k < 512; k = k + 2*b)
- {
- TR = dataR[k]; TI = dataI[k]; temp = dataR[k + b];
- dataR[k] = dataR[k] + dataR[k + b] * cos(2 * Pi * p / frameSize) + dataI[k + b] * sin(2 * Pi * p / frameSize);
- dataI[k] = dataI[k] - dataR[k + b] * sin(2 * Pi * p / frameSize) + dataI[k + b] * cos(2 * Pi * p / frameSize);
- dataR[k + b] = TR - dataR[k + b] * cos(2 * Pi * p / frameSize) - dataI[k + b] * sin(2 * Pi * p / frameSize);
- dataI[k + b] = TI + temp * sin(2 * Pi * p / frameSize) - dataI[k + b] * cos(2 * Pi * p / frameSize);
- }
- }
- }
- for(i = 0; i < frameSize / 2; i++)
- {
- FFTSample[i + pos] = (dataR[i] * dataR[i] + dataI[i] * dataI[i]);
- }
- }
-
-
-
- double* mfccFrame(double *frameSample, double *Sample, int *len, int frameSize, int &frameSampleLen)
- {
- double *hamWin;
- int hamWinSize = sampleRate * Win_Time;
- hamWin = new double[hamWinSize];
- Hamming(hamWin, hamWinSize);
-
- int hopStep = Hop_Time * sampleRate;
- int frameNum = ceil(double(*len) / hopStep);
- frameSampleLen = frameNum * frameSize;
- frameSample = new double[frameSampleLen];
- for(int i = 0; i < frameSampleLen; i++)
- frameSample[i] = 0;
-
- double *FFTSample = new double[frameSampleLen];
- for(int i = 0; i < frameSampleLen; i++)
- FFTSample[i] = 0;
-
- for(int i = 0; i * hopStep < *len; i++)
- {
- for(int j = 0; j < frameSize; j++)
- {
- if(j < hamWinSize && i * hopStep + j < *len)
- frameSample[i * frameSize + j] = Sample[i * hopStep + j] * hamWin[j];
- else
- frameSample[i * frameSize + j] = 0;
- }
- mfccFFT(frameSample, FFTSample, frameSize, i * frameSize);
- }
-
- ofstream fileFrame("C:\\Users\\john\\Desktop\\txt\\Frame.txt");
- for(int i = 0; i < frameSize; i++)
- fileFrame << frameSample[100 * frameSize + i] << endl;
-
- delete []hamWin;
- return FFTSample;
- }
-
- void DCT(double mel[400][40], double c[400][40], int frameNum)
- {
- for(int k = 0; k < frameNum; k++)
- {
- for(int i = 0; i < 13; i++)
- {
- for(int j = 0; j < filterNum; j++)
- {
- c[k][i] += mel[k][j] * cos(Pi * i / (2 * filterNum) * (2 * j + 1));
-
-
- }
- }
- }
-
- }
-
- void computeMel(double mel[400][40], int sampleRate, double *FFTSample, int frameNum, int frameSize)
- {
- double freMax = sampleRate / 2;
- double freMin = 0;
- double melFremax = 1125 * log(1 + freMax / 700);
- double melFremin = 1125 * log(1 + freMin / 700);
- double melFilters[filterNum][3];
- double k = (melFremax - melFremin) / (filterNum + 1);
-
- double *m = new double[filterNum + 2];
- double *h = new double[filterNum + 2];
- double *f = new double[filterNum + 2];
-
- for(int i = 0; i < filterNum + 2; i++)
- {
- m[i] = melFremin + k * i;
- h[i] = 700 * (exp(m[i] / 1125) - 1);
- f[i] = floor((frameSize + 1) * h[i] / sampleRate);
- }
-
- delete[] m;
- delete[] h;
-
-
- for(int k = 0; k < frameNum; k++)
- {
- for(int j = 0; j < filterNum; j++)
- mel[k][j] = 0;
- }
-
-
- for(int i = 0; i < frameNum; i++)
- {
- for(int j = 1; j <= filterNum; j++)
- {
- double temp = 0;
- for(int z = 0; z < frameSize; z++)
- {
- if(z < f[j - 1])
- temp = 0;
- else if(z >= f[j - 1] && z <= f[j])
- temp = (z - f[j - 1]) / (f[j] - f[j - 1]);
- else if(z >= f[j] && z <= f[j + 1])
- temp = (f[j + 1] - z) / (f[j + 1] - f[j]);
- else if(z > f[j + 1])
- temp = 0;
- mel[i][j - 1] += FFTSample[i * frameSize + z] * temp;
- }
- }
- }
-
- ofstream fileMel("C:\\Users\\john\\Desktop\\txt\\Mel.txt");
- for(int i = 1; i <= filterNum; i++)
- fileMel << mel[100][i] << endl;
-
-
- for(int i = 0; i < frameNum; i++)
- {
- for(int j = 0; j < filterNum; j++)
- {
- if(mel[i][j] <= 0.00000000001 || mel[i][j] >= 0.00000000001)
- mel[i][j] = log(mel[i][j]);
- }
- }
- }
-
- void writeToFile(int frameNum, int frameSize, int hopStep, double DCT[400][40], double *sample, double *Sample, double *frameSample, double *FFTSample, double mel[400][40])
- {
- ofstream fileDCT("C:\\Users\\john\\Desktop\\txt\\DCT.txt");
- ofstream filesample("C:\\Users\\john\\Desktop\\txt\\sample.txt");
- ofstream filePreemphasized("C:\\Users\\john\\Desktop\\txt\\Preemphasized.txt");
- ofstream fileFft("C:\\Users\\john\\Desktop\\txt\\Fft.txt");
- ofstream fileLogMel("C:\\Users\\john\\Desktop\\txt\\LogMel.txt");
- for(int i = 0; i < frameSize; i++)
- {
- filesample << sample[100 * hopStep + i] << endl;
- filePreemphasized << Sample[100 * hopStep + i] << endl;
- fileFft << FFTSample[100 * frameSize + i] << endl;
- }
-
- for(int i = 0; i < filterNum; i++)
- fileLogMel << mel[100][i] << endl;
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- for (int i = 0; i < 13; i++)
- {
- for (int j = 0; j < frameNum; j++)
- fileDCT << DCT[j][i] << " ";
- fileDCT << endl;
- }
- }
-
-
- void MFCC(double *sample, int len)
- {
- double factor = 0.95;
- double *Sample;
-
- Sample = pre_emphasizing(sample, len, factor);
-
-
- int frameSize = (int)pow(2, ceil( log(Win_Time * sampleRate) / log(2.0) ));
- double *frameSample = NULL, *FFTSample = NULL;
- int frameSampleLen;
-
-
- FFTSample = mfccFrame(frameSample, Sample, &len, frameSize, frameSampleLen);
-
- int hopStep = Hop_Time * sampleRate;
- int frameNum = ceil(double(len) / hopStep);
-
- double mel[400][40];
- computeMel(mel, sampleRate, FFTSample, frameNum, frameSize);
-
- double c[400][40];
- for(int i = 0; i < 400; i++)
- {
- for(int j = 0; j < 40; j++)
- c[i][j] = 0;
- }
- DCT(mel, c, frameNum);
-
- writeToFile(frameNum, frameSize, hopStep, c, sample, Sample, frameSample, FFTSample, mel);
- }
-
- int main()
- {
- ifstream filedata("C:\\Users\\john\\Desktop\\txt\\Data.txt");
- int len = 0;
-
- double *sample = new double[160000];
- while(!filedata.eof())
- {
- filedata >> sample[len];
- sample[len] = sample[len] * 1000;
- len++;
- }
- cout << len << endl;
- MFCC(sample, len);
- delete [] sample;
-
- return 0;
- }
Matlab画出结果图
1.feature矩阵的色图
- clc;
- clear all;
- original = importdata('C:\\Users\\john\\Desktop\\txt\\Data.txt');
- figure;
- plot(original * 1000);
- DCT = importdata('C:\\Users\\john\\Desktop\\txt\\DCT.txt');
- figure;
- imagesc(DCT);
- colorbar;
- xlabel('cepstrum after DCT');
2.单帧数据变化图
- sample = importdata('C:\\Users\\john\\Desktop\\txt\\sample.txt');
- subplot(2, 3, 1),
- plot(sample);
- axis([0, 400, -60, 60]);
- xlabel('400 sample segment from 16kHz signal');
-
- yujiazhong = importdata('C:\\Users\\john\\Desktop\\txt\\Preemphasized.txt');
- subplot(2, 3, 2),
- plot(yujiazhong);
- axis([0, 400, -20, 20]);
- xlabel('preemphasized');
-
- frame = importdata('C:\\Users\\john\\Desktop\\txt\\Frame.txt');
- subplot(2, 3, 3),
- plot(frame);
- axis([0, 400, -25, 25]);
- xlabel('windowed')
-
- fft = importdata('C:\\Users\\john\\Desktop\\txt\\Fft.txt');
- subplot(2, 3, 4),
- max(fft)
- plot(fft);
- axis([0, 256, 0, 1000000]);
- xlabel('Power spectrum');
-
- mel = importdata('C:\\Users\\john\\Desktop\\txt\\Mel.txt');
- subplot(2, 3, 5),
- plot(mel);
- xlabel('40 point Mel spectrum');
-
- duishu = importdata('C:\\Users\\john\\Desktop\\txt\\LogMel.txt');
- subplot(2, 3, 6),
- plot(duishu);
- axis([0, 40, 0, 15]);
- xlabel('Log Mel spectrum');
参考资料:
1.http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/
2.http://www.speech.cs.cmu.edu/15-492/slides/03_mfcc.pdf
3.http://blog.csdn.net/zouxy09/article/details/9156785
4.http://my.oschina.net/jamesju/blog/193343
最后,引用龙应台的一句话作为总结:
”有些事,只能一个人做;有些关,只能一个人过;有些路啊,只能一个人走。“