通过支持向量机实现对噪声的分类,提升语音增强以及VAD的效果。
当样例是以输入/输出对的形式给出时,学习方法称为有监督学习。
当样例仅以输入的形式给出,而没有输出结果时候,成为无监督学习,包括密度估计、分布类型学习、聚类等领域的学习。
另外还有查询学习、强化学习等。按照学习的过程来分,还可以分为批量学习、线性学习等。
按照输出结果的不同,分类问题被分为二类问题、多类问题、回归问题。
支持向量机matlab工具包常用的是libSVM,matlab自带的应用范围稍窄。
1、下载编译libSVM、测试见以下网址
2、准备训练数据集
噪声数据集使用NOISEx92噪声库,经降采样之后转成8000Hz信号,并提取MFCC特征参数,对每一帧的特征参数进行分类标记。
噪声数据地址
标记代码如下:
%{
噪声分类标记实现步骤:
1、提取噪声库内各类噪声文件前半段的相关特征,例如MFCC参数等
2、将相关特征保存到一个矩阵中,每一帧的参数都进行类别标记,并将标记结果写入到列
向量中
%}
clc
clear all
close all
% 添加的类标
label = 15;
%% 读取音频文件
switch(label)
case 1
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\babble8000.wav');
case 2
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\buccaneer18000.wav');
case 3
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\buccaneer28000.wav');
case 4
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\destroyerops8000.wav');
case 5
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\f168000.wav');
case 6
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\factory18000.wav');
case 7
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\factory28000.wav');
case 8
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\hfchannel8000.wav');
case 9
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\leopard8000.wav');
case 10
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\m1098000.wav');
case 11
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\machinegun8000.wav');
case 12
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\pink8000.wav');
case 13
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\volvo8000.wav');
case 14
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\white8000.wav');
case 15
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\destroyerengine8000.wav');
end
%% 提取音频特征MFCC
% 音频预处理
smooth = 0; % 1:对输入数据进行平滑处理,0:不平滑
preEmphasis = 0; % 预加重处理
audioData = audioData/(max(audioData)); % 音频数据归一化
% 转单声道
if (size(audioData,2)>1)
audioData = (sum(audioData,2)/2); % 转到单声道
end
% 判断是否需要平滑
if (smooth == 1)
audioData = smoothdata(audioData);
end
% 判断是否需要预加重
if preEmphasis == 1
audioData = filter([1 -0.9375],1,audioData);
end
% 音频数据分帧
numSamplesPerFrame = 256;
stStep = 80;
overlap = numSamplesPerFrame-stStep;
mtWinRatio = 40;
mtStepRatio = 20;
numOfShortFeatures = 13;
audioArray = buffer(audioData,numSamplesPerFrame,overlap,'nodelay');
[winLength,numOfFrames] = size(audioArray);
% mfcc滤波器初始化
mfccParams = feature_mfccs_init(winLength,fs);
% 语音信号的短时特征提取
Features = zeros(numOfFrames,numOfShortFeatures);
for iloop1 = 1:numOfFrames
% 加窗做傅里叶计算,窗函数应调整为平顶窗
Ham = window(@hamming, winLength);
curFrame = audioArray(:,iloop1);
curFrameWin = curFrame .* Ham; % 获取当前帧的加窗结果
curFrameFFT = getDFT(curFrameWin,fs); % 对当前帧数据进行傅里叶计算
Features(iloop1,:) = feature_mfccs(curFrameFFT, mfccParams)';
end
%% 选取前半部分MFCC数据作为训练数据集,并标注类标保存
labelVector = ones(floor(numOfFrames/2),1)*label;
instArray = Features(1:floor(numOfFrames/2),:);
if exist('noiseClassTrainingData.mat','file')
tmpData = load('noiseClassTrainingData.mat');
if length(fieldnames(tmpData))==2
labelVector = [tmpData.labelVector;labelVector];
instArray = [tmpData.instArray;instArray];
end
save noiseClassTrainingData.mat labelVector instArray -append;
else
save noiseClassTrainingData.mat labelVector instArray;
end
3、训练、以及分类
将分类标记和特征参数矩阵传递给SVMTRAIN函数,获得训练模型。
%{
SVM噪声分类训练
1、导入训练数据
2、训练
3、将训练模型保存到mat文件中
%}
clc
clear all
close all
%% 导入训练数据
load('noiseClassTrainingData.mat');
%% 数据训练
modelNoiseClass = svmtrain(labelVector,instArray,'-c 1 -g 0.07'); %#ok<SVMTRAIN>
save trainModel.mat modelNoiseClass;
将待分类噪声的MFCC特征矩阵、和训练模型传递给SVMPREDICT函数,对噪声进行分类。
%{
噪声分类测试
1、导入分类模型
2、读取特定语音数据
3、提取MFCC参数
4、根据参数对噪声进行分类
5、统计噪声分类的准确性
%}
clc
clear all
close all
%% 读取音频数据
% 选择要读取的文件
label = 13;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 1、babble;2、buccaneer1;3、buccaneer2;4、destroyerops;5、f16;
% 6、factory1;7、factory2;8、hfchannel;9、leopard;10、m109;
% 11、machinegun;12、pink;13、volvo;14、white;15、destroyerengine
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 读取音频文件
switch(label)
case 1
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\babble8000.wav');
case 2
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\buccaneer18000.wav');
case 3
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\buccaneer28000.wav');
case 4
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\destroyerops8000.wav');
case 5
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\f168000.wav');
case 6
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\factory18000.wav');
case 7
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\factory28000.wav');
case 8
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\hfchannel8000.wav');
case 9
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\leopard8000.wav');
case 10
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\m1098000.wav');
case 11
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\machinegun8000.wav');
case 12
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\pink8000.wav');
case 13
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\volvo8000.wav');
case 14
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\white8000.wav');
case 15
[audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\destroyerengine8000.wav');
end
%% 提取MFCC特征值
% 音频预处理
smooth = 0; % 1:对输入数据进行平滑处理,0:不平滑
preEmphasis = 0; % 预加重处理
audioData = audioData/(max(audioData)); % 音频数据归一化
% 转单声道
if (size(audioData,2)>1)
audioData = (sum(audioData,2)/2); % 转到单声道
end
% 判断是否需要平滑
if (smooth == 1)
audioData = smoothdata(audioData);
end
% 判断是否需要预加重
if preEmphasis == 1
audioData = filter([1 -0.9375],1,audioData);
end
% 音频数据分帧
numSamplesPerFrame = 256;
stStep = 80;
overlap = numSamplesPerFrame-stStep;
mtWinRatio = 40;
mtStepRatio = 20;
numOfShortFeatures = 13;
audioArray = buffer(audioData,numSamplesPerFrame,overlap,'nodelay');
[winLength,numOfFrames] = size(audioArray);
% mfcc滤波器初始化
mfccParams = feature_mfccs_init(winLength,fs);
% 语音信号的短时特征提取
Features = zeros(numOfFrames,numOfShortFeatures);
for iloop1 = 1:numOfFrames
% 加窗做傅里叶计算,窗函数应调整为平顶窗
Ham = window(@hamming, winLength);
curFrame = audioArray(:,iloop1);
curFrameWin = curFrame .* Ham; % 获取当前帧的加窗结果
curFrameFFT = getDFT(curFrameWin,fs); % 对当前帧数据进行傅里叶计算
Features(iloop1,:) = feature_mfccs(curFrameFFT, mfccParams)';
end
%% 噪声分类测试
% 截取待测噪声段的MFCC特征
testInstArray = Features(floor(numOfFrames/2)+1:end,:);
resultLabel = ones(size(testInstArray,1),1)*label;
load('trainModel.mat');
[resultPrecict,accuracy,dec_value] = svmpredict(resultLabel,testInstArray,modelNoiseClass);
plot(resultPrecict)
%% 结果统计
result1 = resultPrecict - label;
result2 = (result1 == 0);
numZero = sum(result2);
rate = numZero/size(testInstArray,1);
disp(rate)
统计分析分类结果。
Accuracy = 99.9915% (11760/11761) (classification)
0.9999
用到的子函数:
离散傅里叶变换子函数
function [FFT, Freq] = getDFT(signal, Fs, PLOT)
%
% function [FFT, Freq] = getDFT(signal, Fs, PLOT)
%
% This function returns the DFT of a discrete signal and the
% respective frequency range.
%
% ARGUMENTS:
% - signal: vector containing the samples of the signal
% - Fs: the sampling frequency
% - PLOT: use this argument if the FFT (and the respective
% frequency values) need to be returned in the
% [-fs/2..fs/2] range. Otherwise, only half of
% the spectrum is returned.
%
% RETURNS:
% - FFT: the magnitude of the DFT coefficients
% - Freq: the corresponding frequencies (in Hz)
%
N = length(signal); % length of signal
% compute the magnitude of the spectrum
% (and normalize by the number of samples):
FFT = abs(fft(signal)) / N;
if nargin==2 % return the first half of the spectrum:
FFT = FFT(1:ceil(N/2));
Freq = (Fs/2) * (1:ceil(N/2)) / ceil(N/2); % define the frequency axis
else
if (nargin==3)
% ... or return the whole spectrum
% (in the range -fs/2 to fs/2)
FFT = fftshift(FFT);
if mod(N,2)==0 % define the frequency axis:
Freq = -N/2:N/2-1; % if N is even
else
Freq = -(N-1)/2:(N-1)/2; % if N is odd
end
Freq = (Fs/2) * Freq ./ ceil(N/2);
end
end
mfcc参数提取函数初始化
function mfccParams = feature_mfccs_init(windowLength,fs)
% function mfccParams = feature_mfccs_inst(windowLength,fs)
%
% This function is used to initalize mfccc quantities
% used in the MFCC caculation
%
% ARGUMENTS:
% - windowLength: the length of the window analysis (in number
% of samples)
% - fs:
%
% RETURNS:
% mfccParams: teturns a structure with the mfcc params:
%
% number of cepstral coefficients:
mfccParams.cepstralCoefficients = 13;
% fft resolution:
mfccParams.fftSize = round(windowLength/2);
% filter parameters:
mfccParams.lowestFrequency = 133.333;
mfccParams.linearFilters = 13;
mfccParams.linearSpacing = 66.666666;
mfccParams.logFilters = 27;
mfccParams.logSpacing = 1.0711703;
mfccParams.totalFilters = mfccParams.linearFilters + ...
mfccParams.logFilters;
mfccParams.freqs = mfccParams.lowestFrequency + ...
(0:mfccParams.linearFilters-1)*mfccParams.linearSpacing;
mfccParams.freqs(mfccParams.linearFilters+1:mfccParams.totalFilters+2) = ...
mfccParams.freqs(mfccParams.linearFilters)*...
mfccParams.logSpacing.^(1:mfccParams.logFilters+2);
mfccParams.lower = mfccParams.freqs(1:mfccParams.totalFilters);
mfccParams.center = mfccParams.freqs(2:mfccParams.totalFilters+1);
mfccParams.upper = mfccParams.freqs(3:mfccParams.totalFilters+2);
mfccParams.mfccFilterWeight = zeros(mfccParams.totalFilters,mfccParams.fftSize);
mfccParams.triangleHeight = 2./(mfccParams.upper-mfccParams.lower);
mfccParams.fftFreqs = (0:mfccParams.fftSize-1)/mfccParams.fftSize*fs;
for chan = 1:mfccParams.totalFilters % for each filter:
% compute the respective filter weights:
mfccParams.mfccFilterWeights(chan,:) = (mfccParams.fftFreqs>...
mfccParams.lower(chan)&mfccParams.fftFreqs<=mfccParams.center(chan).*...
mfccParams.triangleHeight(chan).*...
(mfccParams.fftFreqs-mfccParams.lower(chan))/...
(mfccParams.center(chan)-mfccParams.lower(chan))+...
(mfccParams.fftFreqs>mfccParams.center(chan)&...
mfccParams.fftFreqs<mfccParams.upper(chan)).*...
mfccParams.triangleHeight(chan).*...
(mfccParams.upper(chan)-mfccParams.fftFreqs)/...
(mfccParams.upper(chan)-mfccParams.center(chan)));
end
% matrix used in the DCT calculation :
mfccParams.mfccDCTMatrix = 1/sqrt(mfccParams.totalFilters/2)*...
cos((0:(mfccParams.cepstralCoefficients-1))'*...
(2*(0:(mfccParams.totalFilters-1))+1)*pi/2/mfccParams.totalFilters);
mfccParams.mfccDCTMatrix(1,:) = mfccParams.mfccDCTMatrix(1,:)*sqrt(2)/2;
梅尔系数提取
function ceps = feature_mfccs(windowFFT,mfccParams)
% function ceps = feature_mfccs(windowFFT,mfccParams)
% This function computes the mfccs using the provided DFT.
% The parameters (DCT,filter banks,etc) need to have been
% computed using the feature_mfccs_init function.
earMag = log10(mfccParams.mfccFilterWeights*windowFFT+eps);
ceps = mfccParams.mfccDCTMatrix*earMag;