利用SVM实现噪声分类

通过支持向量机实现对噪声的分类,提升语音增强以及VAD的效果。

当样例是以输入/输出对的形式给出时,学习方法称为有监督学习。

当样例仅以输入的形式给出,而没有输出结果时候,成为无监督学习,包括密度估计、分布类型学习、聚类等领域的学习。

另外还有查询学习、强化学习等。按照学习的过程来分,还可以分为批量学习、线性学习等。

按照输出结果的不同,分类问题被分为二类问题、多类问题、回归问题。

支持向量机matlab工具包常用的是libSVM,matlab自带的应用范围稍窄。

1、下载编译libSVM、测试见以下网址

安装测试libSVM

2、准备训练数据集

噪声数据集使用NOISEx92噪声库,经降采样之后转成8000Hz信号,并提取MFCC特征参数,对每一帧的特征参数进行分类标记。

噪声数据地址

NOISEx92

标记代码如下:

%{
 噪声分类标记实现步骤:
1、提取噪声库内各类噪声文件前半段的相关特征,例如MFCC参数等
2、将相关特征保存到一个矩阵中,每一帧的参数都进行类别标记,并将标记结果写入到列
    向量中
%}

clc
clear all
close all

% 添加的类标
label = 15;

%% 读取音频文件
switch(label)
    case 1
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\babble8000.wav');
    case 2
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\buccaneer18000.wav');
    case 3
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\buccaneer28000.wav');
    case 4
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\destroyerops8000.wav');
    case 5
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\f168000.wav');
    case 6
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\factory18000.wav');
    case 7
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\factory28000.wav');
    case 8
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\hfchannel8000.wav');
    case 9
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\leopard8000.wav');
    case 10
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\m1098000.wav');
    case 11
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\machinegun8000.wav');
    case 12
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\pink8000.wav');
    case 13
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\volvo8000.wav');
    case 14
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\white8000.wav');
    case 15
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\destroyerengine8000.wav');   
end

%% 提取音频特征MFCC

% 音频预处理
smooth = 0;                 % 1:对输入数据进行平滑处理,0:不平滑
preEmphasis = 0;            % 预加重处理
audioData = audioData/(max(audioData));  % 音频数据归一化
% 转单声道
if (size(audioData,2)>1)
    audioData = (sum(audioData,2)/2); % 转到单声道
end
% 判断是否需要平滑
if (smooth == 1)
    audioData = smoothdata(audioData);
end
% 判断是否需要预加重
if preEmphasis == 1
    audioData = filter([1 -0.9375],1,audioData);
end

% 音频数据分帧
numSamplesPerFrame = 256;
stStep = 80;
overlap = numSamplesPerFrame-stStep;
mtWinRatio = 40;
mtStepRatio = 20;
numOfShortFeatures = 13;
audioArray = buffer(audioData,numSamplesPerFrame,overlap,'nodelay');
[winLength,numOfFrames] = size(audioArray);

% mfcc滤波器初始化
mfccParams = feature_mfccs_init(winLength,fs);

% 语音信号的短时特征提取
Features = zeros(numOfFrames,numOfShortFeatures);
for iloop1 = 1:numOfFrames
    % 加窗做傅里叶计算,窗函数应调整为平顶窗
    Ham = window(@hamming, winLength);
    curFrame = audioArray(:,iloop1);
    curFrameWin  = curFrame .* Ham;                           % 获取当前帧的加窗结果
    curFrameFFT = getDFT(curFrameWin,fs);                     % 对当前帧数据进行傅里叶计算
    Features(iloop1,:) = feature_mfccs(curFrameFFT, mfccParams)';
end

%% 选取前半部分MFCC数据作为训练数据集,并标注类标保存
labelVector = ones(floor(numOfFrames/2),1)*label;
instArray = Features(1:floor(numOfFrames/2),:);
if exist('noiseClassTrainingData.mat','file')
    tmpData = load('noiseClassTrainingData.mat');
    if length(fieldnames(tmpData))==2
        labelVector = [tmpData.labelVector;labelVector];
        instArray = [tmpData.instArray;instArray];
    end
    save noiseClassTrainingData.mat labelVector instArray -append;
else
    save noiseClassTrainingData.mat labelVector instArray;
end

 

3、训练、以及分类

将分类标记和特征参数矩阵传递给SVMTRAIN函数,获得训练模型。

%{
SVM噪声分类训练
1、导入训练数据
2、训练
3、将训练模型保存到mat文件中
%}

clc 
clear all
close all

%% 导入训练数据
load('noiseClassTrainingData.mat');

%% 数据训练
modelNoiseClass = svmtrain(labelVector,instArray,'-c 1 -g 0.07'); %#ok<SVMTRAIN>
save trainModel.mat modelNoiseClass;

将待分类噪声的MFCC特征矩阵、和训练模型传递给SVMPREDICT函数,对噪声进行分类。

%{
噪声分类测试
1、导入分类模型
2、读取特定语音数据
3、提取MFCC参数
4、根据参数对噪声进行分类
5、统计噪声分类的准确性
%}
 
clc
clear all
close all

%% 读取音频数据
% 选择要读取的文件
label = 13;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 1、babble;2、buccaneer1;3、buccaneer2;4、destroyerops;5、f16;
% 6、factory1;7、factory2;8、hfchannel;9、leopard;10、m109;
% 11、machinegun;12、pink;13、volvo;14、white;15、destroyerengine
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 读取音频文件
switch(label)
    case 1
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\babble8000.wav');
    case 2
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\buccaneer18000.wav');
    case 3
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\buccaneer28000.wav');
    case 4
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\destroyerops8000.wav');
    case 5
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\f168000.wav');
    case 6
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\factory18000.wav');
    case 7
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\factory28000.wav');
    case 8
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\hfchannel8000.wav');
    case 9
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\leopard8000.wav');
    case 10
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\m1098000.wav');
    case 11
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\machinegun8000.wav');
    case 12
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\pink8000.wav');
    case 13
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\volvo8000.wav');
    case 14
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\white8000.wav');
    case 15
        [audioData,fs] = audioread('G:\音频文件\principle_voice_en\pureNoise\destroyerengine8000.wav');   
end


%% 提取MFCC特征值
% 音频预处理
smooth = 0;                 % 1:对输入数据进行平滑处理,0:不平滑
preEmphasis = 0;            % 预加重处理
audioData = audioData/(max(audioData));  % 音频数据归一化
% 转单声道
if (size(audioData,2)>1)
    audioData = (sum(audioData,2)/2); % 转到单声道
end
% 判断是否需要平滑
if (smooth == 1)
    audioData = smoothdata(audioData);
end
% 判断是否需要预加重
if preEmphasis == 1
    audioData = filter([1 -0.9375],1,audioData);
end

% 音频数据分帧
numSamplesPerFrame = 256;
stStep = 80;
overlap = numSamplesPerFrame-stStep;
mtWinRatio = 40;
mtStepRatio = 20;
numOfShortFeatures = 13;
audioArray = buffer(audioData,numSamplesPerFrame,overlap,'nodelay');
[winLength,numOfFrames] = size(audioArray);

% mfcc滤波器初始化
mfccParams = feature_mfccs_init(winLength,fs);

% 语音信号的短时特征提取
Features = zeros(numOfFrames,numOfShortFeatures);
for iloop1 = 1:numOfFrames
    % 加窗做傅里叶计算,窗函数应调整为平顶窗
    Ham = window(@hamming, winLength);
    curFrame = audioArray(:,iloop1);
    curFrameWin  = curFrame .* Ham;                           % 获取当前帧的加窗结果
    curFrameFFT = getDFT(curFrameWin,fs);                     % 对当前帧数据进行傅里叶计算
    Features(iloop1,:) = feature_mfccs(curFrameFFT, mfccParams)';
end


%% 噪声分类测试
% 截取待测噪声段的MFCC特征
testInstArray = Features(floor(numOfFrames/2)+1:end,:);
resultLabel = ones(size(testInstArray,1),1)*label;
load('trainModel.mat');
[resultPrecict,accuracy,dec_value] = svmpredict(resultLabel,testInstArray,modelNoiseClass);
plot(resultPrecict)


%% 结果统计
result1 = resultPrecict - label;
result2 = (result1 == 0);
numZero = sum(result2);
rate = numZero/size(testInstArray,1);
disp(rate)

统计分析分类结果。

Accuracy = 99.9915% (11760/11761) (classification)
    0.9999

用到的子函数:

离散傅里叶变换子函数

function [FFT, Freq] = getDFT(signal, Fs, PLOT)

%
% function [FFT, Freq] = getDFT(signal, Fs, PLOT)
%
% This function returns the DFT of a discrete signal and the 
% respective frequency range.
% 
% ARGUMENTS:
% - signal: vector containing the samples of the signal
% - Fs:     the sampling frequency
% - PLOT:   use this argument if the FFT (and the respective 
%           frequency values) need to be returned in the 
%           [-fs/2..fs/2] range. Otherwise, only half of 
%           the spectrum is returned.
%
% RETURNS:
% - FFT:    the magnitude of the DFT coefficients
% - Freq:   the corresponding frequencies (in Hz)
%

N = length(signal);  % length of signal
% compute the magnitude of the spectrum
% (and normalize by the number of samples):
FFT = abs(fft(signal)) / N;

if nargin==2 % return the first half of the spectrum:
    FFT = FFT(1:ceil(N/2));    
    Freq = (Fs/2) * (1:ceil(N/2)) / ceil(N/2);  % define the frequency axis
else
    if (nargin==3) 
        % ... or return the whole spectrum 
        %     (in the range -fs/2 to fs/2)
        FFT = fftshift(FFT);        
        if mod(N,2)==0                      % define the frequency axis:
            Freq = -N/2:N/2-1;              % if N is even
        else
            Freq = -(N-1)/2:(N-1)/2;        % if N is odd
        end
        Freq = (Fs/2) * Freq ./ ceil(N/2);
    end
end

mfcc参数提取函数初始化

function mfccParams = feature_mfccs_init(windowLength,fs)

% function mfccParams = feature_mfccs_inst(windowLength,fs)
%
% This function is used to initalize mfccc quantities
% used in the MFCC caculation
%
% ARGUMENTS:
% - windowLength:               the length of the window analysis (in number
%                               of samples)
% - fs:                         
% 
% RETURNS:                      
% mfccParams:                   teturns a structure with the mfcc params:
%

% number of cepstral coefficients:
mfccParams.cepstralCoefficients = 13;

% fft resolution:
mfccParams.fftSize = round(windowLength/2);
% filter parameters:
mfccParams.lowestFrequency = 133.333;
mfccParams.linearFilters = 13;
mfccParams.linearSpacing = 66.666666;
mfccParams.logFilters = 27;
mfccParams.logSpacing = 1.0711703;
mfccParams.totalFilters = mfccParams.linearFilters + ...
    mfccParams.logFilters;
mfccParams.freqs = mfccParams.lowestFrequency + ...
    (0:mfccParams.linearFilters-1)*mfccParams.linearSpacing;
mfccParams.freqs(mfccParams.linearFilters+1:mfccParams.totalFilters+2) = ...
    mfccParams.freqs(mfccParams.linearFilters)*...
    mfccParams.logSpacing.^(1:mfccParams.logFilters+2);
mfccParams.lower = mfccParams.freqs(1:mfccParams.totalFilters);
mfccParams.center = mfccParams.freqs(2:mfccParams.totalFilters+1);
mfccParams.upper = mfccParams.freqs(3:mfccParams.totalFilters+2);
mfccParams.mfccFilterWeight = zeros(mfccParams.totalFilters,mfccParams.fftSize);
mfccParams.triangleHeight = 2./(mfccParams.upper-mfccParams.lower);
mfccParams.fftFreqs = (0:mfccParams.fftSize-1)/mfccParams.fftSize*fs;

for chan = 1:mfccParams.totalFilters % for each filter:
    % compute the respective filter weights:
    mfccParams.mfccFilterWeights(chan,:) = (mfccParams.fftFreqs>...
        mfccParams.lower(chan)&mfccParams.fftFreqs<=mfccParams.center(chan).*...
        mfccParams.triangleHeight(chan).*...
        (mfccParams.fftFreqs-mfccParams.lower(chan))/...
        (mfccParams.center(chan)-mfccParams.lower(chan))+...
        (mfccParams.fftFreqs>mfccParams.center(chan)&...
        mfccParams.fftFreqs<mfccParams.upper(chan)).*...
        mfccParams.triangleHeight(chan).*...
        (mfccParams.upper(chan)-mfccParams.fftFreqs)/...
        (mfccParams.upper(chan)-mfccParams.center(chan)));
end

% matrix used in the DCT calculation :
mfccParams.mfccDCTMatrix = 1/sqrt(mfccParams.totalFilters/2)*...
    cos((0:(mfccParams.cepstralCoefficients-1))'*...
    (2*(0:(mfccParams.totalFilters-1))+1)*pi/2/mfccParams.totalFilters);
mfccParams.mfccDCTMatrix(1,:) = mfccParams.mfccDCTMatrix(1,:)*sqrt(2)/2;

梅尔系数提取

function ceps = feature_mfccs(windowFFT,mfccParams)

% function ceps = feature_mfccs(windowFFT,mfccParams)
% This function computes the mfccs using the provided DFT.
% The parameters (DCT,filter banks,etc) need to have been 
% computed using the feature_mfccs_init function.

earMag = log10(mfccParams.mfccFilterWeights*windowFFT+eps);
ceps = mfccParams.mfccDCTMatrix*earMag;

 

评论 45
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值