Mathworks官网实例链接vggish
这个实例用到两个工具箱:Audio Toolboxs 和 DeepLearning Toolboxs
数据集下载链接:here
http://ssd.mathworks.com/supportfiles/audio/ESC-10.zip
下载并解压缩VGGish的音频工具箱™模型:
在命令窗口输入
vggish
如果VGGish的Audio Toolbox模型没有安装,那么该函数将提供一个下载模型的链接,单击链接;将该文件解压缩到MATLAB路径中的某个位置,并且设置路径。
Step1:
下载并解压缩环境声音分类数据集。 该数据集由标记为10个不同音频声音类之一的录音组成。 或者执行下面命令下载并解压VGGish模型到您的临时目录。
url = 'http://ssd.mathworks.com/supportfiles/audio/ESC-10.zip';
downloadFolder = fullfile(tempdir,'ESC-10');
datasetLocation = tempdir;
if ~exist(fullfile(tempdir,'ESC-10'),'dir')
loc = websave(downloadFolder,url);
unzip(loc,fullfile(tempdir,'ESC-10'))
end
Step2:
创建一个audioDatastore对象来管理数据,并将其划分为训练集和验证集。 调用countEachLabel来显示声音类的分布和唯一标签的数量。
ads = audioDatastore(downloadFolder,'IncludeSubfolders',true,'LabelSource','foldernames');
labelTable = countEachLabel(ads)
确定类的总数:
numClasses = size(labelTable,1);
将数据集分割为训练集和验证集,查看训练集和验证集中标签的分布情况,类的总数。
[adsTrain, adsValidation] = splitEachLabel(ads,0.8);
countEachLabel(adsTrain)
countEachLabel(adsValidation)
Step3:
VGGish网络希望音频被预处理成log-mel频谱图。 support函数vggishPreprocess以audioDatastore对象和log-mel谱图之间的重叠百分比作为输入,并返回适合作为输入的VGGish网络的预测器和响应矩阵。
overlapPercentage = 75;
[trainFeatures,trainLabels] = vggishPreprocess(adsTrain,overlapPercentage);
[validationFeatures,validationLabels,segmentsPerFile] = vggishPreprocess(adsValidation,overlapPercentage);
Step 4:
加载VGGish模型并将其转换为layerGraph对象。
net = vggish;
lgraph = layerGraph(net.Layers);
Step 5:
使用removeLayers将最终的回归输出层从图中移除。 删除回归层后,图的最后一层是名为“EmbeddingBatch”的ReLU层。
lgraph = removeLayers(lgraph,'regressionoutput');
lgraph.Layers(end)
使用addLayers向图中添加一个fulllyconnectedlayer、一个softmaxLayer和一个classificationLayer。
lgraph = addLayers(lgraph,fullyConnectedLayer(numClasses,'Name','FCFinal'));
lgraph = addLayers(lgraph,softmaxLayer('Name','softmax'));
lgraph = addLayers(lgraph,classificationLayer('Name','classOut'));
使用connectLayers将全连接层、softmax层和分类层附加到层图中。
lgraph = connectLayers(lgraph,'EmbeddingBatch','FCFinal');
lgraph = connectLayers(lgraph,'FCFinal','softmax');
lgraph = connectLayers(lgraph,'softmax','classOut');
要定义培训选项,请使用trainingOptions。
miniBatchSize = 128;
options = trainingOptions('adam', ...
'MaxEpochs',5, ...
'MiniBatchSize',miniBatchSize, ...
'Shuffle','every-epoch', ...
'ValidationData',{validationFeatures,validationLabels}, ...
'ValidationFrequency',50, ...
'LearnRateSchedule','piecewise', ...
'LearnRateDropFactor',0.5, ...
'LearnRateDropPeriod',2);
使用trainNetwork训练网络
[trainedNet, netInfo] = trainNetwork(trainFeatures,trainLabels,lgraph,options);
每个音频文件被分割成几个片段,然后输入到VGGish网络中。 使用多数规则决策组合验证集中每个文件的预测。
validationPredictions = classify(trainedNet,validationFeatures);
idx = 1;
validationPredictionsPerFile = categorical;
for ii = 1:numel(adsValidation.Files)
validationPredictionsPerFile(ii,1) = mode(validationPredictions(idx:idx+segmentsPerFile(ii)-1));
idx = idx + segmentsPerFile(ii);
end
使用confusionChart来评估验证集中网络的性能。
figure('Units','normalized','Position',[0.2 0.2 0.5 0.5]);
cm = confusionchart(adsValidation.Labels,validationPredictionsPerFile);
cm.Title = sprintf('Confusion Matrix for Validation Data \nAccuracy = %0.2f %%',mean(validationPredictionsPerFile==adsValidation.Labels)*100);
cm.ColumnSummary = 'column-normalized';
cm.RowSummary = 'row-normalized';
function [predictor,response,segmentsPerFile] = vggishPreprocess(ads,overlap)
% This function is for example purposes only and may be changed or removed
% in a future release.
% Create filter bank
FFTLength = 512;
numBands = 64;
fs0 = 16e3;
filterBank = designAuditoryFilterBank(fs0, ...
'FrequencyScale','mel', ...
'FFTLength',FFTLength, ...
'FrequencyRange',[125 7500], ...
'NumBands',numBands, ...
'Normalization','none', ...
'FilterBankDesignDomain','warped');
% Define STFT parameters
windowLength = 0.025 * fs0;
hopLength = 0.01 * fs0;
win = hann(windowLength,'periodic');
% Define spectrogram segmentation parameters
segmentDuration = 0.96; % seconds
segmentRate = 100; % hertz
segmentLength = segmentDuration*segmentRate; % Number of spectrums per auditory spectrograms
segmentHopDuration = (100-overlap) * segmentDuration / 100; % Duration (s) advanced between auditory spectrograms
segmentHopLength = round(segmentHopDuration * segmentRate); % Number of spectrums advanced between auditory spectrograms
% Preallocate cell arrays for the predictors and responses
numFiles = numel(ads.Files);
predictor = cell(numFiles,1);
response = predictor;
segmentsPerFile = zeros(numFiles,1);
% Extract predictors and responses for each file
for ii = 1:numFiles
[audioIn,info] = read(ads);
x = single(resample(audioIn,fs0,info.SampleRate));
Y = stft(x, ...
'Window',win, ...
'OverlapLength',windowLength-hopLength, ...
'FFTLength',FFTLength, ...
'FrequencyRange','onesided');
Y = abs(Y);
logMelSpectrogram = log(filterBank*Y + single(0.01))';
% Segment log-mel spectrogram
numHops = floor((size(Y,2)-segmentLength)/segmentHopLength) + 1;
segmentedLogMelSpectrogram = zeros(segmentLength,numBands,1,numHops);
for hop = 1:numHops
segmentedLogMelSpectrogram(:,:,1,hop) = logMelSpectrogram(1+segmentHopLength*(hop-1):segmentLength+segmentHopLength*(hop-1),:);
end
predictor{ii} = segmentedLogMelSpectrogram;
response{ii} = repelem(info.Label,numHops);
segmentsPerFile(ii) = numHops;
end
% Concatenate predictors and responses into arrays
predictor = cat(4,predictor{:});
response = cat(2,response{:});
end