vggish目标识别

Mathworks官网实例链接vggish
这个实例用到两个工具箱:Audio ToolboxsDeepLearning Toolboxs
数据集下载链接:here

http://ssd.mathworks.com/supportfiles/audio/ESC-10.zip

下载并解压缩VGGish的音频工具箱™模型:
在命令窗口输入

vggish

如果VGGish的Audio Toolbox模型没有安装,那么该函数将提供一个下载模型的链接,单击链接;将该文件解压缩到MATLAB路径中的某个位置,并且设置路径。

Step1:
下载并解压缩环境声音分类数据集。 该数据集由标记为10个不同音频声音类之一的录音组成。 或者执行下面命令下载并解压VGGish模型到您的临时目录。

url = 'http://ssd.mathworks.com/supportfiles/audio/ESC-10.zip';
downloadFolder = fullfile(tempdir,'ESC-10');
datasetLocation = tempdir;

if ~exist(fullfile(tempdir,'ESC-10'),'dir')
    loc = websave(downloadFolder,url);
    unzip(loc,fullfile(tempdir,'ESC-10'))
end

Step2:
创建一个audioDatastore对象来管理数据,并将其划分为训练集和验证集。 调用countEachLabel来显示声音类的分布和唯一标签的数量。

ads = audioDatastore(downloadFolder,'IncludeSubfolders',true,'LabelSource','foldernames');
labelTable = countEachLabel(ads)

在这里插入图片描述
确定类的总数:

numClasses = size(labelTable,1);

将数据集分割为训练集和验证集,查看训练集和验证集中标签的分布情况,类的总数。

[adsTrain, adsValidation] = splitEachLabel(ads,0.8);

countEachLabel(adsTrain)

在这里插入图片描述

countEachLabel(adsValidation)

在这里插入图片描述
Step3:
VGGish网络希望音频被预处理成log-mel频谱图。 support函数vggishPreprocess以audioDatastore对象和log-mel谱图之间的重叠百分比作为输入,并返回适合作为输入的VGGish网络的预测器和响应矩阵。

overlapPercentage = 75;

[trainFeatures,trainLabels] = vggishPreprocess(adsTrain,overlapPercentage);
[validationFeatures,validationLabels,segmentsPerFile] = vggishPreprocess(adsValidation,overlapPercentage);

Step 4:
加载VGGish模型并将其转换为layerGraph对象。

net = vggish;

lgraph = layerGraph(net.Layers);

Step 5:
使用removeLayers将最终的回归输出层从图中移除。 删除回归层后,图的最后一层是名为“EmbeddingBatch”的ReLU层。

lgraph = removeLayers(lgraph,'regressionoutput');
lgraph.Layers(end)

使用addLayers向图中添加一个fulllyconnectedlayer、一个softmaxLayer和一个classificationLayer。

lgraph = addLayers(lgraph,fullyConnectedLayer(numClasses,'Name','FCFinal'));
lgraph = addLayers(lgraph,softmaxLayer('Name','softmax'));
lgraph = addLayers(lgraph,classificationLayer('Name','classOut'));

使用connectLayers将全连接层、softmax层和分类层附加到层图中。

lgraph = connectLayers(lgraph,'EmbeddingBatch','FCFinal');
lgraph = connectLayers(lgraph,'FCFinal','softmax');
lgraph = connectLayers(lgraph,'softmax','classOut');

要定义培训选项,请使用trainingOptions。

miniBatchSize = 128;
options = trainingOptions('adam', ...
    'MaxEpochs',5, ...
    'MiniBatchSize',miniBatchSize, ...
    'Shuffle','every-epoch', ...
    'ValidationData',{validationFeatures,validationLabels}, ...
    'ValidationFrequency',50, ...
    'LearnRateSchedule','piecewise', ...
    'LearnRateDropFactor',0.5, ...
    'LearnRateDropPeriod',2);

使用trainNetwork训练网络

[trainedNet, netInfo] = trainNetwork(trainFeatures,trainLabels,lgraph,options);

在这里插入图片描述
每个音频文件被分割成几个片段,然后输入到VGGish网络中。 使用多数规则决策组合验证集中每个文件的预测。

validationPredictions = classify(trainedNet,validationFeatures);

idx = 1;
validationPredictionsPerFile = categorical;
for ii = 1:numel(adsValidation.Files)
    validationPredictionsPerFile(ii,1) = mode(validationPredictions(idx:idx+segmentsPerFile(ii)-1));
    idx = idx + segmentsPerFile(ii);
end

使用confusionChart来评估验证集中网络的性能。

figure('Units','normalized','Position',[0.2 0.2 0.5 0.5]);
cm = confusionchart(adsValidation.Labels,validationPredictionsPerFile);
cm.Title = sprintf('Confusion Matrix for Validation Data \nAccuracy = %0.2f %%',mean(validationPredictionsPerFile==adsValidation.Labels)*100);
cm.ColumnSummary = 'column-normalized';
cm.RowSummary = 'row-normalized';

在这里插入图片描述

function [predictor,response,segmentsPerFile] = vggishPreprocess(ads,overlap)
% This function is for example purposes only and may be changed or removed
% in a future release.

% Create filter bank
FFTLength = 512;
numBands = 64;
fs0 = 16e3;
filterBank = designAuditoryFilterBank(fs0, ...
    'FrequencyScale','mel', ...
    'FFTLength',FFTLength, ...
    'FrequencyRange',[125 7500], ...
    'NumBands',numBands, ...
    'Normalization','none', ...
    'FilterBankDesignDomain','warped');

% Define STFT parameters
windowLength = 0.025 * fs0;
hopLength = 0.01 * fs0;
win = hann(windowLength,'periodic');

% Define spectrogram segmentation parameters
segmentDuration = 0.96; % seconds
segmentRate = 100; % hertz
segmentLength = segmentDuration*segmentRate; % Number of spectrums per auditory spectrograms
segmentHopDuration = (100-overlap) * segmentDuration / 100; % Duration (s) advanced between auditory spectrograms
segmentHopLength = round(segmentHopDuration * segmentRate); % Number of spectrums advanced between auditory spectrograms

% Preallocate cell arrays for the predictors and responses
numFiles = numel(ads.Files);
predictor = cell(numFiles,1);
response = predictor;
segmentsPerFile = zeros(numFiles,1);

% Extract predictors and responses for each file
for ii = 1:numFiles
    [audioIn,info] = read(ads);

    x = single(resample(audioIn,fs0,info.SampleRate));

    Y = stft(x, ...
        'Window',win, ...
        'OverlapLength',windowLength-hopLength, ...
        'FFTLength',FFTLength, ...
        'FrequencyRange','onesided');
    Y = abs(Y);

    logMelSpectrogram = log(filterBank*Y + single(0.01))';
    
    % Segment log-mel spectrogram
    numHops = floor((size(Y,2)-segmentLength)/segmentHopLength) + 1;
    segmentedLogMelSpectrogram = zeros(segmentLength,numBands,1,numHops);
    for hop = 1:numHops
        segmentedLogMelSpectrogram(:,:,1,hop) = logMelSpectrogram(1+segmentHopLength*(hop-1):segmentLength+segmentHopLength*(hop-1),:);
    end

    predictor{ii} = segmentedLogMelSpectrogram;
    response{ii} = repelem(info.Label,numHops);
    segmentsPerFile(ii) = numHops;
end

% Concatenate predictors and responses into arrays
predictor = cat(4,predictor{:});
response = cat(2,response{:});
end
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

qq-120

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值