深度学习引论(五):手写数字识别 Ⅱ(目标输出为音频)

神经网络其实可以处理很大的数据,也可以处理音频。接下来我们就用使用上次的栗子来验证一次,但这次的目标输出是一个2983*1的音频,远远大于之前的10*1。

同样的还是一步一步的来

Step1 :Prepare Data

我们还是使用MNIST_small作为输入。(MNIST_small的介绍请参照上一篇博客)

但这次,我们不使用现有的标签作为目标输出,而是使用已经录制好的10个数字的音频(2983*10)作为神经网络的目标输出。(音频及代码框架下载地址

代码如下:

% prepare the data set
load mnist_small_matlab.mat;
train_size = 10000;
X_train{1} = reshape(trainData(1: 14, 1: 14, :), [], train_size);
X_train{2} = reshape(trainData(15: 28, 1: 14, :), [], train_size);
X_train{3} = reshape(trainData(15: 28, 15: 28, :), [], train_size);
X_train{4} = reshape(trainData(1: 14, 15: 28, :), [], train_size);
X_train{5} = zeros (0, train_size);
X_train{6} = zeros (0, train_size);
X_train{7} = zeros (0, train_size);
X_train{8} = zeros (0, train_size);

test_size = 2000;
X_test{1} = reshape(testData(1: 14, 1: 14, :), [], test_size);
X_test{2} = reshape(testData(15: 28, 1: 14, :), [], test_size);
X_test{3} = reshape(testData(15: 28, 15: 28, :), [], test_size);
X_test{4} = reshape(testData(1: 14, 15: 28, :), [], test_size);
X_test{5} = zeros (0, test_size);
X_test{6} = zeros (0, test_size);
X_test{7} = zeros (0, test_size);
X_test{8} = zeros (0, test_size);

% prepare standard speech audio

audio_list = dir('audio/*.wav');

for i = 1 : numel(audio_list)
    audio_name = audio_list(i).name;
    [aud, ~] = audioread(strcat('audio/', audio_name));
end

目标输出的音频处理放在之后介绍


Step2 :Design Network Architecture

这里写图片描述

代码如下:

% define network architecture

layer_size = [196 6000
              196 6000
              196 6000
              196 6000
                0 4000
                0 4000
                0 4000
                0 2983];
L = 8;

Step3 : Initialize Parameters

Initialize Weights

for l = 1: L - 1
    w{l} = (rand(layer_size(l + 1, 2), sum(layer_size(l, :))) * 2 - 1) * sqrt(6 / (layer_size(l + 1, 2) + sum(layer_size(l, :))));
end

Choose Parameters

alpha = 1; %learning rate 学习率
max_iter = 300; %number of iteration 迭代次数
mini_batch = 100; %number of samples in a batch 每一批处理的样本个数

注:mini_batch表示每一次批处理的样本个数,即每次批处理100个样本,而不是直接处理10000个样本,每次处理的100个样本是随机从10000个样本中选择出来的。


Step4 : Run the Network

激活函数

经验表明,以ReLU函数作为激活函数往往能够取得较好的训练效果。在本次试验中,除倒数第二层外,其余层均使用ReLU函数作为激活函数。

神经网络的输出是一个有10个元素的列向量,这个列向量只能有一位为1,其余为0,第几位为1表示这是数字几。如第0位为1,则判断该数字为0.(从0开始数数)

考虑到神经网络的输出,我们在最后一层的前一层使用sigmoid函数作为激活函数,以保证输出的结果为0到1直接的数。举个栗子,以数字‘8’作为输入,若输出的结果向量中的第八位非常接近1,其余位接近0,则认为该样本为数字‘8’,神经网络的输出结果正确。

前向计算

ReLU函数:

function [a_next, z_next] = fc(w, a, x)
    % define the activation function
    f = @(s) max(0, s);

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % Your code BELOW
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % forward computing (either component or vector form)
    a = [x
         a];
    z_next = w * a;
    a_next = f(z_next);
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % Your code ABOVE
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end

Sigmoid函数:

function [a_next, z_next] = fc2(w, a, x)
    % define the activation function
    f = @(s) 1 ./ (1 + exp(-s));

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % Your code BELOW
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % forward computing (either component or vector form)
    a = [x
         a];
    z_next = w * a;
    a_next = f(z_next);
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % Your code ABOVE
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end

Cost Function

J = [J 1/2/mini_batch*sum((a{L}(:) - y(:)).^2)];

后向计算

ReLU函数:

function delta = bc(w, z, delta_next)
    % define the activation function
    f = @(s) max(0, s);

    % define the derivative of activation function
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % Your code BELOW
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % backward computing (either component or vector form)
    xxj = size(z, 1);
    delta = w' * delta_next;
    df = [];
    for i = 1 : size(z, 1)
        for j = 1 : size(z, 2)
            if z(i, j) > 0
                df(i, j) = 1;
            else
                df(i, j) = 0;
            end
        end
    end
    delta = delta(1 : xxj, :)  .* df;
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % Your code ABOVE
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end

Sigmoid函数:

function delta = bc2(w, z, delta_next)
    % define the activation function
    f = @(s) 1 ./ (1 + exp(-s)); 
    % define the derivative of activation function
    df = @(s) f(s) .* (1 - f(s)); 

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % Your code BELOW
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % backward computing (either component or vector form)
    xxj = size(z, 1);
    delta = w' * delta_next;
    delta = delta(1 : xxj, :)  .* df(z);
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % Your code ABOVE
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end

更新权值

for l = 1 : L - 1
    gw = delta{l + 1} * [x{l}; a{l}]' / mini_batch;
    w{l} = w{l} - alpha * gw;
end

Step5 : Evaluation

Acc = number of correct predictionsnumber of samples

Accuracy of training set :

a{1} = zeros(layer_size(1, 2), train_size);
for l = 1 : L - 1
    a{l + 1} = fc(w{l}, a{l}, X_train{l});
end
[~, ind_train] = max(trainLabels);
[~, ind_pred] = max(a{L});
train_acc = sum(ind_train == ind_pred) / train_size;
fprintf('Accuracy on training dataset is %f%%\n', train_acc * 100);

Accuracy of testing set :

a{1} = zeros(layer_size(1, 2), test_size);
for l = 1 : L - 1
    a{l + 1} = fc(w{l}, a{l}, X_test{l});
end
[~, ind_test] = max(testLabels);
[~, ind_pred] = max(a{L});
test_acc = sum(ind_test == ind_pred) / test_size;
fprintf('Accuracy on testing dataset is %f%%\n', test_acc * 100);

完整代码:(目标输出的处理请参照代码)

% clear workspace and close plot windows
clear;
close all;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Your code BELOW
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% prepare the data set
load mnist_small_matlab.mat;
train_size = 10000;
X_train{1} = reshape(trainData(1: 14, 1: 14, :), [], train_size);
X_train{2} = reshape(trainData(15: 28, 1: 14, :), [], train_size);
X_train{3} = reshape(trainData(15: 28, 15: 28, :), [], train_size);
X_train{4} = reshape(trainData(1: 14, 15: 28, :), [], train_size);
X_train{5} = zeros (0, train_size);
X_train{6} = zeros (0, train_size);
X_train{7} = zeros (0, train_size);
X_train{8} = zeros (0, train_size);

test_size = 2000;
X_test{1} = reshape(testData(1: 14, 1: 14, :), [], test_size);
X_test{2} = reshape(testData(15: 28, 1: 14, :), [], test_size);
X_test{3} = reshape(testData(15: 28, 15: 28, :), [], test_size);
X_test{4} = reshape(testData(1: 14, 15: 28, :), [], test_size);
X_test{5} = zeros (0, test_size);
X_test{6} = zeros (0, test_size);
X_test{7} = zeros (0, test_size);
X_test{8} = zeros (0, test_size);

% prepare standard speech audio

audio_list = dir('audio/*.wav');

for i = 1 : numel(audio_list)
    audio_name = audio_list(i).name;
    [aud, ~] = audioread(strcat('audio/', audio_name));
end

% choose parameters

alpha = 1;
max_iter = 300;
mini_batch = 100;
J = [];
Acc = [];

% define network architecture

layer_size = [196 6000
              196 6000
              196 6000
              196 6000
                0 4000
                0 4000
                0 4000
                0 2983];
L = 8;

% initialize weights

for l = 1: L - 1
    w{l} = (rand(layer_size(l + 1, 2), sum(layer_size(l, :))) * 2 - 1) * sqrt(6 / (layer_size(l + 1, 2) + sum(layer_size(l, :))));
end

% train

for iter = 1 : max_iter

    ind = randperm(train_size);

    for k = 1 : ceil(train_size / mini_batch)

        a{1} = zeros(layer_size(1, 2), mini_batch);

        for l = 1 : L
            x{l} = X_train{l}(:, ind((k - 1) * mini_batch + 1 : min(k * mini_batch, train_size)));
        end

        % 目标输出
        y = [];
        y2 = [];
        y1 = double(trainLabels( :, ind((k - 1) * mini_batch + 1 : min(k * mini_batch, train_size))));
        for i = 1 : 100
            y2 = find(y1(:, i)) - 1;
            if y2 ~= 0
                audio_name = audio_list(y2).name;
            else
                ausio_name = 0;
            end
            y = [y double(audioread(strcat('audio/', audio_name)))];
        end

        for l = 1 : L-2
            [a{l + 1}, z{l + 1}] = fc(w{l}, a{l}, x{l});
        end

        [a{L}, z{L}] = fc2(w{L - 1}, a{L - 1}, x{L - 1});

        J = [J 1/2/mini_batch*sum((a{L}(:) - y(:)).^2)];
        fprintf('J=%.4f\n', J);

        [~, ind_y] = max(y);
        [~, ind_pred] = max(a{L});
        xxj = sum(ind_y == ind_pred) / mini_batch;
        Acc = [Acc xxj];

        delta{L} = (a{L} - y) .* a{L} .* (1 - a{L});

        delta{L - 1} = bc2(w{L - 1}, z{L - 1}, delta{L});

        for l = L - 2 : -1 : 2
            delta{l} = bc(w{l}, z{l}, delta{l + 1});
        end

        for l = 1 : L - 1
            gw = delta{l + 1} * [x{l}; a{l}]' / mini_batch;
            w{l} = w{l} - alpha * gw;
        end

    end

end

figure
plot(J);


figure
plot(Acc);

% save model

save model.mat w layer_size

% display/listen to some results pairs

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Your code ABOVE
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

我的结果太差了。。。还是不贴结果了
不知道是不是神经网络的设计又问题
求指导!

评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值