本文的布局如下:首先是本文的基本思想,然后是结合实例的输入和输出,然后是每个函数所起的作用,之后是函数的具体实现,最后是备注
一、基本思想
1、输入样本数据,然后进行训练,然后进行测试
2、深度神经网络训练过程中:首先是进行初始化,根据需求设置神经网络的基本结构;然后进行前向传递(feedforward),层与层之间进行传递,求得误差;然后进行反向传播(back propogation),根据误差最小化原则,使用随机梯度下降法,对各个参数进行求导,确定下降方向,对各个参数进行更新(权重和偏置,该方法类似于单隐层前馈神经网络中的BP神经网络求解算法),在使用样本对神经网络进行训练的过程中,有一个小的case,即单个样本可以多次使用,原因在神经网络发生变化后,那么对该样本的学习能力就会不一样(有点类似于嚼甘蔗,或者说读书,一本书,比如平凡的世界这本书,在自己初中的时候看,在高三看,在复读的那一年看,在本科时候看,在工作的时候看,在研究生阶段看,不同的生命阶段看,总会有不同的体验的,惊喜地发现了这一点,一个深度神经网络也类似于一个正在成长的人,具有成长属性,像一个生命)
3、前向传递阶段:上一层的隐层输出做为本层的输入,具体的原理可参照BP神经网络的原理,如果有由于多层而造成的不同,则会另外进行补充
4、反向传播亦是如此
二、输入和输出
本文以MNIST手写数字识别为研究对象,输入的是10000幅像素为28*28的手写图片,输出的是图片所属的类别(1-10,这10个数字)
对于其他例子,分类问题,亦是如此
三、相关函数
1、function nn = nnsetup(architecture):神经网络的初始化,可以是一层,也可以是多层;返回一个神经网络结构
2、function [nn, L] = nntrain(nn, train_x, train_y, opts, val_x, val_y):神经网络的训练;返回一个神经网络,它更新了激励函数,误差,权重和偏置
3、function nn = nnff(nn, x, y):神经网络的前向传递;返回更新了层激活函数,误差和损失的神经网络结构
4、function nn = nnbp(nn):神经网络的反向传播;返回更新过权重的神经网络结构
5、function nn = nnapplygrads(nn):根据计算出来的参数的梯度对参数(权重和偏置)进行更新;返回更新过权重和偏置的神经网络结构
6、function [loss] = nneval(nn, loss, train_x, train_y, val_x, val_y):评估神经网络的性能;返回更新之后的损失结构体
四、函数具体实现
1、主方法
function test_example_NN
load mnist_uint8;
train_x = double(train_x) / 255;
test_x = double(test_x) / 255;
train_y = double(train_y);
test_y = double(test_y);
% normalize
[train_x, mu, sigma] = zscore(train_x);
test_x = normalize(test_x, mu, sigma);
%% ex1 vanilla neural net
rand('state',0)
nn = nnsetup([784 100 10]);
opts.numepochs = 1; % Number of full sweeps through data
opts.batchsize = 100; % Take a mean gradient step over this many samples
[nn, L] = nntrain(nn, train_x, train_y, opts);
[er, bad] = nntest(nn, test_x, test_y);
assert(er < 0.08, 'Too big error');
%% ex2 neural net with L2 weight decay
rand('state',0)
nn = nnsetup([784 100 10]);
nn.weightPenaltyL2 = 1e-4; % L2 weight decay
opts.numepochs = 1; % Number of full sweeps through data
opts.batchsize = 100; % Take a mean gradient step over this many samples
nn = nntrain(nn, train_x, train_y, opts);
[er, bad] = nntest(nn, test_x, test_y);
assert(er < 0.1, 'Too big error');
%% ex3 neural net with dropout
rand('state',0)
nn = nnsetup([784 100 10]);
nn.dropoutFraction = 0.5; % Dropout fraction
opts.numepochs = 1; % Number of full sweeps through data
opts.batchsize = 100; % Take a mean gradient step over this many samples
nn = nntrain(nn, train_x, train_y, opts);
[er, bad] = nntest(nn, test_x, test_y);
assert(er < 0.1, 'Too big error');
%% ex4 neural net with sigmoid activation function
rand('state',0)
nn = nnsetup([784 100 10]);
nn.activation_function = 'sigm'; % Sigmoid activation function
nn.learningRate = 1; % Sigm require a lower learning rate
opts.numepochs = 1; % Number of full sweeps through data
opts.batchsize = 100; % Take a mean gradient step over this many samples
nn = nntrain(nn, train_x, train_y, opts);
[er, bad] = nntest(nn, test_x, test_y);
assert(er < 0.1, 'Too big error');
%% ex5 plotting functionality
rand('state',0)
nn = nnsetup([784 20 10]);
opts.numepochs = 5; % Number of full sweeps through data
nn.output = 'softmax'; % use softmax output
opts.batchsize = 1000; % Take a mean gradient step over this many samples
opts.plot = 1; % enable plotting
nn = nntrain(nn, train_x, train_y, opts);
[er, bad] = nntest(nn, test_x, test_y);
assert(er < 0.1, 'Too big error');
%% ex6 neural net with sigmoid activation and plotting of validation and training error
% split training data into training and validation data
vx = train_x(1:10000,:);
tx = train_x(10001:end,:);
vy = train_y(1:10000,:);
ty = train_y(10001:end,:);
rand('state',0)
nn = nnsetup([784 20 10]);
nn.output = 'softmax'; % use softmax output
opts.numepochs = 5; % Number of full sweeps through data
opts.batchsize = 1000; % Take a mean gradient step over this many samples
opts.plot = 1; % enable plotting
nn = nntrain(nn, tx, ty, opts, vx, vy); % nntrain takes validation set as last two arguments (optionally)
[er, bad] = nntest(nn, test_x, test_y);
assert(er < 0.1, 'Too big error');
2、function nn = nnsetup(architecture)
%NNSETUP creates a Feedforward Backpropagate Neural Network
% nn = nnsetup(architecture) returns an neural network structure with n=numel(architecture)
% layers, architecture being a n x 1 vector of layer sizes e.g. [784 100 10]
nn.size = architecture;
nn.n = numel(nn.size);
nn.activation_function = 'tanh_opt'; % Activation functions of hidden layers: 'sigm' (sigmoid) or 'tanh_opt' (optimal tanh).
nn.learningRate = 2; % learning rate Note: typically needs to be lower when using 'sigm' activation function and non-normalized inputs.
nn.momentum = 0.5; % Momentum
nn.scaling_learningRate = 1; % Scaling factor for the learning rate (each epoch)
nn.weightPenaltyL2 = 0; % L2 regularization
nn.nonSparsityPenalty = 0; % Non sparsity penalty
nn.sparsityTarget = 0.05; % Sparsity target
nn.inputZeroMaskedFraction = 0; % Used for Denoising AutoEncoders
nn.dropoutFraction = 0; % Dropout level (http://www.cs.toronto.edu/~hinton/absps/dropout.pdf)
nn.testing = 0; % Internal variable. nntest sets this to one.
nn.output = 'sigm'; % output unit 'sigm' (=logistic), 'softmax' and 'linear'
for i = 2 : nn.n
% weights and weight momentum
nn.W{i - 1} = (rand(nn.size(i), nn.size(i - 1)+1) - 0.5) * 2 * 4 * sqrt(6 / (nn.size(i) + nn.size(i - 1)));
nn.vW{i - 1} = zeros(size(nn.W{i - 1}));
% average activations (for use with sparsity)
nn.p{i} = zeros(1, nn.size(i));
end
end
3、function [nn, L] = nntrain(nn, train_x, train_y, opts, val_x, val_y)
%NNTRAIN trains a neural net
% [nn, L] = nnff(nn, x, y, opts) trains the neural network nn with input x and
% output y for opts.numepochs epochs, with minibatches of size
% opts.batchsize. Returns a neural network nn with updated activations,
% errors, weights and biases, (nn.a, nn.e, nn.W, nn.b) and L, the sum
% squared error for each training minibatch.
assert(isfloat(train_x), 'train_x must be a float');
assert(nargin == 4 || nargin == 6,'number ofinput arguments must be 4 or 6')
loss.train.e = [];
loss.train.e_frac = [];
loss.val.e = [];
loss.val.e_frac = [];
opts.validation = 0;
if nargin == 6
opts.validation = 1;
end
fhandle = [];
if isfield(opts,'plot') && opts.plot == 1
fhandle = figure();
end
m = size(train_x, 1);
batchsize = opts.batchsize;
numepochs = opts.numepochs;
numbatches = m / batchsize;
assert(rem(numbatches, 1) == 0, 'numbatches must be a integer');
L = zeros(numepochs*numbatches,1);
n = 1;
for i = 1 : numepochs
tic;
kk = randperm(m);
for l = 1 : numbatches
batch_x = train_x(kk((l - 1) * batchsize + 1 : l * batchsize), :);
%Add noise to input (for use in denoising autoencoder)
if(nn.inputZeroMaskedFraction ~= 0)
batch_x = batch_x.*(rand(size(batch_x))>nn.inputZeroMaskedFraction);
end
batch_y = train_y(kk((l - 1) * batchsize + 1 : l * batchsize), :);
nn = nnff(nn, batch_x, batch_y);
nn = nnbp(nn);
nn = nnapplygrads(nn);
L(n) = nn.L;
n = n + 1;
end
t = toc;
if opts.validation == 1
loss = nneval(nn, loss, train_x, train_y, val_x, val_y);
str_perf = sprintf('; Full-batch train mse = %f, val mse = %f', loss.train.e(end), loss.val.e(end));
else
loss = nneval(nn, loss, train_x, train_y);
str_perf = sprintf('; Full-batch train err = %f', loss.train.e(end));
end
if ishandle(fhandle)
nnupdatefigures(nn, fhandle, loss, opts, i);
end
disp(['epoch ' num2str(i) '/' num2str(opts.numepochs) '. Took ' num2str(t) ' seconds' '. Mini-batch mean squared error on training set is ' num2str(mean(L((n-numbatches):(n-1)))) str_perf]);
nn.learningRate = nn.learningRate * nn.scaling_learningRate;
end
end
4、function nn = nnff(nn, x, y)
%NNFF performs a feedforward pass
% nn = nnff(nn, x, y) returns an neural network structure with updated
% layer activations, error and loss (nn.a, nn.e and nn.L)
n = nn.n;
m = size(x, 1);
x = [ones(m,1) x];
nn.a{1} = x;
%feedforward pass
for i = 2 : n-1
switch nn.activation_function
case 'sigm'
% Calculate the unit's outputs (including the bias term)
nn.a{i} = sigm(nn.a{i - 1} * nn.W{i - 1}');
case 'tanh_opt'
nn.a{i} = tanh_opt(nn.a{i - 1} * nn.W{i - 1}');
end
%dropout
if(nn.dropoutFraction > 0)
if(nn.testing)
nn.a{i} = nn.a{i}.*(1 - nn.dropoutFraction);
else
nn.dropOutMask{i} = (rand(size(nn.a{i}))>nn.dropoutFraction);
nn.a{i} = nn.a{i}.*nn.dropOutMask{i};
end
end
%calculate running exponential activations for use with sparsity
if(nn.nonSparsityPenalty>0)
nn.p{i} = 0.99 * nn.p{i} + 0.01 * mean(nn.a{i}, 1);
end
%Add the bias term
nn.a{i} = [ones(m,1) nn.a{i}];
end
switch nn.output
case 'sigm'
nn.a{n} = sigm(nn.a{n - 1} * nn.W{n - 1}');
case 'linear'
nn.a{n} = nn.a{n - 1} * nn.W{n - 1}';
case 'softmax'
nn.a{n} = nn.a{n - 1} * nn.W{n - 1}';
nn.a{n} = exp(bsxfun(@minus, nn.a{n}, max(nn.a{n},[],2)));
nn.a{n} = bsxfun(@rdivide, nn.a{n}, sum(nn.a{n}, 2));
end
%error and loss
nn.e = y - nn.a{n};
switch nn.output
case {'sigm', 'linear'}
nn.L = 1/2 * sum(sum(nn.e .^ 2)) / m;
case 'softmax'
nn.L = -sum(sum(y .* log(nn.a{n}))) / m;
end
end
5、function nn = nnbp(nn)
%NNBP performs backpropagation
% nn = nnbp(nn) returns an neural network structure with updated weights
n = nn.n;
sparsityError = 0;
switch nn.output
case 'sigm'
d{n} = - nn.e .* (nn.a{n} .* (1 - nn.a{n}));
case {'softmax','linear'}
d{n} = - nn.e;
end
for i = (n - 1) : -1 : 2
% Derivative of the activation function
switch nn.activation_function
case 'sigm'
d_act = nn.a{i} .* (1 - nn.a{i});
case 'tanh_opt'
d_act = 1.7159 * 2/3 * (1 - 1/(1.7159)^2 * nn.a{i}.^2);
end
if(nn.nonSparsityPenalty>0)
pi = repmat(nn.p{i}, size(nn.a{i}, 1), 1);
sparsityError = [zeros(size(nn.a{i},1),1) nn.nonSparsityPenalty * (-nn.sparsityTarget ./ pi + (1 - nn.sparsityTarget) ./ (1 - pi))];
end
% Backpropagate first derivatives
if i+1==n % in this case in d{n} there is not the bias term to be removed
d{i} = (d{i + 1} * nn.W{i} + sparsityError) .* d_act; % Bishop (5.56)
else % in this case in d{i} the bias term has to be removed
d{i} = (d{i + 1}(:,2:end) * nn.W{i} + sparsityError) .* d_act;
end
if(nn.dropoutFraction>0)
d{i} = d{i} .* [ones(size(d{i},1),1) nn.dropOutMask{i}];
end
end
for i = 1 : (n - 1)
if i+1==n
nn.dW{i} = (d{i + 1}' * nn.a{i}) / size(d{i + 1}, 1);
else
nn.dW{i} = (d{i + 1}(:,2:end)' * nn.a{i}) / size(d{i + 1}, 1);
end
end
end
6、function nn = nnapplygrads(nn)
%NNAPPLYGRADS updates weights and biases with calculated gradients
% nn = nnapplygrads(nn) returns an neural network structure with updated
% weights and biases
for i = 1 : (nn.n - 1)
if(nn.weightPenaltyL2>0)
dW = nn.dW{i} + nn.weightPenaltyL2 * [zeros(size(nn.W{i},1),1) nn.W{i}(:,2:end)];
else
dW = nn.dW{i};
end
dW = nn.learningRate * dW;
if(nn.momentum>0)
nn.vW{i} = nn.momentum*nn.vW{i} + dW;
dW = nn.vW{i};
end
nn.W{i} = nn.W{i} - dW;
end
end
7、function [loss] = nneval(nn, loss, train_x, train_y, val_x, val_y)
%NNEVAL evaluates performance of neural network
% Returns a updated loss struct
assert(nargin == 4 || nargin == 6, 'Wrong number of arguments');
nn.testing = 1;
% training performance
nn = nnff(nn, train_x, train_y);
loss.train.e(end + 1) = nn.L;
% validation performance
if nargin == 6
nn = nnff(nn, val_x, val_y);
loss.val.e(end + 1) = nn.L;
end
nn.testing = 0;
%calc misclassification rate if softmax
if strcmp(nn.output,'softmax')
[er_train, dummy] = nntest(nn, train_x, train_y);
loss.train.e_frac(end+1) = er_train;
if nargin == 6
[er_val, dummy] = nntest(nn, val_x, val_y);
loss.val.e_frac(end+1) = er_val;
end
end
end
8、function nnupdatefigures(nn,fhandle,L,opts,i)
%NNUPDATEFIGURES updates figures during training
if i > 1 %dont plot first point, its only a point
x_ax = 1:i;
% create legend
if opts.validation == 1
M = {'Training','Validation'};
else
M = {'Training'};
end
%create data for plots
if strcmp(nn.output,'softmax')
plot_x = x_ax';
plot_ye = L.train.e';
plot_yfrac = L.train.e_frac';
else
plot_x = x_ax';
plot_ye = L.train.e';
end
%add error on validation data if present
if opts.validation == 1
plot_x = [plot_x, x_ax'];
plot_ye = [plot_ye,L.val.e'];
end
%add classification error on validation data if present
if opts.validation == 1 && strcmp(nn.output,'softmax')
plot_yfrac = [plot_yfrac, L.val.e_frac'];
end
% plotting
figure(fhandle);
if strcmp(nn.output,'softmax') %also plot classification error
p1 = subplot(1,2,1);
plot(plot_x,plot_ye);
xlabel('Number of epochs'); ylabel('Error');title('Error');
title('Error')
legend(p1, M,'Location','NorthEast');
set(p1, 'Xlim',[0,opts.numepochs + 1])
p2 = subplot(1,2,2);
plot(plot_x,plot_yfrac);
xlabel('Number of epochs'); ylabel('Misclassification rate');
title('Misclassification rate')
legend(p2, M,'Location','NorthEast');
set(p2, 'Xlim',[0,opts.numepochs + 1])
else
p = plot(plot_x,plot_ye);
xlabel('Number of epochs'); ylabel('Error');title('Error');
legend(p, M,'Location','NorthEast');
set(gca, 'Xlim',[0,opts.numepochs + 1])
end
drawnow;
end
end
9、function [er, bad] = nntest(nn, x, y)
labels = nnpredict(nn, x);
[dummy, expected] = max(y,[],2);
bad = find(labels ~= expected);
er = numel(bad) / size(x, 1);
end
10、function nnchecknumgrad(nn, x, y)
epsilon = 1e-6;
er = 1e-7;
n = nn.n;
for l = 1 : (n - 1)
for i = 1 : size(nn.W{l}, 1)
for j = 1 : size(nn.W{l}, 2)
nn_m = nn; nn_p = nn;
nn_m.W{l}(i, j) = nn.W{l}(i, j) - epsilon;
nn_p.W{l}(i, j) = nn.W{l}(i, j) + epsilon;
rand('state',0)
nn_m = nnff(nn_m, x, y);
rand('state',0)
nn_p = nnff(nn_p, x, y);
dW = (nn_p.L - nn_m.L) / (2 * epsilon);
e = abs(dW - nn.dW{l}(i, j));
assert(e < er, 'numerical gradient checking failed');
end
end
end
end
五、参考文献
https://github.com/rasmusbergpalm/DeepLearnToolbox
注:该深度学习工具箱主要是针对于matlab而言的,属于源代码级别的,相对于研究生而言,逻辑清晰比较易懂;但是在实际的工程应用中,多使用Python编程语言,并且也有许多大公司出产的相对更加健全的平台,如tensorflow,theano等;因此本文只是用来结合论文,来理解其基本思想,基础入门研究之用