转载:https://blog.csdn.net/dark_scope/article/details/9421061
==========================================================================================
最近一直在看Deep Learning,各类博客、论文看得不少
但是说实话,这样做有些疏于实现,一来呢自己的电脑也不是很好,二来呢我目前也没能力自己去写一个toolbox
只是跟着Andrew Ng的UFLDL tutorial 写了些已有框架的代码(这部分的代码见github)
后来发现了一个matlab的Deep Learning的toolbox,发现其代码很简单,感觉比较适合用来学习算法
再一个就是matlab的实现可以省略掉很多数据结构的代码,使算法思路非常清晰
所以我想在解读这个toolbox的代码的同时来巩固自己学到的,同时也为下一步的实践打好基础
(本文只是从代码的角度解读算法,具体的算法理论步骤还是需要去看paper的
我会在文中给出一些相关的paper的名字,本文旨在梳理一下算法过程,不会深究算法原理和公式)
==========================================================================================
使用的代码:DeepLearnToolbox ,下载地址:点击打开,感谢该toolbox的作者
==========================================================================================
第一章从分析NN(neural network)开始,因为这是整个deep learning的大框架,参见UFLDL
==========================================================================================
首先看一下\tests\test_example_NN.m ,跳过对数据进行normalize的部分,最关键的就是:
(为了注释显示有颜色,我把matlab代码中的%都改成了//)
-
nn = nnsetup([
784
100
10]);
-
opts.numepochs =
1;
// Number of full sweeps through data
-
opts.batchsize =
100;
// Take a mean gradient step over this many samples
-
[nn, L] = nntrain(nn, train_x, train_y, opts);
-
[er, bad] = nntest(nn, test_x, test_y);
很简单的几步就训练了一个NN,我们发现其中最重要的几个函数就是nnsetup,nntrain和nntest了
那么我们分别来分析着几个函数,\NN\nnsetup.m
nnsetup
-
function nn = nnsetup(architecture)
-
//首先从传入的architecture中获得这个网络的整体结构,nn.n表示这个网络有多少层,可以参照上面的样例调用nnsetup([784 100 10])加以理解
-
-
nn.size = architecture;
-
nn.n = numel(nn.size);
-
//接下来是一大堆的参数,这个我们到具体用的时候再加以说明
-
nn.activation_function =
'tanh_opt';
// Activation functions of hidden layers: 'sigm' (sigmoid) or 'tanh_opt' (optimal tanh).
-
nn.learningRate =
2;
// learning rate Note: typically needs to be lower when using 'sigm' activation function and non-normalized inputs.
-
nn.momentum =
0.5;
// Momentum
-
nn.scaling_learningRate =
1;
// Scaling factor for the learning rate (each epoch)
-
nn.weightPenaltyL2 =
0;
// L2 regularization
-
nn.nonSparsityPenalty =
0;
// Non sparsity penalty
-
nn.sparsityTarget =
0.05;
// Sparsity target
-
nn.inputZeroMaskedFraction =
0;
// Used for Denoising AutoEncoders
-
nn.dropoutFraction =
0;
// Dropout level (http://www.cs.toronto.edu/~hinton/absps/dropout.pdf)
-
nn.testing =
0;
// Internal variable. nntest sets this to one.
-
nn.output =
'sigm';
// output unit 'sigm' (=logistic), 'softmax' and 'linear'
-
//对每一层的网络结构进行初始化,一共三个参数W,vW,p,其中W是主要的参数
-
//vW是更新参数时的临时参数,p是所谓的sparsity,(等看到代码了再细讲)
-
for i =
2 : nn.n
-
// weights and weight momentum
-
nn.W{i -
1} = (rand(nn.size(i), nn.size(i -
1)+
1) -
0.5) *
2 *
4 *
sqrt(
6 / (nn.size(i) + nn.size(i -
1)));
-
nn.vW{i -
1} = zeros(size(nn.W{i -
1}));
-
-
// average activations (for use with sparsity)
-
nn.p{i} = zeros(
1, nn.size(i));
-
end
-
end
nntrain
setup大概就这样一个过程,下面就到了train了,打开\NN\nntrain.m
我们跳过那些检验传入数据是否正确的代码,直接到关键的部分
denoising 的部分请参考论文:Extracting and Composing Robust Features with Denoising Autoencoders
-
m = size(train_x,
1);
-
//m是训练样本的数量
-
//注意在调用的时候我们设置了opt,batchsize是做batch gradient时候的大小
-
batchsize = opts.batchsize; numepochs = opts.numepochs;
-
numbatches = m / batchsize;
//计算batch的数量
-
assert(rem(numbatches,
1) ==
0,
'numbatches must be a integer');
-
L = zeros(numepochs*numbatches,
1);
-
n =
1;
-
//numepochs是循环的次数
-
for i =
1 : numepochs
-
tic;
-
kk = randperm(m);
-
//把batches打乱顺序进行训练,randperm(m)生成一个乱序的1到m的数组
-
for l =
1 : numbatches
-
batch_x = train_x(kk((l -
1) * batchsize +
1 : l * batchsize), :);
-
//Add noise to input (for use in denoising autoencoder)
-
//加入noise,这是denoising autoencoder需要使用到的部分
-
//这部分请参见《Extracting and Composing Robust Features with Denoising Autoencoders》这篇论文
-
//具体加入的方法就是把训练样例中的一些数据调整变为0,inputZeroMaskedFraction表示了调整的比例
-
if(nn.inputZeroMaskedFraction ~=
0)
-
batch_x = batch_x.*(rand(size(batch_x))>nn.inputZeroMaskedFraction);
-
end
-
batch_y = train_y(kk((l -
1) * batchsize +
1 : l * batchsize), :);
-
//这三个函数
-
//nnff是进行前向传播,nnbp是后向传播,nnapplygrads是进行梯度下降
-
//我们在下面分析这些函数的代码
-
nn = nnff(nn, batch_x, batch_y);
-
nn = nnbp(nn);
-
nn = nnapplygrads(nn);
-
L(n) = nn.L;
-
n = n +
1;
-
end
-
-
t = toc;
-
if ishandle(fhandle)
-
if opts.validation ==
1
-
loss = nneval(nn, loss, train_x, train_y, val_x, val_y);
-
else
-
loss = nneval(nn, loss, train_x, train_y);
-
end
-
nnupdatefigures
(nn, fhandle, loss, opts, i);
-
end
-
-
disp
(['epoch ' num2str(i) '/'
num2str
(opts.numepochs) '. Took '
num2str
(t) ' seconds' '. Mean squared error on training
set is '
num2str
(mean(L((n-numbatches):
(n-1))))]);
-
nn.learningRate = nn.learningRate * nn.scaling_learningRate;
-
end
下面分析三个函数nnff,nnbp和nnapplygrads
nnff
nnff就是进行feedforward pass,其实非常简单,就是整个网络正向跑一次就可以了
当然其中有dropout和sparsity的计算
具体的参见论文“Improving Neural Networks with Dropout“和Autoencoders and Sparsity
-
function nn = nnff(nn, x, y)
-
//NNFF performs a feedforward pass
-
// nn = nnff(nn, x, y) returns an neural network structure with updated
-
// layer activations, error and loss (nn.a, nn.e and nn.L)
-
-
n = nn.n;
-
m = size(x,
1);
-
-
x = [ones(m,
1) x];
-
nn.a{
1} = x;
-
-
//feedforward pass
-
for i =
2 : n
-1
-
//根据选择的激活函数不同进行正向传播计算
-
//你可以回过头去看nnsetup里面的第一个参数activation_function
-
//sigm就是sigmoid函数,tanh_opt就是tanh的函数,这个toolbox好像有一点改变
-
//tanh_opt是1.7159*tanh(2/3.*A)
-
switch nn.activation_function
-
case
'sigm'
-
// Calculate the unit's outputs (including the bias term)
-
nn.a{i} = sigm(nn.a{i -
1} * nn.W{i -
1}
');
-
case 'tanh_opt'
-
nn.a{i} = tanh_opt(nn.a{i -
1} * nn.W{i -
1}
');
-
end
-
-
//dropout的计算部分部分 dropoutFraction 是nnsetup中可以设置的一个参数
-
if(nn.dropoutFraction > 0)
-
if(nn.testing)
-
nn.a{i} = nn.a{i}.*(1 - nn.dropoutFraction);
-
else
-
nn.dropOutMask{i} = (rand(size(nn.a{i}))>nn.dropoutFraction);
-
nn.a{i} = nn.a{i}.*nn.dropOutMask{i};
-
end
-
end
-
//计算sparsity,nonSparsityPenalty 是对没达到sparsitytarget的参数的惩罚系数
-
//calculate running exponential activations for use with sparsity
-
if(nn.nonSparsityPenalty>0)
-
nn.p{i} = 0.99 * nn.p{i} + 0.01 * mean(nn.a{i}, 1);
-
end
-
-
//Add the bias term
-
nn.a{i} = [ones(m,1) nn.a{i}];
-
end
-
switch nn.output
-
case 'sigm'
-
nn.a{n} = sigm(nn.a{n -
1} * nn.W{n -
1}
');
-
case 'linear'
-
nn.a{n} = nn.a{n -
1} * nn.W{n -
1}
';
-
case 'softmax'
-
nn.a{n} = nn.a{n -
1} * nn.W{n -
1}
';
-
nn.a{n} = exp(bsxfun(@minus, nn.a{n}, max(nn.a{n},[],2)));
-
nn.a{n} = bsxfun(@rdivide, nn.a{n}, sum(nn.a{n}, 2));
-
end
-
//error and loss
-
//计算error
-
nn.e = y - nn.a{n};
-
-
switch nn.output
-
case {'sigm',
'linear'}
-
nn.L =
1/
2 * sum(sum(nn.e .^
2)) / m;
-
case
'softmax'
-
nn.L = -sum(sum(y .*
log(nn.a{n}))) / m;
-
end
-
end
nnbp
代码:\NN\nnbp.m
nnbp呢是进行back propagation的过程,过程还是比较中规中矩,和ufldl中的Neural Network讲的基本一致
值得注意的还是dropout和sparsity的部分
-
if(nn.nonSparsityPenalty>
0)
-
pi = repmat(nn.p{i}, size(nn.a{i},
1),
1);
-
sparsityError = [zeros(size(nn.a{i},
1),
1) nn.nonSparsityPenalty * (-nn.sparsityTarget ./ pi + (
1 - nn.sparsityTarget) ./ (
1 - pi))];
-
end
-
-
// Backpropagate first derivatives
-
if i+
1==n % in
this
case in d{n} there is
not the bias term to be removed
-
d{i} = (d{i +
1} * nn.W{i} + sparsityError) .* d_act;
// Bishop (5.56)
-
else
// in this case in d{i} the bias term has to be removed
-
d{i} = (d{i +
1}(:,
2:end) * nn.W{i} + sparsityError) .* d_act;
-
end
-
-
if
(nn.dropoutFraction>0)
-
d{i} = d{i} .* [ones(size(d{i},
1),
1) nn.dropOutMask{i}];
-
end
这只是实现的内容,代码中的d{i}就是这一层的delta值,在ufldl中有讲的
dW{i}基本就是计算的gradient了,只是后面还要加入一些东西,进行一些修改
具体原理参见论文“Improving Neural Networks with Dropout“ 以及 Autoencoders and Sparsity的内容
nnapplygrads
代码文件:\NN\nnapplygrads.m
-
for i =
1 : (nn.n -
1)
-
if(nn.weightPenaltyL2>
0)
-
dW = nn.dW{i} + nn.weightPenaltyL2 * nn.W{i};
-
else
-
dW = nn.dW{i};
-
end
-
-
dW = nn.learningRate * dW;
-
-
if(nn.momentum>
0)
-
nn.vW{i} = nn.momentum*nn.vW{i} + dW;
-
dW = nn.vW{i};
-
end
-
-
nn.W{i} = nn.W{i} - dW;
-
end
这个内容就简单了,nn.weightPenaltyL2 是weight decay的部分,也是nnsetup时可以设置的一个参数
有的话就加入weight Penalty,防止过拟合,然后再根据momentum的大小调整一下,最后改变nn.W{i}即可
nntest
nntest再简单不过了,就是调用一下nnpredict,在和test的集合进行比较
-
function [er, bad] = nntest(nn, x, y)
-
labels = nnpredict(nn, x);
-
[~, expected] = max(y,[],
2);
-
bad = find(labels ~= expected);
-
er = numel(bad) / size(x,
1);
-
end
nnpredict
代码文件:\NN\nnpredict.m
-
function labels = nnpredict(nn, x)
-
nn.testing =
1;
-
nn = nnff(nn, x, zeros(size(x,
1), nn.size(end)));
-
nn.testing =
0;
-
-
[~, i] = max(nn.a{end},[],
2);
-
labels = i;
-
end
继续非常简单,predict不过是nnff一次,得到最后的output~~
max(nn.a{end},[],2); 是返回每一行的最大值以及所在的列数,所以labels返回的就是标号啦
(这个test好像是专门用来test 分类问题的,我们知道nnff得到最后的值即可)