深度学习工具箱DeepLearnToolbox master

分析matlab里深度学习工具箱里的dbn的例子,对涉及到的函数进行简单注释分析,有助于程序理解,后续会应用在其他数据上进行注释。加了一些比较简单繁琐的注释。。。

以下函数为dbn例子,包括准备数据,设定dbn参数等

test_example_DBN调用了dbnsetup(构建DBN网络),dbntrain(训练DBN网络),dbnunfoldtonn(),nntrain,nntest。后面会逐个展示


%https://github.com/rasmusbergpalm/DeepLearnToolbox
function test_example_DBN
load mnist_uint8;

train_x = double(train_x) / 255;
test_x = double(test_x) / 255;
train_y = double(train_y);
test_y = double(test_y);

%% ex1 train a 100 hidden unit RBM and visualize its weights
%设定隐层单元数为100,而可视层单元数为输入向量的个数,由输入数据决定
rand(‘state’,0)
dbn.sizes = [100];
opts.numepochs = 1; %是计算时根据输出误差返回调整神经元权值和阀值的次数
%训练次数,用同样的样本集,别人训练的时候:1的时候11.41%error,5的时候4.2%error,10的时候2.73%error
opts.batchsize = 100; %每次挑出一个batchsize的batch来训练,也就是每用batchsize个样本就调整一次权值,而不是把所有样本都输入了,计算所有样本的误差了才调整一次权值
opts.momentum = 0;%动量
opts.alpha = 1; %学习率
<span style=“color:#ff0000;”>dbn = dbnsetup(dbn, train_x, opts); </span> %构建DBN网络,并返回
<span style=“color:#ff0000;”>dbn = dbntrain(dbn, train_x, opts);</span> %给定训练样本,训练网络
figure; visualize(dbn.rbm{1}.W’); % Visualize the RBM weights

%% ex2 train a 100-100 hidden unit DBN and use its weights to initialize a NN
rand(‘state’,0)
%train dbn
dbn.sizes = [100 100];
opts.numepochs = 1;
opts.batchsize = 100;
opts.momentum = 0;
opts.alpha = 1;
dbn = dbnsetup(dbn, train_x, opts);
dbn = dbntrain(dbn, train_x, opts);

%unfold dbn to nn
<span style=“color:#ff0000;”>nn = dbnunfoldtonn(dbn, 10);</span> %10为输出层节点数
nn.activation_function = ‘sigm’; %nnsetup底层里本身有激活函数的设定,
%但这里根据具体应用进行了改变

%train nn
opts.numepochs = 1;
opts.batchsize = 100;
%最后fine tuning就再训练一下NN就可以了
<span style=“color:#ff0000;”>nn = nntrain(nn, train_x, train_y, opts);</span>
%用测试样本测试
<span style=“color:#ff0000;”>[er, bad] = nntest(nn, test_x, test_y);
</span>
assert(er < 0.10, ‘Too big error’);
dbnsetup.m,主要给每个rbm赋初始值,没有调用其他函数

function dbn = dbnsetup(dbn, x, opts)  %构建dbn
n = size(x, 2);
%列的个数代表维度,也即输入特征的数量,即可视层单元个数
dbn.sizes = [n, dbn.sizes]; %分别为可视层大小和隐层大小

%numel(A)返回数组A中元素个数
<span class="hljs-keyword">for</span> u = <span class="hljs-number">1</span> : numel(dbn.sizes) - <span class="hljs-number">1</span>   %看有几个rbm?此时dbn.sizes=[<span class="hljs-number">784</span>,<span class="hljs-number">100</span>],numel(...)=<span class="hljs-number">2</span>,所以是一个rbm
%总体来说,dbn.sizes里的元素结果应该是【第一个rbm的可视层单元数即rbm1.v,rbm1.h,rbm2.h,rbm3.h,...】,
%总之后一个rbm可视层的单元数即上个个rbm隐含层的单元数,所以就省略不写了,所以整个rbm的个数也就确定了,
%即number(dbn.sizes)-<span class="hljs-number">1</span>,下面,分别为每个rbm赋参数
    dbn.rbm<span class="hljs-comment">{u}</span>.alpha    = opts.alpha;  %学习率
    dbn.rbm<span class="hljs-comment">{u}</span>.momentum = opts.momentum;  %动量

	
    dbn.rbm<span class="hljs-comment">{u}</span>.W  = zeros(dbn.sizes(u + <span class="hljs-number">1</span>), dbn.sizes(u));%权重个数(隐层节点数,可视层节点数)
    dbn.rbm<span class="hljs-comment">{u}</span>.vW = zeros(dbn.sizes(u + <span class="hljs-number">1</span>), dbn.sizes(u));%(隐层节点数,可视层节点数)

	%偏置
    dbn.rbm<span class="hljs-comment">{u}</span>.b  = zeros(dbn.sizes(u), <span class="hljs-number">1</span>);%可视层偏置,与可视层节点数对应
    dbn.rbm<span class="hljs-comment">{u}</span>.vb = zeros(dbn.sizes(u), <span class="hljs-number">1</span>);

    dbn.rbm<span class="hljs-comment">{u}</span>.c  = zeros(dbn.sizes(u + <span class="hljs-number">1</span>), <span class="hljs-number">1</span>);%隐含层偏置,与隐含层节点数对应,dbn.sizes(u + <span class="hljs-number">1</span>)为隐含层节点数
    dbn.rbm<span class="hljs-comment">{u}</span>.vc = zeros(dbn.sizes(u + <span class="hljs-number">1</span>), <span class="hljs-number">1</span>);
<span class="hljs-keyword">end</span>

enddbntrain.m,训练dbn,对每个rbm进行训练,调用了rbmtrain和rbmup函数

function dbn = dbntrain(dbn, x, opts)
    n = numel(dbn.rbm);  %看有几个rbm
&lt;span style=<span class="hljs-string">"color:#ff0000;"</span>&gt;dbn.rbm{<span class="hljs-number">1</span>} = rbmtrain(dbn.rbm{<span class="hljs-number">1</span>}, x, opts); &lt;/span&gt; %先对第一个rbm进行训练
%第一个参数是rbm的结构信息,第二个是训练数据,第三个是rbm训练信息
<span class="hljs-keyword">for</span> i = <span class="hljs-number">2</span> : n
    &lt;span style=<span class="hljs-string">"color:#ff0000;"</span>&gt;x = rbmup(dbn.rbm{i - <span class="hljs-number">1</span>}, x);&lt;/span&gt;%实现rbm间的连接,数据传递,前一个rbm的输出数据为后一个rbm的输入数据
    dbn.rbm{i} = rbmtrain(dbn.rbm{i}, x, opts); %接着训练新的rbm
end

end贴出被调用的两个函数

rbmtrain.m训练单个rbm,改变rbm参数,W,vW , c ,  vc , b , vb

function rbm = rbmtrain(rbm, x, opts)  %对单个rbm进行训练
    assert(isfloat(x), 'x must be a float');
    assert(all(x(:)>=0) && all(x(:)<=1), 'all data in x must be in [0:1]');  
    m = size(x, 1);  %样本数
    numbatches = m / opts.batchsize;%batchsize为调整一次权值所用的样本数,numbatches为所有样本都参与训练,需要调整几次权值
<span class="hljs-attr">assert</span>(<span class="hljs-attr">rem</span>(<span class="hljs-attr">numbatches</span>, <span class="hljs-attr">1</span>) == <span class="hljs-string">0,</span> '<span class="hljs-attr">numbatches</span> <span class="hljs-attr">not</span> <span class="hljs-attr">integer</span>'); %调整一次权值所用的样本数必须为整数

<span class="hljs-attr">for</span> <span class="hljs-attr">i</span> = <span class="hljs-string">1</span> <span class="hljs-attr">:</span> <span class="hljs-attr">opts.numepochs</span>%根据误差进行调整的次数
    <span class="hljs-attr">kk</span> = <span class="hljs-string">randperm(m);</span> %把<span class="hljs-attr">1</span>到<span class="hljs-attr">m</span>这些数随机打乱得到的一个数字序列。每次训练时,从其中拿出<span class="hljs-attr">batchsize</span>个样本,用来调整权值
    <span class="hljs-attr">err</span> = <span class="hljs-string">0;</span> %定义误差
    <span class="hljs-attr">for</span> <span class="hljs-attr">l</span> = <span class="hljs-string">1</span> <span class="hljs-attr">:</span> <span class="hljs-attr">numbatches</span> 
        <span class="hljs-attr">batch</span> = <span class="hljs-string">x(kk((l</span> <span class="hljs-attr">-</span> <span class="hljs-attr">1</span>) * <span class="hljs-attr">opts.batchsize</span> + <span class="hljs-attr">1</span> <span class="hljs-attr">:</span> <span class="hljs-attr">l</span> * <span class="hljs-attr">opts.batchsize</span>), <span class="hljs-attr">:</span>);%拿出指定数量的样本
        
        <span class="hljs-attr">v1</span> = <span class="hljs-string">batch;%输入节点,即可视层节点</span>
		%<span class="hljs-attr">repmat</span>是复制和平铺矩阵函数,也即是为每一个样本分配一个隐含层的偏置
        <span class="hljs-attr">h1</span> = <span class="hljs-string">sigmrnd(repmat(rbm.c</span>', <span class="hljs-attr">opts.batchsize</span>, <span class="hljs-attr">1</span>) + <span class="hljs-attr">v1</span> * <span class="hljs-attr">rbm.W</span>');
        <span class="hljs-attr">v2</span> = <span class="hljs-string">sigmrnd(repmat(rbm.b</span>', <span class="hljs-attr">opts.batchsize</span>, <span class="hljs-attr">1</span>) + <span class="hljs-attr">h1</span> * <span class="hljs-attr">rbm.W</span>);
        <span class="hljs-attr">h2</span> = <span class="hljs-string">sigm(repmat(rbm.c</span>', <span class="hljs-attr">opts.batchsize</span>, <span class="hljs-attr">1</span>) + <span class="hljs-attr">v2</span> * <span class="hljs-attr">rbm.W</span>');

        <span class="hljs-attr">c1</span> = <span class="hljs-string">h1</span>' * <span class="hljs-attr">v1</span>;
        <span class="hljs-attr">c2</span> = <span class="hljs-string">h2</span>' * <span class="hljs-attr">v2</span>;

        <span class="hljs-attr">rbm.vW</span> = <span class="hljs-string">rbm.momentum</span> * <span class="hljs-attr">rbm.vW</span> + <span class="hljs-attr">rbm.alpha</span> * (<span class="hljs-attr">c1</span> <span class="hljs-attr">-</span> <span class="hljs-attr">c2</span>)     / <span class="hljs-attr">opts.batchsize</span>;
        <span class="hljs-attr">rbm.vb</span> = <span class="hljs-string">rbm.momentum</span> * <span class="hljs-attr">rbm.vb</span> + <span class="hljs-attr">rbm.alpha</span> * <span class="hljs-attr">sum</span>(<span class="hljs-attr">v1</span> <span class="hljs-attr">-</span> <span class="hljs-attr">v2</span>)' / <span class="hljs-attr">opts.batchsize</span>;
        <span class="hljs-attr">rbm.vc</span> = <span class="hljs-string">rbm.momentum</span> * <span class="hljs-attr">rbm.vc</span> + <span class="hljs-attr">rbm.alpha</span> * <span class="hljs-attr">sum</span>(<span class="hljs-attr">h1</span> <span class="hljs-attr">-</span> <span class="hljs-attr">h2</span>)' / <span class="hljs-attr">opts.batchsize</span>;

        <span class="hljs-attr">rbm.W</span> = <span class="hljs-string">rbm.W</span> + <span class="hljs-attr">rbm.vW</span>;
        <span class="hljs-attr">rbm.b</span> = <span class="hljs-string">rbm.b</span> + <span class="hljs-attr">rbm.vb</span>;
        <span class="hljs-attr">rbm.c</span> = <span class="hljs-string">rbm.c</span> + <span class="hljs-attr">rbm.vc</span>;

        <span class="hljs-attr">err</span> = <span class="hljs-string">err</span> + <span class="hljs-attr">sum</span>(<span class="hljs-attr">sum</span>((<span class="hljs-attr">v1</span> <span class="hljs-attr">-</span> <span class="hljs-attr">v2</span>) <span class="hljs-attr">.</span>^ <span class="hljs-attr">2</span>)) / <span class="hljs-attr">opts.batchsize</span>;
    <span class="hljs-attr">end</span>
    
    <span class="hljs-attr">disp</span>(['<span class="hljs-attr">epoch</span> ' <span class="hljs-attr">num2str</span>(<span class="hljs-attr">i</span>) '/' <span class="hljs-attr">num2str</span>(<span class="hljs-attr">opts.numepochs</span>)  '<span class="hljs-attr">.</span> <span class="hljs-attr">Average</span> <span class="hljs-attr">reconstruction</span> <span class="hljs-attr">error</span> <span class="hljs-attr">is:</span> ' <span class="hljs-attr">num2str</span>(<span class="hljs-attr">err</span> / <span class="hljs-attr">numbatches</span>)]);
    
<span class="hljs-attr">end</span>

end

rbmup.m ,为训练下一个rbm做准备

function x = rbmup(rbm, x)%输入参数为上一个rbm和训练数据
    x = sigm(repmat(rbm.c', size(x, 1), 1) + x * rbm.W');
	%rbm.c'为隐含层偏置的转置,size(x.1)为样本个数,
	%repmat是复制和平铺矩阵函数,也即是为每一个样本分配一个隐含层的偏置
	%即Wx+c
	%也就是实现rbm之间的传递,后一个rbm的输入数据为前一个rbm的输出数据
end
bdntrain对dbn训练结束后,应用dbnunfoldtonn.m,用训练得到的权重初始化NN,调用了nnsetup函数

function nn = dbnunfoldtonn(dbn, outputsize)
%DBNUNFOLDTONN Unfolds a DBN to a NN
%   dbnunfoldtonn(dbn, outputsize ) returns the unfolded dbn with a final
%   layer of size outputsize added.
%   或者说初始化Weight,是一个unsupervised learning,最后的supervised还得靠NN 
    if(exist('outputsize','var'))
        size = [dbn.sizes outputsize];%把输出层节点数添加到存放各层神经元个数的变量里
    else
        size = [dbn.sizes];  
    end
    <span style="color:#ff0000;">nn = nnsetup(size); </span> %根据网络结果建立网络
	%把每一层展开后的Weight拿去初始化NNWeight  
    %注意dbn.rbm{i}.c拿去初始化了bias项的值
    for i = 1 : numel(dbn.rbm)  %1,2 
        nn.W{i} = [dbn.rbm{i}.c dbn.rbm{i}.W];
		%W1=[W1.c W1.W],W2=[W2.c W2.W],c与对应的隐层节点数对应,大小相同
		%c为(隐层节点数,1),W为(隐层节点数,可视层节点数),所以合起来的W1为(隐层节点数,可视层节点数+1)
		%也即c为隐含层各节点的偏置
    end
end
nnsetup.m,包含对参数初始化,W,vW

function nn = nnsetup(architecture)
%NNSETUP creates a Feedforward Backpropagate Neural Network
% nn = nnsetup(architecture) returns an neural network structure with n=numel(architecture)
% layers, architecture being a n x 1 vector of layer sizes e.g. [784 100 10]
%首先从传入的architecture中获得这个网络的整体结构,可以参照上面的样例调用nnsetup([784 100 10])加以理解  
    nn.size   = architecture;
    nn.n      = numel(nn.size);%nn.n表示这个网络有多少层,包括1个输入层,多个隐层,1个输出层,对于ex2,则为4层
    %接下来是一大堆的参数,这个到具体用的时候再加以说明
    nn.activation_function              = 'tanh_opt';   %  隐含层激活函数Activation functions of hidden layers: 'sigm' (sigmoid) or 'tanh_opt' (optimal tanh).
    nn.learningRate                     = 2;            %  learning rate Note: typically needs to be lower when using 'sigm' activation function and non-normalized inputs.
    nn.momentum                         = 0.5;          %  Momentum
    nn.scaling_learningRate             = 1;            %  Scaling factor for the learning rate (each epoch)
    nn.weightPenaltyL2                  = 0;            %  L2 regularization
    nn.nonSparsityPenalty               = 0;            %  Non sparsity penalty
    nn.sparsityTarget                   = 0.05;         %  Sparsity target
    nn.inputZeroMaskedFraction          = 0;            %  Used for Denoising AutoEncoders
    nn.dropoutFraction                  = 0;            %  Dropout level (http://www.cs.toronto.edu/~hinton/absps/dropout.pdf)
    nn.testing                          = 0;            %  Internal variable. nntest sets this to one.
    nn.output                           = 'sigm';       %  输出单元,是不是用换成‘linear’??output unit 'sigm' (=logistic), 'softmax' and 'linear'
	%对每一层的网络结构进行初始化,一共三个参数W,vW,p,其中W是主要的参数
	%vW是更新参数时的临时参数,p是所谓的sparsity,(等看到代码了再细讲)  
    for i = 2 : nn.n   %对于ex2,则从2,3,4
        % weights and weight momentum,加1加的是偏置???
		%W1,W2,W3    vW1,vW2,vW3
		%W1连接的是输入层和第一个隐层
		%W2连接的是第一个隐层和第二个隐层
		%W3连接的是第二个隐层和输出层
		%W1为(第一个隐层单元个数,输入层单元个数+1),后面一长串改变值是为了什么??
        nn.W{i - 1} = (rand(nn.size(i), nn.size(i - 1)+1) - 0.5) * 2 * 4 * sqrt(6 / (nn.size(i) + nn.size(i - 1)));
        nn.vW{i - 1} = zeros(size(nn.W{i - 1}));  %与W对应
    % average activations (<span class="hljs-keyword">for</span> use <span class="hljs-keyword">with</span> sparsity)
	%分别为p2,p3,p4,每个的大小相同,都为<span class="hljs-number">1</span>行<span class="hljs-number">4</span>列
    nn.p<span class="hljs-comment">{i}</span>     = zeros(<span class="hljs-number">1</span>, nn.size(i));   
<span class="hljs-keyword">end</span>

end接着是nntrain.m,训练NN,

function [nn, L]  = nntrain(nn, train_x, train_y, opts, val_x, val_y) 
%NNTRAIN trains a neural net
% [nn, L] = nnff(nn, x, y, opts) trains the neural network nn with input x and
% output y for opts.numepochs epochs, with minibatches of size
% opts.batchsize. Returns a neural network nn with updated activations,
% errors, weights and biases, (nn.a, nn.e, nn.W, nn.b) and L, the sum
% squared error for each training minibatch.

assert(isfloat(train_x), ‘train_x must be a float’);
assert(nargin == 4 || nargin == 6,‘number ofinput arguments must be 4 or 6’)

loss.train.e = [];%保存的是对训练数据进行前向传递,根据得到的网络输出值计算损失,并保存
%在nneval那里有改变,loss.train.e(end + 1) = nn.L;
loss.train.e_frac = [];%保存的是:对分类问题,用训练数据对网络进行测试,
%首先用网络预测得到预测分类,用预测分类与实际标签进行对比,保存错分样本的个数
%在nneval那里有改变,loss.train.e_frac(end+1) = er_train;
loss.val.e = [];%有关验证集
loss.val.e_frac = [];
opts.validation = 0;
if nargin == 6
opts.validation = 1; %6个参数则要进行验证
end

fhandle = [];
if isfield(opts,‘plot’) && opts.plot == 1
fhandle = figure();
end
%跳过那些检验传入数据是否正确的代码,直接到关键的部分
%denoising 的部分请参考论文:Extracting and Composing Robust Features with Denoising Autoencoders
m = size(train_x, 1); %m是训练样本的数量

%注意在调用的时候我们设置了opt,batchsize是做batch gradient时候的大小
batchsize = opts.batchsize;
numepochs = opts.numepochs;

numbatches = m / batchsize;%计算batch的数量

assert(rem(numbatches, 1) == 0, ‘numbatches must be a integer’);

L = zeros(numepochs*numbatches,1); %L用来存the sum squared error for each training minibatch.
n = 1; %n作为L的索引
%numepochs是循环的次数
for i = 1 : numepochs %记录一次训练所用的时间
tic;

kk = randperm(m);
%把batches打乱顺序进行训练,randperm(m)生成一个乱序的<span class="hljs-number">1</span>到m的数组
<span class="hljs-keyword">for</span> l = <span class="hljs-number">1</span> : numbatches
    batch_x = train_x(kk((l - <span class="hljs-number">1</span>) * batchsize + <span class="hljs-number">1</span> : l * batchsize), :);  %提取训练输入
    
    %Add noise <span class="hljs-keyword">to</span> input (<span class="hljs-keyword">for</span> use <span class="hljs-keyword">in</span> denoising autoencoder)
	%加入noise,这是denoising autoencoder需要使用到的部分
	%这部分请参见《Extracting <span class="hljs-keyword">and</span> Composing Robust Features <span class="hljs-keyword">with</span> Denoising Autoencoders》这篇论文
    %具体加入的方法就是把训练样例中的一些数据调整变为<span class="hljs-number">0</span>,inputZeroMaskedFraction表示了调整的比例
	<span class="hljs-keyword">if</span>(nn.inputZeroMaskedFraction ~= <span class="hljs-number">0</span>)  %之前给该参数设定值为<span class="hljs-number">0</span>,所以不会执行
        batch_x = batch_x.*(rand(size(batch_x))&gt;nn.inputZeroMaskedFraction);
		%(...&gt;...)的取值要么是<span class="hljs-number">1</span>,要么是<span class="hljs-number">0</span>,所以样本的取值要么不变,要么被置为<span class="hljs-number">0</span>,也即加入了噪音
    <span class="hljs-keyword">end</span>
    %这三个函数  
    %nnff是进行前向传播,nnbp是后向传播,nnapplygrads是进行梯度下降  
    %我们在下面分析这些函数的代码 
    batch_y = train_y(kk((l - <span class="hljs-number">1</span>) * batchsize + <span class="hljs-number">1</span> : l * batchsize), :);  %提取训练输出
    
    nn = &lt;span style=<span class="hljs-string">"color:#ff0000;"</span>&gt;nnff(nn, batch_x, batch_y);&lt;/span&gt;%通过各层前向传递得到网络的输出,并计算误差和损失
    nn = &lt;span style=<span class="hljs-string">"color:#ff0000;"</span>&gt;nnbp(nn);&lt;/span&gt;%计算从输入层到最后一个隐层的梯度dW
    nn = &lt;span style=<span class="hljs-string">"color:#ff0000;"</span>&gt;nnapplygrads(nn)&lt;/span&gt;;  %更新每层的权值和阈值
    
    L(n) = nn.L;  %记录损失
    
    n = n + <span class="hljs-number">1</span>;
<span class="hljs-keyword">end</span>  %用下一组batch继续进行训练

t = toc;

%训练结束后,用nneval,和训练数据,评价网络性能
<span class="hljs-keyword">if</span> opts.validation == <span class="hljs-number">1</span>   %如果参数为<span class="hljs-number">6</span>个的话
    loss = nneval(nn, loss, train_x, train_y, val_x, val_y);
    str_perf = sprintf(<span class="hljs-comment">'; Full-batch train mse = %f, val mse = %f', loss.train.e(end), loss.val.e(end));</span>
<span class="hljs-keyword">else</span>  
    &lt;span style=<span class="hljs-string">"color:#ff0000;"</span>&gt;loss = nneval(nn, loss, train_x, train_y);&lt;/span&gt;
	%在nneval函数里对网络进行评价,继续用训练数据,并得到错分的样本数和错分率,都存在了loss里,
	%对应修改了上面提到的四个变量loss.train.e,loss.train.e_frac  ,loss.val.e ,loss.val.e_frac
    str_perf = sprintf(<span class="hljs-comment">'; Full-batch train err = %f', loss.train.e(end));%所有batch的训练误差</span>
<span class="hljs-keyword">end</span>
<span class="hljs-keyword">if</span> ishandle(fhandle)
    nnupdatefigures(nn, fhandle, loss, opts, i);   %这个是画图
<span class="hljs-keyword">end</span>
    
%这个展示
disp([<span class="hljs-comment">'epoch ' num2str(i) '/' num2str(opts.numepochs) '. Took ' num2str(t) ' seconds' '. Mini-batch mean squared error on training set is ' num2str(mean(L((n-numbatches):(n-1)))) str_perf]);</span>
nn.learningRate = nn.learningRate * nn.scaling_learningRate;%更新学习率

end
end主要调用了nnff,nnbp,nnapplygrads,nneval

nnff.m
%nnff就是进行feedforward pass,其实非常简单,就是整个网络正向跑一次就可以了
%当然其中有dropout和sparsity的计算
%具体的参见论文“Improving Neural Networks with Dropout“和Autoencoders and Sparsity

function nn = nnff(nn, x, y)
%通过前向传递得到各层的输出,整个网络的误差和损失(nn.a, nn.e and nn.L)
%NNFF performs a feedforward pass
% nn = nnff(nn, x, y) returns an neural network structure with updated
% layer activations, error and loss (nn.a, nn.e and nn.L)
n = nn.n;%nn.n表示这个网络有多少层,包括<span class="hljs-number">1</span>个输入层,多个隐层,<span class="hljs-number">1</span>个输出层,对于ex2,则为<span class="hljs-number">4</span>层
m = size(x, <span class="hljs-number">1</span>);%样本个数

x = [ones(m,<span class="hljs-number">1</span>) x];%添加一列??,多了<span class="hljs-number">1</span>列??
nn.a{<span class="hljs-number">1</span>} = x;  %a里边放的是什么??放的是每层神经元的值?
%a1为输入层,直接为样本输入,为计算下层输入做准备

%feedforward pass
<span class="hljs-keyword">for</span> i = <span class="hljs-number">2</span> : n<span class="hljs-number">-1</span>   %计算中间层神经元的输出值,
	%根据选择的激活函数不同进行正向传播计算  
    %你可以回过头去看nnsetup里面的第一个参数activation_function  
    %sigm就是sigmoid函数,tanh_opt就是tanh的函数,这个toolbox好像有一点改变  
    %tanh_opt是<span class="hljs-number">1.7159</span>*tanh(<span class="hljs-number">2</span>/<span class="hljs-number">3.</span>*A) 
    switch nn.activation_function 
        <span class="hljs-keyword">case</span> <span class="hljs-comment">'sigm'</span>
            % Calculate the unit<span class="hljs-comment">'s outputs (including the bias term)</span>
			%计算神经元的输出,依据偏置项,在dbnunfoldtonn里设定了W1,W2,输入为上一层的输出,
            nn.a{i} = sigm(nn.a{i - <span class="hljs-number">1</span>} * nn.W{i - <span class="hljs-number">1</span>}<span class="hljs-comment">');</span>
			%a{i<span class="hljs-number">-1</span>}为(样本数,特征数+<span class="hljs-number">1</span>),W{i<span class="hljs-number">-1</span>}为(隐层节点数,输入层节点数+<span class="hljs-number">1</span>),
			%则W{i<span class="hljs-number">-1</span>}的转置为(输入层节点数+<span class="hljs-number">1</span>,隐层节点数),其实,特征数=输入层节点数
			%所以最后的a{i}为sigm(【样本数,隐层节点数】)的值,即该隐层的输出
        <span class="hljs-keyword">case</span> <span class="hljs-comment">'tanh_opt'</span>
            nn.a{i} = tanh_opt(nn.a{i - <span class="hljs-number">1</span>} * nn.W{i - <span class="hljs-number">1</span>}<span class="hljs-comment">');</span>
    <span class="hljs-keyword">end</span>
	
    %dropout的计算部分部分 dropoutFraction 是nnsetup中可以设置的一个参数
    %dropout
    <span class="hljs-keyword">if</span>(nn.dropoutFraction &gt; <span class="hljs-number">0</span>)%在nnsetup中设置了该参数为<span class="hljs-number">0</span>,所以这里跳过了
        <span class="hljs-keyword">if</span>(nn.testing)
            nn.a{i} = nn.a{i}.*(<span class="hljs-number">1</span> - nn.dropoutFraction);
        <span class="hljs-keyword">else</span>
            nn.dropOutMask{i} = (rand(size(nn.a{i}))&gt;nn.dropoutFraction);
            nn.a{i} = nn.a{i}.*nn.dropOutMask{i};
        <span class="hljs-keyword">end</span>
    <span class="hljs-keyword">end</span>
    
	%计算sparsity,nonSparsityPenalty 是对没达到sparsitytarget的参数的惩罚系数  
    %calculate running exponential activations <span class="hljs-keyword">for</span> use <span class="hljs-keyword">with</span> sparsity
    <span class="hljs-keyword">if</span>(nn.nonSparsityPenalty&gt;<span class="hljs-number">0</span>)%在nnsetup中也设置了该参数为<span class="hljs-number">0</span>,所以这里也跳过了
        nn.p{i} = <span class="hljs-number">0.99</span> * nn.p{i} + <span class="hljs-number">0.01</span> * mean(nn.a{i}, <span class="hljs-number">1</span>);   %所以p参数其实也没发挥作用
    <span class="hljs-keyword">end</span>
    
    %Add the bias term
    nn.a{i} = [ones(m,<span class="hljs-number">1</span>) nn.a{i}];  
	%上面计算出来a{i}为(样本数,隐层节点数),现在加上一列,对应每个偏置
<span class="hljs-keyword">end</span>
switch nn.output  %计算输出层的输出值,nn.output在nnsetup里进行了设定
    <span class="hljs-keyword">case</span> <span class="hljs-comment">'sigm'</span>
        nn.a{n} = sigm(nn.a{n - <span class="hljs-number">1</span>} * nn.W{n - <span class="hljs-number">1</span>}<span class="hljs-comment">');</span>
    <span class="hljs-keyword">case</span> <span class="hljs-comment">'linear'</span>
        nn.a{n} = nn.a{n - <span class="hljs-number">1</span>} * nn.W{n - <span class="hljs-number">1</span>}<span class="hljs-comment">';</span>
    <span class="hljs-keyword">case</span> <span class="hljs-comment">'softmax'</span>
        nn.a{n} = nn.a{n - <span class="hljs-number">1</span>} * nn.W{n - <span class="hljs-number">1</span>}<span class="hljs-comment">';</span>
        nn.a{n} = <span class="hljs-built_in">exp</span>(bsxfun(@minus, nn.a{n}, max(nn.a{n},[],<span class="hljs-number">2</span>)));
        nn.a{n} = bsxfun(@rdivide, nn.a{n}, sum(nn.a{n}, <span class="hljs-number">2</span>)); 
<span class="hljs-keyword">end</span>

%<span class="hljs-keyword">error</span> <span class="hljs-keyword">and</span> loss
nn.e = y - nn.a{n};
%计算<span class="hljs-keyword">error</span>

switch nn.output
    <span class="hljs-keyword">case</span> {<span class="hljs-comment">'sigm', 'linear'}  </span>
        nn.L = <span class="hljs-number">1</span>/<span class="hljs-number">2</span> * sum(sum(nn.e .^ <span class="hljs-number">2</span>)) / m;
    <span class="hljs-keyword">case</span> <span class="hljs-comment">'softmax'</span>
        nn.L = -sum(sum(y .* <span class="hljs-built_in">log</span>(nn.a{n}))) / m;
<span class="hljs-keyword">end</span>

endnnbp.m

%nnbp呢是进行back propagation的过程,过程还是比较中规中矩,和ufldl中的Neural Network讲的基本一致
%值得注意的还是dropout和sparsity的部分
function nn = nnbp(nn)
%NNBP performs backpropagation %执行后项传播
% nn = nnbp(nn) returns an neural network structure with updated weights 
<span class="hljs-title">n</span> = <span class="hljs-title">nn</span>.<span class="hljs-title">n</span>;</span>%nn.n表示这个网络有多少层,包括<span class="hljs-number">1</span>个输入层,多个隐层,<span class="hljs-number">1</span>个输出层,对于ex2,则为<span class="hljs-number">4</span>层
sparsityError = <span class="hljs-number">0</span>;
switch nn.output  %d<span class="hljs-comment">{i}</span>就是这一层的delta值
    <span class="hljs-keyword">case</span> <span class="hljs-string">'sigm'</span>
        d<span class="hljs-comment">{n}</span> = - nn.e .* (nn.a<span class="hljs-comment">{n}</span> .* (<span class="hljs-number">1</span> - nn.a<span class="hljs-comment">{n}</span>));%由误差和网络的输出计算得到
    <span class="hljs-keyword">case</span> <span class="hljs-comment">{'softmax','linear'}</span>
        d<span class="hljs-comment">{n}</span> = - nn.e;
<span class="hljs-keyword">end</span>
<span class="hljs-keyword">for</span> i = (n - <span class="hljs-number">1</span>) : -<span class="hljs-number">1</span> : <span class="hljs-number">2</span>  %n-<span class="hljs-number">1</span>为倒数第一个隐层,<span class="hljs-number">2</span>为第一个隐层
    % Derivative <span class="hljs-keyword">of</span> the activation <span class="hljs-function"><span class="hljs-keyword">function</span>  %激活函数的导数
    <span class="hljs-title">switch</span> <span class="hljs-title">nn</span>.<span class="hljs-title">activation_function</span> 
        <span class="hljs-title">case</span> '<span class="hljs-title">sigm</span>'
            <span class="hljs-title">d_act</span> = <span class="hljs-title">nn</span>.<span class="hljs-title">a</span><span class="hljs-comment">{i}</span> .* <span class="hljs-params">(1 - nn.a<span class="hljs-comment">{i}</span>)</span>;</span>%由每个隐层的输出得到
        <span class="hljs-keyword">case</span> <span class="hljs-string">'tanh_opt'</span>
            d_act = <span class="hljs-number">1.7159</span> * <span class="hljs-number">2</span>/<span class="hljs-number">3</span> * (<span class="hljs-number">1</span> - <span class="hljs-number">1</span>/(<span class="hljs-number">1.7159</span>)^<span class="hljs-number">2</span> * nn.a<span class="hljs-comment">{i}</span>.^<span class="hljs-number">2</span>);
    <span class="hljs-keyword">end</span>
    
    <span class="hljs-keyword">if</span>(nn.nonSparsityPenalty&gt;<span class="hljs-number">0</span>)  %该参数设为了<span class="hljs-number">0</span>
        pi = repmat(nn.p<span class="hljs-comment">{i}</span>, size(nn.a<span class="hljs-comment">{i}</span>, <span class="hljs-number">1</span>), <span class="hljs-number">1</span>);
        sparsityError = [zeros(size(nn.a<span class="hljs-comment">{i}</span>,<span class="hljs-number">1</span>),<span class="hljs-number">1</span>) nn.nonSparsityPenalty * (-nn.sparsityTarget ./ pi + (<span class="hljs-number">1</span> - nn.sparsityTarget) ./ (<span class="hljs-number">1</span> - pi))];
    <span class="hljs-keyword">end</span>
    
    % Backpropagate first derivatives  %反向传播一阶导数
    <span class="hljs-keyword">if</span> i+<span class="hljs-number">1</span>==n % <span class="hljs-keyword">in</span> this <span class="hljs-keyword">case</span> <span class="hljs-keyword">in</span> d<span class="hljs-comment">{n}</span> there <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> the bias term <span class="hljs-keyword">to</span> be removed  %则i+<span class="hljs-number">1</span>为输出层,本身没有偏置,不用移除偏置           
        d<span class="hljs-comment">{i}</span> = (d<span class="hljs-comment">{i + 1}</span> * nn.W<span class="hljs-comment">{i}</span> + sparsityError) .* d_act; % Bishop (<span class="hljs-number">5.56</span>)
    <span class="hljs-keyword">else</span> % <span class="hljs-keyword">in</span> this <span class="hljs-keyword">case</span> <span class="hljs-keyword">in</span> d<span class="hljs-comment">{i}</span> the bias term has <span class="hljs-keyword">to</span> be removed %移除偏置
        d<span class="hljs-comment">{i}</span> = (d<span class="hljs-comment">{i + 1}</span>(:,<span class="hljs-number">2</span>:<span class="hljs-keyword">end</span>) * nn.W<span class="hljs-comment">{i}</span> + sparsityError) .* d_act;%d<span class="hljs-comment">{i + 1}</span>的第一列为偏置,被移除了
    <span class="hljs-keyword">end</span>
    
    <span class="hljs-keyword">if</span>(nn.dropoutFraction&gt;<span class="hljs-number">0</span>) %该值被置为<span class="hljs-number">0</span>了
        d<span class="hljs-comment">{i}</span> = d<span class="hljs-comment">{i}</span> .* [ones(size(d<span class="hljs-comment">{i}</span>,<span class="hljs-number">1</span>),<span class="hljs-number">1</span>) nn.dropOutMask<span class="hljs-comment">{i}</span>];
    <span class="hljs-keyword">end</span>

<span class="hljs-keyword">end</span>

<span class="hljs-keyword">for</span> i = <span class="hljs-number">1</span> : (n - <span class="hljs-number">1</span>)  %从输入层到最后一个隐层,计算dW,dW<span class="hljs-comment">{i}</span>基本就是计算的gradient了
    <span class="hljs-keyword">if</span> i+<span class="hljs-number">1</span>==n
        nn.dW<span class="hljs-comment">{i}</span> = (d<span class="hljs-comment">{i + 1}</span><span class="hljs-string">' * nn.a{i}) / size(d{i + 1}, 1);%d{i + 1}为输出层,则不用移除偏置
    else
        nn.dW{i} = (d{i + 1}(:,2:end)'</span> * nn.a<span class="hljs-comment">{i}</span>) / size(d<span class="hljs-comment">{i + 1}</span>, <span class="hljs-number">1</span>); %存在偏置,要移除掉   
    <span class="hljs-keyword">end</span>
<span class="hljs-keyword">end</span>

end
%这只是实现的内容,代码中的d{i}就是这一层的delta值,在ufldl中有讲的
%dW{i}基本就是计算的gradient了,只是后面还要加入一些东西,进行一些修改

%具体原理参见论文“Improving Neural Networks with Dropout“ 以及 Autoencoders and Sparsity的内容nnapplygrads.m

function nn = nnapplygrads(nn)
%NNAPPLYGRADS updates weights and biases with calculated gradients
%用nnbp得到的梯度dW,更新权重和偏置,
% nn = nnapplygrads(nn) returns an neural network structure with updated
% weights and biases  %更新权重和偏置后,返回网络结构
<span class="hljs-keyword">for</span> i = <span class="hljs-number">1</span> : (nn.n - <span class="hljs-number">1</span>)  %更新每层的权值和阈值
    <span class="hljs-keyword">if</span>(nn.weightPenaltyL2&gt;<span class="hljs-number">0</span>)  %nnsetup中设定了该参数为<span class="hljs-number">0</span>
        dW = nn.dW{i} + nn.weightPenaltyL2 * [zeros(size(nn.W{i},<span class="hljs-number">1</span>),<span class="hljs-number">1</span>) nn.W{i}(:,<span class="hljs-number">2</span>:<span class="hljs-keyword">end</span>)];
    <span class="hljs-keyword">else</span>
        dW = nn.dW{i};
    <span class="hljs-keyword">end</span>
    
    dW = nn.learningRate * dW;
    
    <span class="hljs-keyword">if</span>(nn.momentum&gt;<span class="hljs-number">0</span>)
        nn.vW{i} = nn.momentum*nn.vW{i} + dW;
        dW = nn.vW{i};
    <span class="hljs-keyword">end</span>
        
    nn.W{i} = nn.W{i} - dW;
<span class="hljs-keyword">end</span>

end
%这个内容就简单了,nn.weightPenaltyL2 是weight decay的部分,也是nnsetup时可以设置的一个参数
%有的话就加入weight Penalty,防止过拟合,然后再根据momentum的大小调整一下,最后改变nn.W{i}即可nneval.m

function [loss] = nneval(nn, loss, train_x, train_y, val_x, val_y)
%NNEVAL evaluates performance of neural network  评价神经网络表现
% Returns a updated loss struct  %返回更新后的loss损失结构
assert(nargin == 4 || nargin == 6, 'Wrong number of arguments');

nn.testing = 1;
% training performance
nn = nnff(nn, train_x, train_y);
%通过各层前向传递得到网络的输出,并计算误差和损失(nn.a, nn.e and nn.L)
loss.train.e(end + 1) = nn.L;%追加到后边,L为一个数?nnff计算了nn.L

% validation performance
if nargin == 6
nn = nnff(nn, val_x, val_y);
loss.val.e(end + 1) = nn.L;
end
nn.testing = 0;
%calc misclassification rate if softmax
if strcmp(nn.output,‘softmax’) %如果相等为1,则执行
[er_train, dummy] = <span style=“color:#ff0000;”>nntest</span>(nn, train_x, train_y);%返回值第一个为错分样本的个数,第二个为错分率
loss.train.e_frac(end+1) = er_train; %追加到后边,在nntrain里有定义,追加错分样本个数

<span class="hljs-keyword">if</span> nargin == <span class="hljs-number">6</span>
    [er_val, dummy]             = nntest(nn, val_x, val_y);
    loss.val.e_frac(<span class="hljs-keyword">end</span>+<span class="hljs-number">1</span>)  = er_val;
<span class="hljs-keyword">end</span>

end

endnntest.m

function [er, bad] = nntest(nn, x, y)%返回值er为错分样本的个数,bad为错分率
    labels = <span style="color:#ff0000;">nnpredict</span>(nn, x);  %labels为针对分类问题的最后预测结果
    [dummy, expected] = max(y,[],2);
	%y有10列,max(y,[],2)返回的是每一行(即每个样本)中最大值dummy及所在的列expected,列号对应的是第几类
	%max(nn.a{end},[],2); 是返回每一行的最大值以及所在的列数,所以labels返回的就是标号啦
    bad = find(labels ~= expected);    %统计错误个数
    er = numel(bad) / size(x, 1); %计算错误率
end
%nntest再简单不过了,就是调用一下nnpredict,在和test的集合进行比较
nnpredict.m

function labels = nnpredict(nn, x)
    nn.testing = 1;  
    nn = nnff(nn, x, zeros(size(x,1), nn.size(end)));
	%通过前向传递得到各层的输出,整个网络的误差和损失(nn.a, nn.e and nn.L)
    nn.testing = 0;
[dummy, i] = max(nn.a{<span class="hljs-keyword">end</span>},[],<span class="hljs-number">2</span>);%a{<span class="hljs-keyword">end</span>}为输出层的结果
labels = i;

end
%继续非常简单,predict不过是nnff一次,得到最后的output~~
%max(nn.a{end},[],2); 是返回每一行的最大值以及所在的列数,所以labels返回的就是标号啦
%(这个test好像是专门用来test 分类问题的,我们知道nnff得到最后的值即可)我已经把自己绕晕了。。。

列个表,列出列出的函数

text_example_DBNdbnsetup   
 dbntrainrbmtrain  
  rbmup  
 dbnunflodtonnnnsetup  
 nntrainnnff  
  nnbp  
  nnapplygrads  
  nnevalnntestnnpredict
 nntest   
     





    • 2
      点赞
    • 13
      收藏
      觉得还不错? 一键收藏
    • 1
      评论
    评论 1
    添加红包

    请填写红包祝福语或标题

    红包个数最小为10个

    红包金额最低5元

    当前余额3.43前往充值 >
    需支付:10.00
    成就一亿技术人!
    领取后你会自动成为博主和红包主的粉丝 规则
    hope_wisdom
    发出的红包
    实付
    使用余额支付
    点击重新获取
    扫码支付
    钱包余额 0

    抵扣说明:

    1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
    2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

    余额充值