分析matlab里深度学习工具箱里的dbn的例子,对涉及到的函数进行简单注释分析,有助于程序理解,后续会应用在其他数据上进行注释。加了一些比较简单繁琐的注释。。。
以下函数为dbn例子,包括准备数据,设定dbn参数等
test_example_DBN调用了dbnsetup(构建DBN网络),dbntrain(训练DBN网络),dbnunfoldtonn(),nntrain,nntest。后面会逐个展示
-
%https://github.com/rasmusbergpalm/DeepLearnToolbox
-
function test_example_DBN
-
load mnist_uint8;
-
-
train_x = double(train_x) / 255;
-
test_x = double(test_x) / 255;
-
train_y = double(train_y);
-
test_y = double(test_y);
-
-
%% ex1 train a 100 hidden unit RBM and visualize its weights
-
%设定隐层单元数为100,而可视层单元数为输入向量的个数,由输入数据决定
-
rand('state',0)
-
dbn.sizes = [100];
-
opts.numepochs = 1; %是计算时根据输出误差返回调整神经元权值和阀值的次数
-
%训练次数,用同样的样本集,别人训练的时候:1的时候11.41%error,5的时候4.2%error,10的时候2.73%error
-
opts.batchsize = 100; %每次挑出一个batchsize的batch来训练,也就是每用batchsize个样本就调整一次权值,而不是把所有样本都输入了,计算所有样本的误差了才调整一次权值
-
opts.momentum = 0;%动量
-
opts.alpha = 1; %学习率
-
<span style="color:#ff0000;">dbn = dbnsetup(dbn, train_x, opts);
</span> %构建DBN网络,并返回
-
<span style="color:#ff0000;">dbn = dbntrain(dbn, train_x, opts);
</span> %给定训练样本,训练网络
-
figure; visualize(dbn.rbm{1}.W'); % Visualize the RBM weights
-
-
%% ex2 train a 100-100 hidden unit DBN and use its weights to initialize a NN
-
rand('state',0)
-
%train dbn
-
dbn.sizes = [100 100];
-
opts.numepochs = 1;
-
opts.batchsize = 100;
-
opts.momentum = 0;
-
opts.alpha = 1;
-
dbn = dbnsetup(dbn, train_x, opts);
-
dbn = dbntrain(dbn, train_x, opts);
-
-
%unfold dbn to nn
-
<span style="color:#ff0000;">nn = dbnunfoldtonn(dbn, 10);
</span> %10为输出层节点数
-
nn.activation_function = 'sigm'; %nnsetup底层里本身有激活函数的设定,
-
%但这里根据具体应用进行了改变
-
-
%train nn
-
opts.numepochs = 1;
-
opts.batchsize = 100;
-
%最后fine tuning就再训练一下NN就可以了
-
<span style="color:#ff0000;">nn = nntrain(nn, train_x, train_y, opts);
</span>
-
%用测试样本测试
-
<span style="color:#ff0000;">[er, bad] = nntest(nn, test_x, test_y);
-
</span>
-
assert(er
< 0.10, 'Too big error');
-
dbnsetup.m,主要给每个rbm赋初始值,没有调用其他函数
-
function dbn = dbnsetup(dbn, x, opts) %构建dbn
-
n
=
size
(x, 2)
; %列的个数代表维度,也即输入特征的数量,即可视层单元个数
-
dbn.sizes = [n, dbn.sizes]; %分别为可视层大小和隐层大小
-
-
%numel(A)返回数组A中元素个数
-
for u =
1 : numel(dbn.sizes) -
1 %看有几个rbm?此时dbn.sizes=[
784,
100],numel(...)=
2,所以是一个rbm
-
%总体来说,dbn.sizes里的元素结果应该是【第一个rbm的可视层单元数即rbm1.v,rbm1.h,rbm2.h,rbm3.h,...】,
-
%总之后一个rbm可视层的单元数即上个个rbm隐含层的单元数,所以就省略不写了,所以整个rbm的个数也就确定了,
-
%即number(dbn.sizes)-
1,下面,分别为每个rbm赋参数
-
dbn.rbm
{u}.alpha = opts.alpha; %学习率
-
dbn.rbm
{u}.momentum = opts.momentum; %动量
-
-
-
dbn.rbm
{u}.W = zeros(dbn.sizes(u +
1), dbn.sizes(u));%权重个数(隐层节点数,可视层节点数)
-
dbn.rbm
{u}.vW = zeros(dbn.sizes(u +
1), dbn.sizes(u));%(隐层节点数,可视层节点数)
-
-
%偏置
-
dbn.rbm
{u}.b = zeros(dbn.sizes(u),
1);%可视层偏置,与可视层节点数对应
-
dbn.rbm
{u}.vb = zeros(dbn.sizes(u),
1);
-
-
dbn.rbm
{u}.c = zeros(dbn.sizes(u +
1),
1);%隐含层偏置,与隐含层节点数对应,dbn.sizes(u +
1)为隐含层节点数
-
dbn.rbm
{u}.vc = zeros(dbn.sizes(u +
1),
1);
-
end
-
-
end
dbntrain.m,训练dbn,对每个rbm进行训练,调用了rbmtrain和rbmup函数
-
function dbn = dbntrain(dbn, x, opts)
-
n
=
numel
(dbn.rbm); %看有几个rbm
-
-
<span style=
"color:#ff0000;">dbn.rbm{
1} = rbmtrain(dbn.rbm{
1}, x, opts); </span> %先对第一个rbm进行训练
-
%第一个参数是rbm的结构信息,第二个是训练数据,第三个是rbm训练信息
-
for i =
2 : n
-
<span style=
"color:#ff0000;">x = rbmup(dbn.rbm{i -
1}, x);</span>%实现rbm间的连接,数据传递,前一个rbm的输出数据为后一个rbm的输入数据
-
dbn.rbm{i} = rbmtrain(dbn.rbm{i}, x, opts); %接着训练新的rbm
-
end
-
-
end
贴出被调用的两个函数
rbmtrain.m训练单个rbm,改变rbm参数,W,vW , c , vc , b , vb
-
function rbm = rbmtrain(rbm, x, opts) %对单个rbm进行训练
-
assert(isfloat(x), 'x must be a float');
-
assert(all(x(:)>=0) && all(x(:)
<=1), 'all data in x must be in [0:1]');
-
m
=
size(x,
1
); %样本数
-
numbatches
=
m
/
opts.batchsize
;%
batchsize
为调整一次权值所用的样本数,
numbatches
为所有样本都参与训练,需要调整几次权值
-
-
assert
(
rem
(
numbatches
,
1
) ==
0,
'
numbatches
not
integer
'); %调整一次权值所用的样本数必须为整数
-
-
for
i
=
1
:
opts.numepochs
%根据误差进行调整的次数
-
kk
=
randperm(m);
%把
1
到
m
这些数随机打乱得到的一个数字序列。每次训练时,从其中拿出
batchsize
个样本,用来调整权值
-
err
=
0;
%定义误差
-
for
l
=
1
:
numbatches
-
batch
=
x(kk((l
-
1
) *
opts.batchsize
+
1
:
l
*
opts.batchsize
),
:
);%拿出指定数量的样本
-
-
v1
=
batch;%输入节点,即可视层节点
-
%
repmat
是复制和平铺矩阵函数,也即是为每一个样本分配一个隐含层的偏置
-
h1
=
sigmrnd(repmat(rbm.c
',
opts.batchsize
,
1
) +
v1
*
rbm.W
');
-
v2
=
sigmrnd(repmat(rbm.b
',
opts.batchsize
,
1
) +
h1
*
rbm.W
);
-
h2
=
sigm(repmat(rbm.c
',
opts.batchsize
,
1
) +
v2
*
rbm.W
');
-
-
c1
=
h1
' *
v1
;
-
c2
=
h2
' *
v2
;
-
-
rbm.vW
=
rbm.momentum
*
rbm.vW
+
rbm.alpha
* (
c1
-
c2
) /
opts.batchsize
;
-
rbm.vb
=
rbm.momentum
*
rbm.vb
+
rbm.alpha
*
sum
(
v1
-
v2
)' /
opts.batchsize
;
-
rbm.vc
=
rbm.momentum
*
rbm.vc
+
rbm.alpha
*
sum
(
h1
-
h2
)' /
opts.batchsize
;
-
-
rbm.W
=
rbm.W
+
rbm.vW
;
-
rbm.b
=
rbm.b
+
rbm.vb
;
-
rbm.c
=
rbm.c
+
rbm.vc
;
-
-
err
=
err
+
sum
(
sum
((
v1
-
v2
)
.
^
2
)) /
opts.batchsize
;
-
end
-
-
disp
(['
epoch
'
num2str
(
i
) '/'
num2str
(
opts.numepochs
) '
.
Average
reconstruction
error
is:
'
num2str
(
err
/
numbatches
)]);
-
-
end
-
end
-
rbmup.m ,为训练下一个rbm做准备
-
function x = rbmup(rbm, x)%输入参数为上一个rbm和训练数据
-
x
=
sigm
(repmat(rbm.c', size(x, 1), 1) + x * rbm.W')
;
-
%rbm.c
'为隐含层偏置的转置,size(x.1)为样本个数,
-
%repmat是复制和平铺矩阵函数,也即是为每一个样本分配一个隐含层的偏置
-
%即Wx+c
-
%也就是实现rbm之间的传递,后一个rbm的输入数据为前一个rbm的输出数据
-
end
bdntrain对dbn训练结束后,应用dbnunfoldtonn.m,用训练得到的权重初始化NN,调用了nnsetup函数
-
function nn = dbnunfoldtonn(dbn, outputsize)
-
%
DBNUNFOLDTONN
Unfolds a
DBN to a
NN
-
% dbnunfoldtonn(dbn, outputsize ) returns the unfolded dbn with a
final
-
% layer of size outputsize added.
-
% 或者说初始化
Weight,是一个unsupervised learning,最后的supervised还得靠
NN
-
if(exist('outputsize','
var'))
-
size = [dbn.sizes outputsize];%把输出层节点数添加到存放各层神经元个数的变量里
-
else
-
size = [dbn.sizes];
-
end
-
<span style=
"color:#ff0000;">nn = nnsetup(size); </span> %根据网络结果建立网络
-
%把每一层展开后的
Weight拿去初始化
NN的
Weight
-
%注意dbn.rbm{i}.
c拿去初始化了bias项的值
-
for i =
1 : numel(dbn.rbm) %
1,
2
-
nn.
W{i} = [dbn.rbm{i}.
c dbn.rbm{i}.
W];
-
%
W1=[
W1.
c
W1.
W],
W2=[
W2.
c
W2.
W],
c与对应的隐层节点数对应,大小相同
-
%
c为(隐层节点数,
1),
W为(隐层节点数,可视层节点数),所以合起来的
W1为(隐层节点数,可视层节点数+
1)
-
%也即
c为隐含层各节点的偏置
-
end
-
end
nnsetup.m,包含对参数初始化,W,vW
-
function nn = nnsetup(architecture)
-
%
NNSETUP
creates
a
Feedforward
Backpropagate
Neural
Network
-
%
nn
=
nnsetup
(architecture)
returns
an
neural
network
structure
with
n
=
numel
(architecture)
-
%
layers
,
architecture
being
a
n
x
1
vector
of
layer
sizes
e
.
g
. [784 100 10]
-
%首先从传入的
architecture
中获得这个网络的整体结构,可以参照上面的样例调用
nnsetup
([784 100 10])
加以理解
-
nn
.
size
=
architecture
;
-
nn.n = numel(nn.size);%nn.n表示这个网络有多少层,包括
1个输入层,多个隐层,
1个输出层,对于ex2,则为
4层
-
%接下来是一大堆的参数,这个到具体用的时候再加以说明
-
nn.activation_function =
'tanh_opt'; % 隐含层激活函数Activation functions
of hidden layers:
'sigm' (sigmoid)
or
'tanh_opt' (optimal tanh).
-
nn.learningRate =
2; % learning rate Note: typically needs
to be lower when using
'sigm' activation
function and non-normalized inputs.
-
nn
.
momentum
= 0.5; % Momentum
-
nn.scaling_learningRate =
1; % Scaling factor
for the learning rate (each epoch)
-
nn.weightPenaltyL2 =
0; % L2 regularization
-
nn.nonSparsityPenalty =
0; % Non sparsity penalty
-
nn.sparsityTarget =
0.05; % Sparsity target
-
nn.inputZeroMaskedFraction =
0; % Used
for Denoising AutoEncoders
-
nn.dropoutFraction =
0; % Dropout level (http:
//www.cs.toronto.edu/~hinton/absps/dropout.pdf)
-
nn.testing =
0; % Internal variable. nntest sets this
to one.
-
nn.output =
'sigm'; % 输出单元,是不是用换成‘linear’??output
unit
'sigm' (=logistic),
'softmax'
and
'linear'
-
%对每一层的网络结构进行初始化,一共三个参数W,vW,p,其中W是主要的参数
-
%vW是更新参数时的临时参数,p是所谓的sparsity,(等看到代码了再细讲)
-
for i =
2 : nn.n %对于ex2,则从
2,
3,
4
-
% weights
and weight momentum,加
1加的是偏置???
-
%W1,W2,W3 vW1,vW2,vW3
-
%W1连接的是输入层和第一个隐层
-
%W2连接的是第一个隐层和第二个隐层
-
%W3连接的是第二个隐层和输出层
-
%W1为(第一个隐层单元个数,输入层单元个数+
1),后面一长串改变值是为了什么??
-
nn.W
{i - 1} = (rand(nn.size(i), nn.size(i -
1)+
1) -
0.5) *
2 *
4 * sqrt(
6 / (nn.size(i) + nn.size(i -
1)));
-
nn.vW
{i - 1} = zeros(size(nn.W
{i - 1})); %与W对应
-
-
% average activations (
for use
with sparsity)
-
%分别为p2,p3,p4,每个的大小相同,都为
1行
4列
-
nn.p
{i} = zeros(
1, nn.size(i));
-
end
-
end
接着是nntrain.m,训练NN,
-
function [nn, L] = nntrain(nn, train_x, train_y, opts, val_x, val_y)
-
%NNTRAIN trains a neural net
-
% [nn, L] = nnff(nn, x, y, opts) trains the neural network nn
with input x
and
-
% output y
for opts.numepochs epochs,
with minibatches
of size
-
% opts.batchsize. Returns a neural network nn
with updated activations,
-
% errors, weights
and biases, (nn.a, nn.e, nn.W, nn.b)
and L, the sum
-
% squared
error
for
each training minibatch.
-
-
assert(isfloat(train_x),
'train_x must be a float');
-
assert(nargin ==
4 || nargin ==
6,
'number ofinput arguments must be 4 or 6')
-
-
loss.train.e = [];%保存的是对训练数据进行前向传递,根据得到的网络输出值计算损失,并保存
-
%在nneval那里有改变,loss.train.e(
end +
1) = nn.L;
-
loss.train.e_frac = [];%保存的是:对分类问题,用训练数据对网络进行测试,
-
%首先用网络预测得到预测分类,用预测分类与实际标签进行对比,保存错分样本的个数
-
%在nneval那里有改变,loss.train.e_frac(
end+
1) = er_train;
-
loss.val.e = [];%有关验证集
-
loss.val.e_frac = [];
-
opts.validation =
0;
-
if nargin ==
6
-
opts.validation =
1; %
6个参数则要进行验证
-
end
-
-
fhandle = [];
-
if isfield(opts,
'plot') && opts.plot == 1
-
fhandle = figure();
-
end
-
%跳过那些检验传入数据是否正确的代码,直接到关键的部分
-
%denoising 的部分请参考论文:Extracting
and Composing Robust Features
with Denoising Autoencoders
-
m = size(train_x,
1); %m是训练样本的数量
-
-
%注意在调用的时候我们设置了opt,batchsize是做batch gradient时候的大小
-
batchsize = opts.batchsize;
-
numepochs = opts.numepochs;
-
-
numbatches = m / batchsize;%计算batch的数量
-
-
assert(
rem(numbatches,
1) ==
0,
'numbatches must be a integer');
-
-
L = zeros(numepochs*numbatches,
1); %L用来存the sum squared
error
for
each training minibatch.
-
n =
1; %n作为L的索引
-
%numepochs是循环的次数
-
for i =
1 : numepochs %记录一次训练所用的时间
-
tic;
-
-
kk = randperm(m);
-
%把batches打乱顺序进行训练,randperm(m)生成一个乱序的
1到m的数组
-
for l =
1 : numbatches
-
batch_x = train_x(kk((l -
1) * batchsize +
1 : l * batchsize), :); %提取训练输入
-
-
%Add noise
to input (
for use
in denoising autoencoder)
-
%加入noise,这是denoising autoencoder需要使用到的部分
-
%这部分请参见《Extracting
and Composing Robust Features
with Denoising Autoencoders》这篇论文
-
%具体加入的方法就是把训练样例中的一些数据调整变为
0,inputZeroMaskedFraction表示了调整的比例
-
if(nn.inputZeroMaskedFraction ~=
0) %之前给该参数设定值为
0,所以不会执行
-
batch_x = batch_x.
(rand(size(batch_x))>nn.inputZeroMaskedFraction);
-
%(...>...)的取值要么是
1,要么是
0,所以样本的取值要么不变,要么被置为
0,也即加入了噪音
-
end
-
%这三个函数
-
%nnff是进行前向传播,nnbp是后向传播,nnapplygrads是进行梯度下降
-
%我们在下面分析这些函数的代码
-
batch_y = train_y(kk((l -
1) batchsize +
1 : l * batchsize), :); %提取训练输出
-
-
nn = <span style=
"color:#ff0000;">nnff(nn, batch_x, batch_y);</span>%通过各层前向传递得到网络的输出,并计算误差和损失
-
nn = <span style=
"color:#ff0000;">nnbp(nn);</span>%计算从输入层到最后一个隐层的梯度dW
-
nn = <span style=
"color:#ff0000;">nnapplygrads(nn)</span>; %更新每层的权值和阈值
-
-
L(n) = nn.L; %记录损失
-
-
n = n +
1;
-
end %用下一组batch继续进行训练
-
-
t = toc;
-
-
%训练结束后,用nneval,和训练数据,评价网络性能
-
if opts.validation ==
1 %如果参数为
6个的话
-
loss = nneval(nn, loss, train_x, train_y, val_x, val_y);
-
str_perf = sprintf(
'; Full-batch train mse = %f, val mse = %f', loss.train.e(end), loss.val.e(end));
-
else
-
<span style=
"color:#ff0000;">loss = nneval(nn, loss, train_x, train_y);</span>
-
%在nneval函数里对网络进行评价,继续用训练数据,并得到错分的样本数和错分率,都存在了loss里,
-
%对应修改了上面提到的四个变量loss.train.e,loss.train.e_frac ,loss.val.e ,loss.val.e_frac
-
str_perf = sprintf(
'; Full-batch train err = %f', loss.train.e(end));%所有batch的训练误差
-
end
-
if ishandle(fhandle)
-
nnupdatefigures(nn, fhandle, loss, opts, i); %这个是画图
-
end
-
-
%这个展示
-
disp([
'epoch ' num2str(i) '/' num2str(opts.numepochs) '. Took ' num2str(t) ' seconds' '. Mini-batch mean squared error on training set is ' num2str(mean(L((n-numbatches):(n-1)))) str_perf]);
-
nn.learningRate = nn.learningRate * nn.scaling_learningRate;%更新学习率
-
end
-
end
主要调用了nnff,nnbp,nnapplygrads,nneval
nnff.m
%nnff就是进行feedforward pass,其实非常简单,就是整个网络正向跑一次就可以了
%当然其中有dropout和sparsity的计算
%具体的参见论文“Improving Neural Networks with Dropout“和Autoencoders and Sparsity
-
function nn = nnff(nn, x, y)
-
%通过前向传递得到各层的输出,整个网络的误差和损失(nn.a, nn.e
and nn.L)
-
%NNFF performs a feedforward pass
-
% nn = nnff(nn, x, y) returns an neural network structure
with updated
-
% layer activations,
error
and loss (nn.a, nn.e
and nn.L)
-
-
n = nn.n;%nn.n表示这个网络有多少层,包括
1个输入层,多个隐层,
1个输出层,对于ex2,则为
4层
-
m = size(x,
1);%样本个数
-
-
x = [ones(m,
1) x];%添加一列??,多了
1列??
-
nn.a{
1} = x; %a里边放的是什么??放的是每层神经元的值?
-
%a1为输入层,直接为样本输入,为计算下层输入做准备
-
-
%feedforward pass
-
for i =
2 : n
-1 %计算中间层神经元的输出值,
-
%根据选择的激活函数不同进行正向传播计算
-
%你可以回过头去看nnsetup里面的第一个参数activation_function
-
%sigm就是sigmoid函数,tanh_opt就是tanh的函数,这个toolbox好像有一点改变
-
%tanh_opt是
1.7159
tanh(2/3.*A)
-
switch nn.activation_function
-
case
'sigm'
-
% Calculate the unit
's outputs (including the bias term)
-
%计算神经元的输出,依据偏置项,在dbnunfoldtonn里设定了W1,W2,输入为上一层的输出,
-
nn.a{i} = sigm(nn.a{i -
1} nn.W{i -
1}
');
-
%a{i
-1}为(样本数,特征数+
1),W{i
-1}为(隐层节点数,输入层节点数+
1),
-
%则W{i
-1}的转置为(输入层节点数+
1,隐层节点数),其实,特征数=输入层节点数
-
%所以最后的a{i}为sigm(【样本数,隐层节点数】)的值,即该隐层的输出
-
case
'tanh_opt'
-
nn.a{i} = tanh_opt(nn.a{i -
1} * nn.W{i -
1}
');
-
end
-
-
%dropout的计算部分部分 dropoutFraction 是nnsetup中可以设置的一个参数
-
%dropout
-
if(nn.dropoutFraction >
0)%在nnsetup中设置了该参数为
0,所以这里跳过了
-
if(nn.testing)
-
nn.a{i} = nn.a{i}.
(1 - nn.dropoutFraction);
-
else
-
nn.dropOutMask{i} = (rand(size(nn.a{i}))>nn.dropoutFraction);
-
nn.a{i} = nn.a{i}.*nn.dropOutMask{i};
-
end
-
end
-
-
%计算sparsity,nonSparsityPenalty 是对没达到sparsitytarget的参数的惩罚系数
-
%calculate running exponential activations
for use
with sparsity
-
if(nn.nonSparsityPenalty>
0)%在nnsetup中也设置了该参数为
0,所以这里也跳过了
-
nn.p{i} =
0.99 nn.p{i} +
0.01 * mean(nn.a{i},
1); %所以p参数其实也没发挥作用
-
end
-
-
%Add the bias term
-
nn.a{i} = [ones(m,
1) nn.a{i}];
-
%上面计算出来a{i}为(样本数,隐层节点数),现在加上一列,对应每个偏置
-
end
-
switch nn.output %计算输出层的输出值,nn.output在nnsetup里进行了设定
-
case
'sigm'
-
nn.a{n} = sigm(nn.a{n -
1} * nn.W{n -
1}
');
-
case
'linear'
-
nn.a{n} = nn.a{n -
1} * nn.W{n -
1}
';
-
case
'softmax'
-
nn.a{n} = nn.a{n -
1} * nn.W{n -
1}
';
-
nn.a{n} =
exp(bsxfun(@minus, nn.a{n}, max(nn.a{n},[],
2)));
-
nn.a{n} = bsxfun(@rdivide, nn.a{n}, sum(nn.a{n},
2));
-
end
-
-
%
error
and loss
-
nn.e = y - nn.a{n};
-
%计算
error
-
-
switch nn.output
-
case {
'sigm', 'linear'}
-
nn.L =
1/
2 * sum(sum(nn.e .^
2)) / m;
-
case
'softmax'
-
nn.L = -sum(sum(y .*
log(nn.a{n}))) / m;
-
end
-
end
nnbp.m
-
%nnbp呢是进行back propagation的过程,过程还是比较中规中矩,和ufldl中的Neural Network讲的基本一致
-
%值得注意的还是dropout和sparsity的部分
-
function nn = nnbp(nn)
-
%
NNBP
performs
backpropagation
%执行后项传播
-
%
nn
=
nnbp
(nn)
returns
an
neural
network
structure
with
updated
weights
-
-
n
=
nn
.
n
;%nn.n表示这个网络有多少层,包括
1个输入层,多个隐层,
1个输出层,对于ex2,则为
4层
-
sparsityError =
0;
-
switch nn.output %d
{i}就是这一层的delta值
-
case
'sigm'
-
d
{n} = - nn.e .* (nn.a
{n} .* (
1 - nn.a
{n}));%由误差和网络的输出计算得到
-
case
{'softmax','linear'}
-
d
{n} = - nn.e;
-
end
-
for i = (n -
1) : -
1 :
2 %n-
1为倒数第一个隐层,
2为第一个隐层
-
% Derivative
of the activation
function %激活函数的导数
-
switch
nn
.
activation_function
-
case
'
sigm
'
-
d_act
=
nn
.
a
{i}
.*
(1 - nn.a{i})
;%由每个隐层的输出得到
-
case
'tanh_opt'
-
d_act =
1.7159 *
2/
3 * (
1 -
1/(
1.7159)^
2 * nn.a
{i}.^
2);
-
end
-
-
if(nn.nonSparsityPenalty>
0) %该参数设为了
0
-
pi = repmat(nn.p
{i}, size(nn.a
{i},
1),
1);
-
sparsityError = [zeros(size(nn.a
{i},
1),
1) nn.nonSparsityPenalty * (-nn.sparsityTarget ./ pi + (
1 - nn.sparsityTarget) ./ (
1 - pi))];
-
end
-
-
% Backpropagate first derivatives %反向传播一阶导数
-
if i+
1==n %
in this
case
in d
{n} there
is
not the bias term
to be removed %则i+
1为输出层,本身没有偏置,不用移除偏置
-
d
{i} = (d
{i + 1} * nn.W
{i} + sparsityError) .* d_act; % Bishop (
5.56)
-
else %
in this
case
in d
{i} the bias term has
to be removed %移除偏置
-
d
{i} = (d
{i + 1}(:,
2:
end) * nn.W
{i} + sparsityError) .* d_act;%d
{i + 1}的第一列为偏置,被移除了
-
end
-
-
if(nn.dropoutFraction>
0) %该值被置为
0了
-
d
{i} = d
{i} .* [ones(size(d
{i},
1),
1) nn.dropOutMask
{i}];
-
end
-
-
end
-
-
for i =
1 : (n -
1) %从输入层到最后一个隐层,计算dW,dW
{i}基本就是计算的gradient了
-
if i+
1==n
-
nn.dW
{i} = (d
{i + 1}
' * nn.a{i}) / size(d{i + 1}, 1);%d{i + 1}为输出层,则不用移除偏置
-
else
-
nn.dW{i} = (d{i + 1}(:,2:end)' * nn.a
{i}) / size(d
{i + 1},
1); %存在偏置,要移除掉
-
end
-
end
-
end
-
%这只是实现的内容,代码中的d
{i}就是这一层的delta值,在ufldl中有讲的
-
%dW
{i}基本就是计算的gradient了,只是后面还要加入一些东西,进行一些修改
-
-
%具体原理参见论文“Improving Neural Networks
with Dropout“ 以及 Autoencoders
and Sparsity的内容
nnapplygrads.m
-
function nn = nnapplygrads(nn)
-
%NNAPPLYGRADS updates weights
and biases
with calculated gradients
-
%用nnbp得到的梯度dW,更新权重和偏置,
-
% nn = nnapplygrads(nn) returns an neural network structure
with updated
-
% weights
and biases %更新权重和偏置后,返回网络结构
-
-
for i =
1 : (nn.n -
1) %更新每层的权值和阈值
-
if(nn.weightPenaltyL2>
0) %nnsetup中设定了该参数为
0
-
dW = nn.dW{i} + nn.weightPenaltyL2 * [zeros(size(nn.W{i},
1),
1) nn.W{i}(:,
2:
end)];
-
else
-
dW = nn.dW{i};
-
end
-
-
dW = nn.learningRate * dW;
-
-
if(nn.momentum>
0)
-
nn.vW{i} = nn.momentum*nn.vW{i} + dW;
-
dW = nn.vW{i};
-
end
-
-
nn.W{i} = nn.W{i} - dW;
-
end
-
end
-
%这个内容就简单了,nn.weightPenaltyL2 是weight decay的部分,也是nnsetup时可以设置的一个参数
-
%有的话就加入weight Penalty,防止过拟合,然后再根据momentum的大小调整一下,最后改变nn.W{i}即可
nneval.m
-
function [loss] = nneval(nn, loss, train_x, train_y, val_x, val_y)
-
%NNEVAL evaluates performance
of neural network 评价神经网络表现
-
% Returns a updated loss struct %返回更新后的loss损失结构
-
assert(nargin ==
4 || nargin ==
6,
'Wrong number of arguments');
-
-
nn.testing =
1;
-
% training performance
-
nn = nnff(nn, train_x, train_y);
-
%通过各层前向传递得到网络的输出,并计算误差和损失(nn.a, nn.e
and nn.L)
-
loss.train.e(
end +
1) = nn.L;%追加到后边,L为一个数?nnff计算了nn.L
-
-
% validation performance
-
if nargin ==
6
-
nn = nnff(nn, val_x, val_y);
-
loss.val.e(
end +
1) = nn.L;
-
end
-
nn.testing =
0;
-
%calc misclassification rate
if softmax
-
if strcmp(nn.output,
'softmax') %如果相等为1,则执行
-
[er_train, dummy] = <span style=
"color:#ff0000;">nntest</span>(nn, train_x, train_y);%返回值第一个为错分样本的个数,第二个为错分率
-
loss.train.e_frac(
end+
1) = er_train; %追加到后边,在nntrain里有定义,追加错分样本个数
-
-
if nargin ==
6
-
[er_val, dummy] = nntest(nn, val_x, val_y);
-
loss.val.e_frac(
end+
1) = er_val;
-
end
-
end
-
-
end
nntest.m
-
function [er, bad] = nntest(nn, x, y)%返回值er为错分样本的个数,bad为错分率
-
labels = <span style=
"color:#ff0000;">nnpredict</span>(nn, x); %labels为针对分类问题的最后预测结果
-
[dummy, expected] =
max(y,[],
2);
-
%y有
10列,
max(y,[],
2)返回的是每一行(即每个样本)中最大值dummy及所在的列expected,列号对应的是第几类
-
%
max(nn.a{end},[],
2); 是返回每一行的最大值以及所在的列数,所以labels返回的就是标号啦
-
bad =
find(labels ~= expected); %统计错误个数
-
er = numel(bad) / size(x,
1); %计算错误率
-
end
-
%nntest再简单不过了,就是调用一下nnpredict,在和test的集合进行比较
nnpredict.m
-
function labels = nnpredict(nn, x)
-
nn.testing =
1;
-
nn = nnff(nn, x, zeros(size(x,
1), nn.size(
end)));
-
%通过前向传递得到各层的输出,整个网络的误差和损失(nn.a, nn.e
and nn.L)
-
nn.testing =
0;
-
-
[dummy, i] = max(nn.a{
end},[],
2);%a{
end}为输出层的结果
-
labels = i;
-
end
-
%继续非常简单,predict不过是nnff一次,得到最后的output~~
-
%max(nn.a{
end},[],
2); 是返回每一行的最大值以及所在的列数,所以labels返回的就是标号啦
-
%(这个test好像是专门用来test 分类问题的,我们知道nnff得到最后的值即可)
我已经把自己绕晕了。。。
列个表,列出列出的函数
text_example_DBN | dbnsetup | |||
dbntrain | rbmtrain | |||
rbmup | ||||
dbnunflodtonn | nnsetup | |||
nntrain | nnff | |||
nnbp | ||||
nnapplygrads | ||||
nneval | nntest | nnpredict | ||
nntest | ||||