最近在看DBN(深度信念网络)的相关知识以及代码,代码是Hition 的文章reducing the dimensionality of data with neural networks的代码部分。利用一点间歇时间把自己的学习体会总结一下,也不是太全。也结合了其他博主的文章,写一点自己的认识。也是刚刚学习不久,也会有些问题,望共同进步。RBM模型在原理上的实现和代码实现还是有所不同。其基本框架如下:
rbm.m代码解读
% Version 1.000
%
% Code provided by GeoffHinton and Ruslan Salakhutdinov
%
% Permission is granted foranyone to copy, use, modify, or distribute this
% program and accompanyingprograms and documents for any purpose, provided
% this copyright notice isretained and prominently displayed, along with
% a note saying that theoriginal programs are available from our
% web page.
% The programs and documentsare distributed without any warranty, express or
% implied. As the programs were written for researchpurposes only, they have
% not been tested to thedegree that would be advisable in any important
% application. All use of these programs is entirely at theuser's own risk.
% This program trainsRestricted Boltzmann Machine in which
% visible, binary, stochasticpixels are connected to
% hidden, binary, stochasticfeature detectors using symmetrically
% weighted connections.Learning is done with 1-step Contrastive Divergence.
% The program assumes thatthe following variables are set externally:
% maxepoch -- maximum number of epochs
% numhid -- number of hidden units
% batchdata -- the data thatis divided into batches (numcases numdims numbatches)
% restart -- set to 1 if learning starts frombeginning
epsilonw = 0.1; % Learning rate for weights
epsilonvb = 0.1; % Learning rate for biases of visible units
epsilonhb = 0.1; % Learning rate for biases of hidden units
weightcost = 0.0002;
initialmomentum = 0.5;
finalmomentum = 0.9;%%momentum变量的作用:本次权值更新会保留一部分上次更新权值的增量值
[numcases numdimsnumbatches]=size(batchdata); %%100,784,600
if restart ==1,
restart=0;
epoch=1;
% Initializing symmetricweights and biases.
vishid = 0.1*randn(numdims,numhid);%%784*1000初始化(随机给的)可见层到隐含层的权重矩阵
hidbiases = zeros(1,numhid);%%初始化隐含层的偏差
visbiases = zeros(1,numdims);%%初始化可见层的偏差
poshidprobs = zeros(numcases,numhid);%%100*1000 单个batch正向传播时隐含层的输出概率
neghidprobs = zeros(numcases,numhid);%% 反向隐含层的输出概率
posprods =zeros(numdims,numhid);%%784*1000 正向可见单元概率生成
negprods =zeros(numdims,numhid);%% 反向可见单元概率生成
vishidinc =zeros(numdims,numhid);%%可见单元与隐藏单元之间权值增量
hidbiasinc = zeros(1,numhid);%%隐含层偏差的增量
visbiasinc = zeros(1,numdims);%%可见层偏差的增量
batchposhidprobs=zeros(numcases,numhid,numbatches);%%100*1000*600 存储每次迭代计算好的每层的隐藏层概率,作为下一个RBM的输入
end
for epoch = epoch:maxepoch,%%总共迭代10次,开始迭代,进行pre-training
fprintf(1,'epoch %d\r',epoch);
errsum=0;%%初始化输出误差为0
for batch = 1:numbatches,%%600每次处理一批次的数据
fprintf(1,'epoch %d batch %d\r',epoch,batch);
%%%%%%%%% START POSITIVEPHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
data = batchdata(:,:,batch);%%784*100*600每次迭代提取每一批次的数据进行预训练,每一行代表一个样本值data28*28*100(每批100个样本)double型,并未二值化
poshidprobs = 1./(1 + exp(-data*vishid - repmat(hidbiases,numcases,1)));%%通过s型函数计算隐含层输出概率值
batchposhidprobs(:,:,batch)=poshidprobs;%%将隐含层的结果作为下一层RBM的可见层
posprods = data' * poshidprobs;%%用于计算系统的能量值用的;矩阵中的每个元素表示对应的可视层节点和隐含层输出概率的乘积
poshidact = sum(poshidprobs);%%正向隐藏层输出概率求和;把每一列(共100列,即100个样本)隐含层的激活值累加起来
posvisact = sum(data); %% 样本值求和;把每一列(100列,即100个样本)可见层的数据累加
%%%%%%%%% END OF POSITIVEPHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
poshidstates = poshidprobs > rand(numcases,numhid);%%%将隐含层输出概率二值化。大于随机概率的置1,小于随机概率的置0;rand(m,n)产生m*n大小的矩阵,矩阵中元素为0-1之间的均匀分布
%%%%%%%%% START NEGATIVEPHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
negdata = 1./(1 + exp(-poshidstates*vishid' -repmat(visbiases,numcases,1)));%%反向进行时生成(重构)可视层的激活值
neghidprobs = 1./(1 + exp(-negdata*vishid -repmat(hidbiases,numcases,1))); %% 由重构的可视层来再次生成隐藏层
negprods = negdata'*neghidprobs;%%用于计算能量值,隐藏层重构数据的期望矩阵输出
neghidact = sum(neghidprobs); % 用重构的可见层的激活值来产生隐含层输出的概率值求和
negvisact = sum(negdata);%%重构数据求和
%%%%%%%%% END OF NEGATIVEPHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
err= sum(sum( (data-negdata).^2 ));%%重构数据的误差
errsum = err + errsum;%%整体误差
if epoch>5,%%迭代次数不同,调整不同冲量
momentum=finalmomentum;%%若大于5次,冲量为0.9
else
momentum=initialmomentum;%%若大于5次,冲量为0.5
end;
%%%%%%%%% UPDATE WEIGHTS ANDBIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
vishidinc = momentum*vishidinc + ...
epsilonw*((posprods-negprods)/numcases - weightcost*vishid);%% 权值增量计算
visbiasinc = momentum*visbiasinc +(epsilonvb/numcases)*(posvisact-negvisact);%% 偏置增量计算
hidbiasinc = momentum*hidbiasinc +(epsilonhb/numcases)*(poshidact-neghidact);%%隐藏层增量计算
vishid = vishid + vishidinc;%%更新参数;可见层到隐藏层的权重更新
visbiases = visbiases + visbiasinc;%%可见层到隐藏层的偏置更新
hidbiases = hidbiases + hidbiasinc;%%隐藏层的偏置更新
%%%%%%%%%%%%%%%% END OFUPDATES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end
fprintf(1, 'epoch %4i error %6.1f \n', epoch, errsum);
end;