【RBM】代码学习--DeepLearningToolBox

最新推荐文章于 2020-03-20 22:35:21 发布

haoji007

最新推荐文章于 2020-03-20 22:35:21 发布

阅读量2.5k

点赞数

分类专栏：【深度学习及论文笔记】

【深度学习及论文笔记】专栏收录该内容

222 篇文章 14 订阅

订阅专栏

下载地址：DeepLearningToolBox

学习RBM代码之前，需要一些基本的RBM的知识。

网上有很多参考资料，我借鉴一篇写的很好的系列文章，看下来也差不多能看懂了，博客地址：http://blog.csdn.net/itplus/article/details/19168937

目录如下

（五）梯度计算公式

通过学习上面的系列文章，基本上理解了RBM的原理，接下来动手学习toolbox中对应的代码部分。本文参考诸多博客，感谢原文作者。

1.ministdeepauto

这个code对应的原文是 Hition大牛science文章reducing the dimensionality of data with neural networks。

用MNIST数据库来进行深度的autoencoder压缩，用的是无监督学习，评价标准是重构误差值MSE。

[cpp]view plaincopy 
     
 % Version 1.000  
 %  
 % Code provided by Ruslan Salakhutdinov and Geoff Hinton    
 %  
 % Permission is granted for anyone to copy, use, modify, or distribute this  
 % program and accompanying programs and documents for any purpose, provided  
 % this copyright notice is retained and prominently displayed, along with  
 % a note saying that the original programs are available from our   
 % web page.   
 % The programs and documents are distributed without any warranty, express or  
 % implied.  As the programs were written for research purposes only, they have  
 % not been tested to the degree that would be advisable in any important  
 % application.  All use of these programs is entirely at the user's own risk.  
   
   
 % This program pretrains a deep autoencoder for MNIST dataset  
 % You can set the maximum number of epochs for pretraining each layer  
 % and you can set the architecture of the multilayer net.  
   
 clc  
 clear all  
 close all  
   
 maxepoch=10; %In the Science paper we use maxepoch=50, but it works just fine.   
 numhid=1000; numpen=500; numpen2=250; numopen=30;  
   
 fprintf(1,'Converting Raw files into Matlab format \n');  
 converter; % 转换数据为matlab的格式  
   
 fprintf(1,'Pretraining a deep autoencoder. \n');  
 fprintf(1,'The Science paper used 50 epochs. This uses %3i \n', maxepoch);  
   
 makebatches;  
 [numcases numdims numbatches]=size(batchdata);  
   
 fprintf(1,'Pretraining Layer 1 with RBM: %d-%d \n',numdims,numhid);  
 restart=1;  
 rbm;  
 hidrecbiases=hidbiases; %hidbiases为隐含层的偏置值  
 save mnistvh vishid hidrecbiases visbiases;  
 %保存每层的变量，分别为权值，隐含层偏置值，可视层偏置值  
   
 fprintf(1,'\nPretraining Layer 2 with RBM: %d-%d \n',numhid,numpen);  
 batchdata=batchposhidprobs; %batchposhidprobs为第一个rbm的输出概率值  
 numhid=numpen;  
 restart=1;  
 rbm;  
 hidpen=vishid; penrecbiases=hidbiases; hidgenbiases=visbiases;  
 save mnisthp hidpen penrecbiases hidgenbiases;  
 %mnisthp为所保存的文件名  
   
 fprintf(1,'\nPretraining Layer 3 with RBM: %d-%d \n',numpen,numpen2);  
 batchdata=batchposhidprobs;  
 numhid=numpen2;  
 restart=1;  
 rbm;  
 hidpen2=vishid; penrecbiases2=hidbiases; hidgenbiases2=visbiases;  
 save mnisthp2 hidpen2 penrecbiases2 hidgenbiases2;  
   
 fprintf(1,'\nPretraining Layer 4 with RBM: %d-%d \n',numpen2,numopen);  
 batchdata=batchposhidprobs;  
 numhid=numopen;   
 restart=1;  
 rbmhidlinear;  
 hidtop=vishid; toprecbiases=hidbiases; topgenbiases=visbiases;  
 save mnistpo hidtop toprecbiases topgenbiases;  
   
 backprop; %Finetune  

本次是训练4个隐含层的autoencoder深度网络结构，输入层维度为784维，4个隐含层维度分别为1000,500,250,30。整个网络权值的获得流程梳理如下：

首先训练第一个rbm网络，即输入层784维和第一个隐含层1000维构成的网络。采用的方法是rbm优化，这个过程用的是训练样本，优化完毕后，计算训练样本在隐含层的输出值。
利用1中的结果作为第2个rbm网络训练的输入值，同样用rbm网络来优化第2个rbm网络，并计算出网络的输出值。并且用同样的方法训练第3个rbm网络和第4个rbm网络。
将上面4个rbm网络展开连接成新的网络，且分成encoder和decoder部分。并用步骤1和2得到的网络值给这个新网络赋初值。
由于新网络中最后的输出和最初的输入节点数是相同的，所以可以将最初的输入值作为网络理论的输出标签值，然后采用BP算法计算网络的代价函数和代价函数的偏导数。
利用步骤3的初始值和步骤4的代价值和偏导值，采用共轭梯度下降法优化整个新网络，得到最终的网络权值。以上整个过程都是无监督的。

2. Rbm

[cpp]view plaincopy 
      
 % Version 1.000   
 %  
 % Code provided by Geoff Hinton and Ruslan Salakhutdinov   
 %  
 % Permission is granted for anyone to copy, use, modify, or distribute this  
 % program and accompanying programs and documents for any purpose, provided  
 % this copyright notice is retained and prominently displayed, along with  
 % a note saying that the original programs are available from our  
 % web page.  
 % The programs and documents are distributed without any warranty, express or  
 % implied.  As the programs were written for research purposes only, they have  
 % not been tested to the degree that would be advisable in any important  
 % application.  All use of these programs is entirely at the user's own risk.  
   
 % This program trains Restricted Boltzmann Machine in which  
 % visible, binary, stochastic pixels are connected to  
 % hidden, binary, stochastic feature detectors using symmetrically  
 % weighted connections. Learning is done with 1-step Contrastive Divergence.     
 % The program assumes that the following variables are set externally:  
 % maxepoch  -- maximum number of epochs  
 % numhid    -- number of hidden units   
 % batchdata -- the data that is divided into batches (numcases numdims numbatches)  
 % restart   -- set to 1 if learning starts from beginning   
   
 epsilonw      = 0.1;   % Learning rate for weights   
 epsilonvb     = 0.1;   % Learning rate for biases of visible units   
 epsilonhb     = 0.1;   % Learning rate for biases of hidden units   
 weightcost  = 0.0002;     
 initialmomentum  = 0.5;  
 finalmomentum    = 0.9;  
  %由此可见这里隐含层和可视层的偏置值不是共用的，当然了，其权值是共用的  
    
 [numcases numdims numbatches]=size(batchdata);%[100,784,600]  
   
 if restart ==1,  
   restart=0;  
   epoch=1;  
   
 % Initializing symmetric weights and biases.   
   vishid     = 0.1*randn(numdims, numhid);%权值初始值随便给,784*1000  
   hidbiases  = zeros(1,numhid);  
   visbiases  = zeros(1,numdims);  
   
   poshidprobs = zeros(numcases,numhid); %100*1000，单个batch正向传播时隐含层的输出概率  
   neghidprobs = zeros(numcases,numhid);  
   posprods    = zeros(numdims,numhid);  
   negprods    = zeros(numdims,numhid);  
   vishidinc  = zeros(numdims,numhid);  
   hidbiasinc = zeros(1,numhid);  
   visbiasinc = zeros(1,numdims);  
   batchposhidprobs=zeros(numcases,numhid,numbatches);  
   % 整个数据正向传播时隐含层的输出概率  
 end  
   
 for epoch = epoch:maxepoch,  
  fprintf(1,'epoch %d\r',epoch);   
  errsum=0;  
  for batch = 1:numbatches, %每次迭代都有遍历所有的batch  
  fprintf(1,'epoch %d batch %d\r',epoch,batch);   
   
 %%%%%%%%% START POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
   data = batchdata(:,:,batch);  
   % 每次迭代都需要取出一个batch的数据，每一行代表一个样本值（这里的数据是double的，不是01的，严格的说后面应将其01化）  
   poshidprobs = 1./(1 + exp(-data*vishid - repmat(hidbiases,numcases,1)));   
   % 样本正向传播时隐含层节点的输出概率   
   batchposhidprobs(:,:,batch)=poshidprobs;  
   posprods    = data' * poshidprobs;  
   %784*1000，这个是求系统的能量值用的，矩阵中每个元素表示对应的可视层节点和隐含层节点的乘积（包含此次样本的数据对应值的累加）  
   poshidact   = sum(poshidprobs);%针对样本值进行求和  
   posvisact = sum(data);  
   
 %%%%%%%%% END OF POSITIVE PHASE  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
   poshidstates = poshidprobs > rand(numcases,numhid);  
   %将隐含层数据01化（此步骤在posprods之后进行），按照概率值大小来判定.  
   
 %%%%%%%%% START NEGATIVE PHASE  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
   negdata = 1./(1 + exp(-poshidstates*vishid' - repmat(visbiases,numcases,1)));% 反向进行时的可视层数据  
   neghidprobs = 1./(1 + exp(-negdata*vishid - repmat(hidbiases,numcases,1))); % 反向进行后又马上正向传播的隐含层概率值      
   negprods  = negdata'*neghidprobs;% 同理也是计算能量值用的，784*1000  
   neghidact = sum(neghidprobs);  
   negvisact = sum(negdata);   
   
 %%%%%%%%% END OF NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
   err= sum(sum( (data-negdata).^2 ));% 重构后的差值  
   errsum = err + errsum;  
   
    if epoch>5,  
      momentum=finalmomentum;  
      %momentum为保持上一次权值更新增量的比例，如果迭代次数越少，则这个比例值可以稍微大一点  
    else  
      momentum=initialmomentum;  
    end;  
   
 %%%%%%%%% UPDATE WEIGHTS AND BIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%   
     vishidinc = momentum*vishidinc + ...  
                 epsilonw*( (posprods-negprods)/numcases - weightcost*vishid);  
     visbiasinc = momentum*visbiasinc + (epsilonvb/numcases)*(posvisact-negvisact);  
     hidbiasinc = momentum*hidbiasinc + (epsilonhb/numcases)*(poshidact-neghidact);  
   
     vishid = vishid + vishidinc;  
     visbiases = visbiases + visbiasinc;  
     hidbiases = hidbiases + hidbiasinc;  
   
 %%%%%%%%%%%%%%%% END OF UPDATES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%   
   
   end  
   fprintf(1, 'epoch %4i error %6.1f  \n', epoch, errsum);   
 end;  

下面来看下在程序中大致实现RBM权值的优化步骤（假设是一个2层的RBM网络，即只有输入层和输出层，且这两层上的变量是二值变量）：

随机给网络初始化一个权值矩阵w和偏置向量b。
对可视层输入矩阵v正向传播，计算出隐含层的输出矩阵h，并计算出输入v和h对应节点乘积的均值矩阵
此时2中的输出h为概率值，将它随机01化为二值变量。
利用3中01化了的h方向传播计算出可视层的矩阵v’.(按照道理，这个v'应该是要01化的)
对v’进行正向传播计算出隐含层的矩阵h’，并计算出v’和h’对应节点乘积的均值矩阵。
用2中得到的均值矩阵减掉5中得到的均值矩阵，其结果作为对应权值增量的矩阵。
结合其对应的学习率，利用权值迭代公式对权值进行迭代。
重复计算2到7，直至收敛。

　　偏置值的优化步骤：

随机给网络初始化一个权值矩阵w和偏置向量b。
对可视层输入矩阵v正向传播，计算出隐含层的输出矩阵h，并计算v层样本的均值向量以及h层的均值向量。
此时2中的输出h为概率值，将它随机01化为二值变量。
利用3中01化了的h方向传播计算出可视层的矩阵v’.
对v’进行正向传播计算出隐含层的矩阵h’，并计算v‘层样本的均值向量以及h’层的均值向量。
用2中得到的v方均值向量减掉5中得到的v’方的均值向量，其结果作为输入层v对应偏置的增值向量。用2中得到的h方均值向量减掉5中得到的h’方的均值向量，其结果作为输入层h对应偏置的增值向量。
结合其对应的学习率，利用权值迭代公式对偏置值进行迭代。
重复计算2到7，直至收敛。

　　当然了，权值更新和偏置值更新每次迭代都是同时进行的，所以应该是同时收敛的。并且在权值更新公式也可以稍微作下变形，比如加入momentum变量，即本次权值更新的增量会保留一部分上次更新权值的增量值。

3. converter

实现的功能是将样本集从.ubyte格式转换成.ascii格式，然后继续转换成.mat格式。

4.makebatches

实现的是将原本的2维数据集变成3维的，因为分了多个批次，另外1维表示的是批次。

5. backprop

反向传递误差

[cpp]view plaincopy 
     
 % Version 1.000  
 %  
 % Code provided by Ruslan Salakhutdinov and Geoff Hinton  
 %  
 % Permission is granted for anyone to copy, use, modify, or distribute this  
 % program and accompanying programs and documents for any purpose, provided  
 % this copyright notice is retained and prominently displayed, along with  
 % a note saying that the original programs are available from our  
 % web page.  
 % The programs and documents are distributed without any warranty, express or  
 % implied.  As the programs were written for research purposes only, they have  
 % not been tested to the degree that would be advisable in any important  
 % application.  All use of these programs is entirely at the user's own risk.  
   
 % This program fine-tunes an autoencoder with backpropagation.  
 % Weights of the autoencoder are going to be saved in mnist_weights.mat  
 % and trainig and test reconstruction errors in mnist_error.mat  
 % You can also set maxepoch, default value is 200 as in our paper.    
   
 maxepoch=200;  
 fprintf(1,'\nFine-tuning deep autoencoder by minimizing cross entropy error. \n');%其微调通过最小化交叉熵来实现  
 fprintf(1,'60 batches of 1000 cases each. \n');  
   
 load mnistvh % 分别load4个rbm的参数  
 load mnisthp  
 load mnisthp2  
 load mnistpo   
   
 makebatches;  
 [numcases numdims numbatches]=size(batchdata);  
 N=numcases;   
   
 %%%% PREINITIALIZE WEIGHTS OF THE AUTOENCODER %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
 w1=[vishid; hidrecbiases]; %分别装载每层的权值和偏置值，将它们作为一个整体  
 w2=[hidpen; penrecbiases];  
 w3=[hidpen2; penrecbiases2];  
 w4=[hidtop; toprecbiases];  
 w5=[hidtop'; topgenbiases];   
 w6=[hidpen2'; hidgenbiases2];   
 w7=[hidpen'; hidgenbiases];   
 w8=[vishid'; visbiases];  
   
 %%%%%%%%%% END OF PREINITIALIZATIO OF WEIGHTS  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
   
 l1=size(w1,1)-1;%每个网络层中节点的个数  
 l2=size(w2,1)-1;  
 l3=size(w3,1)-1;  
 l4=size(w4,1)-1;  
 l5=size(w5,1)-1;  
 l6=size(w6,1)-1;  
 l7=size(w7,1)-1;  
 l8=size(w8,1)-1;  
 l9=l1;  %输出层节点和输入层的一样  
 test_err=[];  
 train_err=[];  
   
   
 for epoch = 1:maxepoch  
   
 %%%%%%%%%%%%%%%%%%%% COMPUTE TRAINING RECONSTRUCTION ERROR %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
 err=0;   
 [numcases numdims numbatches]=size(batchdata);  
 N=numcases;  
  for batch = 1:numbatches  
   data = [batchdata(:,:,batch)];  
   data = [data ones(N,1)];  % b补上一维，因为有偏置项  
   w1probs = 1./(1 + exp(-data*w1)); w1probs = [w1probs  ones(N,1)];;  
   %正向传播，计算每一层的输出，且同时在输出上增加一维（值为常量1）  
   w2probs = 1./(1 + exp(-w1probs*w2)); w2probs = [w2probs ones(N,1)];  
   w3probs = 1./(1 + exp(-w2probs*w3)); w3probs = [w3probs  ones(N,1)];  
   w4probs = w3probs*w4; w4probs = [w4probs  ones(N,1)];  
   w5probs = 1./(1 + exp(-w4probs*w5)); w5probs = [w5probs  ones(N,1)];  
   w6probs = 1./(1 + exp(-w5probs*w6)); w6probs = [w6probs  ones(N,1)];  
   w7probs = 1./(1 + exp(-w6probs*w7)); w7probs = [w7probs  ones(N,1)];  
   dataout = 1./(1 + exp(-w7probs*w8));  
   err= err +  1/N*sum(sum( (data(:,1:end-1)-dataout).^2 )); %重构的误差值  
   end  
  train_err(epoch)=err/numbatches; %总的误差值（训练样本上）  
   
 %%%%%%%%%%%%%% END OF COMPUTING TRAINING RECONSTRUCTION ERROR %%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
   
 %%%% DISPLAY FIGURE TOP ROW REAL DATA BOTTOM ROW RECONSTRUCTIONS %%%%%%%%%%%%%%%%%%%%%%%%%  
 fprintf(1,'Displaying in figure 1: Top row - real data, Bottom row -- reconstructions \n');  
 output=[];  
  for ii=1:15  
   output = [output data(ii,1:end-1)' dataout(ii,:)']; %output为15（因为是显示15个数字）组，每组2列，分别为理论值和重构值  
  end  
    if epoch==1   
    close all   
    figure('Position',[100,600,1000,200]);  
    else   
    figure(1)  
    end   
    mnistdisp(output);  
    drawnow;  
   
 %%%%%%%%%%%%%%%%%%%% COMPUTE TEST RECONSTRUCTION ERROR %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
 [testnumcases testnumdims testnumbatches]=size(testbatchdata);  
 N=testnumcases;  
 err=0;  
 for batch = 1:testnumbatches  
   data = [testbatchdata(:,:,batch)];  
   data = [data ones(N,1)];  
   w1probs = 1./(1 + exp(-data*w1)); w1probs = [w1probs  ones(N,1)];  
   w2probs = 1./(1 + exp(-w1probs*w2)); w2probs = [w2probs ones(N,1)];  
   w3probs = 1./(1 + exp(-w2probs*w3)); w3probs = [w3probs  ones(N,1)];  
   w4probs = w3probs*w4; w4probs = [w4probs  ones(N,1)];  
   w5probs = 1./(1 + exp(-w4probs*w5)); w5probs = [w5probs  ones(N,1)];  
   w6probs = 1./(1 + exp(-w5probs*w6)); w6probs = [w6probs  ones(N,1)];  
   w7probs = 1./(1 + exp(-w6probs*w7)); w7probs = [w7probs  ones(N,1)];  
   dataout = 1./(1 + exp(-w7probs*w8));  
   err = err +  1/N*sum(sum( (data(:,1:end-1)-dataout).^2 ));  
   end  
  test_err(epoch)=err/testnumbatches;  
  fprintf(1,'Before epoch %d Train squared error: %6.3f Test squared error: %6.3f \t \t \n',epoch,train_err(epoch),test_err(epoch));  
   
 %%%%%%%%%%%%%% END OF COMPUTING TEST RECONSTRUCTION ERROR %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
   
  tt=0;  
  for batch = 1:numbatches/10 %测试样本numbatches是100  
  fprintf(1,'epoch %d batch %d\r',epoch,batch);  
   
 %%%%%%%%%%% COMBINE 10 MINIBATCHES INTO 1 LARGER MINIBATCH %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
  tt=tt+1;   
  data=[];  
  for kk=1:10  
   data=[data   
         batchdata(:,:,(tt-1)*10+kk)];   
  end   
   
 %%%%%%%%%%%%%%% PERFORM CONJUGATE GRADIENT WITH 3 LINESEARCHES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%共轭梯度线性搜索  
   max_iter=3;  
   VV = [w1(:)' w2(:)' w3(:)' w4(:)' w5(:)' w6(:)' w7(:)' w8(:)']';;  
   % 把所有权值（已经包括了偏置值）变成一个大的列向量  
   Dim = [l1; l2; l3; l4; l5; l6; l7; l8; l9];  
   %每层网络对应节点的个数（不包括偏置值）  
   [X, fX] = minimize(VV,'CG_MNIST',max_iter,Dim,data);%该函数时使用共轭梯度的方法来对参数X进行优化  
     
   w1 = reshape(X(1:(l1+1)*l2),l1+1,l2);  
   xxx = (l1+1)*l2;  
   w2 = reshape(X(xxx+1:xxx+(l2+1)*l3),l2+1,l3);  
   xxx = xxx+(l2+1)*l3;  
   w3 = reshape(X(xxx+1:xxx+(l3+1)*l4),l3+1,l4);  
   xxx = xxx+(l3+1)*l4;  
   w4 = reshape(X(xxx+1:xxx+(l4+1)*l5),l4+1,l5);  
   xxx = xxx+(l4+1)*l5;  
   w5 = reshape(X(xxx+1:xxx+(l5+1)*l6),l5+1,l6);  
   xxx = xxx+(l5+1)*l6;  
   w6 = reshape(X(xxx+1:xxx+(l6+1)*l7),l6+1,l7);  
   xxx = xxx+(l6+1)*l7;  
   w7 = reshape(X(xxx+1:xxx+(l7+1)*l8),l7+1,l8);  
   xxx = xxx+(l7+1)*l8;  
   w8 = reshape(X(xxx+1:xxx+(l8+1)*l9),l8+1,l9);  
 %依次重新赋值为优化后的参数  
 %%%%%%%%%%%%%%% END OF CONJUGATE GRADIENT WITH 3 LINESEARCHES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
   
  end  
   
  save mnist_weights w1 w2 w3 w4 w5 w6 w7 w8 %前面一个是文件名  
  save mnist_error test_err train_err;  
   
 end  

5. CG_MNIST

函数CG_MNIST形式如下：

　　function [f, df] = CG_MNIST(VV,Dim,XX);

　　该函数实现的功能是计算网络代价函数值f，以及f对网络中各个参数值的偏导数df，权值和偏置值是同时处理。其中参数VV为网络中所有参数构成的列向量，参数Dim为每层网络的节点数构成的向量，XX为训练样本集合。f和df分别表示网络的代价函数和偏导函数值。

6.minimize——共轭梯度下降的优化函数形式

　　[X, fX, i] = minimize(X, f, length, P1, P2, P3, ... )

　　该函数时使用共轭梯度的方法来对参数X进行优化，所以X是网络的参数值，为一个列向量。f是一个函数的名称，它主要是用来计算网络中的代价函数以及代价函数对各个参数X的偏导函数，f的参数值分别为X，以及minimize函数后面的P1,P2,P3,…使用共轭梯度法进行优化的最大线性搜索长度为length。返回值X为找到的最优参数，fX为在此最优参数X下的代价函数，i为线性搜索的长度（即迭代的次数）。

本文参考如下博客：

http://www.cnblogs.com/tornadomeet/archive/2013/04/30/3052349.html