c++重写卷积网络的前向计算过程，完美复现theano的测试结果

本人的需求是：

通过theano的cnn训练神经网络，将最终稳定的网络权值保存下来。c++实现cnn的前向计算过程，读取theano的权值，复现theano的测试结果

本人最终的成果是：

1、卷积神经网络的前向计算过程
2、mlp网络的前向与后向计算，也就是可以用来训练样本
需要注意的是：
如果为了复现theano的测试结果，那么隐藏层的激活函数要选用tanh；
否则，为了mlp的训练过程，激活函数要选择sigmoid

成果的展现：

下图是theano的训练以及测试结果，验证样本错误率为9.23%

下面是我的c++程序，验证错误率也是9.23%，完美复现theano的结果

简单讲难点有两个：

1.theano的权值以及测试样本与c++如何互通？

2.theano的卷积的时候，上层输入的featuremap如何组合，映射到本层的每个像素点上？

在解决上述两点的过程中，走了很多的弯路：

为了用c++重现theano的测试结果，必须让c++能够读取theano保存的权值以及测试样本。

思考分析如下：

1.theano的权值是numpy格式，而它直接与c++交互，很困难，numpy的格式不好解析，网上资料很少

2.采用python做中间转换，实现1)的要求。后看theano代码，发现读入python的训练样本，不用转换成numpy数组，用本来python就可以了。但是python经过cPickle的dump文件，加了很多格式，不适合同c++交互。

3. 用json转换，由于python和cpp都有json的接口，都转成json的格式，然后再交互。可是theano训练之后权值是numpy格式的，需要转成python数组，json才可以存到文件中。现在的问题是怎么把numpy转成python的list？

4.为了解决3，找了一天，终于找到了numpy数组的tolist接口，可以将numpy数组转换成python的list。

5.现在python和c++都可以用json了。研究jsoncpp库的使用，将python的json文件读取。通过测试发现，库 jsoncpp不适合读取大文件，很容易造成内存不足，效率极低，故不可取。

6.用c++写函数，自己解析json文件。并且通过pot文件生成训练与测试样本的时候，也直接用c++来生成，不需要转换成numpy数组的格式。

经过上述分析，解决了难点1。通过json格式实现c++与theano权值与测试样本的互通，并且自己写函数解析json文件。

对于难点2，看一个典型的cnn网络图

难点2的详细描述如下：

Theano从S2到C3的时候，如何选择S2的featuremap进行组合？每次固定选取还是根据一定的算法动态组合？
Theano从C3到S4的pooling过程，令poolsize是(2*2),如何将C3的每4个像素变成S4的一个像素？

通过大量的分析，对比验证，发现以下结论：

Theano从S2到C3的时候，选择S2的所有featuremap进行组合
Theano从C3到S4的pooling过程，令poolsize是(2*2),，对于C3的每4个像素，选取最大值作为S4的一个像素

通过以上的分析，理论上初步已经弄清楚了。下面就是要根据理论编写代码，真正耗时的是代码的调试过程，总是复现不了theano的测试结果。

曾经不止一次的认为这是不可能复现的，鬼知道theano怎么搞的。

今天终于将代码调通，很是兴奋，于是有了这篇博客。

阻碍我实现结果的bug主要有两个，一个是理论上的不足，对theano卷积的细节把握不准确；一个是自己写代码时粗心，变量初始化错误。如下：

S2到C3卷积时，theano会对卷积核旋转180度之后，才会像下图这样进行卷积（本人刚接触这块，实在是不知道啊。。。）

C3到S4取像素最大值的时候，想当然认为像素都是正的，变量初始化为0，导致最终找最大值错误（这个bug找的时间最久，血淋淋的教训。。。）

Python CNN代码：Here。

然后加入下面代码。

theano对写权值的函数，注意它保存的是卷积核旋转180度后的权值，如果权值是二维的，那么行列互换（与c++的权值表示法统一）

[python]view plaincopy 
      
 def getDataJson(layers):  
     data = []  
     i = 0  
     for layer in layers:  
         w, b = layer.params  
         # print '..layer is', i  
         w, b = w.get_value(), b.get_value()  
         wshape = w.shape  
         # print '...the shape of w is', wshape  
         if len(wshape) == 2:  
             w = w.transpose()  
         else:  
             for k in xrange(wshape[0]):  
                 for j in xrange(wshape[1]):  
                     w[k][j] = numpy.rot90(w[k][j], 2)  
   
             w = w.reshape((wshape[0], numpy.prod(wshape[1:])))  
           
         w = w.tolist()  
         b = b.tolist()  
         data.append([w, b])  
         i += 1  
     return data  
   
 def writefile(data, name = '../../tmp/src/data/theanocnn.json'):  
     print ('writefile is ' + name)  
     f = open(name, "wb")  
     json.dump(data,f)  
     f.close()  

theano读权值

[python]view plaincopy 
      
 def readfile(layers, nkerns, name = '../../tmp/src/data/theanocnn.json'):  
     # Load the dataset  
     print ('readfile is ' + name)  
     f = open(name, 'rb')  
     data = json.load(f)  
     f.close()  
     readwb(data, layers, nkerns)  
   
 def readwb(data, layers, nkerns):  
     i = 0  
     kernSize = len(nkerns)  
     inputnum = 1  
     for layer in layers:  
         w, b = data[i]  
         w = numpy.array(w, dtype='float32')  
         b = numpy.array(b, dtype='float32')  
   
         # print '..layer is', i  
         # print w.shape  
         if i >= kernSize:  
             w = w.transpose()  
         else:  
             w = w.reshape((nkerns[i], inputnum, 5, 5))  
             for k in xrange(nkerns[i]):  
                 for j in xrange(inputnum):  
                     c = w[k][j]  
                     w[k][j] = numpy.rot90(c, 2)  
             inputnum = nkerns[i]  
         # print '..readwb ，transpose and rot180'  
         # print w.shape  
         layer.W.set_value(w, borrow=True)  
         layer.b.set_value(b, borrow=True)  
         i += 1  

测试样本由手写数字库mnist生成，核心代码如下：

[python]view plaincopy 
      
 def mnist2json_small(cnnName = 'mnist_small.json', validNumber = 10):  
     dataset = '../../data/mnist.pkl'  
     print '... loading data', dataset  
   
     # Load the dataset  
     f = open(dataset, 'rb')  
     train_set, valid_set, test_set = cPickle.load(f)  
     #print test_set  
     f.close()  
     def np2listSmall(train_set, number):  
         trainfile = []  
         trains, labels = train_set  
         trainfile = []  
         #如果注释掉下面，将生成number个验证样本  
         number = len(labels)  
         for one in trains[:number]:  
             one = one.tolist()  
             trainfile.append(one)  
         labelfile = labels[:number].tolist()  
         datafile = [trainfile, labelfile]  
         return datafile  
     smallData = valid_set  
     print len(smallData)  
     valid, validlabel = np2listSmall(smallData, validNumber)  
     datafile = [valid, validlabel]  
     basedir = '../../tmp/src/data/'  
     # basedir = './'  
     json.dump(datafile, open(basedir + cnnName, 'wb'))  

个人收获：

面对较难的任务，逐步分解，各个击破
解决问题的过程中，如果此路不通，要马上寻找其它思路，就像当年做数学证明题一样
态度要积极，不要轻言放弃，尽全力完成任务
代码调试时，应该首先构造较为全面的测试用例，这样可以迅速定位bug

本人的需求以及实现时的困难已经基本描述清楚，如果还有别的小问题，我相信大家花点比俺少很多很多

的时间就可以解决，下面开始贴代码

如果不想自己建工程，这里有vs2008的c++代码，自己按照theano生成一下权值就可以读入运行了

C++代码

main.cpp

[cpp]view plaincopy 
    
 #include <iostream>  
 #include "mlp.h"  
 #include "util.h"  
 #include "testinherit.h"  
 #include "neuralNetwork.h"  
 using namespace std;  
   
 /************************************************************************/  
 /* 本程序实现了 
 1、卷积神经网络的前向计算过程  
 2、mlp网络的前向与后向计算，也就是可以用来训练样本 
 需要注意的是： 
 如果为了复现theano的测试结果，那么隐藏层的激活函数要选用tanh； 
 否则，为了mlp的训练过程，激活函数要选择sigmoid 
  
 */  
 /************************************************************************/  
 int main()  
 {  
   
     cout << "****cnn****" << endl;  
     TestCnnTheano(28 * 28, 10);  
     // TestMlpMnist对mlp训练样本进行测试  
     //TestMlpMnist(28 * 28, 500, 10);  
     return 0;  
 }  

neuralNetwork.h

[cpp]view plaincopy 
    
 #ifndef NEURALNETWORK_H  
 #define NEURALNETWORK_H  
   
 #include "mlp.h"  
 #include "cnn.h"  
 #include <vector>  
 using std::vector;  
   
 /************************************************************************/  
 /* 这是一个卷积神经网络                                                                     */  
 /************************************************************************/  
 class NeuralNetWork  
 {  
 public:  
     NeuralNetWork(int iInput, int iOut);  
     ~NeuralNetWork();  
     void Predict(double** in_data, int n);  
   
   
     double CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel);  
   
     void Setwb(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb);  
     void SetTrainNum(int iNum);  
   
     int Predict(double *pInputData);  
     //    void Forward_propagation(double** ppdata, int n);  
     double* Forward_propagation(double *);  
   
 private:  
     int m_iSampleNum; //样本数量  
     int m_iInput; //输入维数  
     int m_iOut; //输出维数  
       
     vector<CnnLayer *> vecCnns;   
     Mlp *m_pMlp;  
 };  
 void TestCnnTheano(const int iInput, const int iOut);  
   
 #endif  

neuralNetwork.cpp

[cpp]view plaincopy 
    
 #include "neuralNetwork.h"  
   
 #include <iostream>  
 #include "util.h"  
 #include <iomanip>  
   
 using namespace std;  
   
 NeuralNetWork::NeuralNetWork(int iInput, int iOut):m_iSampleNum(0), m_iInput(iInput), m_iOut(iOut), m_pMlp(NULL)  
 {  
     int iFeatureMapNumber = 20, iPoolWidth = 2, iInputImageWidth = 28, iKernelWidth = 5, iInputImageNumber = 1;  
   
     CnnLayer *pCnnLayer = new CnnLayer(m_iSampleNum, iInputImageNumber, iInputImageWidth, iFeatureMapNumber, iKernelWidth, iPoolWidth);  
     vecCnns.push_back(pCnnLayer);  
   
     iInputImageNumber = 20;  
     iInputImageWidth = 12;  
     iFeatureMapNumber = 50;  
     pCnnLayer = new CnnLayer(m_iSampleNum, iInputImageNumber, iInputImageWidth, iFeatureMapNumber, iKernelWidth, iPoolWidth);  
     vecCnns.push_back(pCnnLayer);  
   
     const int ihiddenSize = 1;  
     int phidden[ihiddenSize] = {500};  
     // construct LogisticRegression  
     m_pMlp = new Mlp(m_iSampleNum, iFeatureMapNumber * 4 * 4, m_iOut, ihiddenSize, phidden);  
   
 }  
   
 NeuralNetWork::~NeuralNetWork()  
 {  
   
     for (vector<CnnLayer*>::iterator it = vecCnns.begin(); it != vecCnns.end(); ++it)  
     {  
         delete *it;  
     }  
     delete m_pMlp;  
 }  
   
 void NeuralNetWork::SetTrainNum(int iNum)  
 {  
     m_iSampleNum = iNum;  
   
     for (size_t i = 0; i < vecCnns.size(); ++i)  
     {  
         vecCnns[i]->SetTrainNum(iNum);  
     }  
     m_pMlp->SetTrainNum(iNum);  
       
 }  
   
 int NeuralNetWork::Predict(double *pdInputdata)  
 {  
     double *pdPredictData = NULL;  
     pdPredictData = Forward_propagation(pdInputdata);  
   
     int iResult = -1;  
       
       
     iResult = m_pMlp->m_pLogisticLayer->Predict(pdPredictData);  
   
     return iResult;  
 }  
   
 double* NeuralNetWork::Forward_propagation(double *pdInputData)  
 {  
     double *pdPredictData = pdInputData;  
   
     vector<CnnLayer*>::iterator it;  
     CnnLayer *pCnnLayer;  
     for (it = vecCnns.begin(); it != vecCnns.end(); ++it)  
     {  
         pCnnLayer = *it;  
         pCnnLayer->Forward_propagation(pdPredictData);  
         pdPredictData = pCnnLayer->GetOutputData();  
     }  
     //此时pCnnLayer指向最后一个卷积层,pdInputData是卷积层的最后输出  
     //暂时忽略mlp的前向计算，以后加上  
     pdPredictData = m_pMlp->Forward_propagation(pdPredictData);  
     return pdPredictData;  
 }  
   
 void NeuralNetWork::Setwb(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb)  
 {  
     for (size_t i = 0; i < vecCnns.size(); ++i)  
     {  
         vecCnns[i]->Setwb(vvAllw[i], vvAllb[i]);  
     }  
       
     size_t iLayerNum = vvAllw.size();  
     for (size_t i = vecCnns.size(); i < iLayerNum - 1; ++i)  
     {  
         int iHiddenIndex = 0;  
         m_pMlp->m_ppHiddenLayer[iHiddenIndex]->Setwb(vvAllw[i], vvAllb[i]);  
         ++iHiddenIndex;  
     }  
     m_pMlp->m_pLogisticLayer->Setwb(vvAllw[iLayerNum - 1], vvAllb[iLayerNum - 1]);  
 }  
 double NeuralNetWork::CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel)  
 {  
     cout << "Predict------------" << endl;  
     int iErrorNumber = 0, iValidNumber = vecValidlabel.size();  
     //iValidNumber = 1;  
     for (int i = 0; i < iValidNumber; ++i)  
     {  
         int iResult = Predict(vecvalid[i]);  
         //cout << i << ",valid is " << iResult << " label is " << vecValidlabel[i] << endl;  
         if (iResult != vecValidlabel[i])  
         {  
             ++iErrorNumber;  
         }  
     }  
   
     cout << "the num of error is " << iErrorNumber << endl;  
     double dErrorRate = (double)iErrorNumber / iValidNumber;  
     cout << "the error rate of Train sample by softmax is " << setprecision(10) << dErrorRate * 100 << "%" << endl;  
   
     return dErrorRate;  
 }  
   
 /************************************************************************/  
 /*  
 测试样本采用mnist库，此cnn的结构与theano教程上的一致，即 
 输入是28*28图像，接下来是2个卷积层（卷积+pooling），featuremap个数分别是20和50， 
 然后是全连接层（500个神经元），最后输出层10个神经元 
  
 */  
 /************************************************************************/  
 void TestCnnTheano(const int iInput, const int iOut)  
 {  
     //构建卷积神经网络  
     NeuralNetWork neural(iInput, iOut);  
     //存取theano的权值  
     vector< vector<double*> > vvAllw;  
     vector< vector<double> > vvAllb;  
     //存取测试样本与标签  
     vector<double*> vecValid;  
     vector<WORD> vecLabel;  
     //保存theano权值与测试样本的文件  
     const char *szTheanoWeigh = "../../data/theanocnn.json", *szTheanoTest = "../../data/mnist_validall.json";  
     //将每次权值的第二维（列宽）保存到vector中，用于读取json文件  
     vector<int> vecSecondDimOfWeigh;  
     vecSecondDimOfWeigh.push_back(5 * 5);  
     vecSecondDimOfWeigh.push_back(20 * 5 * 5);  
     vecSecondDimOfWeigh.push_back(50 * 4 * 4);  
     vecSecondDimOfWeigh.push_back(500);  
   
     cout << "loadwb ---------" << endl;  
     //读取权值  
     LoadWeighFromJson(vvAllw, vvAllb, szTheanoWeigh, vecSecondDimOfWeigh);  
     //将权值设置到cnn中  
     neural.Setwb(vvAllw, vvAllb);  
     //读取测试文件           
     LoadTestSampleFromJson(vecValid, vecLabel, szTheanoTest, iInput);  
     //设置测试样本的总量  
     int iVaildNum = vecValid.size();  
     neural.SetTrainNum(iVaildNum);  
     //前向计算cnn的错误率，输出结果  
     neural.CalErrorRate(vecValid, vecLabel);  
       
     //释放测试文件所申请的空间  
     for (vector<double*>::iterator cit = vecValid.begin(); cit != vecValid.end(); ++cit)  
     {  
         delete [](*cit);  
     }  
 }  

cnn.h

[cpp]view plaincopy 
    
 #ifndef CNN_H  
 #define CNN_H  
 #include "featuremap.h"  
 #include "poollayer.h"  
   
 #include <vector>  
   
 using std::vector;  
   
 typedef unsigned short WORD;  
 /** 
 *本卷积模拟theano的测试过程 
 *当输入层是num个featuremap时，本层卷积层假设有featureNum个featuremap。 
 *对于本层每个像素点选取，上一层num个featuremap一起组合，并且没有bias 
 *然后本层输出到pooling层，pooling只对poolsize内的像素取最大值，然后加上bias，总共有featuremap个bias值 
 */  
 class CnnLayer  
 {  
 public:  
     CnnLayer(int iSampleNum, int iInputImageNumber, int iInputImageWidth, int iFeatureMapNumber,  
         int iKernelWidth, int iPoolWidth);  
     ~CnnLayer();  
     void Forward_propagation(double *pdInputData);  
     void Back_propagation(double* , double* , double );  
   
     void Train(double *x, WORD y, double dLr);  
     int Predict(double *);  
   
     void Setwb(vector<double*> &vpdw, vector<double> &vdb);  
     void SetInputAllData(double **ppInputAllData, int iInputNum);  
     void SetTrainNum(int iSampleNumber);  
     void PrintOutputData();  
   
     double* GetOutputData();  
 private:  
     int m_iSampleNum;  
   
     FeatureMap *m_pFeatureMap;  
     PoolLayer *m_pPoolLayer;  
     //反向传播时所需值  
     double **m_ppdDelta;  
     double *m_pdInputData;  
     double *m_pdOutputData;  
 };  
 void TestCnn();  
   
 #endif // CNN_H  

cnn.cpp

[cpp]view plaincopy 
    
 #include "cnn.h"  
 #include "util.h"  
 #include <cassert>  
   
 CnnLayer::CnnLayer(int iSampleNum, int iInputImageNumber, int iInputImageWidth, int iFeatureMapNumber,  
                    int iKernelWidth, int iPoolWidth):  
     m_iSampleNum(iSampleNum), m_pdInputData(NULL), m_pdOutputData(NULL)  
 {  
     m_pFeatureMap = new FeatureMap(iInputImageNumber, iInputImageWidth, iFeatureMapNumber, iKernelWidth);  
     int iFeatureMapWidth =  iInputImageWidth - iKernelWidth + 1;  
     m_pPoolLayer = new PoolLayer(iFeatureMapNumber, iPoolWidth, iFeatureMapWidth);  
      
 }  
   
 CnnLayer::~CnnLayer()  
 {  
     delete m_pFeatureMap;  
     delete m_pPoolLayer;  
 }  
   
 void CnnLayer::Forward_propagation(double *pdInputData)  
 {  
     m_pFeatureMap->Convolute(pdInputData);  
     m_pPoolLayer->Convolute(m_pFeatureMap->GetFeatureMapValue());  
     m_pdOutputData = m_pPoolLayer->GetOutputData();  
     /************************************************************************/  
     /* 调试卷积过程的各阶段结果，调通后删除                                                                     */  
     /************************************************************************/  
     /*m_pFeatureMap->PrintOutputData(); 
     m_pPoolLayer->PrintOutputData();*/  
 }  
   
 void CnnLayer::SetInputAllData(double **ppInputAllData, int iInputNum)  
 {  
       
 }  
   
 double* CnnLayer::GetOutputData()  
 {  
     assert(NULL != m_pdOutputData);  
   
     return m_pdOutputData;  
 }  
   
 void CnnLayer::Setwb(vector<double*> &vpdw, vector<double> &vdb)  
 {  
     m_pFeatureMap->SetWeigh(vpdw);  
     m_pPoolLayer->SetBias(vdb);  
       
 }  
   
 void CnnLayer::SetTrainNum( int iSampleNumber )  
 {  
     m_iSampleNum = iSampleNumber;  
 }  
   
 void CnnLayer::PrintOutputData()  
 {  
     m_pFeatureMap->PrintOutputData();  
     m_pPoolLayer->PrintOutputData();  
   
 }  
 void TestCnn()  
 {  
     const int iFeatureMapNumber = 2, iPoolWidth = 2, iInputImageWidth = 8, iKernelWidth = 3, iInputImageNumber = 2;  
   
     double *pdImage = new double[iInputImageWidth * iInputImageWidth * iInputImageNumber];  
     double arrInput[iInputImageNumber][iInputImageWidth * iInputImageWidth];  
   
     MakeCnnSample(arrInput, pdImage, iInputImageWidth, iInputImageNumber);  
   
     double *pdKernel = new double[3 * 3 * iInputImageNumber];  
     double arrKernel[3 * 3 * iInputImageNumber];  
     MakeCnnWeigh(pdKernel, iInputImageNumber) ;  
       
   
     CnnLayer cnn(3, iInputImageNumber, iInputImageWidth, iFeatureMapNumber, iKernelWidth, iPoolWidth);  
   
     vector <double*> vecWeigh;  
     vector <double> vecBias;  
     for (int i = 0; i < iFeatureMapNumber; ++i)  
     {  
         vecBias.push_back(1.0);  
     }  
     vecWeigh.push_back(pdKernel);  
     for (int i = 0; i < 3 * 3 * 2; ++i)  
     {  
         arrKernel[i] = i;  
     }  
     vecWeigh.push_back(arrKernel);  
     cnn.Setwb(vecWeigh, vecBias);  
       
     cnn.Forward_propagation(pdImage);  
     cnn.PrintOutputData();  
       
   
   
     delete []pdKernel;  
     delete []pdImage;  
   
 }  

featuremap.h

[cpp]view plaincopy 
    
 #ifndef FEATUREMAP_H  
 #define FEATUREMAP_H  
   
 #include <cassert>  
 #include <vector>  
 using std::vector;  
   
 class FeatureMap  
 {  
 public:  
     FeatureMap(int iInputImageNumber, int iInputImageWidth, int iFeatureMapNumber, int iKernelWidth);  
     ~FeatureMap();  
     void Forward_propagation(double* );  
     void Back_propagation(double* , double* , double );  
   
   
     void Convolute(double *pdInputData);  
   
     int GetFeatureMapSize()  
     {  
         return m_iFeatureMapSize;  
     }  
   
     int GetFeatureMapWidth()  
     {  
         return m_iFeatureMapWidth;  
     }  
   
     double* GetFeatureMapValue()  
     {  
         assert(m_pdOutputValue != NULL);  
         return m_pdOutputValue;  
     }  
   
     void SetWeigh(const vector<double *> &vecWeigh);  
     void PrintOutputData();  
   
     double **m_ppdWeigh;  
     double *m_pdBias;  
   
 private:  
     int m_iInputImageNumber;  
     int m_iInputImageWidth;  
     int m_iInputImageSize;  
     int m_iFeatureMapNumber;  
     int m_iFeatureMapWidth;  
     int m_iFeatureMapSize;  
     int m_iKernelWidth;  
       
 //    double m_dBias;  
     double *m_pdOutputValue;  
 };  
   
 #endif // FEATUREMAP_H  

featuremap.cpp

[cpp]view plaincopy 
    
 #include "featuremap.h"  
 #include "util.h"  
 #include <cassert>  
   
 FeatureMap::FeatureMap(int iInputImageNumber, int iInputImageWidth, int iFeatureMapNumber, int iKernelWidth):  
     m_iInputImageNumber(iInputImageNumber),  
     m_iInputImageWidth(iInputImageWidth),  
     m_iFeatureMapNumber(iFeatureMapNumber),  
     m_iKernelWidth(iKernelWidth)  
 {  
     m_iFeatureMapWidth = m_iInputImageWidth - m_iKernelWidth + 1;  
     m_iInputImageSize = m_iInputImageWidth * m_iInputImageWidth;  
     m_iFeatureMapSize = m_iFeatureMapWidth * m_iFeatureMapWidth;  
       
   
     int iKernelSize;  
     iKernelSize = m_iKernelWidth * m_iKernelWidth;  
   
     double dbase = 1.0 / m_iInputImageSize;  
     srand((unsigned)time(NULL));  
       
   
     m_ppdWeigh = new double*[m_iFeatureMapNumber];  
     m_pdBias = new double[m_iFeatureMapNumber];  
   
     for (int i = 0; i < m_iFeatureMapNumber; ++i)  
     {  
         m_ppdWeigh[i] = new double[m_iInputImageNumber * iKernelSize];  
         for (int j = 0; j < m_iInputImageNumber * iKernelSize; ++j)  
         {  
             m_ppdWeigh[i][j] = uniform(-dbase, dbase);  
         }  
         //m_pdBias[i] = uniform(-dbase, dbase);  
         //theano的卷积层貌似没有用到bias，它在pooling层使用  
         m_pdBias[i] = 0;  
     }  
   
     m_pdOutputValue = new double[m_iFeatureMapNumber * m_iFeatureMapSize];  
       
 //    m_dBias = uniform(-dbase, dbase);  
 }  
   
 FeatureMap::~FeatureMap()  
 {  
     delete []m_pdOutputValue;  
     delete []m_pdBias;  
     for (int i = 0; i < m_iFeatureMapNumber; ++i)  
     {  
         delete []m_ppdWeigh[i];  
     }  
     delete []m_ppdWeigh;  
       
 }  
   
 void FeatureMap::SetWeigh(const vector<double *> &vecWeigh)  
 {  
     assert(vecWeigh.size() == (DWORD)m_iFeatureMapNumber);  
     for (int i = 0; i < m_iFeatureMapNumber; ++i)  
     {  
         delete []m_ppdWeigh[i];  
         m_ppdWeigh[i] = vecWeigh[i];  
     }  
 }  
   
 /* 
  
 卷积计算 
 pdInputData:一维向量，包含若干个输入图像 
  
 */  
 void FeatureMap::Convolute(double *pdInputData)  
 {  
     for (int iMapIndex = 0; iMapIndex < m_iFeatureMapNumber; ++iMapIndex)  
     {  
         double dBias = m_pdBias[iMapIndex];  
         //每一个featuremap  
         for (int i = 0; i < m_iFeatureMapWidth; ++i)  
         {  
             for (int j = 0; j < m_iFeatureMapWidth; ++j)  
             {  
                 double dSum = 0.0;  
                 int iInputIndex, iKernelIndex, iInputIndexStart, iKernelStart, iOutIndex;  
                 //输出向量的索引计算  
                 iOutIndex = iMapIndex * m_iFeatureMapSize + i * m_iFeatureMapWidth + j;  
                 //分别计算每一个输入图像  
                 for (int k = 0; k < m_iInputImageNumber; ++k)  
                 {  
                     //与kernel对应的输入图像的起始位置  
                     //iInputIndexStart = k * m_iInputImageSize + j * m_iInputImageWidth + i;  
                     iInputIndexStart = k * m_iInputImageSize + i * m_iInputImageWidth + j;  
                     //kernel的起始位置  
                     iKernelStart = k * m_iKernelWidth * m_iKernelWidth;  
                     for (int m = 0; m < m_iKernelWidth; ++m)  
                     {  
                         for (int n = 0; n < m_iKernelWidth; ++n)  
                         {  
   
                             //iKernelIndex = iKernelStart + n * m_iKernelWidth + m;  
                             iKernelIndex = iKernelStart + m * m_iKernelWidth + n;  
                             //i am not sure, is the expression of below correct?  
                             iInputIndex = iInputIndexStart + m * m_iInputImageWidth + n;  
   
                             dSum += pdInputData[iInputIndex] * m_ppdWeigh[iMapIndex][iKernelIndex];  
                         }//end n  
                     }//end m  
   
                 }//end k  
                 //加上偏置  
                 //dSum += dBias;  
                 m_pdOutputValue[iOutIndex] = dSum;        
                   
             }//end j  
         }//end i  
     }//end iMapIndex  
 }  
   
 void FeatureMap::PrintOutputData()  
 {  
      for (int i  = 0; i < m_iFeatureMapNumber; ++i)  
      {  
          cout << "featuremap " << i <<endl;  
          for (int m = 0; m < m_iFeatureMapWidth; ++m)  
          {  
              for (int n = 0; n < m_iFeatureMapWidth; ++n)  
              {  
                  cout << m_pdOutputValue[i * m_iFeatureMapSize +m * m_iFeatureMapWidth +n] << ' ';  
              }  
              cout << endl;  
                
          }  
          cout <<endl;  
            
      }  
        
 }  

poollayer.h

[cpp]view plaincopy 
    
 #ifndef POOLLAYER_H  
 #define POOLLAYER_H  
 #include <vector>  
 using std::vector;  
 class PoolLayer  
 {  
 public:  
     PoolLayer(int iOutImageNumber, int iPoolWidth, int iFeatureMapWidth);  
     ~PoolLayer();  
     void Convolute(double *pdInputData);  
     void SetBias(const vector<double> &vecBias);  
     double* GetOutputData();  
     void PrintOutputData();  
       
 private:  
     int m_iOutImageNumber;  
     int m_iPoolWidth;  
     int m_iFeatureMapWidth;  
     int m_iPoolSize;  
     int m_iOutImageEdge;  
     int m_iOutImageSize;  
       
     double *m_pdOutData;  
     double *m_pdBias;  
 };  
   
 #endif // POOLLAYER_H  

poollayer.cpp

[cpp]view plaincopy 
    
 #include "poollayer.h"  
 #include "util.h"  
 #include <cassert>  
   
 PoolLayer::PoolLayer(int iOutImageNumber, int iPoolWidth, int iFeatureMapWidth):  
     m_iOutImageNumber(iOutImageNumber),  
     m_iPoolWidth(iPoolWidth),  
     m_iFeatureMapWidth(iFeatureMapWidth)  
 {  
     m_iPoolSize = m_iPoolWidth * m_iPoolWidth;  
     m_iOutImageEdge = m_iFeatureMapWidth / m_iPoolWidth;  
     m_iOutImageSize = m_iOutImageEdge * m_iOutImageEdge;  
   
     m_pdOutData = new double[m_iOutImageNumber * m_iOutImageSize];  
     m_pdBias = new double[m_iOutImageNumber];  
     /*for (int i = 0; i < m_iOutImageNumber; ++i) 
     { 
         m_pdBias[i] = 1; 
     }*/  
       
   
 }  
   
 PoolLayer::~PoolLayer()  
 {  
     delete []m_pdOutData;  
     delete []m_pdBias;  
 }  
   
 void PoolLayer::Convolute(double *pdInputData)  
 {  
     int iFeatureMapSize = m_iFeatureMapWidth * m_iFeatureMapWidth;  
     for (int iOutImageIndex = 0; iOutImageIndex < m_iOutImageNumber; ++iOutImageIndex)  
     {  
         double dBias = m_pdBias[iOutImageIndex];  
         for (int i = 0; i < m_iOutImageEdge; ++i)  
         {  
             for (int j = 0; j < m_iOutImageEdge; ++j)  
             {  
                 double dValue = 0.0;  
                 int iInputIndex, iInputIndexStart, iOutIndex;  
                 /************************************************************************/  
                 /* 这里是最大的bug，dMaxPixel初始值设置为0，然后找最大值 
                 ** 问题在于像素值有负数，导致后面一系列计算错误，实在是太难找了 
                 /************************************************************************/  
                 double dMaxPixel = INT_MIN ;  
                 iOutIndex = iOutImageIndex * m_iOutImageSize + i * m_iOutImageEdge + j;  
                 iInputIndexStart = iOutImageIndex * iFeatureMapSize + (i * m_iFeatureMapWidth + j) * m_iPoolWidth;  
                 for (int m = 0; m < m_iPoolWidth; ++m)  
                 {  
                     for (int n = 0; n < m_iPoolWidth; ++n)  
                     {  
                         //                  int iPoolIndex = m * m_iPoolWidth + n;  
                         //i am not sure, the expression of below is correct?  
                         iInputIndex = iInputIndexStart + m * m_iFeatureMapWidth + n;  
                         if (pdInputData[iInputIndex] > dMaxPixel)  
                         {  
                             dMaxPixel = pdInputData[iInputIndex];  
                         }  
                     }//end n  
                 }//end m  
                 dValue = dMaxPixel + dBias;  
   
                 assert(iOutIndex < m_iOutImageNumber * m_iOutImageSize);  
                 //m_pdOutData[iOutIndex] = (dMaxPixel);  
                 m_pdOutData[iOutIndex] = mytanh(dValue);  
             }//end j  
         }//end i  
     }//end iOutImageIndex  
 }  
   
 void PoolLayer::SetBias(const vector<double> &vecBias)  
 {  
     assert(vecBias.size() == (DWORD)m_iOutImageNumber);  
     for (int i = 0; i < m_iOutImageNumber; ++i)  
     {  
         m_pdBias[i] = vecBias[i];  
     }  
 }  
   
 double* PoolLayer::GetOutputData()  
 {  
     assert(NULL != m_pdOutData);  
     return m_pdOutData;  
 }  
   
 void PoolLayer::PrintOutputData()  
 {  
     for (int i  = 0; i < m_iOutImageNumber; ++i)  
     {  
         cout << "pool image " << i  <<endl;  
         for (int m = 0; m < m_iOutImageEdge; ++m)  
         {  
             for (int n = 0; n < m_iOutImageEdge; ++n)  
             {  
                 cout << m_pdOutData[i * m_iOutImageSize + m * m_iOutImageEdge + n] << ' ';  
             }  
             cout << endl;  
   
         }  
         cout <<endl;  
   
     }  
 }  

mlp.h

[cpp]view plaincopy 
    
 #ifndef MLP_H  
 #define MLP_H  
   
 #include "hiddenLayer.h"  
 #include "logisticRegression.h"  
   
   
   
 class Mlp  
 {  
 public:  
     Mlp(int n, int n_i, int n_o, int nhl, int *hls);  
     ~Mlp();  
   
     //    void Train(double** in_data, double** in_label, double dLr, int epochs);  
     void Predict(double** in_data, int n);  
     void Train(double *x, WORD y, double dLr);  
     void TrainAllSample(const vector<double*> &vecTrain, const vector<WORD> &vectrainlabel, double dLr);  
     double CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel);  
     void Writewb(const char *szName);  
     void Readwb(const char *szName);  
   
     void Setwb(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb);  
     void SetTrainNum(int iNum);  
   
     int Predict(double *pInputData);  
     //    void Forward_propagation(double** ppdata, int n);  
     double* Forward_propagation(double *);  
   
     int* GetHiddenSize();  
     int GetHiddenNumber();  
     double *GetHiddenOutputData();  
   
     HiddenLayer **m_ppHiddenLayer;  
     LogisticRegression *m_pLogisticLayer;  
   
 private:  
     int m_iSampleNum; //样本数量  
     int m_iInput; //输入维数  
     int m_iOut; //输出维数  
     int m_iHiddenLayerNum; //隐层数目  
     int* m_piHiddenLayerSize; //中间隐层的大小 e.g. {3,4}表示有两个隐层，第一个有三个节点，第二个有4个节点  
   
       
 };  
   
 void mlp();  
 void TestMlpTheano(const int m_iInput, const int ihidden, const int m_iOut);  
 void TestMlpMnist(const int m_iInput, const int ihidden, const int m_iOut);  
   
 #endif  

mlp.cpp

[cpp]view plaincopy 
    
 #include <iostream>  
 #include "mlp.h"  
 #include "util.h"  
 #include <cassert>  
 #include <iomanip>  
   
 using namespace std;  
   
 const int m_iSamplenum = 8, innode = 3, outnode = 8;  
 Mlp::Mlp(int n, int n_i, int n_o, int nhl, int *hls)  
 {  
     m_iSampleNum = n;  
     m_iInput = n_i;  
     m_iOut = n_o;  
   
     m_iHiddenLayerNum = nhl;  
     m_piHiddenLayerSize = hls;  
   
     //构造网络结构  
     m_ppHiddenLayer = new HiddenLayer* [m_iHiddenLayerNum];  
     for(int i = 0; i < m_iHiddenLayerNum; ++i)  
     {  
         if(i == 0)  
         {  
             m_ppHiddenLayer[i] = new HiddenLayer(m_iInput, m_piHiddenLayerSize[i]);//第一个隐层  
         }  
         else  
         {  
             m_ppHiddenLayer[i] = new HiddenLayer(m_piHiddenLayerSize[i-1], m_piHiddenLayerSize[i]);//其他隐层  
         }  
     }  
     if (m_iHiddenLayerNum > 0)  
     {  
         m_pLogisticLayer = new LogisticRegression(m_piHiddenLayerSize[m_iHiddenLayerNum - 1], m_iOut, m_iSampleNum);//最后的softmax层  
     }  
     else  
     {  
         m_pLogisticLayer = new LogisticRegression(m_iInput, m_iOut, m_iSampleNum);//最后的softmax层  
   
     }  
       
 }  
   
 Mlp::~Mlp()  
 {  
     //二维指针分配的对象不一定是二维数组  
     for(int i = 0; i < m_iHiddenLayerNum; ++i)  
         delete m_ppHiddenLayer[i];  //删除的时候不能加[]  
     delete[] m_ppHiddenLayer;  
     //log_layer只是一个普通的对象指针，不能作为数组delete  
     delete m_pLogisticLayer;//删除的时候不能加[]  
 }  
 void Mlp::TrainAllSample(const vector<double *> &vecTrain, const vector<WORD> &vectrainlabel, double dLr)  
 {  
     cout << "Mlp::TrainAllSample" << endl;  
     for (int j = 0; j < m_iSampleNum; ++j)  
     {  
         Train(vecTrain[j], vectrainlabel[j], dLr);  
     }  
 }  
   
 void Mlp::Train(double *pdTrain, WORD usLabel, double dLr)  
 {  
   
   
     //    cout << "******pdLabel****" << endl;  
     //  printArrDouble(ppdinLabel, m_iSampleNum, m_iOut);  
   
     double *pdLabel = new double[m_iOut];  
     MakeOneLabel(usLabel, pdLabel, m_iOut);  
   
     //前向传播阶段  
     for(int n = 0; n < m_iHiddenLayerNum; ++ n)  
     {  
         if(n == 0) //第一个隐层直接输入数据  
         {  
             m_ppHiddenLayer[n]->Forward_propagation(pdTrain);  
         }  
         else //其他隐层用前一层的输出作为输入数据  
         {  
             m_ppHiddenLayer[n]->Forward_propagation(m_ppHiddenLayer[n-1]->m_pdOutdata);  
         }  
     }  
     //softmax层使用最后一个隐层的输出作为输入数据  
     m_pLogisticLayer->Forward_propagation(m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata);  
   
   
     //反向传播阶段  
     m_pLogisticLayer->Back_propagation(m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata, pdLabel, dLr);  
   
     for(int n = m_iHiddenLayerNum-1; n >= 1; --n)  
     {  
         if(n == m_iHiddenLayerNum-1)  
         {  
             m_ppHiddenLayer[n]->Back_propagation(m_ppHiddenLayer[n-1]->m_pdOutdata,  
                     m_pLogisticLayer->m_pdDelta, m_pLogisticLayer->m_ppdW, m_pLogisticLayer->m_iOut, dLr);  
         }  
         else  
         {  
             double *pdInputData;  
             pdInputData = m_ppHiddenLayer[n-1]->m_pdOutdata;  
   
             m_ppHiddenLayer[n]->Back_propagation(pdInputData,  
                                                 m_ppHiddenLayer[n+1]->m_pdDelta, m_ppHiddenLayer[n+1]->m_ppdW, m_ppHiddenLayer[n+1]->m_iOut, dLr);  
         }  
     }  
     //这里该怎么写？  
     if (m_iHiddenLayerNum > 1)  
         m_ppHiddenLayer[0]->Back_propagation(pdTrain,  
                                             m_ppHiddenLayer[1]->m_pdDelta, m_ppHiddenLayer[1]->m_ppdW, m_ppHiddenLayer[1]->m_iOut, dLr);  
     else  
         m_ppHiddenLayer[0]->Back_propagation(pdTrain,  
                                             m_pLogisticLayer->m_pdDelta, m_pLogisticLayer->m_ppdW, m_pLogisticLayer->m_iOut, dLr);  
   
   
     delete []pdLabel;  
 }  
 void Mlp::SetTrainNum(int iNum)  
 {  
     m_iSampleNum = iNum;  
 }  
   
 double* Mlp::Forward_propagation(double* pData)  
 {  
     double *pdForwardValue = pData;  
     for(int n = 0; n < m_iHiddenLayerNum; ++ n)  
     {  
         if(n == 0) //第一个隐层直接输入数据  
         {  
             pdForwardValue = m_ppHiddenLayer[n]->Forward_propagation(pData);  
         }  
         else //其他隐层用前一层的输出作为输入数据  
         {  
             pdForwardValue = m_ppHiddenLayer[n]->Forward_propagation(pdForwardValue);  
         }  
     }  
     return pdForwardValue;  
     //softmax层使用最后一个隐层的输出作为输入数据  
     //    m_pLogisticLayer->Forward_propagation(m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata);  
   
     //    m_pLogisticLayer->Predict(m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata);  
   
      
 }  
   
 int Mlp::Predict(double *pInputData)  
 {  
     Forward_propagation(pInputData);  
   
     int iResult = m_pLogisticLayer->Predict(m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata);  
   
     return iResult;  
   
 }  
   
 void Mlp::Setwb(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb)  
 {  
     for (int i = 0; i < m_iHiddenLayerNum; ++i)  
     {  
         m_ppHiddenLayer[i]->Setwb(vvAllw[i], vvAllb[i]);  
     }  
     m_pLogisticLayer->Setwb(vvAllw[m_iHiddenLayerNum], vvAllb[m_iHiddenLayerNum]);  
 }  
   
 void Mlp::Writewb(const char *szName)  
 {  
     for(int i = 0; i < m_iHiddenLayerNum; ++i)  
     {  
         m_ppHiddenLayer[i]->Writewb(szName);  
     }  
     m_pLogisticLayer->Writewb(szName);  
   
 }  
 double Mlp::CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel)  
 {  
     int iErrorNumber = 0, iValidNumber = vecValidlabel.size();  
     for (int i = 0; i < iValidNumber; ++i)  
     {  
         int iResult = Predict(vecvalid[i]);  
         if (iResult != vecValidlabel[i])  
         {  
             ++iErrorNumber;  
         }  
     }  
   
     cout << "the num of error is " << iErrorNumber << endl;  
     double dErrorRate = (double)iErrorNumber / iValidNumber;  
     cout << "the error rate of Train sample by softmax is " << setprecision(10) << dErrorRate * 100 << "%" << endl;  
   
     return dErrorRate;  
 }  
   
 void Mlp::Readwb(const char *szName)  
 {  
     long dcurpos = 0, dreadsize = 0;  
     for(int i = 0; i < m_iHiddenLayerNum; ++i)  
     {  
         dreadsize = m_ppHiddenLayer[i]->Readwb(szName, dcurpos);  
         cout << "hiddenlayer " << i + 1 << " read bytes: " << dreadsize << endl;  
         if (-1 != dreadsize)  
             dcurpos += dreadsize;  
         else  
         {  
             cout << "read wb error from HiddenLayer" << endl;  
             return;  
         }  
     }  
     dreadsize = m_pLogisticLayer->Readwb(szName, dcurpos);  
     if (-1 != dreadsize)  
         dcurpos += dreadsize;  
     else  
     {  
         cout << "read wb error from sofmaxLayer" << endl;  
         return;  
     }  
 }  
   
 int* Mlp::GetHiddenSize()  
 {  
     return m_piHiddenLayerSize;  
 }  
   
 double* Mlp::GetHiddenOutputData()  
 {  
     assert(m_iHiddenLayerNum > 0);  
     return m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata;  
 }  
   
 int Mlp::GetHiddenNumber()  
 {  
     return m_iHiddenLayerNum;  
 }  
   
   
   
 //double **makeLabelSample(double **label_x)  
 double **makeLabelSample(double label_x[][outnode])  
 {  
     double **pplabelSample;  
     pplabelSample = new double*[m_iSamplenum];  
     for (int i = 0; i < m_iSamplenum; ++i)  
     {  
         pplabelSample[i] = new double[outnode];  
     }  
   
     for (int i = 0; i < m_iSamplenum; ++i)  
     {  
         for (int j = 0; j < outnode; ++j)  
             pplabelSample[i][j] = label_x[i][j];  
     }  
     return pplabelSample;  
 }  
 double **maken_train(double train_x[][innode])  
 {  
     double **ppn_train;  
     ppn_train = new double*[m_iSamplenum];  
     for (int i = 0; i < m_iSamplenum; ++i)  
     {  
         ppn_train[i] = new double[innode];  
     }  
   
     for (int i = 0; i < m_iSamplenum; ++i)  
     {  
         for (int j = 0; j < innode; ++j)  
             ppn_train[i][j] = train_x[i][j];  
     }  
     return ppn_train;  
 }  
   
   
 void TestMlpMnist(const int m_iInput, const int ihidden, const int m_iOut)  
 {  
     const int ihiddenSize = 1;  
     int phidden[ihiddenSize] = {ihidden};  
     // construct LogisticRegression  
     Mlp neural(m_iSamplenum, m_iInput, m_iOut, ihiddenSize, phidden);  
   
   
     vector<double*> vecTrain, vecvalid;  
   
     vector<WORD> vecValidlabel, vectrainlabel;  
   
     LoadTestSampleFromJson(vecvalid, vecValidlabel, "../../data/mnist.json", m_iInput);  
     LoadTestSampleFromJson(vecTrain, vectrainlabel, "../../data/mnisttrain.json", m_iInput);  
   
     // test  
   
     int itrainnum = vecTrain.size();  
     neural.SetTrainNum(itrainnum);  
   
     const int iepochs = 1;  
     const double dLr = 0.1;  
     neural.CalErrorRate(vecvalid, vecValidlabel);  
   
     for (int i = 0; i < iepochs; ++i)  
     {  
         neural.TrainAllSample(vecTrain, vectrainlabel, dLr);  
         neural.CalErrorRate(vecvalid, vecValidlabel);  
   
     }  
   
     for (vector<double*>::iterator cit = vecTrain.begin(); cit != vecTrain.end(); ++cit)  
     {  
         delete [](*cit);  
     }  
     for (vector<double*>::iterator cit = vecvalid.begin(); cit != vecvalid.end(); ++cit)  
     {  
         delete [](*cit);  
     }  
 }  
   
   
 void TestMlpTheano(const int m_iInput, const int ihidden, const int m_iOut)  
 {  
   
     const int ihiddenSize = 1;  
     int phidden[ihiddenSize] = {ihidden};  
     // construct LogisticRegression  
     Mlp neural(m_iSamplenum, m_iInput, m_iOut, ihiddenSize, phidden);  
     vector<double*> vecTrain, vecw;  
     vector<double> vecb;  
     vector<WORD> vecLabel;  
     vector< vector<double*> > vvAllw;  
     vector< vector<double> > vvAllb;  
     const char *pcfilename = "../../data/theanomlp.json";  
       
     vector<int> vecSecondDimOfWeigh;  
     vecSecondDimOfWeigh.push_back(m_iInput);  
     vecSecondDimOfWeigh.push_back(ihidden);  
   
     LoadWeighFromJson(vvAllw, vvAllb, pcfilename, vecSecondDimOfWeigh);  
   
     LoadTestSampleFromJson(vecTrain, vecLabel, "../../data/mnist_validall.json", m_iInput);  
   
   
     cout << "loadwb ---------" << endl;  
   
     int itrainnum = vecTrain.size();  
     neural.SetTrainNum(itrainnum);  
   
   
     neural.Setwb(vvAllw, vvAllb);  
   
     cout << "Predict------------" << endl;  
     neural.CalErrorRate(vecTrain, vecLabel);  
       
   
     for (vector<double*>::iterator cit = vecTrain.begin(); cit != vecTrain.end(); ++cit)  
     {  
         delete [](*cit);  
     }  
   
 }  
   
 void mlp()  
 {  
     //输入样本  
     double X[m_iSamplenum][innode]= {  
         {0,0,0},{0,0,1},{0,1,0},{0,1,1},{1,0,0},{1,0,1},{1,1,0},{1,1,1}  
     };  
   
     double Y[m_iSamplenum][outnode]={  
         {1, 0, 0, 0, 0, 0, 0, 0},  
         {0, 1, 0, 0, 0, 0, 0, 0},  
         {0, 0, 1, 0, 0, 0, 0, 0},  
         {0, 0, 0, 1, 0, 0, 0, 0},  
         {0, 0, 0, 0, 1, 0, 0, 0},  
         {0, 0, 0, 0, 0, 1, 0, 0},  
         {0, 0, 0, 0, 0, 0, 1, 0},  
         {0, 0, 0, 0, 0, 0, 0, 1},  
     };  
     WORD pdLabel[outnode] = {0, 1, 2, 3, 4, 5, 6, 7};  
     const int ihiddenSize = 2;  
     int phidden[ihiddenSize] = {5, 5};  
     //printArr(phidden, 1);  
     Mlp neural(m_iSamplenum, innode, outnode, ihiddenSize, phidden);  
     double **train_x, **ppdLabel;  
     train_x = maken_train(X);  
     //printArrDouble(train_x, m_iSamplenum, innode);  
     ppdLabel = makeLabelSample(Y);  
     for (int i = 0; i < 3500; ++i)  
     {  
         for (int j = 0; j < m_iSamplenum; ++j)  
         {  
             neural.Train(train_x[j], pdLabel[j], 0.1);  
         }  
     }  
   
     cout<<"trainning complete..."<<endl;  
     for (int i = 0; i < m_iSamplenum; ++i)  
         neural.Predict(train_x[i]);  
     //szName存放权值  
     //  const char *szName = "mlp55new.wb";  
     //  neural.Writewb(szName);  
   
   
     //  Mlp neural2(m_iSamplenum, innode, outnode, ihiddenSize, phidden);  
     //  cout<<"Readwb start..."<<endl;  
     //  neural2.Readwb(szName);  
     //  cout<<"Readwb end..."<<endl;  
   
     //    cout << "----------after readwb________" << endl;  
     //    for (int i = 0; i < m_iSamplenum; ++i)  
     //        neural2.Forward_propagation(train_x[i]);  
   
     for (int i = 0; i != m_iSamplenum; ++i)  
     {  
         delete []train_x[i];  
         delete []ppdLabel[i];  
     }  
     delete []train_x;  
     delete []ppdLabel;  
     cout<<endl;  
 }  

hiddenLayer.h

[cpp]view plaincopy 
    
 #ifndef HIDDENLAYER_H  
 #define HIDDENLAYER_H  
   
 #include "neuralbase.h"  
   
 class HiddenLayer: public NeuralBase  
 {  
 public:  
     HiddenLayer(int n_i, int n_o);  
     ~HiddenLayer();  
   
     double* Forward_propagation(double* input_data);  
       
     void Back_propagation(double *pdInputData, double *pdNextLayerDelta,  
                           double** ppdnextLayerW, int iNextLayerOutNum, double dLr);  
       
   
   
 };  
   
 #endif  

hiddenLayer.cpp

[cpp]view plaincopy 
    
 #include <cmath>  
 #include <cassert>  
 #include <cstdlib>  
 #include <ctime>  
 #include <iostream>  
 #include "hiddenLayer.h"  
 #include "util.h"  
   
 using namespace std;  
   
   
   
 HiddenLayer::HiddenLayer(int n_i, int n_o): NeuralBase(n_i, n_o, 0)  
 {  
 }  
   
 HiddenLayer::~HiddenLayer()  
 {  
   
 }  
   
 /************************************************************************/  
 /* 需要注意的是： 
 如果为了复现theano的测试结果，那么隐藏层的激活函数要选用tanh； 
 否则，为了mlp的训练过程，激活函数要选择sigmoid                                                                     */  
 /************************************************************************/  
 double* HiddenLayer::Forward_propagation(double* pdInputData)  
 {  
     NeuralBase::Forward_propagation(pdInputData);  
     for(int i = 0; i < m_iOut; ++i)  
     {  
        // m_pdOutdata[i] = sigmoid(m_pdOutdata[i]);  
         m_pdOutdata[i] = mytanh(m_pdOutdata[i]);  
   
     }  
     return m_pdOutdata;  
 }  
   
 void HiddenLayer::Back_propagation(double *pdInputData, double *pdNextLayerDelta,  
                                    double** ppdnextLayerW, int iNextLayerOutNum, double dLr)  
 {  
     /* 
     pdInputData          为输入数据 
     *pdNextLayerDelta   为下一层的残差值delta,是一个大小为iNextLayerOutNum的数组 
     **ppdnextLayerW      为此层到下一层的权值 
     iNextLayerOutNum    实际上就是下一层的n_out 
     dLr                  为学习率learning rate 
     m_iSampleNum                   为训练样本总数 
     */  
   
     //sigma元素个数应与本层单元个数一致，而网上代码有误  
     //作者是没有自己测试啊，测试啊  
     //double* sigma = new double[iNextLayerOutNum];  
     double* sigma = new double[m_iOut];  
     //double sigma[10];  
     for(int i = 0; i < m_iOut; ++i)  
         sigma[i] = 0.0;  
   
     for(int i = 0; i < iNextLayerOutNum; ++i)  
     {  
         for(int j = 0; j < m_iOut; ++j)  
         {  
             sigma[j] += ppdnextLayerW[i][j] * pdNextLayerDelta[i];  
         }  
     }  
     //计算得到本层的残差delta  
     for(int i = 0; i < m_iOut; ++i)  
     {  
         m_pdDelta[i] = sigma[i] * m_pdOutdata[i] * (1 - m_pdOutdata[i]);  
     }  
   
     //调整本层的权值w  
     for(int i = 0; i < m_iOut; ++i)  
     {  
         for(int j = 0; j < m_iInput; ++j)  
         {  
             m_ppdW[i][j] += dLr * m_pdDelta[i] * pdInputData[j];  
         }  
         m_pdBias[i] += dLr * m_pdDelta[i];  
     }  
     delete[] sigma;  
 }  

logisticRegression.h

[cpp]view plaincopy 
    
 #ifndef LOGISTICREGRESSIONLAYER  
 #define LOGISTICREGRESSIONLAYER  
 #include "neuralbase.h"  
   
 typedef unsigned short WORD;  
   
 class LogisticRegression: public NeuralBase  
 {  
 public:  
     LogisticRegression(int n_i, int i_o, int);  
     ~LogisticRegression();  
   
     double* Forward_propagation(double* input_data);  
     void Softmax(double* x);  
     void Train(double *pdTrain, WORD usLabel, double dLr);  
   
   
     void SetOldWb(double ppdWeigh[][3], double arriBias[8]);  
     int Predict(double *);  
   
     void MakeLabels(int* piMax, double (*pplabels)[8]);  
 };  
   
 void Test_lr();  
 void Testwb();  
 void Test_theano(const int m_iInput, const int m_iOut);  
 #endif  

logisticRegression.cpp

[cpp]view plaincopy 
    
 #include <cmath>  
 #include <cassert>  
 #include <iomanip>  
 #include <ctime>  
 #include <iostream>  
 #include "logisticRegression.h"  
 #include "util.h"  
   
 using namespace std;  
   
   
 LogisticRegression::LogisticRegression(int n_i, int n_o, int n_t): NeuralBase(n_i, n_o, n_t)  
 {  
   
 }  
   
 LogisticRegression::~LogisticRegression()  
 {  
   
 }  
   
   
   
 void LogisticRegression::Softmax(double* x)  
 {  
     double _max = 0.0;  
     double _sum = 0.0;  
   
     for(int i = 0; i < m_iOut; ++i)  
     {  
         if(_max < x[i])  
             _max = x[i];  
     }  
     for(int i = 0; i < m_iOut; ++i)  
     {  
         x[i] = exp(x[i]-_max);  
         _sum += x[i];  
     }  
   
     for(int i = 0; i < m_iOut; ++i)  
     {  
         x[i] /= _sum;  
     }  
 }  
   
 double* LogisticRegression::Forward_propagation(double* pdinputdata)  
 {  
     NeuralBase::Forward_propagation(pdinputdata);  
     /************************************************************************/  
     /* 调试                                                                     */  
     /************************************************************************/  
     //cout << "Forward_propagation from   LogisticRegression" << endl;  
     //PrintOutputData();  
     //cout << "over\n";  
     Softmax(m_pdOutdata);  
     return m_pdOutdata;  
 }  
   
   
 int LogisticRegression::Predict(double *pdtest)  
 {  
     Forward_propagation(pdtest);  
     /************************************************************************/  
     /* 调试使用                                                                     */  
     /************************************************************************/  
     //PrintOutputData();  
     int iResult = getMaxIndex(m_pdOutdata, m_iOut);  
   
     return iResult;  
   
 }  
 void LogisticRegression::Train(double *pdTrain, WORD usLabel, double dLr)  
 {  
     Forward_propagation(pdTrain);  
     double *pdLabel = new double[m_iOut];  
     MakeOneLabel(usLabel, pdLabel);  
     Back_propagation(pdTrain, pdLabel, dLr);  
   
     delete []pdLabel;  
 }  
 //double LogisticRegression::CalErrorRate(const vector<double*> &vecvalid, const vector<WORD> &vecValidlabel)  
 //{  
 //    int iErrorNumber = 0, iValidNumber = vecValidlabel.size();  
 //    for (int i = 0; i < iValidNumber; ++i)  
 //    {  
 //        int iResult = Predict(vecvalid[i]);  
 //        if (iResult != vecValidlabel[i])  
 //        {  
 //            ++iErrorNumber;  
 //        }  
 //    }  
   
 //    cout << "the num of error is " << iErrorNumber << endl;  
 //    double dErrorRate = (double)iErrorNumber / iValidNumber;  
 //    cout << "the error rate of Train sample by softmax is " << setprecision(10) << dErrorRate * 100 << "%" << endl;  
   
 //    return dErrorRate;  
 //}  
   
   
   
 void LogisticRegression::SetOldWb(double ppdWeigh[][3], double arriBias[8])  
 {  
     for (int i = 0; i < m_iOut; ++i)  
     {  
         for (int j = 0; j < m_iInput; ++j)  
             m_ppdW[i][j] = ppdWeigh[i][j];  
         m_pdBias[i] = arriBias[i];  
     }  
     cout << "Setwb----------" << endl;  
     printArrDouble(m_ppdW, m_iOut, m_iInput);  
     printArr(m_pdBias, m_iOut);  
 }  
   
 //void LogisticRegression::TrainAllSample(const vector<double*> &vecTrain, const vector<WORD> &vectrainlabel, double dLr)  
 //{  
 //    for (int j = 0; j < m_iSamplenum; ++j)  
 //    {  
 //        Train(vecTrain[j], vectrainlabel[j], dLr);  
 //    }  
 //}  
   
 void LogisticRegression::MakeLabels(int* piMax, double (*pplabels)[8])  
 {  
     for (int i = 0; i < m_iSamplenum; ++i)  
     {  
         for (int j = 0; j < m_iOut; ++j)  
             pplabels[i][j] = 0;  
         int k = piMax[i];  
         pplabels[i][k] = 1.0;  
     }  
 }  
 void Test_theano(const int m_iInput, const int m_iOut)  
 {  
   
     // construct LogisticRegression  
     LogisticRegression classifier(m_iInput, m_iOut, 0);  
   
     vector<double*> vecTrain, vecvalid, vecw;  
     vector<double> vecb;  
     vector<WORD> vecValidlabel, vectrainlabel;  
   
     LoadTestSampleFromJson(vecvalid, vecValidlabel, "../.../../data/mnist.json", m_iInput);  
     LoadTestSampleFromJson(vecTrain, vectrainlabel, "../.../../data/mnisttrain.json", m_iInput);  
   
      // test  
   
     int itrainnum = vecTrain.size();  
     classifier.m_iSamplenum = itrainnum;  
   
     const int iepochs = 5;  
     const double dLr = 0.1;  
     for (int i = 0; i < iepochs; ++i)  
     {  
         classifier.TrainAllSample(vecTrain, vectrainlabel, dLr);  
   
         if (i % 2 == 0)  
         {  
             cout << "Predict------------" << i + 1 << endl;  
             classifier.CalErrorRate(vecvalid, vecValidlabel);  
   
         }  
     }  
   
   
   
     for (vector<double*>::iterator cit = vecTrain.begin(); cit != vecTrain.end(); ++cit)  
     {  
         delete [](*cit);  
     }  
     for (vector<double*>::iterator cit = vecvalid.begin(); cit != vecvalid.end(); ++cit)  
     {  
         delete [](*cit);  
     }  
 }  
   
 void Test_lr()  
 {  
     srand(0);  
   
     double learning_rate = 0.1;  
     double n_epochs = 200;  
   
     int test_N = 2;  
     const int trainNum = 8, m_iInput = 3, m_iOut = 8;  
     //int m_iOut = 2;  
     double train_X[trainNum][m_iInput] = {  
         {1, 1, 1},  
         {1, 1, 0},  
         {1, 0, 1},  
         {1, 0, 0},  
         {0, 1, 1},  
         {0, 1, 0},  
         {0, 0, 1},  
         {0, 0, 0}  
     };  
     //sziMax存储的是最大值的下标  
     int sziMax[trainNum];  
     for (int i = 0; i < trainNum; ++i)  
         sziMax[i] = trainNum - i - 1;  
       
     // construct LogisticRegression  
     LogisticRegression classifier(m_iInput, m_iOut, trainNum);  
   
     // Train online  
     for(int epoch=0; epoch<n_epochs; epoch++) {  
         for(int i=0; i<trainNum; i++) {  
             //classifier.trainEfficient(train_X[i], train_Y[i], learning_rate);  
             classifier.Train(train_X[i], sziMax[i], learning_rate);  
         }  
     }  
   
     const char *pcfile = "test.wb";  
     classifier.Writewb(pcfile);  
   
     LogisticRegression logistic(m_iInput, m_iOut, trainNum);  
     logistic.Readwb(pcfile, 0);  
     // test data  
     double test_X[2][m_iOut] = {  
         {1, 0, 1},  
         {0, 0, 1}  
     };  
      // test  
     cout << "before Readwb ---------" << endl;  
     for(int i=0; i<test_N; i++) {  
         classifier.Predict(test_X[i]);  
         cout << endl;  
     }  
     cout << "after Readwb ---------" << endl;  
     for(int i=0; i<trainNum; i++) {  
         logistic.Predict(train_X[i]);  
         cout << endl;  
     }  
     cout << "*********\n";  
      
 }  
 void Testwb()  
 {  
   
   
 //    int test_N = 2;  
     const int trainNum = 8, m_iInput = 3, m_iOut = 8;  
     //int m_iOut = 2;  
     double train_X[trainNum][m_iInput] = {  
         {1, 1, 1},  
         {1, 1, 0},  
         {1, 0, 1},  
         {1, 0, 0},  
         {0, 1, 1},  
         {0, 1, 0},  
         {0, 0, 1},  
         {0, 0, 0}  
     };  
     double arriBias[m_iOut] = {1, 2, 3, 3, 3, 3, 2, 1};  
       
       
     // construct LogisticRegression  
     LogisticRegression classifier(m_iInput, m_iOut, trainNum);  
   
     classifier.SetOldWb(train_X, arriBias);  
   
     const char *pcfile = "test.wb";  
     classifier.Writewb(pcfile);  
   
     LogisticRegression logistic(m_iInput, m_iOut, trainNum);  
     logistic.Readwb(pcfile, 0);  
   
 }  

neuralbase.h

[cpp]view plaincopy 
    
 #ifndef NEURALBASE_H  
 #define NEURALBASE_H  
   
 #include <vector>  
 using std::vector;  
 typedef unsigned short WORD;  
   
 class NeuralBase  
 {  
 public:  
     NeuralBase(int , int , int);  
     virtual ~NeuralBase();  
   
     virtual double* Forward_propagation(double* );  
     virtual void Back_propagation(double* , double* , double );  
   
     virtual void Train(double *x, WORD y, double dLr);  
     virtual int Predict(double *);  
   
   
     void Callbackwb();  
     void MakeOneLabel(int iMax, double *pdLabel);  
   
     void TrainAllSample(const vector<double*> &vecTrain, const vector<WORD> &vectrainlabel, double dLr);  
   
     double CalErrorRate(const vector<double*> &vecvalid, const vector<WORD> &vecValidlabel);  
   
     void Printwb();  
     void Writewb(const char *szName);  
     long Readwb(const char *szName, long);  
     void Setwb(vector<double*> &vpdw, vector<double> &vdb);  
     void PrintOutputData();  
   
   
     int m_iInput;  
     int m_iOut;  
     int m_iSamplenum;  
     double** m_ppdW;  
     double* m_pdBias;  
     //本层前向传播的输出值，也是最终的预测值  
     double* m_pdOutdata;  
     //反向传播时所需值  
     double* m_pdDelta;  
 private:  
     void _callbackwb();  
   
 };  
   
 #endif // NEURALBASE_H  

neuralbase.cpp

[cpp]view plaincopy 
    
 #include "neuralbase.h"  
   
 #include <cmath>  
 #include <cassert>  
 #include <ctime>  
   
 #include <iomanip>  
 #include <iostream>  
   
   
 #include "util.h"  
   
 using namespace std;  
   
   
 NeuralBase::NeuralBase(int n_i, int n_o, int n_t):m_iInput(n_i), m_iOut(n_o), m_iSamplenum(n_t)  
 {  
     m_ppdW = new double* [m_iOut];  
     for(int i = 0; i < m_iOut; ++i)  
     {  
         m_ppdW[i] = new double [m_iInput];  
     }  
     m_pdBias = new double [m_iOut];  
   
     double a = 1.0 / m_iInput;  
   
     srand((unsigned)time(NULL));  
     for(int i = 0; i < m_iOut; ++i)  
     {  
         for(int j = 0; j < m_iInput; ++j)  
             m_ppdW[i][j] = uniform(-a, a);  
         m_pdBias[i] = uniform(-a, a);  
     }  
   
     m_pdDelta = new double [m_iOut];  
     m_pdOutdata = new double [m_iOut];  
 }  
   
 NeuralBase::~NeuralBase()  
 {  
     Callbackwb();  
     delete[] m_pdOutdata;  
     delete[] m_pdDelta;  
 }  
 void NeuralBase::Callbackwb()  
 {  
     _callbackwb();  
 }  
 double NeuralBase::CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel)  
 {  
     int iErrorNumber = 0, iValidNumber = vecValidlabel.size();  
     for (int i = 0; i < iValidNumber; ++i)  
     {  
         int iResult = Predict(vecvalid[i]);  
         if (iResult != vecValidlabel[i])  
         {  
             ++iErrorNumber;  
         }  
     }  
   
     cout << "the num of error is " << iErrorNumber << endl;  
     double dErrorRate = (double)iErrorNumber / iValidNumber;  
     cout << "the error rate of Train sample by softmax is " << setprecision(10) << dErrorRate * 100 << "%" << endl;  
   
     return dErrorRate;  
 }  
 int NeuralBase::Predict(double *)  
 {  
     cout << "NeuralBase::Predict(double *)" << endl;  
     return 0;  
 }  
   
 void NeuralBase::_callbackwb()  
 {  
     for(int i=0; i < m_iOut; i++)  
         delete []m_ppdW[i];  
     delete[] m_ppdW;  
     delete[] m_pdBias;  
 }  
   
 void NeuralBase::Printwb()  
 {  
     cout << "'****m_ppdW****\n";  
     for(int i = 0; i < m_iOut; ++i)  
     {  
         for(int j = 0; j < m_iInput; ++j)  
             cout << m_ppdW[i][j] << ' ';  
         cout << endl;  
     }  
     cout << "'****m_pdBias****\n";  
     for(int i = 0; i < m_iOut; ++i)  
     {  
         cout << m_pdBias[i] << ' ';  
     }  
     cout << endl;  
     cout << "'****output****\n";  
     for(int i = 0; i < m_iOut; ++i)  
     {  
         cout << m_pdOutdata[i] << ' ';  
     }  
     cout << endl;  
   
 }  
   
   
 double* NeuralBase::Forward_propagation(double* input_data)  
 {  
     for(int i = 0; i < m_iOut; ++i)  
     {  
         m_pdOutdata[i] = 0.0;  
         for(int j = 0; j < m_iInput; ++j)  
         {  
             m_pdOutdata[i] += m_ppdW[i][j]*input_data[j];  
         }  
         m_pdOutdata[i] += m_pdBias[i];  
     }  
     return m_pdOutdata;  
 }  
   
 void NeuralBase::Back_propagation(double* input_data, double* pdlabel, double dLr)  
 {  
     for(int i = 0; i < m_iOut; ++i)  
     {  
         m_pdDelta[i] = pdlabel[i] - m_pdOutdata[i] ;  
         for(int j = 0; j < m_iInput; ++j)  
         {  
             m_ppdW[i][j] += dLr * m_pdDelta[i] * input_data[j] / m_iSamplenum;  
         }  
         m_pdBias[i] += dLr * m_pdDelta[i] / m_iSamplenum;  
     }  
 }  
 void NeuralBase::MakeOneLabel(int imax, double *pdlabel)  
 {  
     for (int j = 0; j < m_iOut; ++j)  
         pdlabel[j] = 0;  
     pdlabel[imax] = 1.0;  
 }  
 void NeuralBase::Writewb(const char *szName)  
 {  
     savewb(szName, m_ppdW, m_pdBias, m_iOut, m_iInput);  
 }  
 long NeuralBase::Readwb(const char *szName, long dstartpos)  
 {  
     return loadwb(szName, m_ppdW, m_pdBias, m_iOut, m_iInput, dstartpos);  
 }  
   
 void NeuralBase::Setwb(vector<double*> &vpdw, vector<double> &vdb)  
 {  
     assert(vpdw.size() == (DWORD)m_iOut);  
     for (int i = 0; i < m_iOut; ++i)  
     {  
         delete []m_ppdW[i];  
         m_ppdW[i] = vpdw[i];  
         m_pdBias[i] = vdb[i];  
     }  
 }  
   
 void NeuralBase::TrainAllSample(const vector<double *> &vecTrain, const vector<WORD> &vectrainlabel, double dLr)  
 {  
     for (int j = 0; j < m_iSamplenum; ++j)  
     {  
         Train(vecTrain[j], vectrainlabel[j], dLr);  
     }  
 }  
 void NeuralBase::Train(double *x, WORD y, double dLr)  
 {  
     (void)x;  
     (void)y;  
     (void)dLr;  
     cout << "NeuralBase::Train(double *x, WORD y, double dLr)" << endl;  
 }  
   
 void NeuralBase::PrintOutputData()  
 {  
     for (int i = 0; i < m_iOut; ++i)  
     {  
         cout << m_pdOutdata[i] << ' ';  
     }  
     cout << endl;  
       
   
 }  

util.h

[cpp]view plaincopy 
    
 #ifndef UTIL_H  
 #define UTIL_H  
 #include <iostream>  
 #include <cstdio>  
 #include <cstdlib>  
 #include <ctime>  
 #include <vector>  
   
 using namespace std;  
   
   
 typedef unsigned char BYTE;  
 typedef unsigned short WORD;  
 typedef unsigned int DWORD;  
   
 double sigmoid(double x);  
 double mytanh(double dx);  
   
 typedef struct stShapeWb  
 {  
     stShapeWb(int w, int h):width(w), height(h){}  
     int width;  
     int height;  
 }ShapeWb_S;  
   
 void MakeOneLabel(int iMax, double *pdLabel, int m_iOut);  
   
 double uniform(double _min, double _max);  
 //void printArr(T *parr, int num);  
 //void printArrDouble(double **pparr, int row, int col);  
 void initArr(double *parr, int num);  
 int getMaxIndex(double *pdarr, int num);  
 void Printivec(const vector<int> &ivec);  
 void savewb(const char *szName, double **m_ppdW, double *m_pdBias,  
              int irow, int icol);  
 long loadwb(const char *szName, double **m_ppdW, double *m_pdBias,  
             int irow, int icol, long dstartpos);  
   
 void TestLoadJson(const char *pcfilename);  
 bool LoadvtFromJson(vector<double*> &vecTrain, vector<WORD> &vecLabel, const char *filename, const int m_iInput);  
 bool LoadwbFromJson(vector<double*> &vecTrain, vector<double> &vecLabel, const char *filename, const int m_iInput);  
 bool LoadTestSampleFromJson(vector<double*> &vecTrain, vector<WORD> &vecLabel, const char *filename, const int m_iInput);  
 bool LoadwbByByte(vector<double*> &vecTrain, vector<double> &vecLabel, const char *filename, const int m_iInput);  
 bool LoadallwbByByte(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb, const char *filename,  
                      const int m_iInput, const int ihidden, const int m_iOut);  
 bool LoadWeighFromJson(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb,  
                      const char *filename, const vector<int> &vecSecondDimOfWeigh);  
 void MakeCnnSample(double arr[2][64], double *pdImage, int iImageWidth, int iNumOfImage );  
 void MakeCnnWeigh(double *, int iNumOfKernel);  
 template <typename T>  
 void printArr(T *parr, int num)  
 {  
     cout << "****printArr****" << endl;  
   
     for (int i = 0; i < num; ++i)  
         cout << parr[i] << ' ';  
     cout << endl;  
 }  
 template <typename T>  
 void printArrDouble(T **pparr, int row, int col)  
 {  
     cout << "****printArrDouble****" << endl;  
     for (int i = 0; i < row; ++i)  
     {  
         for (int j = 0; j < col; ++j)  
         {  
             cout << pparr[i][j] << ' ';  
         }  
         cout << endl;  
     }  
 }  
   
   
 #endif  

util.cpp

[cpp]view plaincopy 
    
 #include "util.h"  
 #include <iostream>  
 #include <ctime>  
 #include <cmath>  
 #include <cassert>  
 #include <fstream>  
 #include <cstring>  
 #include <stack>  
 #include <iomanip>  
   
 using namespace std;  
   
 int getMaxIndex(double *pdarr, int num)  
 {  
     double dmax = -1;  
     int iMax = -1;  
     for(int i = 0; i < num; ++i)  
     {  
         if (pdarr[i] > dmax)  
         {  
             dmax = pdarr[i];  
             iMax = i;  
         }  
     }  
     return iMax;  
 }  
   
 double sigmoid(double dx)  
 {  
     return 1.0/(1.0+exp(-dx));  
 }  
 double mytanh(double dx)  
 {  
     double e2x = exp(2 * dx);  
     return (e2x - 1) / (e2x + 1);  
 }  
   
 double uniform(double _min, double _max)  
 {  
     return rand()/(RAND_MAX + 1.0) * (_max - _min) + _min;  
 }  
   
 void initArr(double *parr, int num)  
 {  
     for (int i = 0; i < num; ++i)  
         parr[i] = 0.0;  
 }  
   
   
   
 void savewb(const char *szName, double **m_ppdW, double *m_pdBias,  
              int irow, int icol)  
 {  
     FILE *pf;  
     if( (pf = fopen(szName, "ab" )) == NULL )   
     {   
         printf( "File coulkd not be opened " );   
         return;  
     }   
   
     int isizeofelem = sizeof(double);  
     for (int i = 0; i < irow; ++i)  
     {  
         if (fwrite((const void*)m_ppdW[i], isizeofelem, icol, pf) != icol)  
         {  
             fputs ("Writing m_ppdW error",stderr);  
             return;  
         }  
     }  
     if (fwrite((const void*)m_pdBias, isizeofelem, irow, pf) != irow)  
     {  
         fputs ("Writing m_ppdW error",stderr);  
         return;  
     }  
     fclose(pf);  
 }  
 long loadwb(const char *szName, double **m_ppdW, double *m_pdBias,  
             int irow, int icol, long dstartpos)  
 {  
     FILE *pf;  
     long dtotalbyte = 0, dreadsize;  
     if( (pf = fopen(szName, "rb" )) == NULL )   
     {   
         printf( "File coulkd not be opened " );   
         return -1;  
     }   
     //让文件指针偏移到正确位置  
     fseek(pf, dstartpos , SEEK_SET);  
   
     int isizeofelem = sizeof(double);  
     for (int i = 0; i < irow; ++i)  
     {  
         dreadsize = fread((void*)m_ppdW[i], isizeofelem, icol, pf);  
         if (dreadsize != icol)  
         {  
             fputs ("Reading m_ppdW error",stderr);  
             return -1;  
         }  
         //每次成功读取，都要加到dtotalbyte中，最后返回  
         dtotalbyte += dreadsize;  
     }  
     dreadsize = fread(m_pdBias, isizeofelem, irow, pf);  
     if (dreadsize != irow)  
     {  
         fputs ("Reading m_pdBias error",stderr);  
         return -1;  
     }  
     dtotalbyte += dreadsize;  
     dtotalbyte *= isizeofelem;  
     fclose(pf);  
     return dtotalbyte;    
 }  
   
 void Printivec(const vector<int> &ivec)  
 {  
     for (vector<int>::const_iterator it = ivec.begin(); it != ivec.end(); ++it)  
     {  
         cout << *it << ' ';  
     }  
     cout << endl;  
 }  
 void TestLoadJson(const char *pcfilename)  
 {  
     vector<double *> vpdw;  
     vector<double> vdb;  
     vector< vector<double*> > vvAllw;  
     vector< vector<double> > vvAllb;  
     int m_iInput = 28 * 28, ihidden = 500, m_iOut = 10;  
     LoadallwbByByte(vvAllw, vvAllb, pcfilename, m_iInput, ihidden, m_iOut );  
   
   
 }  
   
 //read vt from mnist, format is [[[], [],..., []],[1, 3, 5,..., 7]]  
 bool LoadvtFromJson(vector<double*> &vecTrain, vector<WORD> &vecLabel, const char *filename, const int m_iInput)  
 {  
     cout << "loadvtFromJson" << endl;  
     const int ciStackSize = 10;  
     const int ciFeaturesize = m_iInput;  
     int arriStack[ciStackSize], iTop = -1;  
   
     ifstream ifs;  
     ifs.open(filename, ios::in);  
     assert(ifs.is_open());  
     BYTE ucRead, ucLeftbrace, ucRightbrace, ucComma, ucSpace;  
     ucLeftbrace = '[';  
     ucRightbrace = ']';  
     ucComma = ',';  
     ucSpace = '0';  
     ifs >> ucRead;  
     assert(ucRead == ucLeftbrace);  
     //栈中全部存放左括号，用1代表,0说明清除  
     arriStack[++iTop] = 1;  
     //样本train开始  
     ifs >> ucRead;  
     assert(ucRead == ucLeftbrace);  
     arriStack[++iTop] = 1;//iTop is 1  
     int iIndex;  
     bool isdigit = false;  
     double dread, *pdvt;  
     //load vt sample  
     while (iTop > 0)  
     {  
         if (isdigit == false)  
         {  
             ifs >> ucRead;  
             isdigit = true;  
             if (ucRead == ucComma)  
             {  
                 //next char is space or leftbrace  
                 //                ifs >> ucRead;  
                 isdigit = false;  
                 continue;  
             }  
             if (ucRead == ucSpace)  
             {  
                 //if pdvt is null, next char is leftbrace;  
                 //else next char is double value  
                 if (pdvt == NULL)  
                     isdigit = false;  
                 continue;  
             }  
             if (ucRead == ucLeftbrace)  
             {  
                 pdvt = new double[ciFeaturesize];  
                 memset(pdvt, 0, ciFeaturesize * sizeof(double));  
                 //iIndex数组下标  
                 iIndex = 0;  
                 arriStack[++iTop] = 1;  
                 continue;  
             }  
   
             if (ucRead == ucRightbrace)  
             {  
                 if (pdvt != NULL)  
                 {  
                     assert(iIndex == ciFeaturesize);  
                     vecTrain.push_back(pdvt);  
                     pdvt = NULL;  
                 }  
                 isdigit = false;  
                 arriStack[iTop--] = 0;  
                 continue;  
             }  
         }  
         else  
         {  
             ifs >> dread;  
             pdvt[iIndex++] = dread;  
             isdigit = false;  
         }  
     };  
     //next char is dot  
     ifs >> ucRead;  
     assert(ucRead == ucComma);  
     cout << vecTrain.size() << endl;  
     //读取label  
     WORD usread;  
     isdigit = false;  
     while (iTop > -1 && ifs.eof() == false)  
     {  
         if (isdigit == false)  
         {  
             ifs >> ucRead;  
             isdigit = true;  
             if (ucRead == ucComma)  
             {  
                 //next char is space or leftbrace  
                 //                ifs >> ucRead;  
 //                isdigit = false;  
                 continue;  
             }  
             if (ucRead == ucSpace)  
             {  
                 //if pdvt is null, next char is leftbrace;  
                 //else next char is double value  
                 if (pdvt == NULL)  
                     isdigit = false;  
                 continue;  
             }  
             if (ucRead == ucLeftbrace)  
             {  
                 arriStack[++iTop] = 1;  
                 continue;  
             }  
   
             //右括号的下一个字符是右括号（最后一个字符）  
             if (ucRead == ucRightbrace)  
             {  
                 isdigit = false;  
                 arriStack[iTop--] = 0;  
                 continue;  
             }  
         }  
         else  
         {  
             ifs >> usread;  
             vecLabel.push_back(usread);  
             isdigit = false;  
         }  
     };  
     assert(vecLabel.size() == vecTrain.size());  
     assert(iTop == -1);  
   
     ifs.close();  
   
   
     return true;  
 }  
 bool testjsonfloat(const char *filename)  
 {  
     vector<double> vecTrain;  
     cout << "testjsondouble" << endl;  
     const int ciStackSize = 10;  
     int arriStack[ciStackSize], iTop = -1;  
   
     ifstream ifs;  
     ifs.open(filename, ios::in);  
     assert(ifs.is_open());  
   
     BYTE ucRead, ucLeftbrace, ucRightbrace, ucComma;  
     ucLeftbrace = '[';  
     ucRightbrace = ']';  
     ucComma = ',';  
     ifs >> ucRead;  
     assert(ucRead == ucLeftbrace);  
     //栈中全部存放左括号，用1代表,0说明清除  
     arriStack[++iTop] = 1;  
     //样本train开始  
     ifs >> ucRead;  
     assert(ucRead == ucLeftbrace);  
     arriStack[++iTop] = 1;//iTop is 1  
     double fread;  
     bool isdigit = false;  
   
   
     while (iTop > -1)  
     {  
         if (isdigit == false)  
         {  
             ifs >> ucRead;  
             isdigit = true;  
             if (ucRead == ucComma)  
             {  
                 //next char is space or leftbrace  
                 //                ifs >> ucRead;  
                 isdigit = false;  
                 continue;  
             }  
             if (ucRead == ' ')  
                 continue;  
             if (ucRead == ucLeftbrace)  
             {  
                 arriStack[++iTop] = 1;  
                 continue;  
             }  
   
             if (ucRead == ucRightbrace)  
             {  
                 isdigit = false;  
                 //右括号的下一个字符是右括号（最后一个字符）  
                 arriStack[iTop--] = 0;  
                 continue;  
             }  
         }  
         else  
         {  
             ifs >> fread;  
             vecTrain.push_back(fread);  
             isdigit = false;  
         }  
     }  
   
     ifs.close();  
   
   
     return true;  
   
 }  
   
 bool LoadwbFromJson(vector<double*> &vecTrain, vector<double> &vecLabel, const char *filename, const int m_iInput)  
 {  
     cout << "loadvtFromJson" << endl;  
     const int ciStackSize = 10;  
     const int ciFeaturesize = m_iInput;  
     int arriStack[ciStackSize], iTop = -1;  
   
     ifstream ifs;  
     ifs.open(filename, ios::in);  
     assert(ifs.is_open());  
     BYTE ucRead, ucLeftbrace, ucRightbrace, ucComma, ucSpace;  
     ucLeftbrace = '[';  
     ucRightbrace = ']';  
     ucComma = ',';  
     ucSpace = '0';  
     ifs >> ucRead;  
     assert(ucRead == ucLeftbrace);  
     //栈中全部存放左括号，用1代表,0说明清除  
     arriStack[++iTop] = 1;  
     //样本train开始  
     ifs >> ucRead;  
     assert(ucRead == ucLeftbrace);  
     arriStack[++iTop] = 1;//iTop is 1  
     int iIndex;  
     bool isdigit = false;  
     double dread, *pdvt;  
     //load vt sample  
     while (iTop > 0)  
     {  
         if (isdigit == false)  
         {  
             ifs >> ucRead;  
             isdigit = true;  
             if (ucRead == ucComma)  
             {  
                 //next char is space or leftbrace  
                 //                ifs >> ucRead;  
                 isdigit = false;  
                 continue;  
             }  
             if (ucRead == ucSpace)  
             {  
                 //if pdvt is null, next char is leftbrace;  
                 //else next char is double value  
                 if (pdvt == NULL)  
                     isdigit = false;  
                 continue;  
             }  
             if (ucRead == ucLeftbrace)  
             {  
                 pdvt = new double[ciFeaturesize];  
                 memset(pdvt, 0, ciFeaturesize * sizeof(double));  
                 //iIndex数组下标  
                 iIndex = 0;  
                 arriStack[++iTop] = 1;  
                 continue;  
             }  
   
             if (ucRead == ucRightbrace)  
             {  
                 if (pdvt != NULL)  
                 {  
                     assert(iIndex == ciFeaturesize);  
                     vecTrain.push_back(pdvt);  
                     pdvt = NULL;  
                 }  
                 isdigit = false;  
                 arriStack[iTop--] = 0;  
                 continue;  
             }  
         }  
         else  
         {  
             ifs >> dread;  
             pdvt[iIndex++] = dread;  
             isdigit = false;  
         }  
     };  
     //next char is dot  
     ifs >> ucRead;  
     assert(ucRead == ucComma);  
     cout << vecTrain.size() << endl;  
     //读取label  
     double usread;  
     isdigit = false;  
     while (iTop > -1 && ifs.eof() == false)  
     {  
         if (isdigit == false)  
         {  
             ifs >> ucRead;  
             isdigit = true;  
             if (ucRead == ucComma)  
             {  
                 //next char is space or leftbrace  
                 //                ifs >> ucRead;  
 //                isdigit = false;  
                 continue;  
             }  
             if (ucRead == ucSpace)  
             {  
                 //if pdvt is null, next char is leftbrace;  
                 //else next char is double value  
                 if (pdvt == NULL)  
                     isdigit = false;  
                 continue;  
             }  
             if (ucRead == ucLeftbrace)  
             {  
                 arriStack[++iTop] = 1;  
                 continue;  
             }  
   
             //右括号的下一个字符是右括号（最后一个字符）  
             if (ucRead == ucRightbrace)  
             {  
                 isdigit = false;  
                 arriStack[iTop--] = 0;  
                 continue;  
             }  
         }  
         else  
         {  
             ifs >> usread;  
             vecLabel.push_back(usread);  
             isdigit = false;  
         }  
     };  
     assert(vecLabel.size() == vecTrain.size());  
     assert(iTop == -1);  
   
     ifs.close();  
   
   
     return true;  
 }  
 bool vec2double(vector<BYTE> &vecDigit, double &dvalue)  
 {  
     if (vecDigit.empty())  
         return false;  
     int ivecsize = vecDigit.size();  
     const int iMaxlen = 50;  
     char szdigit[iMaxlen];  
     assert(iMaxlen > ivecsize);  
     memset(szdigit, 0, iMaxlen);  
     int i;  
     for (i = 0; i < ivecsize; ++i)  
         szdigit[i] = vecDigit[i];  
     szdigit[i++] = '\0';  
     vecDigit.clear();  
     dvalue = atof(szdigit);  
   
     return true;  
 }  
 bool vec2short(vector<BYTE> &vecDigit, WORD &usvalue)  
 {  
     if (vecDigit.empty())  
         return false;  
     int ivecsize = vecDigit.size();  
     const int iMaxlen = 50;  
     char szdigit[iMaxlen];  
     assert(iMaxlen > ivecsize);  
     memset(szdigit, 0, iMaxlen);  
     int i;  
     for (i = 0; i < ivecsize; ++i)  
         szdigit[i] = vecDigit[i];  
     szdigit[i++] = '\0';  
     vecDigit.clear();  
     usvalue = atoi(szdigit);  
     return true;  
 }  
 void readDigitFromJson(ifstream &ifs, vector<double*> &vecTrain, vector<WORD> &vecLabel,  
                        vector<BYTE> &vecDigit, double *&pdvt, int &iIndex,  
                        const int ciFeaturesize, int *arrStack, int &iTop, bool bFirstlist)  
 {  
     BYTE ucRead;  
     WORD usvalue;  
     double dvalue;  
   
   
     const BYTE ucLeftbrace = '[', ucRightbrace = ']', ucComma = ',', ucSpace = ' ';  
   
     ifs.read((char*)(&ucRead), 1);  
     switch (ucRead)  
     {  
         case ucLeftbrace:  
         {  
             if (bFirstlist)  
             {  
   
                 pdvt = new double[ciFeaturesize];  
                 memset(pdvt, 0, ciFeaturesize * sizeof(double));  
                 iIndex = 0;  
             }  
             arrStack[++iTop] = 1;  
             break;  
         }  
         case ucComma:  
         {  
             //next char is space or leftbrace  
             if (bFirstlist)  
             {  
                 if (vecDigit.empty() == false)  
                 {  
                     vec2double(vecDigit, dvalue);  
                     pdvt[iIndex++] = dvalue;  
                 }  
             }  
             else  
             {  
                 if(vec2short(vecDigit, usvalue))  
                     vecLabel.push_back(usvalue);  
             }  
             break;  
         }  
         case ucSpace:  
             break;  
         case ucRightbrace:  
         {  
             if (bFirstlist)  
             {  
                 if (pdvt != NULL)  
                 {  
   
                     vec2double(vecDigit, dvalue);  
                     pdvt[iIndex++] = dvalue;  
                     vecTrain.push_back(pdvt);  
                     pdvt = NULL;  
                 }  
                 assert(iIndex == ciFeaturesize);  
             }  
             else  
             {  
                 if(vec2short(vecDigit, usvalue))  
                     vecLabel.push_back(usvalue);  
             }  
             arrStack[iTop--] = 0;  
             break;  
         }  
         default:  
         {  
             vecDigit.push_back(ucRead);  
             break;  
         }  
     }  
   
 }  
 void readDoubleFromJson(ifstream &ifs, vector<double*> &vecTrain, vector<double> &vecLabel,  
                        vector<BYTE> &vecDigit, double *&pdvt, int &iIndex,  
                        const int ciFeaturesize, int *arrStack, int &iTop, bool bFirstlist)  
 {  
     BYTE ucRead;  
     double dvalue;  
   
   
     const BYTE ucLeftbrace = '[', ucRightbrace = ']', ucComma = ',', ucSpace = ' ';  
   
     ifs.read((char*)(&ucRead), 1);  
     switch (ucRead)  
     {  
         case ucLeftbrace:  
         {  
             if (bFirstlist)  
             {  
   
                 pdvt = new double[ciFeaturesize];  
                 memset(pdvt, 0, ciFeaturesize * sizeof(double));  
                 iIndex = 0;  
             }  
             arrStack[++iTop] = 1;  
             break;  
         }  
         case ucComma:  
         {  
             //next char is space or leftbrace  
             if (bFirstlist)  
             {  
                 if (vecDigit.empty() == false)  
                 {  
                     vec2double(vecDigit, dvalue);  
                     pdvt[iIndex++] = dvalue;  
                 }  
             }  
             else  
             {  
                 if(vec2double(vecDigit, dvalue))  
                     vecLabel.push_back(dvalue);  
             }  
             break;  
         }  
         case ucSpace:  
             break;  
         case ucRightbrace:  
         {  
             if (bFirstlist)  
             {  
                 if (pdvt != NULL)  
                 {  
   
                     vec2double(vecDigit, dvalue);  
                     pdvt[iIndex++] = dvalue;  
                     vecTrain.push_back(pdvt);  
                     pdvt = NULL;  
                 }  
                 assert(iIndex == ciFeaturesize);  
             }  
             else  
             {  
                 if(vec2double(vecDigit, dvalue))  
                     vecLabel.push_back(dvalue);  
             }  
             arrStack[iTop--] = 0;  
             break;  
         }  
         default:  
         {  
             vecDigit.push_back(ucRead);  
             break;  
         }  
     }  
   
 }  
   
 bool LoadallwbByByte(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb, const char *filename,  
                      const int m_iInput, const int ihidden, const int m_iOut)  
 {  
     cout << "LoadallwbByByte" << endl;  
     const int szistsize = 10;  
     int ciFeaturesize = m_iInput;  
     const BYTE ucLeftbrace = '[', ucRightbrace = ']', ucComma = ',', ucSpace = ' ';  
     int arrStack[szistsize], iTop = -1, iIndex = 0;  
   
   
     ifstream ifs;  
     ifs.open(filename, ios::in | ios::binary);  
     assert(ifs.is_open());  
   
   
     double *pdvt;  
     BYTE ucRead;  
     ifs.read((char*)(&ucRead), 1);  
     assert(ucRead == ucLeftbrace);  
     //栈中全部存放左括号，用1代表,0说明清除  
     arrStack[++iTop] = 1;  
     ifs.read((char*)(&ucRead), 1);  
     assert(ucRead == ucLeftbrace);  
     arrStack[++iTop] = 1;//iTop is 1  
     ifs.read((char*)(&ucRead), 1);  
     assert(ucRead == ucLeftbrace);  
     arrStack[++iTop] = 1;//iTop is 2  
     vector<BYTE> vecDigit;  
     vector<double *> vpdw;  
     vector<double> vdb;  
     while (iTop > 1 && ifs.eof() == false)  
     {  
         readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, m_iInput, arrStack, iTop, true);  
     };  
     //next char is dot  
     ifs.read((char*)(&ucRead), 1);;  
     assert(ucRead == ucComma);  
     cout << vpdw.size() << endl;  
     //next char is space  
     while (iTop > 0 && ifs.eof() == false)  
     {  
         readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, m_iInput, arrStack, iTop, false);  
     };  
     assert(vpdw.size() == vdb.size());  
     assert(iTop == 0);  
     vvAllw.push_back(vpdw);  
     vvAllb.push_back(vdb);  
     //clear vpdw and pdb 's contents  
     vpdw.clear();  
     vdb.clear();  
   
     //next char is comma  
     ifs.read((char*)(&ucRead), 1);;  
     assert(ucRead == ucComma);  
     //next char is space  
     ifs.read((char*)(&ucRead), 1);;  
     assert(ucRead == ucSpace);  
   
     ifs.read((char*)(&ucRead), 1);  
     assert(ucRead == ucLeftbrace);  
     arrStack[++iTop] = 1;//iTop is 1  
     ifs.read((char*)(&ucRead), 1);  
     assert(ucRead == ucLeftbrace);  
     arrStack[++iTop] = 1;//iTop is 2  
   
     while (iTop > 1 && ifs.eof() == false)  
     {  
         readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, ihidden, arrStack, iTop, true);  
     };  
     //next char is dot  
     ifs.read((char*)(&ucRead), 1);;  
     assert(ucRead == ucComma);  
     cout << vpdw.size() << endl;  
     //next char is space  
     while (iTop > -1 && ifs.eof() == false)  
     {  
         readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, ihidden, arrStack, iTop, false);  
     };  
     assert(vpdw.size() == vdb.size());  
     assert(iTop == -1);  
     vvAllw.push_back(vpdw);  
     vvAllb.push_back(vdb);  
     //clear vpdw and pdb 's contents  
     vpdw.clear();  
     vdb.clear();  
     //close file  
     ifs.close();  
     return true;  
   
   
 }  
   
 bool LoadWeighFromJson(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb,  
                      const char *filename, const vector<int> &vecSecondDimOfWeigh)  
 {  
     cout << "LoadWeighFromJson" << endl;  
     const int szistsize = 10;  
   
     const BYTE ucLeftbrace = '[', ucRightbrace = ']', ucComma = ',', ucSpace = ' ';  
     int arrStack[szistsize], iTop = -1, iIndex = 0;  
   
   
     ifstream ifs;  
     ifs.open(filename, ios::in | ios::binary);  
     assert(ifs.is_open());  
   
   
     double *pdvt;  
     BYTE ucRead;  
     ifs.read((char*)(&ucRead), 1);  
     assert(ucRead == ucLeftbrace);  
     //栈中全部存放左括号，用1代表,0说明清除  
     arrStack[++iTop] = 1;  
     ifs.read((char*)(&ucRead), 1);  
     assert(ucRead == ucLeftbrace);  
     arrStack[++iTop] = 1;//iTop is 1  
     ifs.read((char*)(&ucRead), 1);  
     assert(ucRead == ucLeftbrace);  
     arrStack[++iTop] = 1;//iTop is 2  
   
     int iVecWeighSize = vecSecondDimOfWeigh.size();  
     vector<BYTE> vecDigit;  
     vector<double *> vpdw;  
     vector<double> vdb;  
     //读取iVecWeighSize个[w,b]  
     for (int i = 0; i < iVecWeighSize; ++i)  
     {  
         int iDimesionOfWeigh = vecSecondDimOfWeigh[i];  
         while (iTop > 1 && ifs.eof() == false)  
         {  
             readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, iDimesionOfWeigh, arrStack, iTop, true);  
         };  
         //next char is dot  
         ifs.read((char*)(&ucRead), 1);;  
         assert(ucRead == ucComma);  
         cout << vpdw.size() << endl;  
         //next char is space  
         while (iTop > 0 && ifs.eof() == false)  
         {  
             readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, iDimesionOfWeigh, arrStack, iTop, false);  
         };  
         assert(vpdw.size() == vdb.size());  
         assert(iTop == 0);  
         vvAllw.push_back(vpdw);  
         vvAllb.push_back(vdb);  
         //clear vpdw and pdb 's contents  
         vpdw.clear();  
         vdb.clear();  
         //如果最后一对[w,b]读取完毕，就退出，下一个字符是右括号']'  
         if (i >= iVecWeighSize - 1)  
         {  
             break;  
         }  
   
         //next char is comma  
         ifs.read((char*)(&ucRead), 1);;  
         assert(ucRead == ucComma);  
         //next char is space  
         ifs.read((char*)(&ucRead), 1);;  
         assert(ucRead == ucSpace);  
   
         ifs.read((char*)(&ucRead), 1);  
         assert(ucRead == ucLeftbrace);  
         arrStack[++iTop] = 1;//iTop is 1  
         ifs.read((char*)(&ucRead), 1);  
         assert(ucRead == ucLeftbrace);  
         arrStack[++iTop] = 1;//iTop is 2  
     }  
   
       
     ifs.read((char*)(&ucRead), 1);;  
     assert(ucRead == ucRightbrace);  
     --iTop;  
     assert(iTop == -1);  
       
     //close file  
     ifs.close();  
     return true;  
   
       
 }  
   
   
 //read vt from mnszist, format is [[[], [],..., []],[1, 3, 5,..., 7]]  
 bool LoadTestSampleFromJson(vector<double*> &vecTrain, vector<WORD> &vecLabel, const char *filename, const int m_iInput)  
 {  
     cout << "LoadTestSampleFromJson" << endl;  
     const int szistsize = 10;  
     const int ciFeaturesize = m_iInput;  
     const BYTE ucLeftbrace = '[', ucRightbrace = ']', ucComma = ',', ucSpace = ' ';  
     int arrStack[szistsize], iTop = -1, iIndex = 0;  
   
   
     ifstream ifs;  
     ifs.open(filename, ios::in | ios::binary);  
     assert(ifs.is_open());  
   
   
     double *pdvt;  
     BYTE ucRead;  
     ifs.read((char*)(&ucRead), 1);  
     assert(ucRead == ucLeftbrace);  
     //栈中全部存放左括号，用1代表,0说明清除  
     arrStack[++iTop] = 1;  
     ifs.read((char*)(&ucRead), 1);  
     assert(ucRead == ucLeftbrace);  
     arrStack[++iTop] = 1;//iTop is 1  
     vector<BYTE> vecDigit;  
     while (iTop > 0 && ifs.eof() == false)  
     {  
         readDigitFromJson(ifs, vecTrain, vecLabel, vecDigit, pdvt, iIndex, ciFeaturesize, arrStack, iTop, true);  
     };  
     //next char is dot  
     ifs >> ucRead;  
     assert(ucRead == ucComma);  
     cout << vecTrain.size() << endl;  
     //next char is space  
 //    ifs.read((char*)(&ucRead), 1);  
 //    ifs.read((char*)(&ucRead), 1);  
 //    assert(ucRead == ucLeftbrace);  
     while (iTop > -1 && ifs.eof() == false)  
     {  
         readDigitFromJson(ifs, vecTrain, vecLabel, vecDigit, pdvt, iIndex, ciFeaturesize, arrStack, iTop, false);  
     };  
     assert(vecLabel.size() == vecTrain.size());  
     assert(iTop == -1);  
   
     ifs.close();  
   
   
     return true;  
 }  
 void MakeOneLabel(int iMax, double *pdLabel, int m_iOut)  
 {  
     for (int j = 0; j < m_iOut; ++j)  
         pdLabel[j] = 0;  
     pdLabel[iMax] = 1.0;  
 }  
   
 void MakeCnnSample(double arrInput[2][64], double *pdImage, int iImageWidth, int iNumOfImage)  
 {  
     int iImageSize = iImageWidth * iImageWidth;  
   
     for (int k = 0; k < iNumOfImage; ++k)  
     {  
         int iStart = k *iImageSize;  
         for (int i = 0; i < iImageWidth; ++i)  
         {  
             for (int j = 0; j < iImageWidth; ++j)  
             {  
                 int iIndex = iStart + i * iImageWidth + j;  
                 pdImage[iIndex] = 1;  
                 pdImage[iIndex] += i + j;  
                 if (k > 0)  
                     pdImage[iIndex] -= 1;  
                 arrInput[k][i * iImageWidth +j] = pdImage[iIndex];  
                 //pdImage[iIndex] /= 15.0   ;  
   
             }  
         }  
     }  
       
     cout << "input image is\n";  
     for (int k = 0; k < iNumOfImage; ++k)  
     {  
         int iStart = k *iImageSize;  
         cout << "k is " << k <<endl;  
         for (int i = 0; i < iImageWidth; ++i)  
         {  
             for (int j = 0; j < iImageWidth; ++j)  
             {  
                 int iIndex =  i * iImageWidth + j;  
                 double dValue = arrInput[k][iIndex];  
                 cout << dValue << ' ';  
   
             }  
             cout << endl;  
   
         }  
         cout << endl;  
   
     }  
       
     cout << endl;  
 }  
   
 void MakeCnnWeigh(double *pdKernel, int iNumOfKernel)  
 {  
     const int iKernelWidth = 3;  
     double iSum = 0;  
     double arrKernel[iKernelWidth][iKernelWidth] = {{4, 7, 1},  
                         {3, 8, 5},  
                         {3, 2, 3}};  
     double arr2[iKernelWidth][iKernelWidth] = {{6, 5, 4},  
                                                 {5, 4, 3},  
                                                 {4, 3, 2}};  
     for (int k = 0; k < iNumOfKernel; ++k)  
     {  
         int iStart = k * iKernelWidth * iKernelWidth;  
         for (int i = 0; i < iKernelWidth; ++i)  
         {  
             for (int j = 0; j < iKernelWidth; ++j)  
             {  
                 int iIndex = i * iKernelWidth + j + iStart;  
                 pdKernel[iIndex] = i + j + 2;  
                 if (k > 0)  
                     pdKernel[iIndex] = arrKernel[i][j];  
                 iSum += pdKernel[iIndex];  
   
             }  
         }  
     }  
     cout << "sum is " << iSum << endl;  
     for (int k = 0; k < iNumOfKernel; ++k)  
     {  
         cout << "kernel :" << k << endl;  
         int iStart = k * iKernelWidth * iKernelWidth;  
         for (int i = 0; i < iKernelWidth; ++i)  
         {  
             for (int j = 0; j < iKernelWidth; ++j)  
             {  
                 int iIndex = i * iKernelWidth + j + iStart;  
                 //pdKernel[iIndex] /= (double)iSum;  
                 cout << pdKernel[iIndex] << ' ';  
   
   
             }  
             cout << endl;  
   
         }  
         cout << endl;  
     }  
   
       
    cout << endl;  
 }  

训练两轮，生成theano权值的代码

cnn_mlp_theano.py

[python]view plaincopy 
    
 #coding=utf-8  
   
 import cPickle  
 import gzip  
 import os  
 import sys  
 import time  
 import json  
   
 import numpy  
   
 import theano  
 import theano.tensor as T  
 from theano.tensor.signal import downsample  
 from theano.tensor.nnet import conv  
   
 from logistic_sgd import LogisticRegression, load_data  
 from mlp import HiddenLayer  
   
   
 class LeNetConvPoolLayer(object):  
     """Pool Layer of a convolutional network """  
   
     def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)):  
         """ 
         Allocate a LeNetConvPoolLayer with shared variable internal parameters. 
  
         :type rng: numpy.random.RandomState 
         :param rng: a random number generator used to initialize weights 
  
         :type input: theano.tensor.dtensor4 
         :param input: symbolic image tensor, of shape image_shape 
  
         :type filter_shape: tuple or list of length 4 
         :param filter_shape: (number of filters, num input feature maps, 
                               filter height,filter width) 
  
         :type image_shape: tuple or list of length 4 
         :param image_shape: (batch size, num input feature maps, 
                              image height, image width) 
  
         :type poolsize: tuple or list of length 2 
         :param poolsize: the downsampling (pooling) factor (#rows,#cols) 
         """  
   
         assert image_shape[1] == filter_shape[1]  
         self.input = input  
   
         # there are "num input feature maps * filter height * filter width"  
         # inputs to each hidden unit  
         fan_in = numpy.prod(filter_shape[1:])  
         # each unit in the lower layer receives a gradient from:  
         # "num output feature maps * filter height * filter width" /  
         #   pooling size  
         fan_out = (filter_shape[0] * numpy.prod(filter_shape[2:]) /  
                    numpy.prod(poolsize))  
         # initialize weights with random weights  
         W_bound = numpy.sqrt(6. / (fan_in + fan_out))  
         self.W = theano.shared(numpy.asarray(  
             rng.uniform(low=-W_bound, high=W_bound, size=filter_shape),  
             dtype=theano.config.floatX),  
                                borrow=True)  
   
         # the bias is a 1D tensor -- one bias per output feature map  
         b_values = numpy.zeros((filter_shape[0],), dtype=theano.config.floatX)  
         self.b = theano.shared(value=b_values, borrow=True)  
   
         # convolve input feature maps with filters  
         conv_out = conv.conv2d(input=input, filters=self.W,  
                 filter_shape=filter_shape, image_shape=image_shape)  
   
         # downsample each feature map individually, using maxpooling  
         pooled_out = downsample.max_pool_2d(input=conv_out,  
                                             ds=poolsize, ignore_border=True)  
   
         # add the bias term. Since the bias is a vector (1D array), we first  
         # reshape it to a tensor of shape (1,n_filters,1,1). Each bias will  
         # thus be broadcasted across mini-batches and feature map  
         # width & height  
         self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))  
   
         # store parameters of this layer  
         self.params = [self.W, self.b]  
   
 def getDataNumpy(layers):  
     data = []  
     for layer in layers:  
         wb = layer.params  
         w, b = wb[0].get_value(), wb[1].get_value()  
         data.append([w, b])  
     return data  
 def getDataJson(layers):  
     data = []  
     i = 0  
     for layer in layers:  
         w, b = layer.params  
         # print '..layer is', i  
         w, b = w.get_value(), b.get_value()  
         wshape = w.shape  
         # print '...the shape of w is', wshape  
         if len(wshape) == 2:  
             w = w.transpose()  
         else:  
             for k in xrange(wshape[0]):  
                 for j in xrange(wshape[1]):  
                     w[k][j] = numpy.rot90(w[k][j], 2)  
   
             w = w.reshape((wshape[0], numpy.prod(wshape[1:])))  
   
         w = w.tolist()  
         b = b.tolist()  
         data.append([w, b])  
         i += 1  
     return data  
   
 def writefile(data, name = '../../tmp/src/data/theanocnn.json'):  
     print ('writefile is ' + name)  
     f = open(name, "wb")  
     json.dump(data,f)  
     f.close()  
 def readfile(layers, nkerns, name = '../../tmp/src/data/theanocnn.json'):  
     # Load the dataset  
     print ('readfile is ' + name)  
     f = open(name, 'rb')  
     data = json.load(f)  
     f.close()  
     readwb(data, layers, nkerns)  
   
 def readwb(data, layers, nkerns):  
     i = 0  
     kernSize = len(nkerns)  
     inputnum = 1  
     for layer in layers:  
         w, b = data[i]  
         w = numpy.array(w, dtype='float32')  
         b = numpy.array(b, dtype='float32')  
   
         # print '..layer is', i  
         # print w.shape  
         if i >= kernSize:  
             w = w.transpose()  
         else:  
             w = w.reshape((nkerns[i], inputnum, 5, 5))  
             for k in xrange(nkerns[i]):  
                 for j in xrange(inputnum):  
                     c = w[k][j]  
                     w[k][j] = numpy.rot90(c, 2)  
             inputnum = nkerns[i]  
         # print '..readwb ，transpose and rot180'  
         # print w.shape  
         layer.W.set_value(w, borrow=True)  
         layer.b.set_value(b, borrow=True)  
         i += 1  
 def loadwb(classifier, name='theanocnn.json'):  
     data = json.load(open(name, 'rb'))  
     w, b = data  
     print type(w)  
     w = numpy.array(w, dtype='float32').transpose()  
     classifier.W.set_value(w, borrow=True)  
     classifier.b.set_value(b, borrow=True)  
 def savewb(classifier, name='theanocnn.json'):  
     w, b = classifier.params  
     w = w.get_value().transpose().tolist()  
     b = b.get_value().tolist()  
     data = [w, b]  
     json.dump(data, open(name, 'wb'))  
       
   
 def evaluate_lenet5(learning_rate=0.1, n_epochs=2,  
                     dataset='../../data/mnist.pkl',  
                     nkerns=[20, 50], batch_size=500):  
     """ Demonstrates lenet on MNIST dataset 
  
     :type learning_rate: float 
     :param learning_rate: learning rate used (factor for the stochastic 
                           gradient) 
  
     :type n_epochs: int 
     :param n_epochs: maximal number of epochs to run the optimizer 
  
     :type dataset: string 
     :param dataset: path to the dataset used for training /testing (MNIST here) 
  
     :type nkerns: list of ints 
     :param nkerns: number of kernels on each layer 
     """  
   
     rng = numpy.random.RandomState(23455)  
   
     datasets = load_data(dataset)  
   
     train_set_x, train_set_y = datasets[0]  
     valid_set_x, valid_set_y = datasets[1]  
     test_set_x, test_set_y = datasets[2]  
   
     # compute number of minibatches for training, validation and testing  
     n_train_batches = train_set_x.get_value(borrow=True).shape[0]  
     n_valid_batches = valid_set_x.get_value(borrow=True).shape[0]  
     n_test_batches = test_set_x.get_value(borrow=True).shape[0]  
     n_train_batches /= batch_size  
     n_valid_batches /= batch_size  
     n_test_batches /= batch_size  
   
     # allocate symbolic variables for the data  
     index = T.lscalar()  # index to a [mini]batch  
     x = T.matrix('x')   # the data is presented as rasterized images  
     y = T.ivector('y')  # the labels are presented as 1D vector of  
                         # [int] labels  
   
     ishape = (28, 28)  # this is the size of MNIST images  
   
     ######################  
     # BUILD ACTUAL MODEL #  
     ######################  
     print '... building the model'  
   
     # Reshape matrix of rasterized images of shape (batch_size,28*28)  
     # to a 4D tensor, compatible with our LeNetConvPoolLayer  
     layer0_input = x.reshape((batch_size, 1, 28, 28))  
   
     # Construct the first convolutional pooling layer:  
     # filtering reduces the image size to (28-5+1,28-5+1)=(24,24)  
     # maxpooling reduces this further to (24/2,24/2) = (12,12)  
     # 4D output tensor is thus of shape (batch_size,nkerns[0],12,12)  
     layer0 = LeNetConvPoolLayer(rng, input=layer0_input,  
             image_shape=(batch_size, 1, 28, 28),  
             filter_shape=(nkerns[0], 1, 5, 5), poolsize=(2, 2))  
   
     # Construct the second convolutional pooling layer  
     # filtering reduces the image size to (12-5+1,12-5+1)=(8,8)  
     # maxpooling reduces this further to (8/2,8/2) = (4,4)  
     # 4D output tensor is thus of shape (nkerns[0],nkerns[1],4,4)  
     layer1 = LeNetConvPoolLayer(rng, input=layer0.output,  
             image_shape=(batch_size, nkerns[0], 12, 12),  
             filter_shape=(nkerns[1], nkerns[0], 5, 5), poolsize=(2, 2))  
   
     # the TanhLayer being fully-connected, it operates on 2D matrices of  
     # shape (batch_size,num_pixels) (i.e matrix of rasterized images).  
     # This will generate a matrix of shape (20,32*4*4) = (20,512)  
     layer2_input = layer1.output.flatten(2)  
   
     # construct a fully-connected sigmoidal layer  
     layer2 = HiddenLayer(rng, input=layer2_input, n_in=nkerns[1] * 4 * 4,  
                          n_out=500, activation=T.tanh)  
   
     # classify the values of the fully-connected sigmoidal layer  
     layer3 = LogisticRegression(input=layer2.output, n_in=500, n_out=10)  
   
     # the cost we minimize during training is the NLL of the model  
     cost = layer3.negative_log_likelihood(y)  
   
     # create a function to compute the mistakes that are made by the model  
     test_model = theano.function([index], layer3.errors(y),  
              givens={  
                 x: test_set_x[index * batch_size: (index + 1) * batch_size],  
                 y: test_set_y[index * batch_size: (index + 1) * batch_size]})  
   
     validate_model = theano.function([index], layer3.errors(y),  
             givens={  
                 x: valid_set_x[index * batch_size: (index + 1) * batch_size],  
                 y: valid_set_y[index * batch_size: (index + 1) * batch_size]})  
   
     # create a list of all model parameters to be fit by gradient descent  
     params = layer3.params + layer2.params + layer1.params + layer0.params  
   
     # create a list of gradients for all model parameters  
     grads = T.grad(cost, params)  
     layers = [layer0, layer1, layer2, layer3]  
   
   
     # train_model is a function that updates the model parameters by  
     # SGD Since this model has many parameters, it would be tedious to  
     # manually create an update rule for each model parameter. We thus  
     # create the updates list by automatically looping over all  
     # (params[i],grads[i]) pairs.  
     updates = []  
     for param_i, grad_i in zip(params, grads):  
         updates.append((param_i, param_i - learning_rate * grad_i))  
   
     train_model = theano.function([index], cost, updates=updates,  
           givens={  
             x: train_set_x[index * batch_size: (index + 1) * batch_size],  
             y: train_set_y[index * batch_size: (index + 1) * batch_size]})  
   
     ###############  
     # TRAIN MODEL #  
     ###############  
     print '... training'  
     # early-stopping parameters  
     patience = 10000  # look as this many examples regardless  
     patience_increase = 2  # wait this much longer when a new best is  
                            # found  
     improvement_threshold = 0.995  # a relative improvement of this much is  
                                    # considered significant  
     validation_frequency = min(n_train_batches, patience / 2)  
                                   # go through this many  
                                   # minibatche before checking the network  
                                   # on the validation set; in this case we  
                                   # check every epoch  
   
     best_params = None  
     best_validation_loss = numpy.inf  
     best_iter = 0  
     test_score = 0.  
     start_time = time.clock()  
   
     epoch = 0  
     done_looping = False  
   
     while (epoch < n_epochs) and (not done_looping):  
         epoch = epoch + 1  
         print '...epoch is', epoch, 'writefile'  
         writefile(getDataJson(layers))  
         for minibatch_index in xrange(n_train_batches):  
   
             iter = (epoch - 1) * n_train_batches + minibatch_index  
   
             if iter % 100 == 0:  
                 print 'training @ iter = ', iter  
             cost_ij = train_model(minibatch_index)  
   
             if (iter + 1) % validation_frequency == 0:  
   
                 # compute zero-one loss on validation set  
                 validation_losses = [validate_model(i) for i  
                                      in xrange(n_valid_batches)]  
                 this_validation_loss = numpy.mean(validation_losses)  
                 print('epoch %i, minibatch %i/%i, validation error %f %%' % \  
                       (epoch, minibatch_index + 1, n_train_batches, \  
                        this_validation_loss * 100.))  
   
                 # if we got the best validation score until now  
                 if this_validation_loss < best_validation_loss:  
   
                     #improve patience if loss improvement is good enough  
                     if this_validation_loss < best_validation_loss *  \  
                        improvement_threshold:  
                         patience = max(patience, iter * patience_increase)  
   
                     # save best validation score and iteration number  
                     best_validation_loss = this_validation_loss  
                     best_iter = iter  
   
                     # test it on the test set  
                     test_losses = [test_model(i) for i in xrange(n_test_batches)]  
                     test_score = numpy.mean(test_losses)  
                     print(('     epoch %i, minibatch %i/%i, test error of best '  
                            'model %f %%') %  
                           (epoch, minibatch_index + 1, n_train_batches,  
                            test_score * 100.))  
   
             if patience <= iter:  
                 done_looping = True  
                 break  
   
     end_time = time.clock()  
     print('Optimization complete.')  
     print('Best validation score of %f %% obtained at iteration %i,'\  
           'with test performance %f %%' %  
           (best_validation_loss * 100., best_iter + 1, test_score * 100.))  
     print >> sys.stderr, ('The code for file ' +  
                           os.path.split(__file__)[1] +  
                           ' ran for %.2fm' % ((end_time - start_time) / 60.))  
     ''''' 
     '''  
   
     readfile(layers, nkerns)  
     validation_losses = [validate_model(i) for i  
                          in xrange(n_valid_batches)]  
     this_validation_loss = numpy.mean(validation_losses)  
     print('validation error %f %%' % \  
           (this_validation_loss * 100.))  
   
 if __name__ == '__main__':  
     evaluate_lenet5()