本人的需求是:
通过theano的cnn训练神经网络,将最终稳定的网络权值保存下来。c++实现cnn的前向计算过程,读取theano的权值,复现theano的测试结果
本人最终的成果是:
1、卷积神经网络的前向计算过程
2、mlp网络的前向与后向计算,也就是可以用来训练样本
需要注意的是:
如果为了复现theano的测试结果,那么隐藏层的激活函数要选用tanh;
否则,为了mlp的训练过程,激活函数要选择sigmoid
成果的展现:
下图是theano的训练以及测试结果,验证样本错误率为9.23%
下面是我的c++程序,验证错误率也是9.23%,完美复现theano的结果
简单讲难点有两个:
1.theano的权值以及测试样本与c++如何互通?
2.theano的卷积的时候,上层输入的featuremap如何组合,映射到本层的每个像素点上?
在解决上述两点的过程中,走了很多的弯路:
为了用c++重现theano的测试结果,必须让c++能够读取theano保存的权值以及测试样本。
思考分析如下:
1.theano的权值是numpy格式,而它直接与c++交互,很困难,numpy的格式不好解析,网上资料很少
2.采用python做中间转换,实现1)的要求。后看theano代码,发现读入python的训练样本,不用转换成numpy数组,用本来python就可以了。但是python经过cPickle的dump文件,加了很多格式,不适合同c++交互。
3.
用json转换,由于python和cpp都有json的接口,都转成json的格式,然后再交互。可是theano训练之后权值是numpy格式的,需要转成python数组,json才可以存到文件中。现在的问题是怎么把numpy转成python的list?
4.为了解决3,找了一天,终于找到了numpy数组的tolist接口,可以将numpy数组转换成python的list。
5.现在python和c++都可以用json了。研究jsoncpp库的使用,将python的json文件读取。通过测试发现,库
jsoncpp不适合读取大文件,很容易造成内存不足,效率极低,故不可取。
6.用c++写函数,自己解析json文件。并且通过pot文件生成训练与测试样本的时候,也直接用c++来生成,不需要转换成numpy数组的格式。
经过上述分析,解决了难点1。
通过json格式实现c++与theano权值与测试样本的互通,并且自己写函数解析json文件。
对于难点2,看一个典型的cnn网络图
难点2的详细描述如下:
- Theano从S2到C3的时候,如何选择S2的featuremap进行组合?每次固定选取还是根据一定的算法动态组合?
- Theano从C3到S4的pooling过程,令poolsize是(2*2),如何将C3的每4个像素变成S4的一个像素?
通过大量的分析,对比验证,发现以下结论:
- Theano从S2到C3的时候,选择S2的所有featuremap进行组合
- Theano从C3到S4的pooling过程,令poolsize是(2*2),,对于C3的每4个像素,选取最大值作为S4的一个像素
通过以上的分析,理论上初步已经弄清楚了。下面就是要根据理论编写代码,真正耗时的是代码的调试过程,总是复现不了theano的测试结果。
曾经不止一次的认为这是不可能复现的,鬼知道theano怎么搞的。
今天终于将代码调通,很是兴奋,于是有了这篇博客。
阻碍我实现结果的bug主要有两个,一个是理论上的不足,对theano卷积的细节把握不准确;一个是自己写代码时粗心,变量初始化错误。如下:
- S2到C3卷积时,theano会对卷积核旋转180度之后,才会像下图这样进行卷积(本人刚接触这块,实在是不知道啊。。。)
- C3到S4取像素最大值的时候,想当然认为像素都是正的,变量初始化为0,导致最终找最大值错误(这个bug找的时间最久,血淋淋的教训。。。)
theano对写权值的函数,注意它保存的是卷积核旋转180度后的权值,如果权值是二维的,那么行列互换(与c++的权值表示法统一)
def getDataJson(layers):
data = []
i = 0
for layer in layers:
w, b = layer.params
# print '..layer is', i
w, b = w.get_value(), b.get_value()
wshape = w.shape
# print '...the shape of w is', wshape
if len(wshape) == 2:
w = w.transpose()
else:
for k in xrange(wshape[0]):
for j in xrange(wshape[1]):
w[k][j] = numpy.rot90(w[k][j], 2)
w = w.reshape((wshape[0], numpy.prod(wshape[1:])))
w = w.tolist()
b = b.tolist()
data.append([w, b])
i += 1
return data
def writefile(data, name = '../../tmp/src/data/theanocnn.json'):
print ('writefile is ' + name)
f = open(name, "wb")
json.dump(data,f)
f.close()
theano读权值
def readfile(layers, nkerns, name = '../../tmp/src/data/theanocnn.json'):
# Load the dataset
print ('readfile is ' + name)
f = open(name, 'rb')
data = json.load(f)
f.close()
readwb(data, layers, nkerns)
def readwb(data, layers, nkerns):
i = 0
kernSize = len(nkerns)
inputnum = 1
for layer in layers:
w, b = data[i]
w = numpy.array(w, dtype='float32')
b = numpy.array(b, dtype='float32')
# print '..layer is', i
# print w.shape
if i >= kernSize:
w = w.transpose()
else:
w = w.reshape((nkerns[i], inputnum, 5, 5))
for k in xrange(nkerns[i]):
for j in xrange(inputnum):
c = w[k][j]
w[k][j] = numpy.rot90(c, 2)
inputnum = nkerns[i]
# print '..readwb ,transpose and rot180'
# print w.shape
layer.W.set_value(w, borrow=True)
layer.b.set_value(b, borrow=True)
i += 1
测试样本由 手写数字库mnist生成,核心代码如下:
def mnist2json_small(cnnName = 'mnist_small.json', validNumber = 10):
dataset = '../../data/mnist.pkl'
print '... loading data', dataset
# Load the dataset
f = open(dataset, 'rb')
train_set, valid_set, test_set = cPickle.load(f)
#print test_set
f.close()
def np2listSmall(train_set, number):
trainfile = []
trains, labels = train_set
trainfile = []
#如果注释掉下面,将生成number个验证样本
number = len(labels)
for one in trains[:number]:
one = one.tolist()
trainfile.append(one)
labelfile = labels[:number].tolist()
datafile = [trainfile, labelfile]
return datafile
smallData = valid_set
print len(smallData)
valid, validlabel = np2listSmall(smallData, validNumber)
datafile = [valid, validlabel]
basedir = '../../tmp/src/data/'
# basedir = './'
json.dump(datafile, open(basedir + cnnName, 'wb'))
个人收获:
- 面对较难的任务,逐步分解,各个击破
- 解决问题的过程中,如果此路不通,要马上寻找其它思路,就像当年做数学证明题一样
- 态度要积极,不要轻言放弃,尽全力完成任务
- 代码调试时,应该首先构造较为全面的测试用例,这样可以迅速定位bug
#include <iostream>
#include "mlp.h"
#include "util.h"
#include "testinherit.h"
#include "neuralNetwork.h"
using namespace std;
/************************************************************************/
/* 本程序实现了
1、卷积神经网络的前向计算过程
2、mlp网络的前向与后向计算,也就是可以用来训练样本
需要注意的是:
如果为了复现theano的测试结果,那么隐藏层的激活函数要选用tanh;
否则,为了mlp的训练过程,激活函数要选择sigmoid
*/
/************************************************************************/
int main()
{
cout << "****cnn****" << endl;
TestCnnTheano(28 * 28, 10);
// TestMlpMnist对mlp训练样本进行测试
//TestMlpMnist(28 * 28, 500, 10);
return 0;
}
neuralNetwork.h
#ifndef NEURALNETWORK_H
#define NEURALNETWORK_H
#include "mlp.h"
#include "cnn.h"
#include <vector>
using std::vector;
/************************************************************************/
/* 这是一个卷积神经网络 */
/************************************************************************/
class NeuralNetWork
{
public:
NeuralNetWork(int iInput, int iOut);
~NeuralNetWork();
void Predict(double** in_data, int n);
double CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel);
void Setwb(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb);
void SetTrainNum(int iNum);
int Predict(double *pInputData);
// void Forward_propagation(double** ppdata, int n);
double* Forward_propagation(double *);
private:
int m_iSampleNum; //样本数量
int m_iInput; //输入维数
int m_iOut; //输出维数
vector<CnnLayer *> vecCnns;
Mlp *m_pMlp;
};
void TestCnnTheano(const int iInput, const int iOut);
#endif
neuralNetwork.cpp
#include "neuralNetwork.h"
#include <iostream>
#include "util.h"
#include <iomanip>
using namespace std;
NeuralNetWork::NeuralNetWork(int iInput, int iOut):m_iSampleNum(0), m_iInput(iInput), m_iOut(iOut), m_pMlp(NULL)
{
int iFeatureMapNumber = 20, iPoolWidth = 2, iInputImageWidth = 28, iKernelWidth = 5, iInputImageNumber = 1;
CnnLayer *pCnnLayer = new CnnLayer(m_iSampleNum, iInputImageNumber, iInputImageWidth, iFeatureMapNumber, iKernelWidth, iPoolWidth);
vecCnns.push_back(pCnnLayer);
iInputImageNumber = 20;
iInputImageWidth = 12;
iFeatureMapNumber = 50;
pCnnLayer = new CnnLayer(m_iSampleNum, iInputImageNumber, iInputImageWidth, iFeatureMapNumber, iKernelWidth, iPoolWidth);
vecCnns.push_back(pCnnLayer);
const int ihiddenSize = 1;
int phidden[ihiddenSize] = {500};
// construct LogisticRegression
m_pMlp = new Mlp(m_iSampleNum, iFeatureMapNumber * 4 * 4, m_iOut, ihiddenSize, phidden);
}
NeuralNetWork::~NeuralNetWork()
{
for (vector<CnnLayer*>::iterator it = vecCnns.begin(); it != vecCnns.end(); ++it)
{
delete *it;
}
delete m_pMlp;
}
void NeuralNetWork::SetTrainNum(int iNum)
{
m_iSampleNum = iNum;
for (size_t i = 0; i < vecCnns.size(); ++i)
{
vecCnns[i]->SetTrainNum(iNum);
}
m_pMlp->SetTrainNum(iNum);
}
int NeuralNetWork::Predict(double *pdInputdata)
{
double *pdPredictData = NULL;
pdPredictData = Forward_propagation(pdInputdata);
int iResult = -1;
iResult = m_pMlp->m_pLogisticLayer->Predict(pdPredictData);
return iResult;
}
double* NeuralNetWork::Forward_propagation(double *pdInputData)
{
double *pdPredictData = pdInputData;
vector<CnnLayer*>::iterator it;
CnnLayer *pCnnLayer;
for (it = vecCnns.begin(); it != vecCnns.end(); ++it)
{
pCnnLayer = *it;
pCnnLayer->Forward_propagation(pdPredictData);
pdPredictData = pCnnLayer->GetOutputData();
}
//此时pCnnLayer指向最后一个卷积层,pdInputData是卷积层的最后输出
//暂时忽略mlp的前向计算,以后加上
pdPredictData = m_pMlp->Forward_propagation(pdPredictData);
return pdPredictData;
}
void NeuralNetWork::Setwb(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb)
{
for (size_t i = 0; i < vecCnns.size(); ++i)
{
vecCnns[i]->Setwb(vvAllw[i], vvAllb[i]);
}
size_t iLayerNum = vvAllw.size();
for (size_t i = vecCnns.size(); i < iLayerNum - 1; ++i)
{
int iHiddenIndex = 0;
m_pMlp->m_ppHiddenLayer[iHiddenIndex]->Setwb(vvAllw[i], vvAllb[i]);
++iHiddenIndex;
}
m_pMlp->m_pLogisticLayer->Setwb(vvAllw[iLayerNum - 1], vvAllb[iLayerNum - 1]);
}
double NeuralNetWork::CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel)
{
cout << "Predict------------" << endl;
int iErrorNumber = 0, iValidNumber = vecValidlabel.size();
//iValidNumber = 1;
for (int i = 0; i < iValidNumber; ++i)
{
int iResult = Predict(vecvalid[i]);
//cout << i << ",valid is " << iResult << " label is " << vecValidlabel[i] << endl;
if (iResult != vecValidlabel[i])
{
++iErrorNumber;
}
}
cout << "the num of error is " << iErrorNumber << endl;
double dErrorRate = (double)iErrorNumber / iValidNumber;
cout << "the error rate of Train sample by softmax is " << setprecision(10) << dErrorRate * 100 << "%" << endl;
return dErrorRate;
}
/************************************************************************/
/*
测试样本采用mnist库,此cnn的结构与theano教程上的一致,即
输入是28*28图像,接下来是2个卷积层(卷积+pooling),featuremap个数分别是20和50,
然后是全连接层(500个神经元),最后输出层10个神经元
*/
/************************************************************************/
void TestCnnTheano(const int iInput, const int iOut)
{
//构建卷积神经网络
NeuralNetWork neural(iInput, iOut);
//存取theano的权值
vector< vector<double*> > vvAllw;
vector< vector<double> > vvAllb;
//存取测试样本与标签
vector<double*> vecValid;
vector<WORD> vecLabel;
//保存theano权值与测试样本的文件
const char *szTheanoWeigh = "../../data/theanocnn.json", *szTheanoTest = "../../data/mnist_validall.json";
//将每次权值的第二维(列宽)保存到vector中,用于读取json文件
vector<int> vecSecondDimOfWeigh;
vecSecondDimOfWeigh.push_back(5 * 5);
vecSecondDimOfWeigh.push_back(20 * 5 * 5);
vecSecondDimOfWeigh.push_back(50 * 4 * 4);
vecSecondDimOfWeigh.push_back(500);
cout << "loadwb ---------" << endl;
//读取权值
LoadWeighFromJson(vvAllw, vvAllb, szTheanoWeigh, vecSecondDimOfWeigh);
//将权值设置到cnn中
neural.Setwb(vvAllw, vvAllb);
//读取测试文件
LoadTestSampleFromJson(vecValid, vecLabel, szTheanoTest, iInput);
//设置测试样本的总量
int iVaildNum = vecValid.size();
neural.SetTrainNum(iVaildNum);
//前向计算cnn的错误率,输出结果
neural.CalErrorRate(vecValid, vecLabel);
//释放测试文件所申请的空间
for (vector<double*>::iterator cit = vecValid.begin(); cit != vecValid.end(); ++cit)
{
delete [](*cit);
}
}
cnn.h
#ifndef CNN_H
#define CNN_H
#include "featuremap.h"
#include "poollayer.h"
#include <vector>
using std::vector;
typedef unsigned short WORD;
/**
*本卷积模拟theano的测试过程
*当输入层是num个featuremap时,本层卷积层假设有featureNum个featuremap。
*对于本层每个像素点选取,上一层num个featuremap一起组合,并且没有bias
*然后本层输出到pooling层,pooling只对poolsize内的像素取最大值,然后加上bias,总共有featuremap个bias值
*/
class CnnLayer
{
public:
CnnLayer(int iSampleNum, int iInputImageNumber, int iInputImageWidth, int iFeatureMapNumber,
int iKernelWidth, int iPoolWidth);
~CnnLayer();
void Forward_propagation(double *pdInputData);
void Back_propagation(double* , double* , double );
void Train(double *x, WORD y, double dLr);
int Predict(double *);
void Setwb(vector<double*> &vpdw, vector<double> &vdb);
void SetInputAllData(double **ppInputAllData, int iInputNum);
void SetTrainNum(int iSampleNumber);
void PrintOutputData();
double* GetOutputData();
private:
int m_iSampleNum;
FeatureMap *m_pFeatureMap;
PoolLayer *m_pPoolLayer;
//反向传播时所需值
double **m_ppdDelta;
double *m_pdInputData;
double *m_pdOutputData;
};
void TestCnn();
#endif // CNN_H
cnn.cpp
#include "cnn.h"
#include "util.h"
#include <cassert>
CnnLayer::CnnLayer(int iSampleNum, int iInputImageNumber, int iInputImageWidth, int iFeatureMapNumber,
int iKernelWidth, int iPoolWidth):
m_iSampleNum(iSampleNum), m_pdInputData(NULL), m_pdOutputData(NULL)
{
m_pFeatureMap = new FeatureMap(iInputImageNumber, iInputImageWidth, iFeatureMapNumber, iKernelWidth);
int iFeatureMapWidth = iInputImageWidth - iKernelWidth + 1;
m_pPoolLayer = new PoolLayer(iFeatureMapNumber, iPoolWidth, iFeatureMapWidth);
}
CnnLayer::~CnnLayer()
{
delete m_pFeatureMap;
delete m_pPoolLayer;
}
void CnnLayer::Forward_propagation(double *pdInputData)
{
m_pFeatureMap->Convolute(pdInputData);
m_pPoolLayer->Convolute(m_pFeatureMap->GetFeatureMapValue());
m_pdOutputData = m_pPoolLayer->GetOutputData();
/************************************************************************/
/* 调试卷积过程的各阶段结果,调通后删除 */
/************************************************************************/
/*m_pFeatureMap->PrintOutputData();
m_pPoolLayer->PrintOutputData();*/
}
void CnnLayer::SetInputAllData(double **ppInputAllData, int iInputNum)
{
}
double* CnnLayer::GetOutputData()
{
assert(NULL != m_pdOutputData);
return m_pdOutputData;
}
void CnnLayer::Setwb(vector<double*> &vpdw, vector<double> &vdb)
{
m_pFeatureMap->SetWeigh(vpdw);
m_pPoolLayer->SetBias(vdb);
}
void CnnLayer::SetTrainNum( int iSampleNumber )
{
m_iSampleNum = iSampleNumber;
}
void CnnLayer::PrintOutputData()
{
m_pFeatureMap->PrintOutputData();
m_pPoolLayer->PrintOutputData();
}
void TestCnn()
{
const int iFeatureMapNumber = 2, iPoolWidth = 2, iInputImageWidth = 8, iKernelWidth = 3, iInputImageNumber = 2;
double *pdImage = new double[iInputImageWidth * iInputImageWidth * iInputImageNumber];
double arrInput[iInputImageNumber][iInputImageWidth * iInputImageWidth];
MakeCnnSample(arrInput, pdImage, iInputImageWidth, iInputImageNumber);
double *pdKernel = new double[3 * 3 * iInputImageNumber];
double arrKernel[3 * 3 * iInputImageNumber];
MakeCnnWeigh(pdKernel, iInputImageNumber) ;
CnnLayer cnn(3, iInputImageNumber, iInputImageWidth, iFeatureMapNumber, iKernelWidth, iPoolWidth);
vector <double*> vecWeigh;
vector <double> vecBias;
for (int i = 0; i < iFeatureMapNumber; ++i)
{
vecBias.push_back(1.0);
}
vecWeigh.push_back(pdKernel);
for (int i = 0; i < 3 * 3 * 2; ++i)
{
arrKernel[i] = i;
}
vecWeigh.push_back(arrKernel);
cnn.Setwb(vecWeigh, vecBias);
cnn.Forward_propagation(pdImage);
cnn.PrintOutputData();
delete []pdKernel;
delete []pdImage;
}
featuremap.h
#ifndef FEATUREMAP_H
#define FEATUREMAP_H
#include <cassert>
#include <vector>
using std::vector;
class FeatureMap
{
public:
FeatureMap(int iInputImageNumber, int iInputImageWidth, int iFeatureMapNumber, int iKer