实验环境: vs2013, python2.7(Miniconda2),Windows10, cuda8.0
感觉用windows系统做深度学习就是败笔。
以前用caffe提供的python接口,利用python层新建层,但是新建的层不支持GPU。因此需要自己动手写GPU的C++代码。面临的问题对C++的各个接口不是太熟悉,这是其一,其二习惯了python中数据的输入方式,例如使用HDF5或者建立一个Python层用来batch式的读取数据。
利用Python接口时,使用HDF5数据,好像不会对数据的第一维度(即样本数)进行检查例如,hdf5数据集中存储如下形式的数据:
data: 64 * 3 * 224 * 224 (nsample * channel * height * width)
label: 64 * 5 (nsample * label_distribution)
boxes: 128 * 4 (nbound_box * position)
使用caffe C++ 读取.prototxt指定的hdf5文件时,会提示错误,bottom[0].shape[0] vs
bottom[2].shape[0] (64 vs 128) 不匹配。
如果简单的对自己的层进行测试,caffe 提供了DummyData层:
layer{
name: "data"
type: "DummyData"
top: "data"
top: "labels"
top: "roi"
dummy_data_param {
data_filler {
type: "constant"
value: 0.01
}
shape {
dim: 16
dim: 3
dim: 224
dim: 224
}
data_filler{
type: "constant"
value: 0.02
}
shape{
dim: 16
dim: 1
dim: 1
dim: 1
}
data_filler{
type: "constant"
value: 0.03
}
shape{
dim: 48
dim: 5
dim: 1
dim: 1
}
}
}
caffe提供了几种数据填充方式,感觉仅能做简单的预测或者测试,对于自己新模块的测试,可能需要特定的数据才能观察出问题来,因此还是需要在forward前访问到数据的地址,填充自己的定制化数据。上面三种填充方式分别对应上面的三个blob.
Input层 可以设置不同形状的blob,例如:
layer {
name: "data"
type: "Input"
top: "data"
top: "labels"
top: "rois"
input_param {
shape {
dim: 16
dim: 3
dim: 224
dim: 224
}
shape{
dim: 16
dim: 1
dim: 1
dim: 1
}
shape{
dim: 48
dim: 5
dim: 1
dim: 1
}
}
}
# 下面是旧版的输入方式
input: "data"
input_dim: 16
input_dim: 3
input_dim: 224
input_dim: 224
input: "labels"
input_dim: 16
input_dim: 1
input_dim: 1
input_dim: 1
input: "labels"
input_dim: 48
input_dim: 5
input_dim: 1
input_dim: 1
上面三个blob,即data, headposes, rois 在训练的时候可以重新reshape其形状,然后feed 数据。
为caffe c++ 定制输入数据进行检测自定义模块
习惯使用 matlab和python来处理图像数据。对于C++的话,可以使用OpenCV库来完成图像数据的输入。
下面将测试数据封装成 matlab文件,然后利用c++(vs)读取matlab文件中的数据,将数据赋给input层中的blobs.
新建一个vs工程,在vs中添加matlab的头文件和库目录:
头文件:
D:\MATLAB\R2014a\extern\include
库目录:
D:\MATLAB\R2014a\extern\lib\win64\microsoft\libmx.lib
D:\MATLAB\R2014a\extern\lib\win64\microsoft\libmex.lib
D:\MATLAB\R2014a\extern\lib\win64\microsoft\libmat.lib
D:\MATLAB\R2014a\extern\lib\win64\microsoft\libeng.lib
在vs工程中添加编译的caffe release版本的库(debug版本总是出现一些意想不到的错误,并且幸运的是,release模式下也可以对自己编写的代码进行调试),及其头文件:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include
E:\caffe-windows2\include
E:\caffe-windows2\scripts\build\include
E:\caffe-windows2\scripts\build\libraries\include
E:\caffe-windows2\scripts\build\libraries\include\boost-1_61
库目录(连接器输入):
E:\caffe-windows2\scripts\build\lib\Release\caffe.lib
E:\caffe-windows2\scripts\build\lib\Release\proto.lib
E:\caffe-windows2\scripts\build\lib\Release\gtest.lib
E:\caffe-windows2\scripts\build\libraries\lib\boost_system-vc120-mt-1_61.lib
E:\caffe-windows2\scripts\build\libraries\lib\boost_thread-vc120-mt-1_61.lib
E:\caffe-windows2\scripts\build\libraries\lib\boost_chrono-vc120-mt-1_61.lib
E:\caffe-windows2\scripts\build\libraries\lib\boost_date_time-vc120-mt-1_61.lib
E:\caffe-windows2\scripts\build\libraries\lib\boost_atomic-vc120-mt-1_61.lib
E:\caffe-windows2\scripts\build\libraries\lib\boost_python-vc120-mt-1_61.lib
E:\caffe-windows2\scripts\build\libraries\lib\glog.lib
E:\caffe-windows2\scripts\build\libraries\lib\gflags.lib
shlwapi.lib
E:\caffe-windows2\scripts\build\libraries\lib\libprotobuf.lib
E:\caffe-windows2\scripts\build\libraries\lib\caffehdf5_hl.lib
E:\caffe-windows2\scripts\build\libraries\lib\caffehdf5.lib
E:\caffe-windows2\scripts\build\libraries\lib\lmdb.lib
ntdll.lib
E:\caffe-windows2\scripts\build\libraries\lib\leveldb.lib
E:\caffe-windows2\scripts\build\libraries\lib\snappy_static.lib
E:\caffe-windows2\scripts\build\libraries\lib\caffezlib.lib
E:\caffe-windows2\scripts\build\libraries\x64\vc12\lib\opencv_highgui310.lib
E:\caffe-windows2\scripts\build\libraries\x64\vc12\lib\opencv_imgcodecs310.lib
E:\caffe-windows2\scripts\build\libraries\x64\vc12\lib\opencv_imgproc310.lib
E:\caffe-windows2\scripts\build\libraries\x64\vc12\lib\opencv_core310.lib
E:\caffe-windows2\scripts\build\libraries\lib\libopenblas.dll.a
D:\python27\libs\python27.lib
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64\cudart.lib
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64\curand.lib
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64\cublas.lib
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64\cublas_device.lib
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64\cudnn.lib
E:\caffe-windows2\scripts\build\libraries\lib\libboost_filesystem-vc120-mt-1_61.lib
E:\caffe-windows2\scripts\build\libraries\lib\libboost_system-vc120-mt-1_61.lib
E:\caffe-windows2\scripts\build\libraries\lib\libboost_date_time-vc120-mt-1_61.lib
上面的库可能有些不一定用到,全包含总是好的。
第一次写cu文件,在.cu文件中,使用了caffe_set< Dtype>函数就会下面这样的错误:
0xC0000006: 写入位置 0x0000000C1C000000 时发生页面中的错误
因为,在执行.cu文件中的函数时,数据通常存储在GPU上,所以应使用GPU上对应的指令,caffe对其进行了封装,即 caffe_gpu_set< Dtype>。
有时可能也会碰到下面从错误:
0xC0000005: 写入位置 0x000000001D2B1000 时发生访问冲突。
copy时 src和dst地址大小不同引起的。
进行测试层的代码如下:
#include<iostream>
#include<string>
#include<vector>
#include<opencv2/opencv.hpp>
#include<caffe/caffe.hpp>
#include"caffe_reg.h"
#include"sin_layer.hpp"
#include<memory>
#include<mat.h>
using namespace std;
using namespace cv;
using namespace caffe;
void print_shape(vector<int> & shape)
{
for (int i = 0; i < shape.size(); i++)
cout << shape[i] << " ";
cout << endl;
}
void read_data_from_mat(string filename, float * * feat_ptr, float * * head_ptr, float * *mult_ptr,
int & feat_num, int & head_num, int & mult_num)
{
MATFile *data_pmat = NULL;
mxArray * feat_var = NULL;
mxArray * head_var = NULL;
mxArray * mult_var = NULL;
const char ** var_names;// variable names in .mat file ,testFeatures, testHeadposes, testMultiboxes
int nvars;
data_pmat = matOpen(filename.c_str(), "r");
if (data_pmat == NULL)
{
cout << "cannot open .mat file !" << endl;
exit(-1);
}
// read the varaibles in .mat file
var_names = (const char **)matGetDir(data_pmat, &nvars);
if (var_names == NULL)
{
cout << "The .mat file contarin 0 variable, please perform checking!" << endl;
exit(-1);
}
// output the number of variable in .mat file
cout << "The numboer of variable in.mat file is : " << nvars << endl;
feat_var = matGetVariable(data_pmat, var_names[0]); // nsample * c * h * w
head_var = matGetVariable(data_pmat, var_names[1]); //nsample * 1 * 1 * 1
mult_var = matGetVariable(data_pmat, var_names[2]);//(nsmaple * 3) * 5
for (int i = 0; i < nvars; i++)
cout << var_names[i] << " ";
cout << endl;
int feat_dim = mxGetNumberOfDimensions(feat_var);
int head_dim = mxGetNumberOfDimensions(head_var);
int mult_dim = mxGetNumberOfDimensions(mult_var);
cout << "dim1: " << feat_dim << ";dim2: " << head_dim << "dim3: " << mult_dim << endl;
const size_t * feat_dims = mxGetDimensions(feat_var);
const size_t * head_dims = mxGetDimensions(head_var);
const size_t * mult_dims = mxGetDimensions(mult_var);
// output
feat_num = feat_dims[0] * feat_dims[1] * feat_dims[2] * feat_dims[3];
head_num = head_dims[0] * head_dims[1];
mult_num = mult_dims[0] * mult_dims[1];
// note that: three variables in .mat file is single float type
*feat_ptr = (float*)mxGetData(feat_var);
*head_ptr = (float*)mxGetData(head_var);
*mult_ptr = (float*)mxGetData(mult_var);
}
//int main_test_sampleadd_train()
int main(void)
{
// for train
Caffe::set_mode(Caffe::GPU);
bool status_GPU = true;
SolverParameter solver_param;
string solver_file = "ZOO_VGG16/solver_sampleadd.prototxt";
//string train_proto = "ZOO_VGG16/train_sinlayer.prototxt";
ReadSolverParamsFromTextFileOrDie(solver_file, &solver_param);
//boost::shared_ptr<Solver<float> > solver(
// SolverRegistry<float>::CreateSolver(solver_param));
Solver<float> * solver = SolverRegistry<float>::CreateSolver(solver_param);
boost::shared_ptr<Net<float> > net = solver->net(); //train net
// read testFeatures, testHeapdoses, testMultiboxes from .mat file
float * feat_ptr = NULL;
float * head_ptr = NULL;
float * mult_ptr = NULL;
int feat_num; // the number of elements in blob
int head_num;
int mult_num;
string filename = "data/debug_data6.mat";
read_data_from_mat(filename, &feat_ptr, &head_ptr, &mult_ptr, feat_num, head_num, mult_num);
// copy the data in mat to blobs
Blob<float> * feat_blob = net->input_blobs()[0];
Blob<float> * head_blob = net->input_blobs()[1];
Blob<float> * mult_blob = net->input_blobs()[2];
int nsample = 6;
int nbox = 3;
feat_blob->Reshape(nsample, 3, 224, 224);
head_blob->Reshape(nsample, 1, 1, 1);
mult_blob->Reshape(nsample * nbox, 5, 1, 1);
net->Reshape();
float * feat_blob_ptr = NULL;
float * head_blob_ptr = NULL;
float * mult_blob_ptr = NULL;
if (status_GPU)
{
feat_blob_ptr = feat_blob->mutable_gpu_data();
head_blob_ptr = head_blob->mutable_gpu_data();
mult_blob_ptr = mult_blob->mutable_gpu_data();
}
else
{
feat_blob_ptr = feat_blob->mutable_cpu_data();
head_blob_ptr = head_blob->mutable_cpu_data();
mult_blob_ptr = mult_blob->mutable_cpu_data();
}
cout << "check the data in memory.... " << endl;
int mini_batch = 10;
// the following data is in cpu memory,i.e. the pointers (feat_ptr,head_ptr,) point to cpu memory
for (int i = 0; i < mini_batch; i++)
cout << feat_ptr[i] << " ";
cout << endl;
for (int i = 0; i < mini_batch; i++)
cout << head_ptr[i] << " ";
cout << endl;
caffe_copy(feat_num, feat_ptr, feat_blob_ptr);
caffe_copy(head_num, head_ptr, head_blob_ptr);
caffe_copy(mult_num, mult_ptr, mult_blob_ptr);
//feat_blob_ptr: this pointer is point to GPU memory
cout << "after copy: " << endl;
if (status_GPU)
{
// in order to show the results, we should access to cpu memory ,
const float *temp_feat = feat_blob->cpu_data();
const float *temp_head = head_blob->cpu_data();
for (int i = 0; i < mini_batch; i++)
cout << temp_feat[i] << " ";
cout << endl;
for (int i = 0; i < mini_batch; i++)
cout << temp_head[i] << " ";
cout << endl;
}
else
{
for (int i = 0; i < mini_batch; i++)
cout << feat_blob_ptr[i] << " ";
cout << endl;
for (int i = 0; i < mini_batch; i++)
cout << head_blob_ptr[i] << " ";
cout << endl;
}
//solver->Step(1);
cout << "forward... " << endl;
net->Forward(); // forward once
//check forward data
//const float *sample = net->blob_by_name("sampleadd5")->gpu_data();
//const float * roi = net->blob_by_name("roi_pool5")->gpu_data();
const float *sample = NULL;
const float *roi = NULL;
if (status_GPU)
{
sample = net->blob_by_name("sampleadd5")->gpu_data();
roi = net->blob_by_name("roi_pool5")->gpu_data();
}
else
{
sample = net->blob_by_name("sampleadd5")->cpu_data();
roi = net->blob_by_name("roi_pool5")->cpu_data();
}
vector<int> sample_shape = net->blob_by_name("sampleadd5")->shape();
vector<int> roi_shape = net->blob_by_name("roi_pool5")->shape();
cout << " The shape of input: " << endl;
print_shape(roi_shape);
cout << "The shape of output: " << endl;
print_shape(sample_shape);
cout << "start check the partial results for forward...." << endl;
int volume = sample_shape[1] * sample_shape[2] * sample_shape[3];
int sample_size = sample_shape[0] * volume;
int roi_size = roi_shape[0] * roi_shape[1] * roi_shape[2] * roi_shape[3];
int index = 0;
bool cudaStatus;
float * temp_sample = NULL;
float * temp_roi = NULL;
if (status_GPU)
{
temp_sample = new float[sample_size];
temp_roi = new float[roi_size];
// copy gpu data to cpu
cudaStatus = cudaMemcpy(temp_sample, sample, sample_size * sizeof(float), cudaMemcpyDeviceToHost);
cudaStatus = cudaMemcpy(temp_roi, roi, roi_size * sizeof(float), cudaMemcpyDeviceToHost);
sample = temp_sample;
roi = temp_roi;
}
int out_size = 3;
cout << "val1: " << endl;
for (int i = 0; i < out_size; i++)
cout << sample[i] << " ";
cout << endl;
float *res = new float[out_size];
cout << "val2: " << endl;
for (int i = 0; i < out_size; i++)
{
res[i] = 0; // initial its values to zeros
for (int j = 0; j < nbox; j++)
{
res[i] += roi[i + j*volume];
}
cout << res[i] << " ";
}
cout << endl;
delete res;
if (status_GPU)
{
delete temp_sample;
delete temp_roi;
}
cout << "backward..." << endl;
net->Backward();// backward once
cout << "start check the partial result for backward.... " << endl;
const float * top_diff = NULL;
const float * bottom_diff = NULL;
if (status_GPU)
{
bottom_diff = net->blob_by_name("roi_pool5")->gpu_diff();
top_diff = net->blob_by_name("sampleadd5")->gpu_diff();
// check the data in cpu == gpu?
const float * deb_bottom = net->blob_by_name("roi_pool5")->cpu_diff();
const float * deb_top = net->blob_by_name("sampleadd5")->cpu_diff();
// print the first-10 data in cpu
cout << "The data in cpu: " << endl;
for (int i = 0; i < 10; i++)
cout << *deb_bottom << " ";
cout << endl;
float * temp_top = new float[sample_size];
float * temp_bottom = new float[roi_size];
// copy gpu data to cpu
cudaStatus = cudaMemcpy(temp_top, top_diff, sample_size * sizeof(float), cudaMemcpyDeviceToHost);
cudaStatus = cudaMemcpy(temp_bottom, bottom_diff, roi_size * sizeof(float), cudaMemcpyDeviceToHost);
//print the first-10 data in gpu
cout << "The data in gpu: " << endl;
for (int i = 0; i < 10; i++)
cout << *temp_bottom << " ";
cout << endl;
bottom_diff = temp_bottom;
top_diff = temp_top;
}
else
{
bottom_diff = net->blob_by_name("roi_pool5")->cpu_diff();
top_diff = net->blob_by_name("sampleadd5")->cpu_diff();
}
cout << "the values below should be the same:" << endl;
cout << "diff1: " << endl;
for (int i = 0; i < out_size; i++)
cout << top_diff[i] << " ";
cout << endl;
cout << "diff2: " << endl;
for (int i = 0; i < out_size; i++)
cout << bottom_diff[i + 0 * volume] << " " << bottom_diff[i + 1 * volume] << " " << bottom_diff[i + 2 * volume] << endl;
return 0;
}
如果碰见错误:
math_functions.cpp:91] Check failed: error == cudaSuccess (77 vs. 0)
则设置batch_size小一些。网上对于这个错误众说纷纭,也不知道到底是不是内存不足引起的。难道是自己写的.cu文件不太合理?这是很有可能的。
对于gpu代码的调试,传统的设置断点跟踪的时候不会给出其内部的值,因为相关知识薄弱,采用笨方法调试,将产生的GPU数据拷贝到CPU上,即通过如下的方式:
bool cudaStatus;
cudaStatus = cudaMemcpy(out_result, out_p, 1 * sizeof(float), cudaMemcpyDeviceToHost);
cudaStatus = cudaMemcpy(in_result, in_p, 1 * sizeof(float), cudaMemcpyDeviceToHost);
其他注意事项
如果运行过程中提示未知的层类型(自己定义的),则可能需要新建一个头文件
如caffe_new_head.hpp
#include<caffe/common.hpp>
#include<caffe/fast_rcnn_layers.hpp>
#include"sample_add_layer.hpp"
#include"my_softmax_layer.hpp"
namespace caffe
{
extern INSTANTIATE_CLASS(ROIPoolingLayer);
extern INSTANTIATE_CLASS(SampleAddLayer);
extern INSTANTIATE_CLASS(MySoftmaxWithLossLayer);
}
在写cpp代码的时候,即使一些函数还没有去实现(具体的实现)也要加上实例化类和注册层类,这样就可以通过caffe的初始化检测,并且可以通过送入数据的方式来调试一些函数是否写正确。
NSTANTIATE_CLASS(MySoftmaxWithLossLayer);
REGISTER_LAYER_CLASS(MySoftmaxWithLoss);