pytorch 2 ncnn and 2 caffe 模型转换(image caption 的实现)
pytorch模型的转换
方法一:通过python的代码直接转换
通过方向传播back propagation计算流进行转换:
训练模型的pytorch版本需小于1.0。当pytorch的版本大于等于1.0时报错,next_functions总是NoneType。转换代码参考[链接](https://github.com/starimeL/PytorchConverter)或者代码
根据模型前向计算网络编写转换代码:
参考[代码](https://github.com/kouxichao/pytorch2ncnn),根据pytorch计算图提取相应的层信息进行ncnn or caffe计算图的构建。
方法二:使用ONNX(Open Neural Network Exchange)
Todo:待测试使用
ncnn的encoder、decoder(image caption)实现
基本思路:动态构建ncnn计算流。ncnn对imagecaption中的层支持不太完善,有很多错误冲突,需要改进,特别是不支持像广播这样的操作,另外尽量减少重复计算。
- 节省计算:python版中的计算存在较多冗余,主要是 inference 时候为了提高精度的 而使用的beam size技巧, 它的大小决定了 inference 的时候输出的序列数(最终选择score最高的序列作为 inference 的结果),由此导致不必要的重复计算。如attention 对 encoder 的输出的编码只需计算一次;decoder step 1 时只用计算一个输出,无论beam size多大。
- 编写缺失的op:如lstmcell,adaptive pool, 某个维度的操作等。
ncnn缺失层的编写
lstmcell层
描述词数不定,网络输出由lstmcell输出的是否是<end>决定,这就需要编写单个lstmcell层执行单个细胞计算,并循环执行,根据输出决定停止时刻。
int LSTMCell::forward(const std::vector<Mat>& bottom_blobs, std::vector<Mat>& top_blobs, const Option& opt) const
{
const Mat& input_blob = bottom_blobs[0];
const Mat& hidden_blob = bottom_blobs[1];
const Mat& cell_blob = bottom_blobs[2];
size_t elemsize = bottom_blobs[0].elemsize;
/**
* this variable is just used for several
* input samples but only one h and one c,
* this case usually happens when lstm begin.
* use this tech for save input size.
*/
int index_indicator = hidden_blob.h > 1 ? 1 : 0;
Mat& top_blob_h = top_blobs[0];
Mat& top_blob_c = top_blobs[1];
top_blob_h.create(hidden_size, input_blob.h, elemsize, opt.blob_allocator);
top_blob_c.create(hidden_size, input_blob.h, elemsize, opt.blob_allocator);
if (top_blob_h.empty() || top_blob_c.empty())
return -100;
//gates weights_blob of inputs(weights order I,F,G,O, same as pytorch's weight order)
const float *ih_I = (const float*)weight_ih_data;
const float *ih_F = (const float*)weight_ih_data + input_size * hidden_size;
const float *ih_G = (const float*)weight_ih_data + input_size * hidden_size * 2;
const float *ih_O = (const float*)weight_ih_data + input_size * hidden_size * 3;
//gates weights_blob of pre_hidden(weights order I,F,G,O, same as pytorch's weight order)
const float *hh_I = (const float*)weight_hh_data;
const float *hh_F = (const float*)weight_hh_data + hidden_size * hidden_size;
const float *hh_G = (const float*)weight_hh_data + hidden_size * hidden_size * 2;
const float *hh_O = (const float*)weight_hh_data + hidden_size * hidden_size * 3;
/**lstm unit
* sigmoid(I)
* sigmoid(F)
* sigmoid(O)
* tanh(G)
* c_t := f_t .* c_{t-1} + i_t .* g_t
* h_t := o_t .* tanh[c_t]
*/
// fprintf(stderr, "num_threads:%d\n", opt.num_threads);
#pragma omp parallel for num_threads(4)
for (int out_ele=0; out_ele<hidden_size; out_ele++)
{
const float* weight_ih_data_I = ih_I + input_size * out_ele;
const float* weight_hh_data_I = hh_I + hidden_size * out_ele;
const float* weight_ih_data_F = ih_F + input_size * out_ele;
const float* weight_hh_data_F = hh_F + hidden_size * out_ele;
const float* weight_ih_data_O = ih_O + input_size * out_ele;
const float* weight_hh_data_O = hh_O + hidden_size * out_ele;
const float* weight_ih_data_G = ih_G + input_size * out_ele;
const float* weight_hh_data_G = hh_G + hidden_size * out_ele;
if(input_blob.h != hidden_blob.h && hidden