image caption 的移动端实现过程（ncnn, nnie）

最新推荐文章于 2024-01-25 16:52:38 发布

kxc_1102

最新推荐文章于 2024-01-25 16:52:38 发布

阅读量1.9k

点赞数 1

分类专栏：场景描述文章标签：场景描述移动端 nnie ncnn

本文链接：https://blog.csdn.net/qq_34898504/article/details/93241093

版权

本文详细介绍了如何将PyTorch的image caption模型转换为移动端可用的ncnn和nnie模型。首先讲解了两种转换方法，包括通过Python代码直接转换和使用ONNX。接着，重点阐述了ncnn的encoder和decoder实现，尤其是ncnn中缺失层的编写，特别是lstmcell层。还探讨了nnie的encoder实现及其与ncnn的衔接过程，包括静态库的生成和注意事项。

摘要由CSDN通过智能技术生成

pytorch 2 ncnn and 2 caffe 模型转换（image caption 的实现）

pytorch模型的转换
- 方法一：通过python的代码直接转换
- 方法二：使用ONNX（Open Neural Network Exchange）
ncnn的encoder、decoder（image caption）实现
nnie 的encoder（image caption）实现
- - 代码实现
nnie 与 ncnn 的衔接
- 衔接过程
- 应注意的问题

pytorch模型的转换

方法一：通过python的代码直接转换

通过方向传播back propagation计算流进行转换：

	训练模型的pytorch版本需小于1.0。当pytorch的版本大于等于1.0时报错，next_functions总是NoneType。转换代码参考[链接](https://github.com/starimeL/PytorchConverter)或者代码

根据模型前向计算网络编写转换代码:

	参考[代码](https://github.com/kouxichao/pytorch2ncnn)，根据pytorch计算图提取相应的层信息进行ncnn or caffe计算图的构建。

方法二：使用ONNX（Open Neural Network Exchange）

Todo：待测试使用

ncnn的encoder、decoder（image caption）实现

基本思路：动态构建ncnn计算流。ncnn对imagecaption中的层支持不太完善，有很多错误冲突，需要改进，特别是不支持像广播这样的操作，另外尽量减少重复计算。

节省计算：python版中的计算存在较多冗余，主要是 inference 时候为了提高精度的而使用的beam size技巧，它的大小决定了 inference 的时候输出的序列数（最终选择score最高的序列作为 inference 的结果），由此导致不必要的重复计算。如attention 对 encoder 的输出的编码只需计算一次；decoder step 1 时只用计算一个输出，无论beam size多大。
编写缺失的op:如lstmcell，adaptive pool, 某个维度的操作等。

ncnn缺失层的编写

lstmcell层

描述词数不定，网络输出由lstmcell输出的是否是<end>决定，这就需要编写单个lstmcell层执行单个细胞计算，并循环执行，根据输出决定停止时刻。

int LSTMCell::forward(const std::vector<Mat>& bottom_blobs, std::vector<Mat>& top_blobs, const Option& opt) const
{
   
    const Mat& input_blob = bottom_blobs[0];
    const Mat& hidden_blob = bottom_blobs[1];
    const Mat& cell_blob = bottom_blobs[2];
    size_t elemsize = bottom_blobs[0].elemsize;

    /**
     * this variable is just used for several
     * input samples but only one h and one c,
     * this case usually happens when lstm begin.
     * use this tech for save input size. 
     */
    int index_indicator = hidden_blob.h > 1 ? 1 : 0; 

    Mat& top_blob_h = top_blobs[0];
    Mat& top_blob_c = top_blobs[1];
    top_blob_h.create(hidden_size, input_blob.h, elemsize, opt.blob_allocator);
    top_blob_c.create(hidden_size, input_blob.h, elemsize, opt.blob_allocator);

    if (top_blob_h.empty() || top_blob_c.empty())
        return -100;

    //gates weights_blob of inputs(weights order I,F,G,O, same as pytorch's weight order)
    const float *ih_I = (const float*)weight_ih_data;
    const float *ih_F = (const float*)weight_ih_data + input_size * hidden_size;
    const float *ih_G = (const float*)weight_ih_data + input_size * hidden_size * 2;
    const float *ih_O = (const float*)weight_ih_data + input_size * hidden_size * 3;

    //gates weights_blob of pre_hidden(weights order I,F,G,O, same as pytorch's weight order)
    const float *hh_I = (const float*)weight_hh_data;
    const float *hh_F = (const float*)weight_hh_data + hidden_size * hidden_size;
    const float *hh_G = (const float*)weight_hh_data + hidden_size * hidden_size * 2;
    const float *hh_O = (const float*)weight_hh_data + hidden_size * hidden_size * 3;
    
    /**lstm unit 
     * sigmoid(I)
     * sigmoid(F)
     * sigmoid(O)
     * tanh(G)
     * c_t := f_t .* c_{t-1} + i_t .* g_t
     * h_t := o_t .* tanh[c_t]
     */
   // fprintf(stderr, "num_threads:%d\n", opt.num_threads);
    #pragma omp parallel for num_threads(4)
    for (int out_ele=0; out_ele<hidden_size; out_ele++)
    {
   
        const float* weight_ih_data_I = ih_I + input_size * out_ele;
        const float* weight_hh_data_I = hh_I + hidden_size * out_ele; 
        const float* weight_ih_data_F = ih_F + input_size * out_ele;
        const float* weight_hh_data_F = hh_F + hidden_size * out_ele; 
        const float* weight_ih_data_O = ih_O + input_size * out_ele; 
        const float* weight_hh_data_O = hh_O + hidden_size * out_ele; 
        const float* weight_ih_data_G = ih_G + input_size * out_ele; 
        const float* weight_hh_data_G = hh_G + hidden_size * out_ele; 


        if(input_blob.h != hidden_blob.h && hidden