crnn+bilstm+ctc转ncnn模型，并成功调用

whatsuo

已于 2024-01-23 10:48:27 修改

阅读量1.5k

点赞数

分类专栏：深度学习 OCR 文章标签： c++ ncnn

于 2021-08-12 17:44:13 首次发布

本文链接：https://blog.csdn.net/whatsuo/article/details/119647979

版权

深度学习同时被 2 个专栏收录

4 篇文章 0 订阅

订阅专栏

OCR

2 篇文章 0 订阅

订阅专栏

ocr训练代码：crnn+bilstm+ctcl
腾讯ncnn工具
参考:nihuid的详细记录超轻量中文OCR LSTM模型ncnn实现
第一步
训练crnn模型，得到best.pth

第二步

import numpy as np
import onnxruntime as rt
import utils
import onnx
import params
import models.crnn as net
import torch.nn as nn
import torch
from PIL import Image
import dataset
from torch.autograd import Variable
import sys
import onnxsim
img_height = 32
img_width = 100
img_channel = 1

def export(model_path, model_name):
    converter = utils.strLabelConverter(params.alphabet)
    nclass = len(params.alphabet) + 1
    model = net.CRNN(params.imgH, params.nc, nclass, params.nh)
    model.load_state_dict(torch.load(
        model_path, map_location=torch.device('cpu')))

    model.eval()
    img = torch.zeros(1, img_channel, img_height, img_width)
    preds = model(img)
    print(preds.shape)

    torch.onnx.export(model, img, model_name, verbose=False, input_names=['data'],
                      output_names=['pred_fc'])


def convert(model_name):
    input_shapes = {None: [1, img_channel, img_height, img_width]}

    model_opt, check_ok = onnxsim.simplify(model_name, input_shapes=input_shapes)

    onnx.save(model_opt, model_name[:-5]+"-sim.onnx")
    print("the model was simplified successfully")


if __name__ == "__main__":
    model_path = 'best.pth'
    model_name = 'netCRNN.onnx'

    export(model_path, model_name)
    convert(model_name)

通过export_onnx.py转换模型，得到netCRNN.onnx和netCRNN-sim.onnx
第三步：
通过ncnn转换模型，得到netCRNN.bin和netCRNN-sim.param
遇到的问题一：
log_softmax is not supported
遇到这个问题，是因为ncnn不支持log_softmax这个函数，自然而然想到把它用其他函数替换了。
尝试一：
修改log_softmax为softmax

    def forward(self, input):
        # conv features
        conv = self.cnn(input)
        b, c, h, w = conv.size()
        assert h == 1, "the height of conv must be 1"
        conv = conv.squeeze(2)
        conv = conv.permute(2, 0, 1)  # [w, b, c]

        # rnn features
        output = self.rnn(conv)
        # add log_softmax to converge output
        #output = F.log_softmax(output, dim=2) //修改为
		output = F.softmax(output, dim=2)
        return output

但是这样子虽然可以转换成功，但是训练过程中，loss一直为负，验证集准确率上不了50%，所以转换了并没有什么用。
log_softmax函数是softmax 的变种，并没有参数。
尝试二：
修改ncnn源代码，主要是/ncnn/tools/onnx/onnx2ncnn.cpp，在里面增加log_softmax的处理：
在Softmax下增加LogSoftmax声明，其实没做其他操作，这样子就可以成功转换

        else if (op == "Softmax")
        {
            fprintf(pp, "%-16s", "Softmax");
        }
        else if (op == "LogSoftmax")
        {
            fprintf(pp, "%-16s", "LogSoftmax");
        }

这是偷懒的做法，比较靠谱的应该是在/ncnn/src/layer下增加对应的logsoftmax层的声明和处理，并且在onnx2ncnn中正确调用
尝试三：
还有一种简单方法，直接在crnn.py中，注释掉log_softmax函数，直接返回output

    def forward(self, input):
        # conv features
        conv = self.cnn(input)
        b, c, h, w = conv.size()
        assert h == 1, "the height of conv must be 1"
        conv = conv.squeeze(2)
        conv = conv.permute(2, 0, 1)  # [w, b, c]

        # rnn features
        output = self.rnn(conv)
        # add log_softmax to converge output
        # output = F.log_softmax(output, dim=2)
        return output

这样子重新转换到onnx模型，模型中就不包含log_softmax，然后通过

./onnx2ncnn crnn-pytorch/netCRNN-sim.onnx

就可以得到对应的netCRNN.bin和netCRNN-sim.param

第四步：

遇到的问题，ncnn模型输出得到的结果和pytorch不一致：
调用pytorch模型，最终输出26batch_sizeclass_size，
而调用ncnn模型，输出的是11class_size
对比各层输出的shape，crnn bi-lstm中的前向代码，即对应的维度变化


    def forward(self, input):
        recurrent, _ = self.rnn(input)
        # print(recurrent.shape)                                                    [26, batch_size, 512]           [1,26,512]
        T, b, h = recurrent.size()
        t_rec = recurrent.view(T * b, h)
        # print(t_rec.shape)                                                           [26*batch_size, 512]              [1,1,13312]
        output = self.embedding(t_rec)  # [T * b, nOut]
        # print(output.shape)                                                         [26*batch_size, 256]				 [1,1,256]
        output = output.view(T, b, -1)
        # print(output.shape)                                                         [26, batch_size, 256]              [1,1,256]
        return output

可以发现在 recurrent.view(T * b, h)后，维度就不一样了，因为ncnn中没有batch_size维，所以这个操作是不必要的，因此我直接删除了netCRNN-sim.param中与之对应的Reshape操作。
删除前：

7767517
30 34
Input            data                     0 1 data
Convolution      Conv_0                   1 1 data 50 0=64 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=576
ReLU             Relu_1                   1 1 50 51
Pooling          MaxPool_2                1 1 51 52 0=0 1=2 11=2 2=2 12=2 3=0 13=0 14=0 15=0 5=1
Convolution      Conv_3                   1 1 52 53 0=128 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=73728
ReLU             Relu_4                   1 1 53 54
Pooling          MaxPool_5                1 1 54 55 0=0 1=2 11=2 2=2 12=2 3=0 13=0 14=0 15=0 5=1
Convolution      Conv_6                   1 1 55 250 0=256 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=294912
ReLU             Relu_7                   1 1 250 58
Convolution      Conv_8                   1 1 58 59 0=256 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=589824
ReLU             Relu_9                   1 1 59 60
Pooling          MaxPool_10               1 1 60 61 0=0 1=2 11=2 2=1 12=2 3=1 13=0 14=1 15=0 5=1
Convolution      Conv_11                  1 1 61 253 0=512 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=1179648
ReLU             Relu_12                  1 1 253 64
Convolution      Conv_13                  1 1 64 65 0=512 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=2359296
ReLU             Relu_14                  1 1 65 66
Pooling          MaxPool_15               1 1 66 67 0=0 1=2 11=2 2=1 12=2 3=1 13=0 14=1 15=0 5=1
Convolution      Conv_16                  1 1 67 256 0=512 1=2 11=2 2=1 12=1 3=1 13=1 4=0 14=0 15=0 16=0 5=1 6=1048576
ReLU             Relu_17                  1 1 256 70
Squeeze          Squeeze_18               1 1 70 71 -23303=1,2
Permute          Transpose_19             1 1 71 72 0=1
LSTM             LSTM_29                  1 3 72 139 135 136 0=256 1=1048576 2=2
Reshape          Reshape_46               1 1 139 153 0=512
InnerProduct     Gemm_47                  1 1 153 154 0=256 1=1 2=131072
Reshape          Reshape_51               1 1 154 160 0=-1 1=1
LSTM             LSTM_61                  1 3 160 227 223 224 0=256 1=524288 2=2
Reshape          Reshape_78               1 1 227 241 0=512
InnerProduct     Gemm_79                  1 1 241 242 0=11 1=1 2=5632
Reshape          Reshape_83               1 1 242 248 0=-1 1=1
Softmax          Softmax_84               1 1 248 pred_fc 0=1 1=1

删除后：

7767517
26 33
Input            data                     0 1 data
Convolution      Conv_0                   1 1 data 50 0=64 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=576
ReLU             Relu_1                   1 1 50 51
Pooling          MaxPool_2                1 1 51 52 0=0 1=2 11=2 2=2 12=2 3=0 13=0 14=0 15=0 5=1
Convolution      Conv_3                   1 1 52 53 0=128 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=73728
ReLU             Relu_4                   1 1 53 54
Pooling          MaxPool_5                1 1 54 55 0=0 1=2 11=2 2=2 12=2 3=0 13=0 14=0 15=0 5=1
Convolution      Conv_6                   1 1 55 249 0=256 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=294912
ReLU             Relu_7                   1 1 249 58
Convolution      Conv_8                   1 1 58 59 0=256 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=589824
ReLU             Relu_9                   1 1 59 60
Pooling          MaxPool_10               1 1 60 61 0=0 1=2 11=2 2=1 12=2 3=1 13=0 14=1 15=0 5=1
Convolution      Conv_11                  1 1 61 252 0=512 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=1179648
ReLU             Relu_12                  1 1 252 64
Convolution      Conv_13                  1 1 64 65 0=512 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=2359296
ReLU             Relu_14                  1 1 65 66
Pooling          MaxPool_15               1 1 66 67 0=0 1=2 11=2 2=1 12=2 3=1 13=0 14=1 15=0 5=1
Convolution      Conv_16                  1 1 67 255 0=512 1=2 11=2 2=1 12=1 3=1 13=1 4=0 14=0 15=0 16=0 5=1 6=1048576
ReLU             Relu_17                  1 1 255 70
Squeeze          Squeeze_18               1 1 70 71 -23303=1,2
Permute          Transpose_19             1 1 71 72 0=1
LSTM             LSTM_29                  1 3 72 139 135 136 0=256 1=1048576 2=2
InnerProduct     Gemm_47                  1 1 139 154 0=256 1=1 2=131072
LSTM             LSTM_61                  1 3 154 227 223 224 0=256 1=524288 2=2
InnerProduct     Gemm_79                  1 1 227 248 0=11 1=1 2=5632
Softmax          Softmax_84               1 1 248 pred_fc 0=1 1=1

以后就可以用c++调用ncnn模型了
c++调用代码

#include "net.h"
#include <opencv2/opencv.hpp>
#include <vector>
#include <iostream>

void resnet_init(ncnn::Net* net, const char* paramfile, const char* binfile)
{
	if(net == NULL) net = new ncnn::Net;
	net->load_param(paramfile);
	net->load_model(binfile);
}

void resnet_release(ncnn::Net* net)
{
	if(net == NULL) return;
	net->clear();
	delete net;
}

static int print_topk(const std::vector<float>& cls_scores, int topk)
{
    // partial sort topk with index
    int size = cls_scores.size();
    std::vector<std::pair<float, int> > vec;
    vec.resize(size);
    for (int i = 0; i < size; i++)
    {
        vec[i] = std::make_pair(cls_scores[i], i);
    }

    std::partial_sort(vec.begin(), vec.begin() + topk, vec.end(),
                      std::greater<std::pair<float, int> >());

    // print topk and score
    for (int i = 0; i < topk; i++)
    {
        float score = vec[i].first;
        int index = vec[i].second;
        fprintf(stderr, "%d = %f\n", index, score);
    }

    return 0;
}

void softmax(std::vector<float>& scores)
{
	float sum = 0.0f;
	for(int i = 0; i < scores.size(); i++)
	{
		float s = scores[i];
		scores[i] = std::exp(s);
		//scores[i] = std::exp(scores[i]);
		sum += scores[i];
	}
	printf("sum = %.3f\n", sum);
	for(int i = 0; i < scores.size(); i++)
	{
		scores[i] /= sum;
	}
}

void findmax(std::vector<float> scores)
{
	float max = -1;
	int index = -1;
	for(int i = 0; i < scores.size(); i++)
	{
		if(scores[i] > max){
			max = scores[i];
			index = i;
		}
	}
	printf("max: %.3f, id = %d\n", max, index);
}

void pretty_print(const ncnn::Mat& m)
{

	for (int q=0; q<m.c; q++)
	{
	const float* ptr = m.channel(q);
		for (int y=0; y<m.h; y++)
		{
			for (int x=0; x<m.w; x++)
			{
				printf("%f ", ptr[x]);
			}
			ptr += m.w;
			printf("\n");
		}
	printf("------------------------\n");
	}
}

void select_label(const ncnn::Mat& m,int *labels)
{
	for (int q=0; q<m.c; q++)
	{
		const float* ptr = m.channel(q);
		for (int y=0; y<m.h; y++)
		{	
			double max_score = 0.0;
			int label = -1;
			for (int x=0; x<m.w; x++)
			{
				if (max_score > ptr[x])
					continue;
				max_score = ptr[x]; 
				label = x; 
				// printf("%f ", ptr[x]);
			}
			labels[y] = label;
			ptr += m.w;
			// printf("\n");
		}

	}

}

void decode(const ncnn::Mat& m)
{
	int *labels = new int[26]{0};
    select_label(m,labels);
	char *char_list = new char[11]{'0'};

	int i=0,j=0;
	for (; i < 26; i++)
	{

		if(labels[i] == 0)
			continue;
		if(i>0&&labels[i-1] == labels[i])
			continue;

		char_list[j] = char(labels[i]-1+'0');
		j+=1;
		
	}
}

void forward(ncnn::Net* net, const cv::Mat& image)
{
	int w = image.cols;
	int h = image.rows;

	ncnn::Mat in = ncnn::Mat::from_pixels_resize(image.data, ncnn::Mat::PIXEL_GRAY, w, h, 100, 32);

    const float mean_vals[3] = {0.5f*255.f, 0.5f*255.f, 0.5f*255.f};
	const float norm_vals[3] = {1/0.5f/255.f, 1/0.5f/255.f, 1/0.5f/255.f};
	// const float mean_vals[3] = {0.485f*255.f, 0.456f*255.f, 0.406f*255.f};
	// const float norm_vals[3] = {1/0.229f/255.f, 1/0.224f/255.f, 1/0.225f/255.f};

	in.substract_mean_normalize(mean_vals, norm_vals);
	// pretty_print(in);
	ncnn::Mat blob234;

	
	ncnn::Mat out;
	ncnn::Extractor ex = net->create_extractor();
	ex.set_light_mode(true);
	ex.set_num_threads(4);
	ex.input("data", in);
	ex.extract("pred_fc", out);
	// pretty_print(out);
	decode(out);

}

int main(int argc, char** argv)
{
	ncnn::Net* net = new ncnn::Net;

	resnet_init(net, "../../ncnn.param", "../../ncnn.bin");
 	// cv::Mat src = cv::Mat::zeros(cv::Size(100,32),CV_8UC1);	
	cv::Mat src = cv::imread(00029506308_0.png");
	cv::cvtColor(src, src, cv::COLOR_BGR2GRAY);
	if(src.empty()) return 0;
	while (1)
	{
		double time1 = static_cast<double>( cv::getTickCount());
		forward(net, src);
        double time2 = (static_cast<double>( cv::getTickCount()) - time1)/cv::getTickFrequency();
        std::cout<<"处理一张图像的时间是："<< time2 <<"秒"<<std::endl;
	}
	delete net;
	return 0;
}

更新：
之前对模型进行./ncnnoptimize，在删除reshape操作，在c++中调用结果会不对。其实是因为下面操作的问题，可以参考nihui的文章就行修改

        conv = conv.squeeze(2)
        conv = conv.permute(2, 0, 1)  # [w, b, c]

模型转出到ncnn，再进行ncnnoptimize，得到的param是

7767517
23 27
Input                    data                     0 1 data
Convolution              Conv_0                   1 1 data 51 0=32 1=3 4=1 5=1 6=288 9=1
Pooling                  MaxPool_2                1 1 51 52 1=2 2=2 5=1
Convolution              Conv_3                   1 1 52 54 0=64 1=3 4=1 5=1 6=18432 9=1
Pooling                  MaxPool_5                1 1 54 55 1=2 2=2 5=1
Convolution              Conv_6                   1 1 55 58 0=128 1=3 4=1 5=1 6=73728 9=1
Convolution              Conv_8                   1 1 58 60 0=128 1=3 4=1 5=1 6=147456 9=1
Pooling                  MaxPool_10               1 1 60 61 1=2 12=2 3=1 13=0 5=1
Convolution              Conv_11                  1 1 61 64 0=256 1=3 4=1 5=1 6=294912 9=1
Convolution              Conv_13                  1 1 64 66 0=256 1=3 4=1 5=1 6=589824 9=1
Pooling                  MaxPool_15               1 1 66 67 1=2 12=2 3=1 13=0 5=1
Convolution              Conv_16                  1 1 67 70 0=256 1=2 5=1 6=262144 9=1
Squeeze                  Squeeze_18               1 1 70 71 -23300=1,2
Permute                  Transpose_19             1 1 71 72 0=1
LSTM                     LSTM_36                  1 3 72 142 138 139 0=128 1=262144 2=2
Reshape                  Reshape_40               1 1 142 150 0=256
InnerProduct             Gemm_41                  1 1 150 151 0=128 1=1 2=32768
Reshape                  Reshape_42               1 1 151 157 0=-1 1=1
LSTM                     LSTM_59                  1 3 157 227 223 224 0=128 1=131072 2=2
Reshape                  Reshape_63               1 1 227 235 0=256
InnerProduct             Gemm_64                  1 1 235 236 0=3898 1=1 2=997888
Reshape                  Reshape_65               1 1 236 242 0=-1 1=1
LogSoftmax               LogSoftmax_66            1 1 242 pred_fc

修改

Squeeze                  Squeeze_18               1 1 70 71 -23300=1,2
Permute                  Transpose_19             1 1 71 72 0=1

为

Reshape                  Squeeze_18               1 1 70 71 0=-1 1=256 2=-233
Permute                  Transpose_19             1 1 71 72 0=1

然后再进行后续操作，得到模型就是正常的

whatsuo

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
9
评论
crnn+bilstm+ctc转ncnn模型，并成功调用

ocr训练代码：crnn+bilstm+ctcl腾讯ncnn工具参考:nihuid的详细记录超轻量中文OCR LSTM模型ncnn实现第一步训练crnn模型，得到best.pth第二步import numpy as npimport onnxruntime as rtimport utilsimport onnximport paramsimport models.crnn as netimport torch.nn as nnimport torchfrom PIL impor
复制链接

扫一扫