ocr训练代码:crnn+bilstm+ctcl
腾讯ncnn工具
参考:nihuid的详细记录超轻量中文OCR LSTM模型ncnn实现
第一步
训练crnn模型,得到best.pth
第二步
import numpy as np
import onnxruntime as rt
import utils
import onnx
import params
import models.crnn as net
import torch.nn as nn
import torch
from PIL import Image
import dataset
from torch.autograd import Variable
import sys
import onnxsim
img_height = 32
img_width = 100
img_channel = 1
def export(model_path, model_name):
converter = utils.strLabelConverter(params.alphabet)
nclass = len(params.alphabet) + 1
model = net.CRNN(params.imgH, params.nc, nclass, params.nh)
model.load_state_dict(torch.load(
model_path, map_location=torch.device('cpu')))
model.eval()
img = torch.zeros(1, img_channel, img_height, img_width)
preds = model(img)
print(preds.shape)
torch.onnx.export(model, img, model_name, verbose=False, input_names=['data'],
output_names=['pred_fc'])
def convert(model_name):
input_shapes = {None: [1, img_channel, img_height, img_width]}
model_opt, check_ok = onnxsim.simplify(model_name, input_shapes=input_shapes)
onnx.save(model_opt, model_name[:-5]+"-sim.onnx")
print("the model was simplified successfully")
if __name__ == "__main__":
model_path = 'best.pth'
model_name = 'netCRNN.onnx'
export(model_path, model_name)
convert(model_name)
通过export_onnx.py转换模型,得到netCRNN.onnx和netCRNN-sim.onnx
第三步:
通过ncnn转换模型,得到netCRNN.bin和netCRNN-sim.param
遇到的问题一:
log_softmax is not supported
遇到这个问题,是因为ncnn不支持log_softmax这个函数,自然而然想到把它用其他函数替换了。
尝试一:
修改log_softmax为softmax
def forward(self, input):
# conv features
conv = self.cnn(input)
b, c, h, w = conv.size()
assert h == 1, "the height of conv must be 1"
conv = conv.squeeze(2)
conv = conv.permute(2, 0, 1) # [w, b, c]
# rnn features
output = self.rnn(conv)
# add log_softmax to converge output
#output = F.log_softmax(output, dim=2) //修改为
output = F.softmax(output, dim=2)
return output
但是这样子虽然可以转换成功,但是训练过程中,loss一直为负,验证集准确率上不了50%,所以转换了并没有什么用。
log_softmax函数是softmax 的变种,并没有参数。
尝试二:
修改ncnn源代码,主要是/ncnn/tools/onnx/onnx2ncnn.cpp,在里面增加log_softmax的处理:
在Softmax下增加LogSoftmax声明,其实没做其他操作,这样子就可以成功转换
else if (op == "Softmax")
{
fprintf(pp, "%-16s", "Softmax");
}
else if (op == "LogSoftmax")
{
fprintf(pp, "%-16s", "LogSoftmax");
}
这是偷懒的做法,比较靠谱的应该是在/ncnn/src/layer下增加对应的logsoftmax层的声明和处理,并且在onnx2ncnn中正确调用
尝试三:
还有一种简单方法,直接在crnn.py中,注释掉log_softmax函数,直接返回output
def forward(self, input):
# conv features
conv = self.cnn(input)
b, c, h, w = conv.size()
assert h == 1, "the height of conv must be 1"
conv = conv.squeeze(2)
conv = conv.permute(2, 0, 1) # [w, b, c]
# rnn features
output = self.rnn(conv)
# add log_softmax to converge output
# output = F.log_softmax(output, dim=2)
return output
这样子重新转换到onnx模型,模型中就不包含log_softmax,然后通过
./onnx2ncnn crnn-pytorch/netCRNN-sim.onnx
就可以得到对应的netCRNN.bin和netCRNN-sim.param
第四步:
遇到的问题,ncnn模型输出得到的结果和pytorch不一致:
调用pytorch模型,最终输出26batch_sizeclass_size,
而调用ncnn模型,输出的是11class_size
对比各层输出的shape,crnn bi-lstm中的前向代码,即对应的维度变化
def forward(self, input):
recurrent, _ = self.rnn(input)
# print(recurrent.shape) [26, batch_size, 512] [1,26,512]
T, b, h = recurrent.size()
t_rec = recurrent.view(T * b, h)
# print(t_rec.shape) [26*batch_size, 512] [1,1,13312]
output = self.embedding(t_rec) # [T * b, nOut]
# print(output.shape) [26*batch_size, 256] [1,1,256]
output = output.view(T, b, -1)
# print(output.shape) [26, batch_size, 256] [1,1,256]
return output
可以发现在 recurrent.view(T * b, h)后,维度就不一样了,因为ncnn中没有batch_size维,所以这个操作是不必要的,因此我直接删除了netCRNN-sim.param中与之对应的Reshape操作。
删除前:
7767517
30 34
Input data 0 1 data
Convolution Conv_0 1 1 data 50 0=64 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=576
ReLU Relu_1 1 1 50 51
Pooling MaxPool_2 1 1 51 52 0=0 1=2 11=2 2=2 12=2 3=0 13=0 14=0 15=0 5=1
Convolution Conv_3 1 1 52 53 0=128 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=73728
ReLU Relu_4 1 1 53 54
Pooling MaxPool_5 1 1 54 55 0=0 1=2 11=2 2=2 12=2 3=0 13=0 14=0 15=0 5=1
Convolution Conv_6 1 1 55 250 0=256 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=294912
ReLU Relu_7 1 1 250 58
Convolution Conv_8 1 1 58 59 0=256 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=589824
ReLU Relu_9 1 1 59 60
Pooling MaxPool_10 1 1 60 61 0=0 1=2 11=2 2=1 12=2 3=1 13=0 14=1 15=0 5=1
Convolution Conv_11 1 1 61 253 0=512 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=1179648
ReLU Relu_12 1 1 253 64
Convolution Conv_13 1 1 64 65 0=512 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=2359296
ReLU Relu_14 1 1 65 66
Pooling MaxPool_15 1 1 66 67 0=0 1=2 11=2 2=1 12=2 3=1 13=0 14=1 15=0 5=1
Convolution Conv_16 1 1 67 256 0=512 1=2 11=2 2=1 12=1 3=1 13=1 4=0 14=0 15=0 16=0 5=1 6=1048576
ReLU Relu_17 1 1 256 70
Squeeze Squeeze_18 1 1 70 71 -23303=1,2
Permute Transpose_19 1 1 71 72 0=1
LSTM LSTM_29 1 3 72 139 135 136 0=256 1=1048576 2=2
Reshape Reshape_46 1 1 139 153 0=512
InnerProduct Gemm_47 1 1 153 154 0=256 1=1 2=131072
Reshape Reshape_51 1 1 154 160 0=-1 1=1
LSTM LSTM_61 1 3 160 227 223 224 0=256 1=524288 2=2
Reshape Reshape_78 1 1 227 241 0=512
InnerProduct Gemm_79 1 1 241 242 0=11 1=1 2=5632
Reshape Reshape_83 1 1 242 248 0=-1 1=1
Softmax Softmax_84 1 1 248 pred_fc 0=1 1=1
删除后:
7767517
26 33
Input data 0 1 data
Convolution Conv_0 1 1 data 50 0=64 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=576
ReLU Relu_1 1 1 50 51
Pooling MaxPool_2 1 1 51 52 0=0 1=2 11=2 2=2 12=2 3=0 13=0 14=0 15=0 5=1
Convolution Conv_3 1 1 52 53 0=128 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=73728
ReLU Relu_4 1 1 53 54
Pooling MaxPool_5 1 1 54 55 0=0 1=2 11=2 2=2 12=2 3=0 13=0 14=0 15=0 5=1
Convolution Conv_6 1 1 55 249 0=256 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=294912
ReLU Relu_7 1 1 249 58
Convolution Conv_8 1 1 58 59 0=256 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=589824
ReLU Relu_9 1 1 59 60
Pooling MaxPool_10 1 1 60 61 0=0 1=2 11=2 2=1 12=2 3=1 13=0 14=1 15=0 5=1
Convolution Conv_11 1 1 61 252 0=512 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=1179648
ReLU Relu_12 1 1 252 64
Convolution Conv_13 1 1 64 65 0=512 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=2359296
ReLU Relu_14 1 1 65 66
Pooling MaxPool_15 1 1 66 67 0=0 1=2 11=2 2=1 12=2 3=1 13=0 14=1 15=0 5=1
Convolution Conv_16 1 1 67 255 0=512 1=2 11=2 2=1 12=1 3=1 13=1 4=0 14=0 15=0 16=0 5=1 6=1048576
ReLU Relu_17 1 1 255 70
Squeeze Squeeze_18 1 1 70 71 -23303=1,2
Permute Transpose_19 1 1 71 72 0=1
LSTM LSTM_29 1 3 72 139 135 136 0=256 1=1048576 2=2
InnerProduct Gemm_47 1 1 139 154 0=256 1=1 2=131072
LSTM LSTM_61 1 3 154 227 223 224 0=256 1=524288 2=2
InnerProduct Gemm_79 1 1 227 248 0=11 1=1 2=5632
Softmax Softmax_84 1 1 248 pred_fc 0=1 1=1
以后就可以用c++调用ncnn模型了
c++调用代码
#include "net.h"
#include <opencv2/opencv.hpp>
#include <vector>
#include <iostream>
void resnet_init(ncnn::Net* net, const char* paramfile, const char* binfile)
{
if(net == NULL) net = new ncnn::Net;
net->load_param(paramfile);
net->load_model(binfile);
}
void resnet_release(ncnn::Net* net)
{
if(net == NULL) return;
net->clear();
delete net;
}
static int print_topk(const std::vector<float>& cls_scores, int topk)
{
// partial sort topk with index
int size = cls_scores.size();
std::vector<std::pair<float, int> > vec;
vec.resize(size);
for (int i = 0; i < size; i++)
{
vec[i] = std::make_pair(cls_scores[i], i);
}
std::partial_sort(vec.begin(), vec.begin() + topk, vec.end(),
std::greater<std::pair<float, int> >());
// print topk and score
for (int i = 0; i < topk; i++)
{
float score = vec[i].first;
int index = vec[i].second;
fprintf(stderr, "%d = %f\n", index, score);
}
return 0;
}
void softmax(std::vector<float>& scores)
{
float sum = 0.0f;
for(int i = 0; i < scores.size(); i++)
{
float s = scores[i];
scores[i] = std::exp(s);
//scores[i] = std::exp(scores[i]);
sum += scores[i];
}
printf("sum = %.3f\n", sum);
for(int i = 0; i < scores.size(); i++)
{
scores[i] /= sum;
}
}
void findmax(std::vector<float> scores)
{
float max = -1;
int index = -1;
for(int i = 0; i < scores.size(); i++)
{
if(scores[i] > max){
max = scores[i];
index = i;
}
}
printf("max: %.3f, id = %d\n", max, index);
}
void pretty_print(const ncnn::Mat& m)
{
for (int q=0; q<m.c; q++)
{
const float* ptr = m.channel(q);
for (int y=0; y<m.h; y++)
{
for (int x=0; x<m.w; x++)
{
printf("%f ", ptr[x]);
}
ptr += m.w;
printf("\n");
}
printf("------------------------\n");
}
}
void select_label(const ncnn::Mat& m,int *labels)
{
for (int q=0; q<m.c; q++)
{
const float* ptr = m.channel(q);
for (int y=0; y<m.h; y++)
{
double max_score = 0.0;
int label = -1;
for (int x=0; x<m.w; x++)
{
if (max_score > ptr[x])
continue;
max_score = ptr[x];
label = x;
// printf("%f ", ptr[x]);
}
labels[y] = label;
ptr += m.w;
// printf("\n");
}
}
}
void decode(const ncnn::Mat& m)
{
int *labels = new int[26]{0};
select_label(m,labels);
char *char_list = new char[11]{'0'};
int i=0,j=0;
for (; i < 26; i++)
{
if(labels[i] == 0)
continue;
if(i>0&&labels[i-1] == labels[i])
continue;
char_list[j] = char(labels[i]-1+'0');
j+=1;
}
}
void forward(ncnn::Net* net, const cv::Mat& image)
{
int w = image.cols;
int h = image.rows;
ncnn::Mat in = ncnn::Mat::from_pixels_resize(image.data, ncnn::Mat::PIXEL_GRAY, w, h, 100, 32);
const float mean_vals[3] = {0.5f*255.f, 0.5f*255.f, 0.5f*255.f};
const float norm_vals[3] = {1/0.5f/255.f, 1/0.5f/255.f, 1/0.5f/255.f};
// const float mean_vals[3] = {0.485f*255.f, 0.456f*255.f, 0.406f*255.f};
// const float norm_vals[3] = {1/0.229f/255.f, 1/0.224f/255.f, 1/0.225f/255.f};
in.substract_mean_normalize(mean_vals, norm_vals);
// pretty_print(in);
ncnn::Mat blob234;
ncnn::Mat out;
ncnn::Extractor ex = net->create_extractor();
ex.set_light_mode(true);
ex.set_num_threads(4);
ex.input("data", in);
ex.extract("pred_fc", out);
// pretty_print(out);
decode(out);
}
int main(int argc, char** argv)
{
ncnn::Net* net = new ncnn::Net;
resnet_init(net, "../../ncnn.param", "../../ncnn.bin");
// cv::Mat src = cv::Mat::zeros(cv::Size(100,32),CV_8UC1);
cv::Mat src = cv::imread(00029506308_0.png");
cv::cvtColor(src, src, cv::COLOR_BGR2GRAY);
if(src.empty()) return 0;
while (1)
{
double time1 = static_cast<double>( cv::getTickCount());
forward(net, src);
double time2 = (static_cast<double>( cv::getTickCount()) - time1)/cv::getTickFrequency();
std::cout<<"处理一张图像的时间是:"<< time2 <<"秒"<<std::endl;
}
delete net;
return 0;
}
更新:
之前对模型进行./ncnnoptimize,在删除reshape操作,在c++中调用结果会不对。其实是因为下面操作的问题,可以参考nihui的文章就行修改
conv = conv.squeeze(2)
conv = conv.permute(2, 0, 1) # [w, b, c]
模型转出到ncnn,再进行ncnnoptimize,得到的param是
7767517
23 27
Input data 0 1 data
Convolution Conv_0 1 1 data 51 0=32 1=3 4=1 5=1 6=288 9=1
Pooling MaxPool_2 1 1 51 52 1=2 2=2 5=1
Convolution Conv_3 1 1 52 54 0=64 1=3 4=1 5=1 6=18432 9=1
Pooling MaxPool_5 1 1 54 55 1=2 2=2 5=1
Convolution Conv_6 1 1 55 58 0=128 1=3 4=1 5=1 6=73728 9=1
Convolution Conv_8 1 1 58 60 0=128 1=3 4=1 5=1 6=147456 9=1
Pooling MaxPool_10 1 1 60 61 1=2 12=2 3=1 13=0 5=1
Convolution Conv_11 1 1 61 64 0=256 1=3 4=1 5=1 6=294912 9=1
Convolution Conv_13 1 1 64 66 0=256 1=3 4=1 5=1 6=589824 9=1
Pooling MaxPool_15 1 1 66 67 1=2 12=2 3=1 13=0 5=1
Convolution Conv_16 1 1 67 70 0=256 1=2 5=1 6=262144 9=1
Squeeze Squeeze_18 1 1 70 71 -23300=1,2
Permute Transpose_19 1 1 71 72 0=1
LSTM LSTM_36 1 3 72 142 138 139 0=128 1=262144 2=2
Reshape Reshape_40 1 1 142 150 0=256
InnerProduct Gemm_41 1 1 150 151 0=128 1=1 2=32768
Reshape Reshape_42 1 1 151 157 0=-1 1=1
LSTM LSTM_59 1 3 157 227 223 224 0=128 1=131072 2=2
Reshape Reshape_63 1 1 227 235 0=256
InnerProduct Gemm_64 1 1 235 236 0=3898 1=1 2=997888
Reshape Reshape_65 1 1 236 242 0=-1 1=1
LogSoftmax LogSoftmax_66 1 1 242 pred_fc
修改
Squeeze Squeeze_18 1 1 70 71 -23300=1,2
Permute Transpose_19 1 1 71 72 0=1
为
Reshape Squeeze_18 1 1 70 71 0=-1 1=256 2=-233
Permute Transpose_19 1 1 71 72 0=1
然后再进行后续操作,得到模型就是正常的