paddleOCRv3之一: rec识别部分用 openVINO(C++)部署

paddleOCRv3 openVINO部署

1. 简介

PaddleOCRv3识别部分(rec)用C++在openVINO上的部署,关注阶段从训练结束开始到用openVINO部署。中间使用的软件版本如下

  • paddleOCR(release2.5),(一致把release2.5叫做paddleOCRv3)

  • openVINO 2021.4

  • opencv 4.4.0

  • paddlepaddle-gpu: 2.2.2.post111

  • paddle2onnx: 0.9.8
    在这里插入图片描述

  • 结果示例示例

  • 速度测试
    CPU

openVINO版本2021.4.2,CPU i5-10400 @2.90GHz。测试1000次前后处理+模型推理的平均时间

模型输入尺寸精度耗时(ms)CPU占用(大约)
1x3x48x320FP3215.66856%
1x3x48x200FP329.15760%
1x3x48x160FP327.96871%
1x3x48x320FP163.89766%
1x3x48x200FP162.72467%
1x3x48x160FP162.44068%

2. paddleOCR模型转为onnx模型

2.1 paddle 动态图模型转为静态图模型

paddleOCRv3项目下提供了一个转换方法在tools/tools/export_model.py里面,直接用下面的命令行方式转换,指定配置文件和训练的模型文件,还有存储路径即可,具体如下:

训练好的模型在./output/myOCR_model路径下
请添加图片描述

python tools/export_model.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec_my2.yml -o Global.pretrained_model=./output/myOCR_model/best_accuracy  Global.save_inference_dir=./inference/myOCR_model/

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zXLTFvS6-1658112937357)(C:\Users\BDTDEMO\AppData\Roaming\Typora\typora-user-images\image-20220715135031136.png)]

经过上面的步骤得到下面的inference模型

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Z1xYuZ3s-1658112937357)(C:\Users\BDTDEMO\AppData\Roaming\Typora\typora-user-images\image-20220715140059808.png)]

2.3 paddle2onnx把模型转onnx格式

paddle2onnx --model_dir D:\myAPP\pythonDoc\PaddleOCRv3\inference\myOCR_model --model_filename inference.pdmodel --params_file inference.pdiparams --save_file D:\myAPP\pythonDoc\PaddleOCRv3\inference\myOCR_model/onnx/PaddleOCRv3.onnx --opset_version 11

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4DgqCsL6-1658112937358)(C:\Users\BDTDEMO\AppData\Roaming\Typora\typora-user-images\image-20220715134206444.png)]

得到如下的onnx格式的模型

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-i2YpkS1e-1658112937358)(C:\Users\BDTDEMO\AppData\Roaming\Typora\typora-user-images\image-20220715140229063.png)]

3. openVINO转换ONNX模型为IR模型

用netron看onnx的输入节点

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-8UPtl8di-1658112937358)(C:\Users\BDTDEMO\AppData\Roaming\Typora\typora-user-images\image-20220715141018523.png)]

注意:现在的onnx模型输入的维度上batchsize和图片宽度是不定的,在onnx模型上为-1,但是openVINO2021.4现在不支持尺寸的自动推导,所以需要固定输入尺寸,转换时用–input_shape=[1,3,48,320]指定。openVINO的具体使用方法这里先不细述了,可以用–help查看具体的用法。

python "G:\openVINO\install\openvino_2021.4.752\deployment_tools\model_optimizer\mo.py" --input_model="K:\model\PaddleOCR\onnx\ppocr.onnx" --output_dir="K:\model\PaddleOCR\onnx\opv" --model_name="ppocr" --data_type=FP32 --input_shape=[1,3,48,320]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-2HFiTluk-1658112937359)(C:\Users\BDTDEMO\AppData\Roaming\Typora\typora-user-images\image-20220715141332102.png)]

得到下面的模型

4. openVINO + openCV部署

4.1 vs的配置

配置选项:openVINO2021系列配置都一样,openVINO2021.4版本是Intel长期支持的版本,opencv的话主要是用来处理图片,和推理过程无关,输入图片的前处理阶段padding的时候用到cv::copyMakeBoder函数,似乎opencv4.x版本都支持,3.x的从哪一个版本开始支持我也不是很清楚,所以想直接copy代码跑的话就用4.x版本的吧。

我这里使用的是openVINO2021.4和opencv4.4.0.

  • 附加包含目录

  • 附加库目录

  • 连接器/输入

  • dll依赖
    dll的依赖可以有三种方式,1)设置系统环境变量,把dll的位置添加到系统变量中,2)把dll拷贝到exe的同级目录,3)在vs的工程配置中设置:调试/环境+dll的路径,我这里的是:

path=G:\openVINO\install\openvino_2021.4.752\deployment_tools\ngraph\lib;D:\opencv440\opencv\build\x64\vc15\bin;G:\openVINO\install\openvino_2021.4.752\deployment_tools\inference_engine\external\tbb\bin;G:\openVINO\install\openvino_2021.4.752\deployment_tools\inference_engine\bin\intel64\Release;

在这里插入图片描述

4.2 部署

openVINO的调用流程,准备模型,准备input和output blob,然后循环 input blob里填充数据 ==> 推理 ==> 后处理 的过程。具体可以查看官方文档,当时网上找资源看有没有现成的拿来用,但发现都不是很合适,所以偷懒还是不好哇,耐着性子读读文档,读得多了英文看起来也顺利了些。

在后处理阶段,输出维度是[batch,40,39],40是最多能识别40个字符,39是字符字典数+1,第一个元素是blank,然后是你的字典,我的字典在这里的是std::string dict = "-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ ";训练之前没注意在txt文件里面多了一个空行,所以总的字符就是1+10+26+1=38,在训练阶段会被自动在起始位填充一个blank字段,在解析输出的时候,需要排除第一个位置的预测的原因就在这里。

另外的字符的解析,例如把输出存为40行39列的图片,则按行找最大值的位置索引,一共有40个索引值,判断第一个位置索引值是不是零(为零表示预测为blank字段),如果不是则保留索引,从第二个开始依次和前面的索引比较,如果和前一个索引值不同,则保留索引,当遍历完40个索引的时候就得到了最后的索引表。

举例

maxIndex:每一行的最大值的索引,长度为40

[0,23,0,0,4,4,4,0,0,35 .。。。]

保留下来的索引值就是

[23,4,35。。。]

详见后处理代码。talk is cheap, show me your code

Talkcheap.png

5. 完整源码

  • paddleOCR.h

    #pragma once
    #ifndef _PADDLEOCR_H_
    #define _PADDLEOCR_H_
    
    #include "opencv.hpp"
    #include "inference_engine.hpp"
    
    void showImage(const cv::Mat & image, std::string name, int waitMode = 1, int windowMode = 1);
    
    /*
    Function name:
    	normalizeImage
    Parameters:
    	@image:	 Scource image
    	@out:	Destination image
    	@mean:	channels mean fo the train dataset
    	@stdv:		standard deviation of the train dataset
    
    image [0,255] convert to [0,1] ,image -=mean, image/=stdv;
    */
    void normalizeImage(const cv::Mat & image, cv::Mat & out, std::vector<double>mean, std::vector<double>stdv);
    
    /*
    Function name:
    	paddingImage
    Parameters:
    	@image:	Source image
    	@out:	Destination image of the same type as src and the size Size(src.cols+left+right,
    				src.rows+top+bottom)
    	@top:	 top pixels
    	@left:	left pixels
    	@bottom:	bottom pixels
    	@right:		right pixels;
    	@boderType:  frequently method cv::BORDER_REPLICATE ,cv::BORDER_CONSTANT
    				boderType more details https://docs.opencv.org/3.4/d2/de8/group__core__array.html#ga209f2f4869e304c82d07739337eae7c5
    	@Border value if borderType==BORDER_CONSTANT
    This function use opencv cv::copyMakeBoder to padding image
    More details	https://docs.opencv.org/3.4/d2/de8/group__core__array.html#ga2ac1049c2c3dd25c2b41bffe17658a36
    */
    void paddingImage(const cv::Mat & image, cv::Mat & out,
    	int top, int left, int bottom, int right,
    	int bodeyType, const cv::Scalar& value = cv::Scalar());
    
    
    /*
    Function name :
    	paddleOCRPreprocess
    Parameters:
    	@image:		Source image
    	@out:		Destination image
    	@targetHeight:	target image height ,in paddleOCRv3, input height is 48
    	@targetWidth:		target image width, in paddleOCRv3, input width is 320
    	@mean:	channels mean of the training set
    	@stdv:		standard deviation of the training set,size of mean and stdv should be equal to image.channels() 
    Briefs:
    	image ==> padding ==> normalize
    */
    void paddleOCRPreprocess(const cv::Mat & image, cv::Mat & out, const int targetHeight, const int targetWidth,
    											std::vector<double>mean, std::vector<double>stdv);
    
    
    void paddleOCRPostProcess(cv::Mat &output, std::string &result, float &prob);
    
    void demo();
    
    #endif // !_PADDLEOCR_H_
    
    
    
  • paddleOCR.cpp

#include <iostream>
#include <time.h>

#include "opencv.hpp"

#include "paddleOCR.h"

#define SPEED_TEST


void showImage(const cv::Mat & image, std::string name, int waitMode, int windowMode)
{
	if (image.empty())
	{
		std::cout << "ERROR: In showImage the input image is empty!\n";
		return;
	}
	if (waitMode < 0)
		waitMode = 0;
	if (windowMode != 0 || windowMode != 1)
		windowMode = cv::WINDOW_AUTOSIZE;


	cv::namedWindow(name, windowMode);
	cv::imshow(name, image);
	cv::waitKey(waitMode);
}


void normalizeImage(const cv::Mat &image, cv::Mat & out, std::vector<double>mean, std::vector<double>stdv)
{
	if (image.empty())
		throw "normalizeImage input image is empty()!";
	if (mean.size() != stdv.size())
		throw "normalizeImage mean.size() != stdv.size()!";
	if (mean.size() != image.channels())
		throw "normalizeImage mean.size() != image.channels()";

	for (double stdv_item : stdv)
	{
		//if standard deviation is zero, the image's all pixels are same 
		if (stdv_item == 0)
			throw "normalizeImage stdv is zero";
	}

	image.convertTo(out, CV_32F, 1.0 / 256.0f, 0);

	if (out.channels() == 1)
	{
		out -= mean[0];
		out /= stdv[0];
	}
	else if (out.channels() > 1)
	{
		std::vector<cv::Mat> channelImage;
		cv::split(out, channelImage);
		for (int i = 0; i < out.channels(); i++)
		{
			channelImage[i] -= mean[i];
			channelImage[i] /= stdv[i];
		}
		cv::merge(channelImage, out);
	}

	return;
}


void paddingImage(const cv::Mat & image, cv::Mat & out,
	int top, int left, int bottom, int right,
	int bodeyType, const cv::Scalar& value)
{
	if (image.empty())
		throw  "padding input image is empty()!";

	cv::copyMakeBorder(image, out, top, bottom, left, right, bodeyType, value);

	return;
}



void paddleOCRPreprocess(const cv::Mat & image, cv::Mat & out, const int targetHeight, const int targetWidth,
											std::vector<double>mean,std::vector<double>stdv)
{
	if (image.empty())
		throw "paddleOCRPreprocess : input image is empty()\n";
	if (targetHeight <= 0 || targetWidth <= 0)
		throw "paddleOCRPreprocess target size error targetHeight<=0 || targetWidth<=0";

	//Resize image 
	//Adjust the height of the original image to match the height of the target image
	int sourceWidth = image.cols;
	int sourceHeight = image.rows;
	//double targetWHRatio = (double)targetWidth / targetHeight;
	double sourceWHRatio = (double)sourceWidth / sourceHeight;

	int newHeight = targetHeight;
	int newWidth = newHeight * sourceWHRatio;
	
	if (newWidth > targetWidth)
		newWidth = targetWidth;
	cv::resize(image, out, cv::Size(newWidth, newHeight));

	//Normalize image
	normalizeImage(out, out, mean, stdv);
	
	//Padding image
	//the resized image's height is always equal to targetHeight,but width will not
	if (newWidth < targetWidth)
	{
		int right = targetWidth - newWidth;
		//paddingImage(out, out, 0, 0, 0, right, cv::BORDER_REPLICATE);// 按最后一行填充
		paddingImage(out, out, 0, 0, 0, right, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));//0 填充
	}
	//showImage(out, "padding",1,0);
}


void paddleOCRPostProcess(cv::Mat &output, std::string &result, float &prob)
{
	std::string dict = ".-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
	if (output.empty())
		return ;

	result = "";;

	int h = output.rows;
	int w = output.cols;
	std::vector<int> maxIndex;
	std::vector<float>maxProb;

	double maxVal;
	cv::Point maxLoc;
	for (int row = 0; row < h; row++)
	{
		cv::Mat temp(1, w,CV_32FC1,output.ptr<float>(row));
		cv::minMaxLoc(temp, NULL, &maxVal, NULL, &maxLoc);
		maxIndex.push_back(maxLoc.x);
		maxProb.push_back((float)maxVal);
	}

	std::vector<int>selectedIndex;
	std::vector<float>selectedProb;
	//在maxIndex中找出与前一个的index不一样且不为0的位置,
	//先判断第一个元素         
	if (maxIndex.size() != 0 && maxIndex[0] != 0)
	{
		selectedIndex.push_back(maxIndex[0]);
		selectedProb.push_back(maxProb[0]);        
	}
	for (int i = 1; i < maxIndex.size() ; i++)
	{
		if (maxIndex[i] != maxIndex[i - 1] && maxIndex[i] != 0)
		{
			selectedIndex.push_back(maxIndex[i]);
			selectedProb.push_back(maxProb[i]);
		}
	}

	double meanProb = 0;
	for (int i = 0; i < selectedIndex.size(); i++)
	{
		result += dict[selectedIndex[i]];
		meanProb += selectedProb[i];
	}
	if (selectedIndex.size() == 0)
		meanProb = 0;
	else
		meanProb /= selectedIndex.size();
	prob = meanProb;
	return ;
}


void demo()
{
	using namespace std;
	string xmlPath, binPath, imageDirs;
	xmlPath = "K:\\model\\PaddleOCR\\onnx\\opv\\ppocr.xml";
	binPath = "K:\\model\\PaddleOCR\\onnx\\opv\\ppocr.bin";
	imageDirs = "K:\\imageData\\OCR\\ocr_dataset\\test";
	//imageDirs = "\\\\192.168.1.247\\Pictures\\imageAndModel\\paddle_OCR_dataset\\test\\real\\t2x";
	//imageDirs = "K:\\imageData\\OCR\\ocr_dataset\\bg_black";

	string inputNodeName = "x", outputNodeName = "softmax_2.tmp_0";
	      
	vector<double>mean = { 0.5,0.5,0.5 };
	vector<double>stdv = { 0.5,0.5,0.5 };
	const int targetHeight = 48;
	const int targetWidth = 320;

	vector<cv::String> imagePathList;
	cv::glob(imageDirs, imagePathList,1);


	//1. Create Inference Engine Core
	InferenceEngine::Core core;
	//InferenceEngine::CNNNetwork network;
	InferenceEngine::ExecutableNetwork executable_network;

	//2. (Optional). Configure Input and Output of the Model

	//3. Load the Model to the Device
	executable_network = core.LoadNetwork(xmlPath, "CPU");
		// show some information
	InferenceEngine::ConstInputsDataMap inputInfo = executable_network.GetInputsInfo();
	for (auto inputIter : inputInfo)
	{
		cout << "input node name :" << inputIter.first << endl;;
	}
	InferenceEngine::ConstOutputsDataMap outputInfo = executable_network.GetOutputsInfo();
	for (auto outputIter : outputInfo)
	{
		cout << "output node name : " << outputIter.first << endl;
		//use netron youcan see the output node name is : softmax_2.tmp_0
	}

	//4. Create an Inference Request¶
	InferenceEngine::InferRequest inferRequest =	executable_network.CreateInferRequest();
	inferRequest.Infer(); //warmup

	//5.1 Prepare input blob
		//input data must be aligned (resized manually) with a given blob size and have a correct color format
	InferenceEngine::Blob::Ptr inputBlobPtr = inferRequest.GetBlob(inputNodeName);
	InferenceEngine::SizeVector inputSize = inputBlobPtr->getTensorDesc().getDims();
	auto inputdata = inputBlobPtr->buffer()
		.as<InferenceEngine::PrecisionTrait<InferenceEngine::Precision::FP32>::value_type *>(); 
		//let's see the input size (NCHW)
	cout << "the input blob size: ";
	for (size_t item : inputSize)
		cout << item << ",";;
	cout << endl;

	//5.2 Prepare output blob
	InferenceEngine::Blob::Ptr outputBlobPtr = inferRequest.GetBlob(outputNodeName);
	auto outputData = outputBlobPtr->buffer().
		as<InferenceEngine::PrecisionTrait<InferenceEngine::Precision::FP32>::value_type *>();//是否可以提前而不是每次都调用
	InferenceEngine::SizeVector outputSize = outputBlobPtr->getTensorDesc().getDims();
	cout << "output blob size:";
	for (size_t item : outputSize)
		cout << item << ",";
	cout << endl;;


	cv::Mat image, input, output;
	for (cv::String imageDir : imagePathList)
	{
		image = cv::imread(imageDir);
		if (image.empty())
			continue;

		showImage(image, "original image",1,0);
#ifdef SPEED_TEST
		static double totalTime = 0;
		static double totalNum = 0;
		clock_t  start_time = clock();
#endif // SPEED_TEST

		size_t channels		= inputSize[1];
		size_t inputHeight	= inputSize[2];
		size_t inputWidth	= inputSize[3];
		rsize_t imageSize		= inputHeight * inputWidth;

		//Preprocess
		paddleOCRPreprocess(image, input, targetHeight, targetWidth, mean, stdv);

		//Prepare input data
		for (size_t pid = 0; pid < imageSize; ++pid)
		{
			for (size_t ch = 0; ch < channels; ++ch)
			{
				inputdata[imageSize*ch + pid] = input.at<cv::Vec3f>(pid)[ch];
			}
		}

		//6. Infer
		//for(int i=0;i<100;i++)
		inferRequest.Infer();
		 
		 cv::Mat temp(outputSize[1],outputSize[2],CV_32FC1,outputData );
		 output = temp;
		 //cout << output;
		 std::string result;
		 float prob;

		 //7. Postprocess
		 paddleOCRPostProcess(output,result,prob);

#ifdef SPEED_TEST
		 clock_t end_time = clock();
		 double  run_time = 1000 * (end_time - start_time) / CLOCKS_PER_SEC;
		 totalNum+=1;
		 totalTime += run_time;
		 cout << "run time:" << run_time << endl;;
		 cout << "total time: " << totalTime <<", totalNum: "<<totalNum<<", mean time: "<< totalTime/totalNum<<endl;;
#endif // SPEED_TEST


		 cv::Mat textScore = cv::Mat::zeros(200, 200, CV_8UC3);
		 cv::putText(textScore, "text:" + result, cv::Point(10, 50), 1, 1, cv::Scalar(0, 255, 0));
		 cv::putText(textScore, "score:" + std::to_string((int)(prob*100)), cv::Point(10, 120), 1, 1, cv::Scalar(0, 2, 250));
		 showImage(textScore, "textScore");

		 int c = cv::waitKey(0);
		 if (c == 27)
			 break;

	}

	return;
}
  • main.cpp

    #include <iostream>
    #include "paddleOCR.h"
    
    
    int main()
    {
    	std::cout << "Hello World!\n";
    	demo();
    
    	//testFunction();
    }
    
    
    
评论 13
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值