1.本地环境准备
1.1 首先需要安装cudatoolkit, tensorrt,cudnn,opencv,paddle inference,paddle ocr,cmake, visual studio 2019,qt ,inno setup,下面1,2,3,5,6的版本需匹配,不然在生成exe文件或者dll文件会报错,如我在第一次安装时并不是严格按照上述版本安装,在生成exe文件时生成失败:
2>正在生成代码...
2>E:\paddle_ocr\projects\paddle_inference\paddle\lib\paddle_inference.dll : fatal error LNK1107: 文件无效或损坏: 无法在 0x3C0 处读取
2>已完成生成项目“ppocr.vcxproj”的操作 - 失败。
3>------ 已跳过全部重新生成: 项目: ALL_BUILD, 配置: Release x64 ------
3>没有为此解决方案配置选中要生成的项目
========== 全部重新生成: 成功 1 个,失败 1 个,跳过 1 个 ==========
盲猜可能是因为版本依赖关系,具体版本兼容关系可参考这篇文章的版本对应表部分,在安装完上述安装包后要在系统中配置相关环境变量
1.2 我安装的上述各个组件具体版本信息分别如下:
- cudatoolkit: 11.2.0
- tensorrt: 8.0.1.6
- cudnn: 8.2.0.53
- opencv: 4.5.2
- paddle inference: 2.3.2
- paddle ocr: 2.5
- cmake :3.26.3
- visual studio 2019
- qt : 5.12.10.
1.3系统中相关环境变量配置如下:
2.参考官方文档,使用cmake gui进行编译,使用vs2019生成dll动态链接库
2.1在cmake-gui选择要编译的paddle ocr 源码文件夹,以及编译输出的文件夹
2.2 点击界面下方的Configure
按钮,第一次点击会弹出提示框进行Visual Studio配置,选择你的Visual Studio版本即可,目标平台选择x64。然后点击finish
按钮即开始自动执行配置。第一次configure 读取cmakelists.txt文件会因找不到相关库而报错,按照实际安装的各个第三方库的路径进行配置后,因为我选择的是gpu版本,“WITH_GPU",“WITH_MKL’‘,’‘WITH_STATIC_LIB’‘,’‘WITH_TENSORRT’'四个选项都需打勾,再次点击可以成功configure,紧接着点击"Generate”,显示"Generate done" 后,点击"Open project"会在vs2019中打开该项目。
我的cmake-gui配置如下:
2.3 在vs2019项目属性页进行相关配置,需要将常规属性页中“配置类型”选择“动态库(.dll)“,在高级属性中的"目标文件扩展名"选择”.dll",其它如C/C++“常规”配置中的“附加包含目录”,和“链接器”中的“附加库目录”已经在上一步cmake-gui中生成项目的时候自动配置成功,不需要再手动添加,我的具体相关属性页配置如下:
2.4 因为要导出为dll动态链接库供其它项目调用,所以要修改paddl ocr 源码中的main.cpp,定义一个导出的处理图片函数ImageProcess(),这个函数主要通过实例化一个PPocr对象来对读取的图片进行文本检测和文本识别处理,从而返回相应结果,并利用utility的可视化函数VisualizeBboxes()结合检测框信息在待检测图片上绘制出检测框,除了修改main.cpp,还要额外添加一个ppocr.h头文件用来宏定义导出函数信息,上述两个文件源码如下
2.4.1ppocr.h文件内容
#pragma once
#include <vector>
#include <string>
#ifndef IMAGE_API
#define IMAGE_API
struct TextDetectionResult {
std::vector<std::vector<int>> boxes;
};
struct TextRecognitionResult {
std::string text;
double score;
};
extern "C" {
// 图像推理
__declspec(dllexport) void ImageProcess(const char* image_dir, TextDetectionResult*** detection_results, int* num_detection_results,
TextRecognitionResult*** recognition_results, int* num_recognition_results);
/*__declspec(dllexport) void FreeMemory(TextDetectionResult** detection_results, int num_detection_results,
TextRecognitionResult** recognition_results, int num_recognition_results);*/
}
#endif
2.4.2 main.cpp文件内容
#include <iostream>
#include <include/paddleocr.h>
#include <include/args.h>
#include "ppocr.h"
using namespace PaddleOCR;
// 处理图片的函数
void ImageProcess(const char* image_dir, TextDetectionResult*** detection_results, int* num_detection_results,
TextRecognitionResult*** recognition_results, int* num_recognition_results)
{
std::cout << "--------" << image_dir << "-------" << std::endl;
std::string dir(image_dir);
std::replace(dir.begin(), dir.end(), '/', '\\');
std::cout << "--------" << dir << "-------" << std::endl;
std::vector<cv::String> cv_all_img_names;
cv::glob(image_dir, cv_all_img_names);
std::cout << "total images num: " << cv_all_img_names.size() << endl;
PPOCR ocr = PPOCR();
std::cout << "begin process" << std::endl;
std::vector<std::vector<OCRPredictResult>> ocr_results = ocr.ocr(cv_all_img_names, FLAGS_det, FLAGS_rec, FLAGS_cls);
std::cout << "finish process" << std::endl;
auto ocr_result = ocr_results[0];
std::vector<TextDetectionResult> detectionResults;
std::vector<TextRecognitionResult> recognitionResults;
for (int i = 0; i < ocr_result.size(); i++) {
if (ocr_result[i].score != -1.0) {
TextDetectionResult detectionResult;
detectionResult.boxes = ocr_result[i].box;
TextRecognitionResult recognitionResult;
recognitionResult.text = ocr_result[i].text;
recognitionResult.score = ocr_result[i].score;
detectionResults.push_back(detectionResult);
recognitionResults.push_back(recognitionResult);
}
}
*num_detection_results = detectionResults.size();
*detection_results = new TextDetectionResult * [*num_detection_results];
for (int i = 0; i < *num_detection_results; i++) {
(*detection_results)[i] = new TextDetectionResult(detectionResults[i]);
}
*num_recognition_results = recognitionResults.size();
*recognition_results = new TextRecognitionResult * [*num_recognition_results];
for (int i = 0; i < *num_recognition_results; i++) {
(*recognition_results)[i] = new TextRecognitionResult(recognitionResults[i]);
}
std::cout << "in the end" << std::endl;
if (*num_recognition_results == 0) {
std::cout << "result is null" << std::endl;
}
///*c*/onst char* img_dir = "E:\paddlepaddle\projects\PaddleOCR-release-2.5\deploy\cpp_infer\qt_project\qt4ocr\imgs\1.jpg";
cv::Mat srcimg = cv::imread(dir, cv::IMREAD_COLOR);
if (!srcimg.data) {
std::cerr << "[ERROR] image read failed! image path: "
<< endl;
exit(1);
}
std::string file_name = Utility::basename(image_dir);
Utility::VisualizeBboxes(srcimg, ocr_results[0],
FLAGS_output + file_name);
std::cout << "***************************" << endl;
}
void FreeMemory(TextDetectionResult** detection_results, int num_detection_results,
TextRecognitionResult** recognition_results, int num_recognition_results)
{
for (int i = 0; i < num_detection_results; i++) {
delete detection_results[i];
}
delete[] detection_results;
for (int i = 0; i < num_recognition_results; i++) {
delete recognition_results[i];
}
delete[] recognition_results;
}
2.5 另外在arg.cpp文件中以相对路径写入了文本检测模型和文本识别模型路径,因此在QT程序中检测模型和识别模型下拉框并不可以选择,这个地方有待改进
2.6 右键解决方案即可在输出目录下生成相关dll,以及lib文件。一开始我在定义导出函数ImageProcess()的时候,函数返回类型为C++类型,由于定义的是C风格,只生成了dll文件,没有生成lib文件,导致第一次调用dll文件的时候不能得到检测结果和识别结果。
3.测试动态链接库
这部分内容参考这篇博文[](https://blog.csdn.net/weixin_45052870/article/details/126491550)
3.1 在vs2019中新建一个项目test_dll,只有一个源文件main.cpp,内容如下
#include <iostream>
#include <vector>
#include <string>
#include <Windows.h>
struct TextDetectionResult {
std::vector<std::vector<int>> boxes;
};
struct TextRecognitionResult {
std::string text;
double score;
};
typedef void (*ImageProcessFunc)(const char* image_dir, TextDetectionResult*** detection_results, int* num_detection_results,
TextRecognitionResult*** recognition_results, int* num_recognition_results);
void FreeMemory(TextDetectionResult** detection_results, int num_detection_results,
TextRecognitionResult** recognition_results, int num_recognition_results)
{
for (int i = 0; i < num_detection_results; i++) {
delete detection_results[i];
}
delete[] detection_results;
for (int i = 0; i < num_recognition_results; i++) {
delete recognition_results[i];
}
delete[] recognition_results;
}
int main()
{
system("chcp 65001");
const char* image_dir = "..//imgs//11.jpg";
const char* dll_path = ".//ppocr.dll";
HMODULE hModule = LoadLibraryA(dll_path);
if (hModule == NULL) {
std::cout << "Failed to load the DLL." << std::endl;
return 1;
}
ImageProcessFunc ImageProcess = (ImageProcessFunc)GetProcAddress(hModule, "ImageProcess");
if (ImageProcess == NULL) {
std::cout << "Failed to get the function address." << std::endl;
FreeLibrary(hModule);
return 1;
}
TextDetectionResult** detection_results = nullptr;
int num_detection_results = 0;
TextRecognitionResult** recognition_results = nullptr;
int num_recognition_results = 0;
ImageProcess(image_dir, &detection_results, &num_detection_results, &recognition_results, &num_recognition_results);
if (num_detection_results > 0) {
std::cout << "get the result" << std::endl;
for (int i = 0; i < num_detection_results; i++) {
std::vector<std::vector<int>> boxes = detection_results[i]->boxes;
std::cout << "det boxes: [";
for (int n = 0; n < boxes.size(); n++) {
std::cout << '[' << boxes[n][0] << ',' << boxes[n][1] << "]";
if (n != boxes.size() - 1) {
std::cout << ',';
}
}
std::string recognitionResult = recognition_results[i]->text;
std::cout << "] " << " " << " recognition text : " << recognitionResult << std::endl;
}
}
std::cout << "begin clear" << std::endl;
// 释放内存
FreeMemory(detection_results, num_detection_results, recognition_results, num_recognition_results);
// 卸载 DLL
FreeLibrary(hModule);
std::cout << "clear over" << std::endl;
return 0;
}
3.2 test_dll相关配置如下:
3.3 将ppocr.dll,ppocr.lib以及其它相关dll文件都复制到test_dll.exe文件路径下,双击test_dll.exe会闪退,选择cmd进入,输入
“ test_dll.exe .\imgs\11.jpg",即可在终端打印如下内容,检测结果和识别结果部分如下
4.编写qt程序演示
4.1 在vs2019中新建”QTWidget QApplication应用“项目,
4.2 在C/C++项目属性附加包含目录除了选择和ppocr项目相同的路径,还要包含QT相关的文件夹路径
4.3 在链接器常规选项中选择ppocr项目生成的动态链接库路径,并在输入选项选择ppocr.lib
具体配置如下:
4.4 ui界面布局如下:
4.5 界面的一些样式设置在”改变样式表“里设置,导入了一些提前准备好的资源,如logo,最小化,最大化,关闭按钮自定义的图片,项目运行显示exe的ico文件
4.6 整个QT程序主要由qt4ocr.cpp和qt4ocr.h文件组成,qt4ocr.cpp主要包括一些布局器的设置,清理内存的函数,鼠标拖拽窗口的函数,几个按钮的槽函数,如恢复正常化的槽函数NormalWindow(),最大化的槽函数MaxWindow(),上传图片的槽函数UploadImage(),处理图片的槽函数ProcessImage()
4.6.1 qt4ocr.cpp源码如下
#include "qt4ocr.h"
void QT4OCR::FreeMemory(TextDetectionResult** detection_results, int num_detection_results,
TextRecognitionResult** recognition_results, int num_recognition_results)
{
for (int i = 0; i < num_detection_results; i++) {
delete detection_results[i];
}
delete[] detection_results;
for (int i = 0; i < num_recognition_results; i++) {
delete recognition_results[i];
}
delete[] recognition_results;
}
QT4OCR::QT4OCR(QWidget *parent)
: QWidget(parent)
{
ui.setupUi(this);
//去除原窗口边框
setWindowFlags(Qt::FramelessWindowHint);
//布局head和body 垂直布局器
auto vlay = new QVBoxLayout();
//边框间距
vlay->setContentsMargins(0, 0, 0, 0);
//元素间距
vlay->setSpacing(0);
vlay->addWidget(ui.head);
vlay->addWidget(ui.body);
this->setLayout(vlay);
//水平布局器
auto hlay = new QHBoxLayout();
ui.body->setLayout(hlay);
//边框间距
hlay->setContentsMargins(0, 0, 0, 0);
hlay->addWidget(ui.left); //左侧任务列表
hlay->addWidget(ui.show); //右侧预览窗口
}
void QT4OCR::MaxWindow()
{
ui.max->setVisible(false);
ui.normal->setVisible(true);
showMaximized();
}
void QT4OCR::NormalWindow()
{
ui.max->setVisible(true);
ui.normal->setVisible(false);
showNormal();
}
void QT4OCR::UploadImage()
{
QString filePath = QFileDialog::getOpenFileName(this, " please select img", "", "image (*.png *.jpg *.jpeg)");
qDebug() << filePath;
imagePath = filePath;
qDebug() << imagePath;
// 检查用户是否取消选择文件
if (filePath.isEmpty()) {
return;
}
// 加载并显示图片
QPixmap pixmap_1(filePath);
ui.imgLabel->setPixmap(pixmap_1.scaled(ui.imgLabel->size(), Qt::KeepAspectRatio));
qDebug() << "show src img success";
}
void QT4OCR::ProcessImage()
{
system("chcp 65001");
qDebug() << imagePath;
// 检查是否已经选择了图片
if (imagePath.isEmpty()) {
qDebug() << "please select img first";
return;
}
// 获取图片文件名
QFileInfo fileInfo(imagePath);
QString imageName = fileInfo.fileName();
qDebug() << imageName;
const char* dllPath = "./ppocr.dll";
HMODULE hModule = LoadLibraryA(dllPath);
if (hModule == NULL) {
qDebug() << "failed to load the dll";
return;
}
ImageProcessFunc ImageProcess = (ImageProcessFunc)GetProcAddress(hModule, "ImageProcess");
if (ImageProcess == NULL) {
qDebug() << "failed to get the function address";
FreeLibrary(hModule);
return;
}
TextDetectionResult** detectionResults = nullptr;
int numDetectionResults = 0;
TextRecognitionResult** recognitionResults = nullptr;
int numRecognitionResults = 0;
qDebug() << "before process image";
qDebug() << imagePath;
//const char* image_dir = "E:\\paddlepaddle\\projects\\PaddleOCR-release-2.5\\deploy\\cpp_infer\\qt_project\\qt4ocr\\bin\\imgs\\1.jpg"; //imagePath.toUtf8().constData();
ImageProcess(imagePath.toUtf8().constData(), &detectionResults, &numDetectionResults, &recognitionResults, &numRecognitionResults);
qDebug() << "after process image";
detectionResult = ""; // 清空detectionResult
recognitionResult = ""; // 清空recognitionResult
if (detectionResults != nullptr && recognitionResults != nullptr) {
for (int i = 0; i < numDetectionResults; i++) {
std::vector<std::vector<int>> boxes = detectionResults[i]->boxes;
detectionResult += "Detboxes" + QString::number(i) + ": [";
for (int n = 0; n < boxes.size(); n++) {
detectionResult += "[" + QString::number(boxes[n][0]) + ", " + QString::number(boxes[n][1]) + "]";
if (n != boxes.size() - 1) {
detectionResult += ", ";
}
}
detectionResult += "]";
detectionResult += "\n";
recognitionResult += " Recogtext" + QString::number(i) + ": " + QString::fromStdString(recognitionResults[i]->text) + "\n";
}
// 设置字体和颜色
QFont font("Arial", 12); // 设置字体为Arial,大小为12
QColor color(255, 255, 255); // 设置颜色为白色
// 设置检测结果的字体和颜色
ui.detLabel->setText(detectionResult);
ui.detLabel->setFont(font);
ui.detLabel->setStyleSheet("color: " + color.name() + ";");
// 设置识别结果的字体和颜色
ui.recogLabel->setText(recognitionResult);
ui.recogLabel->setFont(font);
ui.recogLabel->setStyleSheet("color: " + color.name() + ";");
QString parentPath = ".//output//"; // 获取图片的路径
QString detimagePath = parentPath + imageName;
qDebug() << detimagePath;
QPixmap pixmap_2(detimagePath);
ui.detLabel_2->setPixmap(pixmap_2.scaled(ui.imgLabel->size(), Qt::KeepAspectRatio));
}
qDebug() << " begin clear";
// 释放内存
FreeMemory(detectionResults, numDetectionResults, recognitionResults, numRecognitionResults);
// 卸载 DLL
FreeLibrary(hModule);
qDebug() << " after clear";
}
//窗口大小发生编码
void QT4OCR::resizeEvent(QResizeEvent* ev)
{
int x = width() - ui.head_button->width();
int y = ui.head_button->y();
ui.head_button->move(x, y);
}
/
/// 鼠标拖动窗口
static bool mouse_press = false;
QPoint QT4OCR:: mouse_point;
void QT4OCR::mouseMoveEvent(QMouseEvent* ev)
{
if (!mouse_press)
{
QWidget::mouseMoveEvent(ev);
return;
}
this->move(ev->globalPos() - mouse_point);
}
void QT4OCR::mousePressEvent(QMouseEvent* ev)
{
if (ev->button() == Qt::LeftButton)
{
mouse_press = true;
mouse_point = ev->pos();
}
}
void QT4OCR::mouseReleaseEvent(QMouseEvent* ev)
{
mouse_press = false;
}
QT4OCR::~QT4OCR()
{
}
4.6.2 qt4ocr.h主要定义的文本检测和识别结果的结构体,几个类函数和槽函数的声明,以及类成员变量的定义,源码如下:
#pragma once
#include <QtWidgets/QWidget>
#include <QtWidgets/QLabel>
#include <QtGui/QPainter>
#include <QtGui/QMouseEvent>
#include <QtWidgets/QVBoxLayout>
#include <QtWidgets/QHBoxLayout>
#include <QtWidgets/QMessageBox>
#include <QtGui/QResizeEvent>
#include <Qtcore/QDebug>
#include <QtWidgets/QFileDialog>
#include <QtGui/QPixmap>
#include <QtGui/QMouseEvent>
#include <QtWidgets/QVBoxLayout>
#include <QtWidgets/QHBoxLayout>
#include <QtWidgets/QMessageBox>
#include <QtGui/QResizeEvent>
#include <Qtcore/QDebug>
#include <QtWidgets/QFileDialog>
#include <QtGui/QPixmap>
#include "ui_qt4ocr.h"
#include <Windows.h>
struct TextDetectionResult {
std::vector<std::vector<int>> boxes;
};
struct TextRecognitionResult {
std::string text;
double score;
};
typedef void (*ImageProcessFunc)(const char* image_dir, TextDetectionResult*** detection_results, int* num_detection_results,
TextRecognitionResult*** recognition_results, int* num_recognition_results);
void FreeMemory(TextDetectionResult** detection_results, int num_detection_results,
TextRecognitionResult** recognition_results, int num_recognition_results);
class QT4OCR : public QWidget
{
Q_OBJECT
public:
QT4OCR(QWidget *parent = nullptr);
void FreeMemory(TextDetectionResult** detection_results, int num_detection_results,
TextRecognitionResult** recognition_results, int num_recognition_results);
void mouseMoveEvent(QMouseEvent* ev) override;
void mousePressEvent(QMouseEvent* ev) override;
void mouseReleaseEvent(QMouseEvent* ev) override;
//窗口大小发生编码
void resizeEvent(QResizeEvent* ev) override;
//void ShowDet(TextDetectionResult** detection_results, int num_detection_results, TextRecognitionResult** recognition_results, int num_recognition_results);
~QT4OCR();
public slots:
void MaxWindow();
void NormalWindow();
void UploadImage();
void ProcessImage();
private:
Ui::QT4OCRClass ui;
QString imagePath;
QString detimagePath;
QString detectionResult;
QString recognitionResult;
QString imageName;
bool mouse_press = false;
static QPoint mouse_point;
};
4.7.运行演示QT程序
4.7.1开始运行
4.7.2 点击"上传图片”按钮后会显示原始图片
4.7.3点击"检测 and 识别" 按钮后会显示检测结果图,以及检测框信息,识别文本信息
5.使用QT自带windeployqt打包
5.1cmd进入windeployqt.exe所在的E:\QT\5.12.10\msvc2017_64\bin路径下
运行如下命令****此时就会把运行qt4ocr.exe相关的所有dll都放在该目录下,此时将这个文件夹压缩打包在另一台无安装环境的电脑也能成功运行qt4ocr.exe,具体可参考如下链接VS2019下打包QT项目的方法(包含第三方库)
6.使用Inno Setup制作安装包
6.1下载inno setup和中文插件,具体可按下文exe制作成安装包操作,即可生成一个setup.exe,双击即可安装自己开发的安装包
7.总结
由于本人专业水平有限,很多地方并不能面面俱到,可能会存在些许表达不对的地方,还请各位指正,另外写这篇文章参考了很多大佬的成果,如有侵权,请联系删除