基于VS2019的paddle-ocr编译、测试和代码精简

目录

1、编译过程

2、修改工程代码

3、精简代码


1、编译过程

下载源码GitHub - PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

过程参考源码里的这个md就行

 我是用的GPU实现OCR,所以需要用到cuda11.4(这个是编译过程中提示的)和tenssorRT(这个用的是8.4.1.5 GA版本)opencv4.8.0

编译过程参考官方流程,也可以搜下别的博客,这里不赘述

按流程走完,打开工程

2、修改工程代码

首先切换到release

 然后找到

 为了避免每次都要在命令行里配置相关参数

这里直接改下DEFINE

需要注意的是这个位置

DEFINE_string(rec_char_dict_path, "../../../../ppocr/utils/ppocr_keys_v1.txt",
              "Path of dictionary.");

这个官方的生成的路径是../../ppocr/utils/ppocr_keys_v1.txt

路径不对,会提示找不到ppocr_keys_v1.txt

然后下载rec和det文件,选轻量化的就行

分别解压,修改文件名字为det和rec 

在args.cpp文件里找到对应的DEFINE

DEFINE_string(rec_model_dir, "rec", "Path of rec inference model.");

DEFINE_string(det_model_dir, "det", "Path of det inference model.");

然后新建一个img文件夹,放上待处理图片

然后修改参数

DEFINE_string(image_dir, "./img", "Dir of input image.");

点击重新生成

然后在这个目录打开shell   执行 .\ppocr.exe

我这里放了100张图片分辨率640*137,显卡是1050TI

用时4888ms 

// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#include <gflags/gflags.h>

// common args
DEFINE_bool(use_gpu, true, "Infering with GPU or CPU.");
DEFINE_bool(use_tensorrt, false, "Whether use tensorrt.");
DEFINE_int32(gpu_id, 0, "Device id of GPU to execute.");
DEFINE_int32(gpu_mem, 4000, "GPU id when infering with GPU.");
DEFINE_int32(cpu_threads, 10, "Num of threads with CPU.");
DEFINE_bool(enable_mkldnn, false, "Whether use mkldnn with CPU.");
DEFINE_string(precision, "fp32", "Precision be one of fp32/fp16/int8");
DEFINE_bool(benchmark, false, "Whether use benchmark.");
DEFINE_string(output, "./output/", "Save benchmark log path.");
DEFINE_string(image_dir, "./img", "Dir of input image.");
DEFINE_string(
    type, "ocr",
    "Perform ocr or structure, the value is selected in ['ocr','structure'].");
// detection related
DEFINE_string(det_model_dir, "det", "Path of det inference model.");
DEFINE_string(limit_type, "max", "limit_type of input image.");
DEFINE_int32(limit_side_len, 960, "limit_side_len of input image.");
DEFINE_double(det_db_thresh, 0.3, "Threshold of det_db_thresh.");
DEFINE_double(det_db_box_thresh, 0.6, "Threshold of det_db_box_thresh.");
DEFINE_double(det_db_unclip_ratio, 1.5, "Threshold of det_db_unclip_ratio.");
DEFINE_bool(use_dilation, false, "Whether use the dilation on output map.");
DEFINE_string(det_db_score_mode, "slow", "Whether use polygon score.");
DEFINE_bool(visualize, true, "Whether show the detection results.");
// classification related
DEFINE_bool(use_angle_cls, false, "Whether use use_angle_cls.");
DEFINE_string(cls_model_dir, "", "Path of cls inference model.");
DEFINE_double(cls_thresh, 0.9, "Threshold of cls_thresh.");
DEFINE_int32(cls_batch_num, 1, "cls_batch_num.");
// recognition related
DEFINE_string(rec_model_dir, "rec", "Path of rec inference model.");
DEFINE_int32(rec_batch_num, 6, "rec_batch_num.");
DEFINE_string(rec_char_dict_path, "../../../../ppocr/utils/ppocr_keys_v1.txt",
              "Path of dictionary.");
DEFINE_int32(rec_img_h, 48, "rec image height");
DEFINE_int32(rec_img_w, 320, "rec image width");

// layout model related
DEFINE_string(layout_model_dir, "", "Path of table layout inference model.");
DEFINE_string(layout_dict_path,
              "../../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt",
              "Path of dictionary.");
DEFINE_double(layout_score_threshold, 0.5, "Threshold of score.");
DEFINE_double(layout_nms_threshold, 0.5, "Threshold of nms.");
// structure model related
DEFINE_string(table_model_dir, "", "Path of table struture inference model.");
DEFINE_int32(table_max_len, 488, "max len size of input image.");
DEFINE_int32(table_batch_num, 1, "table_batch_num.");
DEFINE_bool(merge_no_span_structure, true,
            "Whether merge <td> and </td> to <td></td>");
DEFINE_string(table_char_dict_path,
              "../../ppocr/utils/dict/table_structure_dict_ch.txt",
              "Path of dictionary.");

// ocr forward related
DEFINE_bool(det, true, "Whether use det in forward.");
DEFINE_bool(rec, true, "Whether use rec in forward.");
DEFINE_bool(cls, false, "Whether use cls in forward.");
DEFINE_bool(table, false, "Whether use table structure in forward.");
DEFINE_bool(layout, false, "Whether use layout analysis in forward.");

3、精简代码

这个工程大概思路是遍历文件夹下的图片生成std::vector<cv::String> cv_all_img_names;

然后去void structure(std::vector<cv::String> &cv_all_img_names)这里面遍历图片名字生成CV图片流std::vector<cv::Mat> img_list;

其实OCR核心就是ocr.ocr(img_list, FLAGS_det, FLAGS_rec, FLAGS_cls);

 这个时候就可以进行精简代码了

删掉无用的输出和结果图片保存

首先定位到imwrite,找到在哪个函数里进行结果图片保存

以下是精简后的main.cpp

// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "opencv2/core.hpp"
#include "opencv2/imgcodecs.hpp"
#include "opencv2/imgproc.hpp"
#include <iostream>
#include <vector>

#include <include/args.h>
#include <include/paddleocr.h>
#include <include/paddlestructure.h>

using namespace PaddleOCR;

void ocr(std::vector<cv::String>& cv_all_img_names) {
    PPOCR ocr = PPOCR();

    std::vector<cv::Mat> img_list;
    std::vector<cv::String> img_names;
    for (int i = 0; i < cv_all_img_names.size(); ++i) {
        cv::Mat img = cv::imread(cv_all_img_names[i], cv::IMREAD_COLOR);
        if (!img.data) {
            std::cerr << "[ERROR] image read failed! image path: "
                << cv_all_img_names[i] << std::endl;
            continue;
        }
        img_list.push_back(img);
        img_names.push_back(cv_all_img_names[i]);
    }

    std::vector<std::vector<OCRPredictResult>> ocr_results =
        ocr.ocr(img_list, FLAGS_det, FLAGS_rec, FLAGS_cls);

    for (int i = 0; i < img_names.size(); ++i) {
        Utility::print_result(ocr_results[i]);
    }
}


int main(int argc, char** argv) {

    std::vector<cv::String> cv_all_img_names;
    cv::glob(FLAGS_image_dir, cv_all_img_names);

    ocr(cv_all_img_names);
   
}

为了方便实际运用,再将cv::glob遍历图片的过程去掉,改为直接图片流push_back

// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "opencv2/core.hpp"
#include "opencv2/imgcodecs.hpp"
#include "opencv2/imgproc.hpp"
#include <iostream>
#include <vector>

#include <include/args.h>
#include <include/paddleocr.h>
#include <include/paddlestructure.h>
#include <chrono>  
using namespace std::chrono;
using namespace PaddleOCR;

void ocr(std::vector<cv::Mat> img_list) {
    PPOCR ocr = PPOCR();

    std::vector<std::vector<OCRPredictResult>> ocr_results =
        ocr.ocr(img_list, FLAGS_det, FLAGS_rec, FLAGS_cls);

    for (int i = 0; i < img_list.size(); ++i) {
        Utility::print_result(ocr_results[i]);
    }
}


int main(int argc, char** argv) {

    high_resolution_clock::time_point  startTime, endTime;
    cv::Mat image1 = cv::imread("./img/test1.png");
    cv::Mat image2 = cv::imread("./img/test2.png");
    cv::Mat image3 = cv::imread("./img/test3.png");
    cv::Mat image4 = cv::imread("./img/test4.png");
    cv::Mat image5 = cv::imread("./img/test5.png");

    std::vector<cv::Mat> img_list;

    img_list.push_back(image1);
    img_list.push_back(image2);
    img_list.push_back(image3);
    img_list.push_back(image4);
    img_list.push_back(image5);


    startTime = high_resolution_clock::now();
    ocr(img_list);
    endTime = high_resolution_clock::now();
    long   interval_time = (duration_cast<milliseconds>(endTime - startTime).count());
    printf("\r\ntime = %d ", interval_time);

   
}

五张图2078ms确实很慢,应该是OCR机制的问题,但是上批处理就很快了,1050TI能做到50-60ms一张640*480分辨率RGB图

提升OCR速度可以考虑灰度化图像、二值化图像、裁剪关键区域等思路

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值