PaddleOCR使用

Kessity

已于 2022-02-16 11:58:17 修改

阅读量2.4w

点赞数 7

分类专栏： Artificial Intelligence 文章标签： paddlepaddle 人工智能深度学习

于 2019-04-08 10:13:22 首次发布

本文链接：https://blog.csdn.net/essity/article/details/89082526

版权

Artificial Intelligence 专栏收录该内容

7 篇文章

订阅专栏

一概述

项目源码：https://github.com/PaddlePaddle/PaddleOCR。
项目说明：https://mp.weixin.qq.com/s/i1Dm18qp93jzWnMoqRU2gA。

二快速安装

本节内容来自这里。

1 win10 python安装

1. 安装PaddlePaddle Fluid v2.0

pip3 install --upgrade pip
如果您的机器安装的是CUDA9或CUDA10，请运行以下命令安装
python3 -m pip install paddlepaddle-gpu==2.0.0b0 -i https://mirror.baidu.com/pypi/simple
如果您的机器是CPU，请运行以下命令安装
python3 -m pip install paddlepaddle==2.0.0b0 -i https://mirror.baidu.com/pypi/simple
更多的版本需求，请参照安装文档中的说明进行操作。

2. 克隆PaddleOCR repo代码

【推荐】git clone https://github.com/PaddlePaddle/PaddleOCR
如果因为网络问题无法pull成功，也可选择使用码云上的托管：
git clone https://gitee.com/paddlepaddle/PaddleOCR
注：码云托管代码可能无法实时同步本github项目更新，存在3~5天延时，请优先使用推荐方式。

3. 安装第三方库

cd PaddleOCR
pip3 install -r requirments.txt

注意，windows环境下，建议从这里下载shapely安装包完成安装，直接通过pip安装的shapely库可能出现[winRrror 126] 找不到指定模块的问题。

2 C++编译

参考列表：
https://blog.csdn.net/u010477528/article/details/109078267
环境：
1、C++预测库fluid_inference：https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/05_inference_deployment/inference/windows_cpp_inference.html
2、opencv
3、cmake 3.17.5选项：
在这里插入图片描述
4、vs2019，Release编译。

3 生成dll，c#调用

系列文章如何使用PaddleDetection做一个完整项目（三）
PaddleDetection在windows下的部署（二）
设置：
在这里插入图片描述

三 OCR模型列表

本节内容来自这里。

1 模型区别

PaddleOCR提供的可下载模型包括推理模型、训练模型、预训练模型、slim模型，模型区别说明如下：

模型类型	模型格式	简介
推理模型	model、params	用于python预测引擎推理，详情
训练模型、预训练模型	.pdmodel、.pdopt、*.pdparams	训练过程中保存的checkpoints模型，保存的是模型的参数，多用于模型指标评估和恢复训练
slim模型	*.nb	用于lite部署

2 模型分类

文本检测模型、文本识别模型、文本方向分类模型

四中文OCR模型快速使用

本节内容来自这里。

1 以超轻量级模型为例：

mkdir inference && cd inference
# 下载超轻量级中文OCR模型的检测模型并解压
wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar && tar xf ch_ppocr_mobile_v1.1_det_infer.tar
# 下载超轻量级中文OCR模型的识别模型并解压
wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar && tar xf ch_ppocr_mobile_v1.1_rec_infer.tar
# 下载超轻量级中文OCR模型的文本方向分类器模型并解压
wget https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar && tar xf ch_ppocr_mobile_v1.1_cls_infer.tar
cd ..

解压完毕后应有如下文件结构：

|-inference
    |-ch_ppocr_mobile_v1.1_det_infer
        |- model
        |- params
    |-ch_ppocr_mobile_v1.1_rec_infer
        |- model
        |- params
    |-ch_ppocr_mobile-v1.1_cls_infer
        |- model
        |- params
    ...

2 单张图像或者图像集合预测

以下代码实现了文本检测、识别串联推理，在执行预测时，需要通过参数 image_dir 指定单张图像或者图像集合的路径、参数 det_model_dir 指定检测inference模型的路径、参数 rec_model_dir 指定识别inference模型的路径、参数 use_angle_cls 指定是否使用方向分类器、参数 cls_model_dir 指定方向分类器inference模型的路径、参数 use_space_char 指定是否预测空格字符。可视化识别结果默认保存到 ./inference_results 文件夹里面。

# 预测image_dir指定的单张图像
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/"  --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True

# 预测image_dir指定的图像集合
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/"  --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True

# 如果想使用CPU进行预测，需设置use_gpu参数为False
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/"  --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True --use_gpu=False

通用中文OCR模型

请按照上述步骤下载相应的模型，并且更新相关的参数，示例如下：

# 预测image_dir指定的单张图像
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_server_v1.1_det_infer/"  --rec_model_dir="./inference/ch_ppocr_server_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True

注意：

如果希望使用不支持空格的识别模型，在预测的时候需要注意：请将代码更新到最新版本，并添加参数 --use_space_char=False。
如果不希望使用方向分类器，在预测的时候需要注意：请将代码更新到最新版本，并添加参数 --use_angle_cls=False。
更多的文本检测、识别串联推理使用方式请参考文档教程中基于Python预测引擎推理。

五 paddleocr package使用说明

本节内容来自这里。

1 安装whl包

pip安装：

pip install paddleocr

本地构建并安装：

python3 setup.py bdist_wheel
pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x是paddleocr的版本号

2 代码使用

检测+分类+识别全流程

from paddleocr import PaddleOCR, draw_ocr
# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语，可以通过修改lang参数进行切换
# 参数依次为`ch`, `en`, `french`, `german`, `korean`, `japan`。
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs/11.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)

# 显示结果
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')

结果是一个list，每个item包含了文本框，文字和识别置信度

[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['（45元/每公斤，100公斤起订）', 0.9676722]]
......

检测+识别

from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR() # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs/11.jpg'
result = ocr.ocr(img_path)
for line in result:
    print(line)

# 显示结果
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')

结果是一个list，每个item包含了文本框，文字和识别置信度

[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['（45元/每公斤，100公斤起订）', 0.9676722]]
......

分类+识别

from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True) # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg'
result = ocr.ocr(img_path, det=False, cls=True)
for line in result:
    print(line)

结果是一个list，每个item只包含识别结果和识别置信度

['韩国小馆', 0.9907421]

单独执行检测

from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR() # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs/11.jpg'
result = ocr.ocr(img_path, rec=False)
for line in result:
    print(line)

# 显示结果
from PIL import Image

image = Image.open(img_path).convert('RGB')
im_show = draw_ocr(image, result, txts=None, scores=None, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')

结果是一个list，每个item只包含文本框

[[26.0, 457.0], [137.0, 457.0], [137.0, 477.0], [26.0, 477.0]]
[[25.0, 425.0], [372.0, 425.0], [372.0, 448.0], [25.0, 448.0]]
[[128.0, 397.0], [273.0, 397.0], [273.0, 414.0], [128.0, 414.0]]
......

单独执行识别

from paddleocr import PaddleOCR
ocr = PaddleOCR() # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg'
result = ocr.ocr(img_path, det=False)
for line in result:
    print(line)

结果是一个list，每个item只包含识别结果和识别置信度

['韩国小馆', 0.9907421]

单独执行分类

from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True) # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg'
result = ocr.ocr(img_path, det=False, rec=False, cls=True)
for line in result:
    print(line)

结果是一个list，每个item只包含分类结果和分类置信度

['0', 0.9999924]
['0', 0.9999924]

六 DB 场景文字检测

本节内容来自这里。
文章Real-time Scene Text Detection with Differentiable Binarization 提出的文字检测模型 DB 是 Differentiable Binarization 缩写。
基于分割的方法通常需要设着一个阈值来判断像素是否属于文字区域。
$B_{i,j} = \begin{cases} 1\ \ \text{if } P_{i,j} >= t, \\ 0\ \ \text{otherwise}. \\ \end{cases}\\ \text{what t is the predefined threshold and (i, j) indicates the coordinate point in the map.}$
作者发现对每个像素分类的模型最终产生的概率分布会呈现出边界比较高概率的样子。所以为了让分割效果更加稳定引入了监督信息，
在这里插入图片描述
引入的方式是在普通的像素级的分类模型上增加一个辅助分支，来动态预测每个点的分割的阈值 Tij 然后用下面的公式来判断某个像素是否属于文字区域。
$\hat{B}_{i,j}=\frac{1}{1+e^{-k(P_{i,j}-T_{i,j})}}$
训练的时候需要一个 thresh_map 的真值，这个真值的生成方式是把文本区域 G 扩大到G_d 计算 G_d内的点到 G 的边的标准化的最小距离 dis, thresh_map 的值就是 1-dis ， G 的边界两边都是很高的阈值，到 G 的中部会越来越小，文章设着最大阈值为0.7 最小为 0.3
有了这个 thresh_map 和预测输出的 P 代如公式二就能还原出哪些点是文本区域
由于正负样本通常是不均匀的，所以要对不是文本区域的像素做采样，采样的策略是和 OHEM 的方式类似，选取被预测文文字区域高的像素作为负样本。
最后最预测的时候并不需要阈值的分支。有点儿类似增加了边界像素权重的意思。主要的好处是省去了复杂的后处理过程。但是感觉比较难训练，相对而言， PSENet 是预测多个缩小的文字区域，最后用广度优先的方式来合并成一个文本区域，训练会简单些，只是有个后处理步骤。他们都能对相邻很近的文本区域有比较好的效果。
在这里插入图片描述