Paddle-OCR本地部署

云博士的AI课堂

已于 2024-11-05 17:12:20 修改

阅读量2k

点赞数 27

分类专栏：大模型技术开发与实践哈佛博后带你玩转机器学习 AI工具应用实践文章标签： paddle ocr paddle ocr 人工智能计算机视觉

于 2024-10-14 21:05:07 首次发布

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/l35633/article/details/142928873

版权

哈佛博后带你玩转机器学习同时被 3 个专栏收录

262 篇文章

订阅专栏

大模型技术开发与实践

257 篇文章

订阅专栏

AI工具应用实践

45 篇文章

订阅专栏

Paddle-OCR本地部署（GPU版）

项目地址：GitHub - PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
环境工具准备：Anaconda3，cuda12.4（本机），pycharm

Pytorch：pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121

本地部署：

打开anaconda终端

创建虚拟环境：conda create -n paddle python=3.10 -y

激活虚拟环境：conda activate paddle

安装应用包：pip install paddlepaddle-gpu

安装PaddleOCR whl包：pip install paddleocr
使用

命令行使用

（图像使用）：paddleocr --image_dir 图像路径 --use_angle_cls true --use_gpu true

在使用前可以先使用cd命令切换到图像存放目录

（pdf使用）：并且可以通过指定参数page_num来控制推理前面几页，默认为0，表示推理所有页。paddleocr --image_dir pdf路径 --use_angle_cls true --use_gpu true --page_num 0

运行结果：

Python脚本使用：

图像（官网参考代码）：

from paddleocr import PaddleOCR, draw_ocr

# Paddleocr目前支持的多语言语种可以通过修改lang参数进行切换

# 例如`ch`, `en`, `fr`, `german`, `korean`, `japan`

ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory

img_path = './imgs/11.jpg'

result = ocr.ocr(img_path, cls=True)

for idx in range(len(result)):

res = result[idx]

for line in res:

print(line)

# 显示结果

from PIL import Image

result = result[0]

image = Image.open(img_path).convert('RGB')

boxes = [line[0] for line in result]

txts = [line[1][0] for line in result]

scores = [line[1][1] for line in result]

im_show = draw_ocr(image, boxes, txts, scores, font_path='./fonts/simfang.ttf')

im_show = Image.fromarray(im_show)

im_show.save('result.jpg')

PDF（参考代码）：

from paddleocr import PaddleOCR, draw_ocr

# Paddleocr目前支持的多语言语种可以通过修改lang参数进行切换

# 例如`ch`, `en`, `fr`, `german`, `korean`, `japan`

PAGE_NUM = 10 # 将识别页码前置作为全局，防止后续打开pdf的参数和前文识别参数不一致 / Set the recognition page number

pdf_path = 'default.pdf'

ocr = PaddleOCR(use_angle_cls=True, lang="ch", page_num=PAGE_NUM) # need to run only once to download and load model into memory

# ocr = PaddleOCR(use_angle_cls=True, lang="ch", page_num=PAGE_NUM,use_gpu=0) # 如果需要使用GPU，请取消此行的注释并注释上一行 / To Use GPU,uncomment this line and comment the above one.

result = ocr.ocr(pdf_path, cls=True)

for idx in range(len(result)):

res = result[idx]

if res == None: # 识别到空页就跳过，防止程序报错 / Skip when empty result detected to avoid TypeError:NoneType

print(f"[DEBUG] Empty page {idx+1} detected, skip it.")

continue

for line in res:

print(line)

# 显示结果

import fitz

from PIL import Image

import cv2

import numpy as np

imgs = []

with fitz.open(pdf_path) as pdf:

for pg in range(0, PAGE_NUM):

page = pdf[pg]

mat = fitz.Matrix(2, 2)

pm = page.get_pixmap(matrix=mat, alpha=False)

# if width or height > 2000 pixels, don't enlarge the image

if pm.width > 2000 or pm.height > 2000:

pm = page.get_pixmap(matrix=fitz.Matrix(1, 1), alpha=False)

img = Image.frombytes("RGB", [pm.width, pm.height], pm.samples)

img = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)

imgs.append(img)

for idx in range(len(result)):

res = result[idx]

if res == None:

continue

image = imgs[idx]

boxes = [line[0] for line in res]

txts = [line[1][0] for line in res]

scores = [line[1][1] for line in res]

im_show = draw_ocr(image, boxes, txts, scores, font_path='doc/fonts/simfang.ttf')

im_show = Image.fromarray(im_show)

im_show.save('result_page_{}.jpg'.format(idx))

更多机器学习课程：

https://www.bilibili.com/cheese/play/ss27274

评论 1

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。