pyocr，一个超酷的Python库！

最新推荐文章于 2025-03-16 08:25:53 发布

黑马聊AI

最新推荐文章于 2025-03-16 08:25:53 发布

阅读量3k

点赞数 32

分类专栏： Python编程文章标签： python 开发语言

本文链接：https://blog.csdn.net/2401_83617404/article/details/140939562

版权

Python编程专栏收录该内容

73 篇文章

订阅专栏

pyocr 是一个用于光学字符识别（OCR）的 Python 库，它提供了一个简单的接口，允许开发者将图片中的文本提取出来。这个库是对 Tesseract-OCR 的封装，使得在 Python 环境中使用 OCR 技术变得更加便捷。

如何安装pyocr

首先，要使用 pyocr 库，您需要安装它。可以使用 pip 包管理工具来进行安装：

pip install pyocr

安装完成后，您可以在 Python 脚本中通过以下方式引入 pyocr 库：

import pyocr

同时，为了使用 OCR 功能，您还需要安装一个 OCR 引擎，如 Tesseract。具体的安装步骤取决于您的操作系统。在大多数情况下，可以使用以下命令：

# 对于Ubuntu/Debian系统
sudo apt-get install tesseract-ocr

# 对于macOS系统
brew install tesseract

# 对于Windows系统，您需要下载安装包
# 访问https://github.com/UB-Mannheim/tesseract/wiki，根据说明进行安装

安装完 OCR 引擎后，您需要确保 pyocr 能找到它。这可以通过以下代码实现：

pyocr.set_path('/usr/bin/tesseract')  # 根据实际安装路径调整

这样，您就可以在 Python 程序中使用 pyocr 库进行 OCR 操作了。

pyocr的功能特性

多语言支持：pyocr 支持多种语言的 OCR 识别，包括英文、中文等。
平台兼容：在 Windows、Linux 和 macOS 等多个平台上都能运行。
模块化设计：pyocr 将 OCR 功能模块化，方便扩展和自定义。
易于集成：可以轻松集成到各种 Python 项目中，提高项目效率。
高精度识别：提供高精度的文本识别功能，适用于多种场景。

pyocr的基本功能

文本识别

pyocr 提供了强大的文本识别功能，可以将图像中的文字转换成字符串。

from PIL import Image
from pyocr import pyocr

# 创建OCR工具
tool = pyocr.get_available_tools()[0]

# 加载图像
image = Image.open('path_to_image.jpg')

# 使用OCR工具进行文本识别
text = tool.image_to_string(image, lang='eng')
print(text)

语言支持

pyocr 支持多种语言识别，可以根据实际需求选择不同的语言。

# 使用法语进行识别
text_fr = tool.image_to_string(image, lang='fra')
print(text_fr)

字符串输出格式

pyocr 支持不同的输出格式，例如输出为带格式信息的字符串。

# 输出带格式信息的字符串
text_box = tool.image_to_string(image, builder=pyocr.TesseractBuilder(), lang='eng')
print(text_box)

图像处理

pyocr 允许对图像进行预处理，如调整大小、旋转等，以提高识别准确率。

from PIL import ImageFilter

# 对图像进行模糊处理
image_filtered = image.filter(ImageFilter.BLUR)

# 识别处理后的图像
text_filtered = tool.image_to_string(image_filtered, lang='eng')
print(text_filtered)

识别结果调整

pyocr 允许对识别结果进行调整，如去除特殊字符、纠正错误等。

import re

# 去除特殊字符
clean_text = re.sub(r'[^a-zA-Z0-9\s]', '', text)
print(clean_text)

pyocr的高级功能

支持多种语言识别

pyocr 支持多种语言的识别，不仅可以识别英文，还可以识别中文、数字等多种语言。

from PIL import Image
import pytesseract

# 加载中文训练库
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
# 设置语言为中文
custom_oem_psm_config = r'--oem 3 --psm 6'
text = pytesseract.image_to_string(Image.open('chinese_text.jpg'), lang='chi_sim', config=custom_oem_psm_config)
print(text)

识别图片中的表格

pyocr 可以识别图片中的表格，并将表格内容提取出来。

from PIL import Image
import pytesseract

# 识别表格内容
table_image = Image.open('table_image.jpg')
table_text = pytesseract.image_to_string(table_image, config='--psm 6')
print(table_text)

识别图片中的手写文字

pyocr 可以识别图片中的手写文字，但识别准确度可能受到手写风格的影响。

from PIL import Image
import pytesseract

# 识别手写文字
handwriting_image = Image.open('handwriting.jpg')
handwriting_text = pytesseract.image_to_string(handwriting_image, config='--psm 6')
print(handwriting_text)

识别图片中的复杂布局

pyocr 可以处理复杂布局的图片，如包含多种字体、颜色和大小不一的文字。

from PIL import Image
import pytesseract

# 识别复杂布局的图片
complex_image = Image.open('complex_layout.jpg')
complex_text = pytesseract.image_to_string(complex_image, config='--psm 6')
print(complex_text)

自定义识别参数

pyocr 允许用户自定义识别参数，以优化识别结果。

from PIL import Image
import pytesseract

# 自定义参数
custom_config = r'--psm 6 -c tessedit_char_whitelist=0123456789'
image_with_numbers = Image.open('numbers.jpg')
numbers_text = pytesseract.image_to_string(image_with_numbers, config=custom_config)
print(numbers_text)

输出识别结果的置信度

pyocr 可以输出识别结果的置信度，帮助用户评估识别结果的准确性。

from PIL import Image
import pytesseract

# 输出识别结果的置信度
image = Image.open('text_image.jpg')
data = pytesseract.image_to_data(image, output_type=Output.DICT)
n_boxes = len(data['text'])
for i in range(n_boxes):
    if int(data['conf'][i]) > 60:  # 仅输出置信度大于60的文字
        (x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])
        text = data['text'][i]
        print(f'({x}, {y}, {w}, {h}) {text} - {data["conf"][i]}')

pyocr的实际应用场景

身份证信息识别

在处理用户身份证信息时，可以使用pyocr来识别身份证上的文字信息。以下是一个示例代码：

import cv2
from PIL import Image
import pytesseract
from pyocr import tesseract

# 读取身份证图片
image_path = 'id_card.jpg'
image = cv2.imread(image_path)

# 转换为PIL格式
image = Image.fromarray(image)

# 使用pytesseract进行文字识别
text = pytesseract.image_to_string(image, lang='chi_sim')

print("识别结果：", text)

验证码识别

在网站登录或注册时，验证码是常见的防止机器行为的一种手段。使用pyocr可以轻松识别验证码。

from PIL import Image
import pytesseract
from pyocr import tesseract

# 读取验证码图片
image_path = 'captcha.jpg'
image = Image.open(image_path)

# 使用pytesseract进行文字识别
text = pytesseract.image_to_string(image, lang='eng')

print("验证码识别结果：", text)

文档扫描与文字提取

将纸质文档转换为电子文档时，pyocr可以帮助提取文档中的文字信息。

from PIL import Image
import pytesseract
from pyocr import tesseract

# 读取文档图片
image_path = 'document.jpg'
image = Image.open(image_path)

# 使用pytesseract进行文字识别
text = pytesseract.image_to_string(image, lang='chi_sim')

print("文档文字提取结果：", text)

二维码识别

在移动支付等场景中，识别二维码中的信息至关重要。以下是一个示例：

import cv2
from pyzbar.pyzbar import decode

# 读取二维码图片
image_path = 'qrcode.jpg'
image = cv2.imread(image_path)

# 使用pyzbar进行二维码识别
data = decode(image)[0].data.decode('utf-8')

print("二维码识别结果：", data)

图像中文字识别与翻译

对于图像中的文字，可以使用pyocr进行识别并翻译为其他语言。

from PIL import Image
import pytesseract
from pyocr import tesseract
from googletrans import Translator

# 读取图片
image_path = 'image_with_text.jpg'
image = Image.open(image_path)

# 使用pytesseract进行文字识别
text = pytesseract.image_to_string(image, lang='chi_sim')

# 使用Google翻译API进行翻译
translator = Translator()
translated_text = translator.translate(text, src='zh-cn', dest='en').text

print("识别并翻译后的结果：", translated_text)

交通事故处理

在交通事故处理中，使用pyocr识别现场照片中的车牌号码，以便快速确定事故车辆。

from PIL import Image
import pytesseract
from pyocr import tesseract

# 读取现场照片
image_path = 'accident_scene.jpg'
image = Image.open(image_path)

# 使用pytesseract进行车牌号码识别
text = pytesseract.image_to_string(image, lang='chi_sim')

print("车牌号码识别结果：", text)

总结

通过对pyocr库的学习与实践，我们掌握了如何利用Python进行OCR文字识别的基本技能，同时也了解了pyocr的高级应用。从简单的文字识别到复杂文档的处理，pyocr都表现出强大的功能和灵活性。它不仅可以帮助我们快速开发出满足需求的OCR应用，还能够在各种实际场景中发挥重要作用。不断探索与实践，我们定能在程序开发的道路上走得更远。

编程、AI、副业交流：https://t.zsxq.com/19zcqaJ2b
领【150 道精选 Java 高频面试题】请 go 公众号：码路向前。