python文字识别_Python+Tesseract文字识别

最新推荐文章于 2024-08-12 10:27:56 发布

weixin_39671621

最新推荐文章于 2024-08-12 10:27:56 发布

阅读量707

点赞数

文章标签： python文字识别 tesseract 中文语言包

本文链接：https://blog.csdn.net/weixin_39671621/article/details/111294676

版权

本文介绍了在Windows上如何下载、安装和配置Tesseract，包括安装Python的pytesseract库，添加语言包环境变量，以及调整pytesseract库的Tesseract.exe路径。通过示例代码展示了Tesseract对英文和中文的识别效果，强调了图片质量和前期处理的重要性。

摘要由CSDN通过智能技术生成

Sunday, February 16, 2020 ---Andy 前言：文字识别是所有文字类识别的基础，比如身份证,火车票，证件等自动识别...。所以它在文字类识别尤为重要，故今天咱们来看下py+tsrt如何识别文字。闲话：Tesseract主要特点-->开源、免费、识别无需联网、可训练自己的字库。

一、Tesseract在Windows的下载、安装及配置

1-1 下载

1-1-1 安装包：https://digi.bib.uni-mannheim.de/tesseract/1-1-2 语言包：https://github.com/tesseract-ocr/tessdata (默认是支持英文的，中文识别需下载语言包：chi_tra.traineddata、chi_sim.traineddata)

1-2 安装及配置

1-2-1 安装Tesseract：双击安装包，选择安装位置，一直下一步就可以。1-2-2 安装python的pytesseract库：pip install pytesseract1-2-3 配置:

配置1(添加语言包环境变量)

配置2(修改pytesseract库调用Tesseract.exe位置)

配置3(添加中文语言包)

二、识别测试代码

from PIL import Image
import pytesseract

# 英文识别测试
img_en = Image.open('OCR_test_en.png')
ocr_result_en = pytesseract.image_to_string(img_en)
print(ocr_result_en)

# 中文识别测试
img_zh = Image.open('OCR_test_zh.png')
ocr_result_zh = pytesseract.image_to_string(img_zh, lang='chi_sim')
print(ocr_result_zh)