python中的pytesseract包的安装、配置、使用

最新推荐文章于 2025-01-03 14:55:53 发布

GT_lonely_wolf

最新推荐文章于 2025-01-03 14:55:53 发布

阅读量1.1w

点赞数 16

分类专栏： pytesseract包文章标签：图像识别 python

本文链接：https://blog.csdn.net/gt_lonely_wolf/article/details/107760943

版权

pytesseract包专栏收录该内容

1 篇文章 0 订阅

订阅专栏

python中的pytesseract包的安装、配置、使用

pytesseract的使用

pytesseract的使用

1.pytesseract包的下载

使用命令下载：pip install pytesseract

2.识别图片的代码

from PIL import Image
import pytesseract
file_path = "test.jpg"
image = Image.open(file_path)
print(pytesseract.image_to_string(image))

图片：

输出：
输出结果

3.问题解决

1.单独的pytesseract包是无法运行的，需要下载Tesseract-OCR
2.下载链接：https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-4.00.00dev.exe
3.安装：按照默认的安装就行，我这里将安装路径改为D:\Programe Files\Tesseract-OCR，默认是C:\Programe Files\Tesseract-OCR
4.配置环境变量（这里配置的是系统变量）：在这里插入图片描述
5.找到下载的pytesseract包的pytesseract.py文件，修改一下配置

# 增加tessdata文件的路径变量
tessdata_dir_config = '--tessdata-dir "D:\\Programe Files\\Tesseract-OCR\\tessdata"'

# 修改一下tesseract_cmd变量
# 原来
# tesseract_cmd = 'tesseract'
# 修改为
tesseract_cmd = 'D:/Programe Files/Tesseract-OCR/tesseract.exe'

# 将tessdata_dir_config变量直接添加到image_to_string函数中
def image_to_string(
    image, lang=None, config=tessdata_dir_config, nice=0, output_type=Output.STRING, timeout=0,
):
    """
    Returns the result of a Tesseract OCR run on the provided image to string
    """
    args = [image, 'txt', lang, config, nice, timeout]