报错:pytesseract.pytesseract.TesseractError

搭建 Tesseract-OCR 环境及报错:pytesseract.pytesseract.TesseractError处理办法:

(1, ‘Error opening data file C:\Program Files\Tesseract-OCR/tessdata/chi_sim.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your “tessdata” directory. Failed loading language ‘chi_sim’ Tesseract couldn’t load any languages! Could not initialize tesseract.’)

pytesseract.pytesseract.tesseract_cmd=r"C:\Program Files\Tesseract-OCR\tesseract.exe"

def Tesseract_to_str(image):
    """提取图片中的文字,返回text字符串"""
    # 调用pytesseract库提取文字,识别中文需指定语言lang='chi_sim'
    text_from_image = pytesseract.image_to_string(image, lang='chi_sim')
    print(text_from_image)
    return text_from_image

上述代码运行时,报错如下:

File “D:\ProgramFiles\miniconda3\envs\env_myenv\Lib\site-packages\pytesseract\pytesseract.py”, line 489, in
Output.STRING: lambda: run_and_get_output(*args),
^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\ProgramFiles\miniconda3\envs\env_myenv\Lib\site-packages\pytesseract\pytesseract.py”, line 352, in run_and_get_output
run_tesseract(**kwargs)
File “D:\ProgramFiles\miniconda3\envs\env_myenv\Lib\site-packages\pytesseract\pytesseract.py”, line 284, in run_tesseract
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, ‘Error opening data file C:\Program Files\Tesseract-OCR/tessdata/chi_sim.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your “tessdata” directory. Failed loading language ‘chi_sim’ Tesseract couldn’t load any languages! Could not initialize tesseract.’)

原因查看搭建环境注意点:

1 搭建 Tesseract-OCR 环境。

1.1 注意需先手动安装Tesseract-OCR, 下载地址:https://digi.bib.uni-mannheim.de/tesseract/?C=M;O=D

注意:安装的时候选中中文包(安装时把所有选项都勾上)。

安装磁盘选择与运行的代码在同一磁盘。

安装 Tesseract-OCR 后,需将 Tesseract-OCR 对应的安装路径添加到系统环境变量中。

安装完成后,使用命令,查看版本号和支持语言:
cd D:\Program Files\Tesseract-OCR

tesseract -v tesseract --list-langs -v tesseract --list-langs 

若有语言方面的Error,需将中文包 chi_sim.traineddata 下载到本地C:\Program Files\Tesseract-OCR 路径下。

语言包下载地址:https://tesseract-ocr.github.io/tessdoc/Data-Files

1.2 再安装python库pytesseract

pip install pytesseract

1.3 此时再运行下述代码将不再报错

def Tesseract_to_str(image):
    """提取图片中的文字,返回text字符串"""
    # 如果没有将tesseract的安装目录添加到系统环境变量中,则需要指定安装路径,
    pytesseract.pytesseract.tesseract_cmd = r"D:\Program_Files\Tesseract-OCR\tesseract.exe"
    testdata_dir_config = '--tessdata-dir D:/Program_Files/Tesseract-OCR/tessdata'
    # 调用pytesseract库提取文字,识别中文需指定语言lang='chi_sim'
    print('-'*20,'获取图中的文字','-'*20)
    try:
        text_from_image = pytesseract.image_to_string(image,  config=testdata_dir_config, lang='chi_sim')
    except Exception as e:
        print('获取文字失败!         ', e)
        return ''
    print(text_from_image)
    return text_from_image
pytesseract.pytesseract.TesseractNotFoundError是一个错误,提示tesseract未安装或不在系统路径中。这个错误通常出现在使用pytesseract库时,因为pytesseract需要tesseract OCR引擎来进行图像识别。引用中提到,在cmd中可以运行tesseract,但在pycharm中运行时仍然出现错误。解决方法是在代码中添加以下代码来指定tesseract的安装路径: pytesseract.pytesseract.tesseract_cmd = 'E:\\software\\Tesseract-OCR\\tesseract.exe' 这样,pytesseract将能够找到正确的tesseract安装路径,从而解决TesseractNotFoundError错误。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* [报错:pytesseract.TesseractNotFoundError: tesseract is not installed or it’s not in your path](https://download.csdn.net/download/weixin_38551938/13749171)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 33.333333333333336%"] - *2* [pytesseract报错pytesseract.TesseractNotFoundError: tesseract is not installed or it‘s not in your ...](https://blog.csdn.net/weixin_45941288/article/details/131297776)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 33.333333333333336%"] - *3* [Python中使用pytesseract(tesseract OCR)报错(TesseractNotFoundError)解决方法](https://blog.csdn.net/qianya9/article/details/124094727)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 33.333333333333336%"] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

爱吃油淋鸡的莫何

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值