Tesseract,一款由HP实验室开发由Google维护的开源OCR(Optical Character Recognition , 光学字符识别)引擎,特点是开源,免费,支持多语言,多平台。
下载地址:https://github.com/tesseract-ocr/tesseract/wiki
这里下载的是Windows版本
运行exe文件
在这里选择新增语言数据文件(默认只有英文,如果需要中文请勾选)
运行
错误1
Traceback (most recent call last):
File "C:/Users/User-name/PycharmProjects/orc/testforpackets.py", line 9,
version = pt.get_tesseract_version()
File "C:\Users\User-name\AppData\Local\Programs\Python\Python36\lib\site-packages\pytesseract\pytesseract.py", line 69, in wrapper
wrapper._result = func(*args, **kwargs)
File "C:\Users\User-name\AppData\Local\Programs\Python\Python36\lib\site-packages\pytesseract\pytesseract.py", line 264, in get_tesseract_version
raise TesseractNotFoundError()
pytesseract.pytesseract.TesseractNotFoundError: tesseract.exe is not installed or it's not in your path
需要修改一下pytesseract.py中的tesseract_cmd指向的路径
错误2
Error opening data file \Program Files (x86)\Tesseract-OCR\tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
添加TESSDATA_PREFIX的环境变量,设置为安装目录下的tessdata目录
如:C:\Tesseract-OCR\tessdata
如果还是不行请重启电脑