初次练习使用pytesser3识别简单验证码时,遇到了‘gbk’编码不能识别的问题,经过一番折腾后解决了,特此记录下来并分享给大家!
验证码识别代码为:
from PIL import Image
import pytesser3
im = Image.open("captcha.gif", )
print(pytesser3.image_file_to_string("captcha.gif"))
print(pytesser3.image_to_string(im))
在这之前我已经安装好PIL, Tesseract-OCR并修改了环境变量。也修改了pytesser3包__init__.py中的tesseract_exe_name 为tesseract-OCR的安装路径。
运行上述代码后,报错如下:
Traceback (most recent call last):
File ".../pytesser3_try.py", line 6, in <module>
print(pytesser3.image_file_to_string("captcha.gif"))
File "C:\Users\1\AppData\Roaming\Python\Python36\site-packages\pytesser3\__init__.py", line 44, in image_file_to_string
text = util.retrieve_text(scratch_text_name_root)
File "C:\Users\1\AppData\Roaming\Python\Python36\site-packages\pytesser3\util.py", lin