windows下pytesseract识别验证码遇到的WindowsError: [Error 2] 的解决方法

安装PIL+pytesseract
安装很简单,参考http://www.waitalone.cn/python-php-ocr.html

从http://www.lfd.uci.edu/~gohlke/pythonlibs/里面下载pillow选择自己的版本即可, 我是2.7,然而这里有个问题,明明我机子是64位的,我下载了64位的whl然后pip安装的时候居然报错了,说格式不支持,然后我就去下载32位了,居然特么的安装上了。算了....


然后

pip install pytesseract

安装成功后执行脚本:


from PIL import Image
from pytesseract import image_to_string
image = Image.open(r'7364.png')  # Open image object using PIL

<pre name="code" class="python"><pre name="code" class="plain">报错,错误如下:

Traceback (most recent call last):
  File "F:/spider/test.py", line 4, in <module>
    print image_to_string(image)     # Run tesseract.exe on image  
  File "C:\Users\tandazhao\spider_venv\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string
    config=config)
  File "C:\Users\tandazhao\spider_venv\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract
    stderr=subprocess.PIPE)
  File "C:\Python27\Lib\subprocess.py", line 711, in __init__
    errread, errwrite)
  File "C:\Python27\Lib\subprocess.py", line 959, in _execute_child
    startupinfo)
WindowsError: [Error 2] 

print image_to_string(image) # Run tesseract.exe on image

上网找解决方法,说是pytesseract.py 里面的

tesseract_cmd = 'tesseract' 改成  tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

好,我改

再次运行,嗯,再次报错

Traceback (most recent call last):
  File "F:/spider/test.py", line 4, in <module>
    print image_to_string(image)     # Run tesseract.exe on image  
  File "C:\Users\tandazhao\spider_venv\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string
    config=config)
  File "C:\Users\tandazhao\spider_venv\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract
    stderr=subprocess.PIPE)
  File "C:\Python27\Lib\subprocess.py", line 711, in __init__
    errread, errwrite)
  File "C:\Python27\Lib\subprocess.py", line 959, in _execute_child
    startupinfo)
WindowsError: [Error 2] 

呵呵哒,仔细看命令,发现windows下\t转义了。。。。然后在tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'前面加个r,

tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

执行,OK,识别出来了

C:\Users\tandazhao\spider_venv\Scripts\python.exe F:/spider/test.py
7364

Process finished with exit code 0


哈哈哈



阅读更多
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/bigzhao_25/article/details/52350781
上一篇C++中虚函数的继承
下一篇ubuntu server 16.04 配置网络
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页

关闭
关闭