参考:https://zhuanlan.zhihu.com/p/138622393
1.pytesseract
pytesseract是google做的ocr库,可以识别图片中的文字,一般用在爬虫登录时验证码的识别,在安装pytesseract环境过程中会遇到各种坑的事情,如果你需要安装,可以按照如下流程去做,避免踩坑。
1.1安装pytesseract
pip install -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com requests
但是这样安装的pytesseract并不能处理图片,会报错
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.
需要在本机安装程序:tesseract ,安装步骤如下:
- pytesseract官方文档:https://pypi.org/project/pytesseract/
- tesseract安装文档: