python ocr安装_Windows10系统下针对python爬虫安装tesserocr遇到各种问题?

========================20180606更新=================

问题算是解决,改天空了写下步骤

=========================================

不知道为啥知乎说 说明过长,只能把具体描述添加到回答中的

具体描述如下:

【系统环境】

笔记本:小米Air

系统:自带Windows10家庭版

已安装相关软件:PyCharm 2018.1(Professional)| VS 2017 社区版

【已操作步骤】

我的安装步骤分了两个阶段:一、通过PyCharm安装 二、直接下载安装包/编译源码安装。详细描述如下:

一、通过PyCharm安装下载tesseract安装包,安装

现在的PyCharm自带Env,所以直接通过其 Project Interpreter 安装Python包,包括:

升级pip至10.0.1(含修改 PyCharm\helpers\packaging_tool.py)

安装tesserocr包

但最后报的错如下

其详细报错信息为Installing collected packages: tesserocr

Running setup.py install for tesserocr: started

Running setup.py install for tesserocr: finished with status 'error'

Complete output from command C:\Users\12558\PycharmProjects\Spider2\venv\Scripts\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\12558\\AppData\\Local\\Temp\\pycharm-packaging\\tesserocr\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\12558\AppData\Local\Temp\pip-record-iltdzc16\install-record.txt --single-version-externally-managed --compile --install-headers C:\Users\12558\PycharmProjects\Spider2\venv\include\site\python3.6\tesserocr:

Failed to extract tesseract version from executable: [WinError 2] 系统找不到指定的文件。

Supporting tesseract v3.04.00

Building with configs: {'libraries': ['tesseract', 'lept'], 'cython_compile_time_env': {'TESSERACT_VERSION': 197632}}

running install

running build

running build_ext

building 'tesserocr' extension

creating build

creating build\temp.win32-3.6

creating build\temp.win32-3.6\Release

C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\bin\HostX86\x86\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\12558\PycharmProjects\Spider2\venv\include -IC:\Users\12558\AppData\Local\Programs\Python\Python36-32\include -IC:\Users\12558\AppData\Local\Programs\Python\Python36-32\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\cppwinrt" /EHsc /Tptesserocr.cpp /Fobuild\temp.win32-3.6\Release\tesserocr.obj

tesserocr.cpp

tesserocr.cpp(597): fatal error C1083: 无法打开包括文件: “leptonica/allheaders.h”: No such file or directory

error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.14.26428\\bin\\HostX86\\x86\\cl.exe' failed with exit status 2

----------------------------------------

Failed building wheel for tesserocr

Command "C:\Users\12558\PycharmProjects\Spider2\venv\Scripts\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\12558\\AppData\\Local\\Temp\\pycharm-packaging\\tesserocr\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\12558\AppData\Local\Temp\pip-record-iltdzc16\install-record.txt --single-version-externally-managed --compile --install-headers C:\Users\12558\PycharmProjects\Spider2\venv\include\site\python3.6\tesserocr" failed with error code 1 in C:\Users\12558\AppData\Local\Temp\pycharm-packaging\tesserocr\

网上也搜不到更多信息,于是转换成另外的途径

二、直接下载安装包/编译源码安装

因为不太懂Windows下编译代码,主要的步骤参考简书的一篇文章Win10+VS2017编译opencv3.2.0和opencv_contrib3.2.0来调用text模块​www.jianshu.com

后期辅以下文Tesseract4.0+VS2017+win10源码编译攻略 - LiveZingy​livezingy.com

卸载上一部分中通过安装包安装的tesseract

下载cmake-3.11.3 | cppan最新版 | tesseract-3.05.01

解压tesseract后,新建build | install | sources(解压后的文件全放在sources文件夹中)

安装cppan,并运行cppan命令

安装cmake,使用cmake-gui进行编译。但问题就出在这步,点击“Configure”后,选择“Visual Studio 15 2017 Win64”,点击“Finish”,运行一段时间后报错如下

【总结】

从目前看,两种方法出的问题都涉及到了leptonica这个库,但因为确实不太清楚Windows下的编译,也没能找到更具体的相关步骤,之后不知道该怎么办,只能向知乎大神求助了,谢谢!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值