tesserocr是Python的一个OCR识别库.其是对tesseract的一个 python API封装.
<1> 在安装tesserocr之前要安装tesseract.
Windows下载地址是
https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-3.05.01.exe
可以选择其他版本,我目前选择的是这个,带dev的为开发版本,不带dev的为稳定版本. 安装的时候要勾选
additional language data. 安装的时候耐心等待即可.
安装完成后,在控制面板的系统中增加环境变量. 将tesseract的安装路径增加到PATH
环境变量中,另外再增加一个新的环境变量TESSDATA_PREFIX, 也指向tesseract的安装
目录.我的系统中结果如图
检验是否安装成功,在命令行中输入tessact -v,显示如下结果就说明安装成功了.
<2> 安装tesserocr
安装这个比较坑,按照pip3 install tesserocr pillow会出现如下错误
c:\Python37\Scripts>pip install tesserocr pillow Collecting tesserocr Downloading https://files.pythonhosted.org/packages/92/2d/05a7f8387e93c192919b508e4f4936f232bd3d2ca388b9130ae538a9f9ad/tesserocr-2.4.0.tar.gz (56kB) 100% |████████████████████████████████| 61kB 208kB/s Collecting pillow Downloading https://files.pythonhosted.org/packages/55/ea/305f61258278790706e69f01c53e107b0830ea5a4a69aa1f2c11fe605ed3/Pillow-5.3.0-cp37-cp37m-win_amd64.whl (1.6MB) 100% |████████████████████████████████| 1.6MB 1.9MB/s Building wheels for collected packages: tesserocr Running setup.py bdist_wheel for tesserocr ... error Complete output from command c:\python37\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\ZHANGM~1\\AppData\\Local\\Temp\\pip-install-aixq4ve2\\tesserocr\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d C:\Users\ZHANGM~1\AppData\Local\Temp\pip-wheel-7w7aj1_3 --python-tag cp37: C:\Users\ZHANGM~1\AppData\Local\Temp\pip-install-aixq4ve2\tesserocr\setup.py:134: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead _LOGGER.warn('Failed to extract tesseract version from executable: {}'.format(e)) Failed to extract tesseract version from executable: [WinError 2] 系统找不到指定的文件。 Supporting tesseract v3.04.00 Building with configs: {'libraries': ['tesseract', 'lept'], 'cython_compile_time_env': {'TESSERACT_VERSION': 50593792}} c:\python37\lib\distutils\dist.py:274: UserWarning: Unknown distribution option: 'long_description_content_type' warnings.warn(msg) running bdist_wheel running build running build_ext building 'tesserocr' extension error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools ---------------------------------------- Failed building wheel for tesserocr Running setup.py clean for tesserocr Failed to build tesserocr Installing collected packages: tesserocr, pillow Running setup.py install for tesserocr ... error Complete output from command c:\python37\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\ZHANGM~1\\AppData\\Local\\Temp\\pip-install-aixq4ve2\\tesserocr\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\ZHANGM~1\AppData\Local\Temp\pip-record-tobwu8mn\install-record.txt --single-version-externally-managed --compile: C:\Users\ZHANGM~1\AppData\Local\Temp\pip-install-aixq4ve2\tesserocr\setup.py:134: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead _LOGGER.warn('Failed to extract tesseract version from executable: {}'.format(e)) Failed to extract tesseract version from executable: [WinError 2] 系统找不到指定的文件。 Supporting tesseract v3.04.00 Building with configs: {'libraries': ['tesseract', 'lept'], 'cython_compile_time_env': {'TESSERACT_VERSION': 50593792}} c:\python37\lib\distutils\dist.py:274: UserWarning: Unknown distribution option: 'long_description_content_type' warnings.warn(msg) running install running build running build_ext building 'tesserocr' extension error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools ----------------------------------------
开始想着去按照提示的链接下载visual c++ 14.0, 但是很不巧,碰到的是404错误. 然后发现可以通过下载
whl格式的安装包解决这个问题,下载地址为
https://github.com/simonflueckiger/tesserocr-windows_build/releases
安装结果如图
这个说下碰到的一个问题,最开始下载的是tesserocr-2.2.2-cp36-cp36m-win_amd64.whl这个文件.因为我是python小白,还没反应出来
cp36代表什么东西.所以一直出现如下这个错误
c:\Python37\Scripts>pip3 install tesserocr-2.2.2-cp36-cp36m-win_amd64.whl tesserocr-2.2.2-cp36-cp36m-win_amd64.whl is not a supported wheel on this platform. You are using pip version 10.0.1, however version 18.1 is available. You should consider upgrading via the 'python -m pip install --upgrade pip' command.
可以看到我用的是python 3.7, 所以一直提示这个错误. 后来反应过来了,应该是我下的的版本不是针对3.7的,所以
又找到那个链接提供的tesserocr-2.3.1-cp37-cp37m-win_amd64.whl,就安装成功了.