最近要模拟登陆一个网站,时不时会有图片验证码,所以我就想着装个tesserocr玩玩儿。但是,我在装的过程中,碰到了不少问题。所幸最后都解决了,故写下这篇博客备忘一下。
第一步
brew install imagemagick
这一步没啥问题,顺利装上。
第二步
brew install tesseract
这一步也会很顺利的装上。
第三步
pip3.6 install tesserocr pillow
这一步这样安装会报错的,报错如下:
~ pip3.6 install tesserocr pillow
Collecting tesserocr
Downloading https://files.pythonhosted.org/packages/92/2d/05a7f8387e93c192919b508e4f4936f232bd3d2ca388b9130ae538a9f9ad/tesserocr-2.4.0.tar.gz (56kB)
100% |████████████████████████████████| 61kB 158kB/s
Collecting pillow
Downloading https://files.pythonhosted.org/packages/68/3a/61531c34cc18f77b9f979f2cf1a670ae3e98316521e78e3f070c4cc5029b/Pillow-6.0.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (3.7MB)
100% |████████████████████████████████| 3.7MB 543kB/s
Building wheels for collected packages: tesserocr
Building wheel for tesserocr (setup.py) ... error
Complete output from command /Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/private/var/folders/t8/_sy2x7xd3714ct9p_vwh19m80000gn/T/pip-install-2ogfax5r/tesserocr/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /private/var/folders/t8/_sy2x7xd3714ct9p_vwh19m80000gn/T/pip-wheel-3m_lcxu8 --python-tag cp36:
Supporting tesseract v4.0.0
Configs from pkg-config: {'include_dirs': ['/usr/local/Cellar/tesseract/4.0.0_1/include', '/usr/local/Cellar/leptonica/1.78.0/include'], 'library_dirs': ['/usr/local/Cellar/tesseract/4.0.0_1/lib', '/usr/local/Cellar/leptonica/1.78.0/lib'], 'libraries': ['lept', 'tesseract'], 'cython_compile_time_env': {'TESSERACT_VERSION': 67108864}}
running bdist_wheel
running build
running build_ext
building 'tesserocr' extension
creating build
creating build/temp.macosx-10.6-intel-3.6
/usr/bin/clang -fno-strict-aliasing -Wsign-compare -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -arch i386 -arch x86_64 -g -I/usr/local/Cellar/tesseract/4.0.0_1/include -I/usr/local/Cellar/leptonica/1.78.0/include -I/Library/Frameworks/Python.framework/Versions/3.6/include/python3.6m -c tesserocr.cpp -o build/temp.macosx-10.6-intel-3.6/tesserocr.o -std=c++11 -DUSE_STD_NAMESPACE
In file included from tesserocr.cpp:657:
In file included from /usr/local/Cellar/tesseract/4.0.0_1/include/tesseract/genericvector.h:28:
In file included from /usr/local/Cellar/tesseract/4.0.0_1/include/tesseract/tesscallback.h:22:
/usr/local/Cellar/tesseract/4.0.0_1/include/tesseract/host.h:30:10: fatal error: 'cinttypes' file not found
#include <cinttypes> // PRId32, ...
^~~~~~~~~~~
1 error generated.
error: command '/usr/bin/clang' failed with exit status 1
----------------------------------------
Failed building wheel for tesserocr
Running setup.py clean for tesserocr
Failed to build tesserocr
Installing collected packages: tesserocr, pillow
Running setup.py install for tesserocr ... error
Complete output from command /Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/private/var/folders/t8/_sy2x7xd3714ct9p_vwh19m80000gn/T/pip-install-2ogfax5r/tesserocr/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/t8/_sy2x7xd3714ct9p_vwh19m80000gn/T/pip-record-jvx3ujjz/install-record.txt --single-version-externally-managed --compile:
Supporting tesseract v4.0.0
Configs from pkg-config: {'include_dirs': ['/usr/local/Cellar/leptonica/1.78.0/include', '/usr/local/Cellar/tesseract/4.0.0_1/include'], 'library_dirs': ['/usr/local/Cellar/leptonica/1.78.0/lib', '/usr/local/Cellar/tesseract/4.0.0_1/lib'], 'libraries': ['tesseract', 'lept'], 'cython_compile_time_env': {'TESSERACT_VERSION': 67108864}}
running install
running build
running build_ext
building 'tesserocr' extension
creating build
creating build/temp.macosx-10.6-intel-3.6
/usr/bin/clang -fno-strict-aliasing -Wsign-compare -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -arch i386 -arch x86_64 -g -I/usr/local/Cellar/leptonica/1.78.0/include -I/usr/local/Cellar/tesseract/4.0.0_1/include -I/Library/Frameworks/Python.framework/Versions/3.6/include/python3.6m -c tesserocr.cpp -o build/temp.macosx-10.6-intel-3.6/tesserocr.o -std=c++11 -DUSE_STD_NAMESPACE
In file included from tesserocr.cpp:657:
In file included from /usr/local/Cellar/tesseract/4.0.0_1/include/tesseract/genericvector.h:28:
In file included from /usr/local/Cellar/tesseract/4.0.0_1/include/tesseract/tesscallback.h:22:
/usr/local/Cellar/tesseract/4.0.0_1/include/tesseract/host.h:30:10: fatal error: 'cinttypes' file not found
#include <cinttypes> // PRId32, ...
^~~~~~~~~~~
1 error generated.
error: command '/usr/bin/clang' failed with exit status 1
----------------------------------------
Command "/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/private/var/folders/t8/_sy2x7xd3714ct9p_vwh19m80000gn/T/pip-install-2ogfax5r/tesserocr/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/t8/_sy2x7xd3714ct9p_vwh19m80000gn/T/pip-record-jvx3ujjz/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/t8/_sy2x7xd3714ct9p_vwh19m80000gn/T/pip-install-2ogfax5r/tesserocr/
类似这种错误的,需要执行下面2个步骤:
步骤1
~ xcode-select --install
xcode-select: error: command line tools are already installed, use "Software Update" to install updates
我这里提示 已经存在了。嗯,没事儿,继续进行下一步。
步骤二
~ export MACOSX_DEPLOYMENT_TARGET=10.13
这里 10.13 是我自己的macOS的系统版本号。换成自己电脑相对应的就行。
执行以上两个步骤,然后就能安装成功啦。
步骤四
~ pip3.6 install tesserocr pillow
Collecting tesserocr
Using cached https://files.pythonhosted.org/packages/92/2d/05a7f8387e93c192919b508e4f4936f232bd3d2ca388b9130ae538a9f9ad/tesserocr-2.4.0.tar.gz
Collecting pillow
Using cached https://files.pythonhosted.org/packages/68/3a/61531c34cc18f77b9f979f2cf1a670ae3e98316521e78e3f070c4cc5029b/Pillow-6.0.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Building wheels for collected packages: tesserocr
Building wheel for tesserocr (setup.py) ... done
Stored in directory: /Users/liuqiuying/Library/Caches/pip/wheels/f4/f8/13/9e30c62e12a7ec922c4cd3e8d936b63679f845543c2f66d172
Successfully built tesserocr
Installing collected packages: tesserocr, pillow
Successfully installed pillow-6.0.0 tesserocr-2.4.0
这样就安装成功啦!
但是,在终端进入 python3环境后,导入 tesserocr会报错并且退出了交互终端了。情形如下:
~ python3
Python 3.6.4 (v3.6.4:d48ecebad5, Dec 18 2017, 21:07:28)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tesserocr
!strcmp(locale, "C"):Error:Assert failed:in file baseapi.cpp, line 209
[1] 57957 illegal hardware instruction python3
这样的话,按如下操作就行了:
命令行运行: export LC_ALL=C or将该语句配置进~/.bash_profile | ~/.zshrc
执行相应的source命令导入环境变量即可
~ export LC_ALL=C
~ source ~/.zshrc
~ python3
Python 3.6.4 (v3.6.4:d48ecebad5, Dec 18 2017, 21:07:28)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tesserocr
>>>
后续
当天当我 把 export LC_ALL=C 写入 ~/.bashrc文件,然后第二天我的mac电脑所有的中文都显示乱码了。这个是 export LC_ALL=C 引起的。那怎么办呢?
首先 bash相关的文件中不需要 export LC_ALL=C 这一行的配置,而是在代码中配置,如下:
import locale
locale.setlocale(locale.LC_ALL, 'C')
import tesserocr
from PIL import Image
若代码中 既没有配置 locale.LC_ALL ,也没有在 bash相关的文件中 配置 export LC_ALL=C, 就会报如下错误:
!strcmp(locale, "C"):Error:Assert failed:in file baseapi.cpp, line 209
Process finished with exit code 132 (interrupted by signal 4: SIGILL)
然后python环境就意外的卡退。
按照上述在代码中配置下就解决了这个问题。