网上的教程有的靠谱,有的不靠谱,不过再靠谱的教程不如自己研究搭建总结。
前置条件,将yum源改为阿里云源,然后执行yum update,因为前三次安装到最后都失败,无法运行起来,更新后就可以了,
虽然不确定是不是旧包导致的,不过更新没坏处,还是更了吧。
(1)首先安装依赖的leptonica库:
安装基础包
yum install -y zlib-devel libjpeg* freetype-devel libtiff* libicu-devel automake libtool pango*
wget http://www.leptonica.com/source/leptonica-1.72.tar.gz
tar -zxvf leptonica-1.72.tar.gz
cd leptonica-1.72
./configure && make && make install
(2)编译tesseract了,所用版本 3.04。
wget https://github.com/tesseract-ocr/tesseract/archive/3.04.00.tar.gz
tar -zxvf 3.04.00.tar.gz
cd tesseract-3.04.00/
./autogen.sh (如果前置条件不满足,编译过程会报错,错误信息会提示缺少的包名,按照提示直接yum安装即可)
./configure
make && make install
cd /usr/local/share/tessdata/
wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata
wget https://github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata
wget https://github.com/tesseract-ocr/tessdata/raw/master/chi_tra.traineddata
到这里可以先弄一张图片测试下能否解析出文字了
tesseract 11.jpg bbb -psm 3 -l chi_sim+eng
cat bbb.txt
wget http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz
tar -zxvf setuptools-0.6c11.tar.gz
cd setuptools-0.6c11
python setup.py build
python setup.py install
yum install python-imaging
(3)安装PIL,python图像识别第三方包
wget http://effbot.org/media/downloads/Imaging-1.1.7.tar.gz
tar -zxvf Imaging-1.1.7.tar.gz
cd Imaging-1.1.7
安装依赖
yum install python-devel libffi-devel
yum install tkinter*
yum install python-pillow*
yum install littlecms*
vim setup.py
TCL_ROOT = "/usr/lib64/"
JPEG_ROOT = "/usr/lib64/"
ZLIB_ROOT = "/usr/lib64/"
TIFF_ROOT = "/usr/lib64/"
FREETYPE_ROOT = "/usr/lib64/"
LCMS_ROOT = "/usr/lib64/"
python setup.py install
(4)pytesser,无需安装,直接下载解压即可,进入解压后的文件夹进行测试
http://download.csdn.net/download/pyliang_2008/5564135
总结下来,东西涉及有点繁多,不排除部分多余的,不过也没精力去一个一个研究了。