(1)If they are not already installed, you need the following libraries (Ubuntu 16.04/14.04)
sudo apt-get install g++ # or clang++ (presumably)
sudo apt-get install autoconf automake libtool
sudo apt-get install autoconf-archive
sudo apt-get install pkg-config
sudo apt-get install libpng12-dev
sudo apt-get install libjpeg8-dev
sudo apt-get install libtiff5-dev
sudo apt-get install zlib1g-dev
sudo apt-get install libleptonica-dev
(2)if you plan to install the training tools, you also need the following libraries:
sudo apt-get install libicu-dev
sudo apt-get install libpango1.0-dev
sudo apt-get install libcairo2-dev
(3)leptonica编译(有2种方式,一种是github源码,一种是压缩包,我使用github源码)
sudo apt install git
git clone https://github.com/DanBloomberg/leptonica
cd leptonica
autoreconf -vi
./autobuild
./configure
make
sudo make install
(4)安装tesseract
cd
git clone --depth 1 https://github.com/tesseract-ocr/tesseract.git
cd tesseract
./autogen.sh
./configure --enable-debug
LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make
sudo make install
sudo ldconfig
(5)查看tesseract是否安装成功
tesseract -v
//(6)安装训练文件
make
make training
sudo make training-install
(7)安装语言包
1>下载地址:https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata
2>移动到数据包安装目录下,默认为 /usr/local/share/tessdata
mv eng.traineddata /usr/local/share/tessdata
export TESSDATA_PREFIX=/usr/local/share/
(8)安装图片格式转换工具,因为tesseract只识别tif格式的图片
apt-get install imagemagick
(9)测试
tesseract <image> <outputbasename> [-l lang] [configs]
默认为英语
tesseract a.tif a
如果你要识别其它语言请使用-l 参数指定如
tesseract a.tif a -l chi_sim
查看识别结果
cat a.txt
sudo apt-get install g++ # or clang++ (presumably)
sudo apt-get install autoconf automake libtool
sudo apt-get install autoconf-archive
sudo apt-get install pkg-config
sudo apt-get install libpng12-dev
sudo apt-get install libjpeg8-dev
sudo apt-get install libtiff5-dev
sudo apt-get install zlib1g-dev
sudo apt-get install libleptonica-dev
(2)if you plan to install the training tools, you also need the following libraries:
sudo apt-get install libicu-dev
sudo apt-get install libpango1.0-dev
sudo apt-get install libcairo2-dev
(3)leptonica编译(有2种方式,一种是github源码,一种是压缩包,我使用github源码)
sudo apt install git
git clone https://github.com/DanBloomberg/leptonica
cd leptonica
autoreconf -vi
./autobuild
./configure
make
sudo make install
(4)安装tesseract
cd
git clone --depth 1 https://github.com/tesseract-ocr/tesseract.git
cd tesseract
./autogen.sh
./configure --enable-debug
LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make
sudo make install
sudo ldconfig
(5)查看tesseract是否安装成功
tesseract -v
//(6)安装训练文件
make
make training
sudo make training-install
(7)安装语言包
1>下载地址:https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata
2>移动到数据包安装目录下,默认为 /usr/local/share/tessdata
mv eng.traineddata /usr/local/share/tessdata
export TESSDATA_PREFIX=/usr/local/share/
(8)安装图片格式转换工具,因为tesseract只识别tif格式的图片
apt-get install imagemagick
(9)测试
tesseract <image> <outputbasename> [-l lang] [configs]
默认为英语
tesseract a.tif a
如果你要识别其它语言请使用-l 参数指定如
tesseract a.tif a -l chi_sim
查看识别结果
cat a.txt