Compile tesseract4.1.1 in ubuntu16.04

I have lost those articles web adresses I ever leaned from, so I can not list them here now, but many thanks to those friends who shared them in the web sites, Let’s make this world better as we can. you are all the good and lovely guys, we will share and enjoy a good world, that’s so inspiring.

Download tesseract4.1.1 source code from git:

https://github.com/tesseract-ocr/tesseract/archive/4.1.1.tar.gz

Then express it into a folder, in general it will be tesseract-4.1.1.

Compile and install

Press Ctrl+Alt+T or press the right key of the mouse and select the 'open terminal at current folder" to open the terminal window. then input the commond:

bash autogen.sh

if the autogen is done successfully, the following message will be shown:

 ...
All done.
...

Then input command:

 ./configure

if the following message is shown:

 configure: error: Leptonica 1.74 or higher is required. Try to install libleptonica-dev package.

Then we have to install liblepleptonica-dev:

 sudo apt install libleptonica-dev

If no errors is shown, then it has been installed. So we configure again:

 ./configure

If the following messages were shown, it means that configure is successfully:

 Configuration is done.
 You can now build and install tesseract by running:

$ make
$ sudo make install
$ sudo ldconfig

Documentation will not be built because asciidoc or xsltproc is missing.

Training tools can be built and installed with:

$ make training
$ sudo make training-install

Then we can make and install the tesseract:

make -j8
sudo make install
sudo ldconfig

If no errors were shown, then it should be installed successfully, so we can test it:

 tessract --version

If the following messages were shown:

tesseract: symbol lookup error: tesseract: undefined symbol: _ZN10SIMDDetect14avx_available_E

It means some libs is needed to install:

sudo apt install libtesseract-dev libleptonica-dev liblept5
sudo apt install tesseract-ocr -y

But the problem is still there, Then we have a look at the python3 version, we found that the python version is 3.6.3, maybe it is not new enough, so we install the python 3.7.6:

1) Download the python3.7.3 source code from: 
    https://www.python.org/ftp/python/3.7.6/Python-3.7.6.tgz
2) Express it into a folder: Python-3.7.6
    open terminal window, and execute the command:
    ./configure
    make -j8
    sudo make install
    sudo ldconfig

 If no errors were shown, it means the python3.7.6 has been installed successfully. we can test it, input command in the command:
 python3.7
 then we should see the python3.7 version information, it means the python3.7 has been installed successfully.

Then we check the tesseract again, input the command in the terminal window:

tesseract --version

If the tesseract 4.1.1 information was shown, it means the tesseract4.1.1 has been installed sucessfully.

Download the newest and best tesseract lstm models from git and install them:

https://github.com/tesseract-ocr/tessdata_best

We will get many ***traineddata file, we copy them into folder /usr/local/share/tessdata:

   sudo cp tessdata_best/*.traineddata /usr/local/share/tessdata

And let’s check if there are many files in the folder /usr/local/share/tessdata:

ls /usr/local/share/tessdata

If many *.traineddata files were shown, it means the models have been installed.

Install python pytesseract and pillow:

To use tesseract in python, we have to install pytesseract and pillow with pip:

sudo pip install Pillow
sudo pip3 install pytesseract

Install opencv module for python:

sudo pip install opencv-python

So, the tesseract4.1.1 and the best lstm language models have been installed, please start to enjoy your new journey!

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值