I have lost those articles web adresses I ever leaned from, so I can not list them here now, but many thanks to those friends who shared them in the web sites, Let’s make this world better as we can. you are all the good and lovely guys, we will share and enjoy a good world, that’s so inspiring.
Download tesseract4.1.1 source code from git:
https://github.com/tesseract-ocr/tesseract/archive/4.1.1.tar.gz
Then express it into a folder, in general it will be tesseract-4.1.1.
Compile and install
Press Ctrl+Alt+T or press the right key of the mouse and select the 'open terminal at current folder" to open the terminal window. then input the commond:
bash autogen.sh
if the autogen is done successfully, the following message will be shown:
...
All done.
...
Then input command:
./configure
if the following message is shown:
configure: error: Leptonica 1.74 or higher is required. Try to install libleptonica-dev package.
Then we have to install liblepleptonica-dev:
sudo apt install libleptonica-dev
If no errors is shown, then it has been installed. So we configure again:
./configure
If the following messages were shown, it means that configure is successfully:
Configuration is done.
You can now build and install tesseract by running:
$ make
$ sudo make install
$ sudo ldconfig
Documentation will not be built because asciidoc or xsltproc is missing.
Training tools can be built and installed with:
$ make training
$ sudo make training-install
Then we can make and install the tesseract:
make -j8
sudo make install
sudo ldconfig
If no errors were shown, then it should be installed successfully, so we can test it:
tessract --version
If the following messages were shown:
tesseract: symbol lookup error: tesseract: undefined symbol: _ZN10SIMDDetect14avx_available_E
It means some libs is needed to install:
sudo apt install libtesseract-dev libleptonica-dev liblept5
sudo apt install tesseract-ocr -y
But the problem is still there, Then we have a look at the python3 version, we found that the python version is 3.6.3, maybe it is not new enough, so we install the python 3.7.6:
1) Download the python3.7.3 source code from:
https://www.python.org/ftp/python/3.7.6/Python-3.7.6.tgz
2) Express it into a folder: Python-3.7.6
open terminal window, and execute the command:
./configure
make -j8
sudo make install
sudo ldconfig
If no errors were shown, it means the python3.7.6 has been installed successfully. we can test it, input command in the command:
python3.7
then we should see the python3.7 version information, it means the python3.7 has been installed successfully.
Then we check the tesseract again, input the command in the terminal window:
tesseract --version
If the tesseract 4.1.1 information was shown, it means the tesseract4.1.1 has been installed sucessfully.
Download the newest and best tesseract lstm models from git and install them:
https://github.com/tesseract-ocr/tessdata_best
We will get many ***traineddata file, we copy them into folder /usr/local/share/tessdata:
sudo cp tessdata_best/*.traineddata /usr/local/share/tessdata
And let’s check if there are many files in the folder /usr/local/share/tessdata:
ls /usr/local/share/tessdata
If many *.traineddata files were shown, it means the models have been installed.
Install python pytesseract and pillow:
To use tesseract in python, we have to install pytesseract and pillow with pip:
sudo pip install Pillow
sudo pip3 install pytesseract
Install opencv module for python:
sudo pip install opencv-python
So, the tesseract4.1.1 and the best lstm language models have been installed, please start to enjoy your new journey!