安装
Mac直接安装tesseract的话无法附带安装training tools
如果已经安装了没有training tools的tesseract,请先卸载
brew uninstall tesseract
先安装一些依赖的包
# Packages which are always needed.
brew install automake autoconf libtool
brew install pkgconfig
brew install icu4c
brew install leptonica
# Packages required for training tools.
brew install pango
# Optional packages for extra features.
brew install libarchive
# Optional package for builds using g++.
brew install gcc
从下列链接下载tesseract-4.1.1.tar.gz并解压
https://github.com/tesseract-ocr/tesseract/releases
编译并安装
cd tesseract-4.1.1
./autogen.sh
mkdir build
cd build
# Optionally add CXX=g++-8 to the configure command if you really want to use a different compiler.
../configure PKG_CONFIG_PATH=/usr/local/opt/icu4c/lib/pkgconfig:/usr/local/opt/libarchive/lib/pkgconfig:/usr/local/opt/libffi/lib/pkgconfig
make -j
# Optionally install Tesseract.
sudo make install
# Optionally build and install training tools.
make training
sudo make training-install
下载完不会附带着一起下载数据集,通过下列链接自行下载需要的语言
https://github.com/tesseract-ocr/tessdata
训练
首先,收集数据样本(若干张需要训练的图片)
图片格式需要转换为tif
下载并打开jTessBoxEditor (注意,该软件需要java8环境,请自行配