可用OCR工具搜索
[xy@xyarch src]$ pacman -Ss ocr
community/cuneiform 1.1.0-19
Linux port of an OCR system developed in Russia. Supports more than 20 languages
community/gimagereader-gtk 3.3.0-1
Gtk front-end to tesseract-ocr
community/gimagereader-qt 3.3.0-1
Qt front-end to tesseract-ocr
community/gocr 0.52-1
OCR (Optical Character Recognition) program, which converts scanned images of text back to text files
community/gocryptfs 1.7-2
Encrypted overlay filesystem written in Go.
community/gscan2pdf 2.5.4-1
A GUI with OCR capability to produce PDFs or DjVus from scanned documents
community/kaudiocreator 3.92.0+306+gbd233f8-1
A program for ripping and encoding Audio-CDs, encoding files from disk
community/ocrad 0.27-1
OCR (Optical Character Recognition) program based on a feature extraction method
community/ocrfeeder 0.8.1-4
GTK+ document layout analysis and optical character recognition application
community/python-pyocr 0.7-1
Python wrapper for Tesseract and Cuneiform
community/tesseract 4.1.0-1
An OCR program
community/tesseract-data-afr 1:4.0.0-1 (tesseract-data)
Tesseract OCR data (afr)
...
工具简单介绍
gocr是基于命令行的工具,比较麻烦
tesseract是基于命令行的工具,查看已安装语言 tesseract --list-langs
ocrfeeder是图形化的,基于python2
gscan2pdf 是图形化的,基于perl gtk3
gimagereader-qt 是图形化的,基于qt,调用底层tesseract
gimagereader安装及使用介绍
gimagereader需要安装OCR识别引擎
[xy@xyarch ~]$ sudo pacman -S tesseract-data-chi_sim
[xy@xyarch ~]$ sudo pacman -S tesseract-data-eng
启动时有告警,安装一个拼写检查包,没有看到中文的。
[xy@xyarch ~]$ sudo pacman -Ss hunspell-
[xy@xyarch ~]$ sudo pacman -S hunspell-en_US
支持从img和pdf文件,也可以截屏直接识别,识别时可以指定区域识别。也支持扫描仪,未测试。
中文和英文单独识别比较正常,但混合时英文识别较差。
截屏时没有找到快捷键,默认是当前显示的整个屏幕
更多帮助参考 file:///usr/share/doc/gimagereader/manual.html#