tesseract：从图片中识别文字信息

最新推荐文章于 2024-04-23 11:51:01 发布

miaow~miaow

最新推荐文章于 2024-04-23 11:51:01 发布

阅读量4.7k

点赞数 4

分类专栏： tesseract 文章标签： ocr

本文链接：https://blog.csdn.net/fengbohello/article/details/119385287

版权

tesseract 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

一、安装 tesseract

安装过程：https://blog.csdn.net/fengbohello/article/details/119272478

二、安装训练后的语言文件

下载英文数据：https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata

下载简体中文数据：https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata

把下载的数据文件，复制到系统目录 /usr/share/tessdata/ 中，参考：https://blog.csdn.net/fengbohello/article/details/119255898

三、识别图片中的文字信息

3.0）tesseract 命令的使用方法

$ tesseract --help
Usage:
  tesseract --help | --help-extra | --version
  tesseract --list-langs
  tesseract imagename outputbase [options...] [configfile...]

OCR options:
  -l LANG[+LANG]        Specify language(s) used for OCR.
NOTE: These options must occur before any configfile.

Single options:
  --help                Show this help message.
  --help-extra          Show extra help for advanced users.
  --version             Show version information.
  --list-langs          List available languages for tesseract engine.

所以识别一个包含英文信息的图片时，可以使用下面的命令

tesseract english-word.png out-file -l eng

english-word.png：图片的文件名
out-file：识别出来的文字信息存储的位置，tesseract 会自动补全为 out-file.txt
-l eng：指定语言

3.1）识别英文

包含文字信息的图片如下：

识别图片中的信息：

$ tesseract eng.png - -l eng
Traineddata Files for Version 4.00 +

We have three sets of official .traineddata files trained at Google, for tesseract versions 4.00 and
above. These are made available in three separate repositories.

注：输出文件指定为连字符 - ，意思是直接输出到 stdout

3.2）识别简体中文

包含文字信息的图片如下：

识别图片中的信息：

$ tesseract chi_sim.png - -l chi_sim
目 录

一 、 安装 tesseract
二 、 安 装 训 练 后 的 语 言 文 件
三 、 识 别 图 片 中 的 文 字 信 息

miaow~miaow

关注

4
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
tesseract：从图片中识别文字信息

一、安装 tesseract参考：https://blog.csdn.net/fengbohello/article/details/119272478二、安装训练后的语言文件英文数据：https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata简体中文数据：https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata把下载的
复制链接

扫一扫