Tesseract-OCR：安装、训练自己数据库、Python调用字符数据（保姆级教程）

航空界的小爬虫

已于 2022-04-25 19:13:08 修改

阅读量8.7k

点赞数 9

分类专栏： OCR 文章标签： Tesseract java OCR

于 2022-03-25 11:07:28 首次发布

本文链接：https://blog.csdn.net/weixin_42872122/article/details/123730558

版权

1、安装程序Tesseract

1、下载

tesseracthttps://digi.bib.uni-mannheim.de/tesseract/

下载正式版本，不要下载dev，alpha什么的版本

2、配置系统环境

3、打开CMD命令提示符(管理员)

输入：tesseract -v

显示版本号，安装成功

4、扩展语言包

可以在安装程序的时候勾选语言包，自动安装

或者在网站手动下载，下载后将该包直接放在程序安装目录的tessdata文件夹中里面即可

GitHub - tesseract-ocr/tessdata: Trained models with support for legacy and LSTM OCR enginehttps://github.com/tesseract-ocr/tessdata

2、python调用Tesseract

1、配置调用包：pytesseract

cmd中下载pytesseract模块

pip install pytesseract

在这里插入图片描述

或者直接在python项目解释器里搜索pytesseract下载

2、测试代码

import pytesseract
from PIL import Image

def demo():
    # 打开要识别的图片
    image = Image.open('E:/2.png')
    # 使用pytesseract调用image_to_string方法进行识别，传入要识别的图片，lang='chi_sim'是设置为中文识别，
    text = pytesseract.im