OCR学习笔记（3）tesseract学习

最新推荐文章于 2024-07-24 15:36:57 发布

wyl2077

最新推荐文章于 2024-07-24 15:36:57 发布

阅读量274

点赞数 1

分类专栏：文本检测文章标签： python 机器学习

本文链接：https://blog.csdn.net/dbdxwyl/article/details/108330700

版权

文本检测专栏收录该内容

6 篇文章 0 订阅

订阅专栏

OCR学习笔记（3）tesseract学习

tesseract介绍

Tesseract 是由惠普发布后谷歌维护的开源文字识别项目，从 Tesseract v4 开始宣布支持深度神经网络 LSTM 进行文字识别。

win10下tessercat安装

(0)我的python版本为3.6.5
(1)下载地址：https://digi.bib.uni-mannheim.de/tesseract/
我选择的版本是：在这里插入图片描述
这里的版本需要与之后安装的tessorocr或pytesseract对应。
安装时不要勾选downloda内容，因为没有梯子下载会很慢或者失败。
(2)可以在GitHub上下载语言包：https://github.com/tesseract-ocr/tessdata
我选择的是中文语言包
在这里插入图片描述
之后将下载好的文件拷贝到Tesseract-OCR目录下的的tessdata文件夹中，并将tessdate文件夹复制一份到python安装目录下。
(3)添加环境变量
此处参考博客，博主解释的非常清楚环境变量参考博客

pytesseract或tesserocr安装

（1）teseerocr包，安装过程为：
在github上下载tesserocr-2.2.2-cp36-cp36m-win_amd64.whl 之后利用cmd进行安装。
代码：

import tesserocr
from PIL import Image
image = Image.open(r'F:\download\blueman00-text-detection-ctpn-master\text-detection-ctpn\ctpn\data\demo\010.png')
image_vert=tesserocr.image_to_text(image)
print(image_vert)

输入为：在这里插入图片描述
输出为：

（2）pytesseract安装
我直接在pycharm内安装

代码：

import pytesseract
from PIL import Image
image = Image.open(r'F:\download\blueman00-text-detection-ctpn-master\text-detection-ctpn\ctpn\data\demo\010.png')
image_vert=pytesseract.image_to_string(image)
print(image_vert)