Tesseract OCR文字识别

最新推荐文章于 2024-07-24 15:36:57 发布

工具人来了

最新推荐文章于 2024-07-24 15:36:57 发布

阅读量708

点赞数

文章标签： python nlp 人工智能

本文链接：https://blog.csdn.net/qq_43219250/article/details/121512863

版权

Tesseract的OCR引擎最先由HP实验室于1985年开始研发，至1995年时已经成为OCR业内最准确的三款识别引擎之一。2005年，Tesseract由美国内华达州信息技术研究所获得，并求诸于Google对Tesseract进行改进、消除Bug、优化工作。

环境：

1.VScode安装

都会的

2.安装pytesseract

pip install pytesseract

3. 安装 tesseract orc

https://github.com/UB-Mannheim/tesseract/wiki
现在有版本5了，下载.exe后安装，记住安装路径（重要）。

4.配置tesseract运行文件

进入路径c:\users\31331\appdata\local\programs\python\python38\lib\site-packages，找你们自己的
找到pytesseract下的pytesseract.py,找到tesseract_cmd = 'tesseract'
‘ ’里面修改为你刚才记住的路径

5.文字识别

from PIL import Image
import pytesseract

path = "img\\text-img.png"

text = pytesseract.image_to_string(Image.open(path), lang='chi_sim')
print(text)