关于使用tess4j-OCR识别图片中文教程，亲测可用，不报错

最新推荐文章于 2024-08-08 08:31:13 发布

空了虾摸索

最新推荐文章于 2024-08-08 08:31:13 发布

阅读量3k

点赞数 1

分类专栏： java

本文链接：https://blog.csdn.net/qq_18730505/article/details/81705319

版权

java 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

tess4j是hp 在20sh世纪90年代研发，最后贡献给google 的开源项目

自版本3.0.2后支持了对中文字库的识别

jar包最简单的获取方式

idea 创建maven工程在pom.xml引入 tess4j

<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>3.2.1</version>
</dependency>

等待下载完对应的jar包。去本地仓库考出下载的jar包，去重jar包如下

1、在src同级目录创建 tessdata文件夹，将tess4j.jar解压缩。把解压后的tessdata中的内容一律拷贝进来。

2、我们使用的是tess4j。jna封装的那些 c++写的动态类库 dll文件已经在jar包里了。不要在像网上的教程安装一下，不需要安装

3、从github上下载对应的语言包。我需要识别中文所以下载语言包 chi_sim

https://github.com/tesseract-ocr/tessdata

4、如果报错绝逼 99.99都是没有正确配置。

5、jdk 版本必须1.7以上

测试

public static void main(String args[]) {
    File imageFile = new File("C:\\Users\\yzx\\Desktop\\test\\t2.png");
    ITesseract instance = new Tesseract();  // JNA Interface Mapping
    // ITesseract instance = new Tesseract1(); // JNA Direct Mapping

    instance.setLanguage("chi_sim");
    try {
        String result = instance.doOCR(imageFile);
        System.out.println(result);
    } catch (TesseractException e) {
        System.err.println(e.getMessage());
    }
    //excute();
}

识别图片