tess4j的简单使用doOCR

 

Warning: Parameter not found: enable_new_segsearch
Warning: Parameter not found: save_raw_choices
D:\devtools\Tesseract-OCR\tessdata
result:--->650 3428

一、测试图片777.png

二、测试代码:

package com.gazgeek.helloworld.tess4jTest;


import java.awt.*;
import java.io.File;
import net.sourceforge.tess4j.*;

public class Testtess {

    public static void main(String[] args) {

        File imageFile = new File("F:\\imgall\\777.png");
        Tesseract tessInst = new Tesseract();
        tessInst.setDatapath("D:\\devtools\\Tesseract-OCR\\tessdata");
        tessInst.setLanguage("eng");// eng.traineddata is in /tessdata direcotry

        try {
            String result= tessInst.doOCR(imageFile);
            System.out.println("D:\\devtools\\Tesseract-OCR\\tessdata");
            System.out.println("result:--->"  + result );
        } catch (TesseractException e) {
            System.err.println(e.getMessage());
        }

    }

}

 

三、测试结果:

 

四、FAQ

1. ERROR net.sourceforge.tess4j.Tesseract - Not a JPEG file: starts with 0x89 0x50

Solution: the file is not acctually JPEG file, select true JPEG file.

2. WARN Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.

Solution: option A, tessInst.setDatapath(System.getProperty("user.dir") + "/tessdata");

               option B, set TESSDATA_PREFIX your environment. Which is Tesseract's tessdata default value. If do not set, it will

               open ./*.traineddata file.

3. "Warning: Parameter not found: enable_new_segsearch" 

Solution: Works with this eng.traineddata: https://github.com/tesseract-ocr/tessdata_fast/blob/master/eng.traineddata

 

Note: language data file best use tessdata_best's file. If you want to recognize chinese, select chi_sim.traineddata, and download it, move it in your tessdata directory.

Java's print API basically works on the assumption that everything is done at 72 dpi. This means that you can use this as bases for converting to/from different measurements

references:

1. http://www.jbrandsma.com/news/2015/12/07/ocr-with-java-and-tesseract/

2. https://sourceforge.net/projects/tess4j/

3. https://github.com/tesseract-ocr/tessdata_best

4. https://www.b4x.com/android/forum/threads/solved-tesseract-api-a-120-opotunity.101482/

6. https://www.learnopencv.com/deep-learning-based-text-recognition-ocr-using-tesseract-and-opencv/

7. https://stackoverflow.com/questions/18975595/how-to-design-an-image-in-java-to-be-printed-on-a-300-dpi-printer

 

五、《Mac安装Tesseract的全过程,附带完整的错误和异常的解决办法。Java开源OCR识别》

MacOS

使用 

sudo port install tesseract

安装 tesseract-ocr 之后,再执行以下命令时:

sudo tesseract 1563327696899809.jpg 1563327696899809.txt -l chi_sim+eng

提示:

Warning: Parameter not found: enable_new_segsearch

Tesseract Open Source OCR Engine v4.0.0 with Leptonica

Warning: Invalid resolution 0 dpi. Using 70 instead.

Estimating resolution as 194

 

这是因为 -l 参数中指定的语言包都没有安装的原因,使用以下命令安装:

sudo port install tesseract-chi-sim

sudo port install tesseract-eng

 

 

异常汇总:

①Warning: Parameter not found: enable_new_segsearch

Mac出现的时候(把语言包文件拷贝到你在Java代码设定好的目录下,原因是此目录没有中文简体的语言包)

  1. ITesseract iTesseract = new Tesseract();

  2. iTesseract.setDatapath("你的语言包绝对路径");

Warning: Invalid resolution 0 dpi. Using 70  instead.

  1. ITesseract iTesseract = new Tesseract();

  2. iTesseract.setDatapath("你的语言包绝对路径");

  3. iTesseract.setTessVariable("user_defined_dpi", "300");

设置一下dpi即可,默认设置300是最好的

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值