按照官网文档和国内文章各种实验都不成功,最终成功后把步骤分享出来,以免大家被坑了!
1. maven添加
<!-- https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j --> <dependency> <groupId>net.sourceforge.tess4j</groupId> <artifactId>tess4j</artifactId> <version>3.2.1</version> </dependency>
2. 测试代码
public class TesseractExample {
public static void main(String[] args) throws Exception {
// File imageFile = new File("C:\\wangl\\eurotext.tif");
InputStream is = downLoadFromUrl("https://passport.baidu.com//cgi-bin//genimage?njG8b06f54d9425e2bf023b14d95b01078933b5de06810514de");
BufferedImage bi = ImageIO.read(is);
ITesseract instance = new Tesseract(); // JNA Interface Mapping
//这句非常重要
instance.setDatapath("C:\\wangl\\soft\\tess4j\\tessdata");
try {
String result = instance.doOCR(bi);
System.out.println(result);
} catch (TesseractException e) {
System.err.println(e.getMessage());
}
if(is != null){
is.close();
}
}
public static InputStream downLoadFromUrl(String urlStr) throws IOException{
URL url = new URL(urlStr);
HttpURLConnection conn = (HttpURLConnection)url.openConnection();
//设置超时间为3秒
conn.setConnectTimeout(3*1000);
//防止屏蔽程序抓取而返回403错误
conn.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)");
//得到输入流
InputStream inputStream = conn.getInputStream();
return inputStream;
}
}
3. 总结:根本不需要设置testdata,根本不需要拷贝dll,根本不需要设置环境变量(testdata,dll已经包含在jar中)