依赖包:
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>2.0.1</version>
<exclusions>
<exclusion>
<groupId>com.sun.jna</groupId>
<artifactId>jna</artifactId>
</exclusion>
</exclusions>
</dependency>
下载相关的语言包:
https://github.com/tesseract-ocr/tessdata (验证码识别相关的语言包)
如只想识别一般的英文、数字验证码,下载eng.traineddata,放到项目下tesseract文件夹下。
示例代码
@Test
public void testIdentify() {
String identifyCodePath = "image/image.png";
// 解析验证码
File imageFile = new File(identifyCodePath);
Tesseract tessreact = new Tesseract();
tessreact.setDatapath("tessreact");// 语言包路径
String result = null;
try {
result = tessreact.doOCR(imageFile);
} catch (TesseractException e) {
e.printStackTrace();
}
System.out.println(result);
Assert.assertEquals("hmxo", result.trim());
}