java文本域 图片,如何在java中分隔文本区域和图像

I am working on OCR to recognised passport details, Since I am using Tesseract Java API. To achieve better accuracy I need to divide the whole image (can be of .png,.jpeg, .tiff) only into text regions. Is there any open source java library which separates text regions from image. Please give me any suggestions on it.

解决方案

Marvin provides a method exactly for this purpose.

public static java.util.List findTextRegions(MarvinImage imageIn,

int maxWhiteSpace,

int maxFontLineWidth,

int minTextWidth,

int grayScaleThreshold)

Input image:

hZM2o.png

Output image:

YCEdn.png

Source code:

import static marvin.MarvinPluginCollection.*;

public class TextRegions{

public static void main(String[] args) {

MarvinImage image = MarvinImageIO.loadImage("./res/passport.png");

MarvinImage originalImage = image.clone();

List segments = findTextRegions(image, 15, 8, 30, 150);

for(MarvinSegment s:segments){

if(s.height >= 5){

originalImage.drawRect(s.x1, s.y1, s.x2-s.x1, s.y2-s.y1, Color.red);

}

}

MarvinImageIO.saveImage(originalImage, "./res/passport_2.png");

}

}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值