ocr java源代码_JAVA OCR

【实例简介】

【实例截图】

【核心代码】

package com.ocr;

import java.io.BufferedReader;

import java.util.*;

import java.io.*;

import java.io.File;

import java.io.FileInputStream;

import java.io.InputStreamReader;

import java.util.ArrayList;

import java.util.List;

import org.jdesktop.swingx.util.OS;

public class OCR {

private final String LANG_OPTION = "-l";

private final String EOL = System.getProperty("line.separator");

private String tessPath = new File("Tesseract-OCR").getAbsolutePath();

//private String tessPath="C:\\Program Files (x86)\\Tesseract-OCR\\";

public String recognizeText(File imageFile, String imageFormat) throws Exception {

System.out.println("in OCR.java recognizeText 47 row tessPath=" tessPath);

File tempImage = ImageIOHelper.createImage(imageFile, imageFormat);

File outputFile = new File(imageFile.getParentFile(), "output");

StringBuffer strB = new StringBuffer();

List cmd = new ArrayList();

if (OS.isWindowsXP()) {

cmd.add(tessPath "\\tesseract");

//cmd.add(tessPath "\\Tesseract-OCR");

} else if (OS.isLinux()) {

cmd.add("tesseract");

} else {

//cmd.add(tessPath "\\Tesseract-OCR");

cmd.add(tessPath "\\tesseract");

}

cmd.add("");

cmd.add(outputFile.getName());

cmd.add(LANG_OPTION);

cmd.add("chi_sim");

cmd.add("eng");

ProcessBuilder pb = new ProcessBuilder();

pb.directory(imageFile.getParentFile());

cmd.set(1, tempImage.getName());

pb.command(cmd);

pb.redirectErrorStream(true);

Process process = pb.start();

//tesseract.exe 1.jpg 1 -l chi_sim

int w = process.waitFor();

// delete temp working files

tempImage.delete();

if (w == 0) {

BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(outputFile

.getAbsolutePath()

".txt"), "UTF-8"));

String str;

while ((str = in.readLine()) != null) {

strB.append(str).append(EOL);

}

in.close();

} else {

String msg;

switch (w) {

case 1:

msg = "Errors accessing files. There may be spaces in your image's filename.";

break;

case 29:

msg = "Cannot recognize the image or its selected region.";

break;

case 31:

msg = "Unsupported image format.";

break;

default:

msg = "Errors occurred.";

}

tempImage.delete();

throw new RuntimeException(msg);

}

new File(outputFile.getAbsolutePath() ".txt").delete();

return strB.toString();

}

}

Java OCR Framework An Optical Character Recognition Framework written purely in Java. Installation Build the project and add the jar for the project along with all the jars in the jar directory to your compile-time libraries. Usage There are 4 main parts to OCR: Normalization Segmentation Feature Extraction Classification Feature Extraction and Classification are the only required parts. For Feature Extraction there are 5 algorithms at your disposal Horizontal Celled Projection Vertical Celled Projection Horizontal Projection Histogram Vertical Projection Histogram Local Line Fitting This framework loosely uses a Fluent Interface Builder syntax. Example: OCR ocr = OCRBuilder .create() .normalization(new Normalization()) .segmentation(new Segmentation()) .featureExtraction( FeatureExtractionBuilder .create() .children( new HorizontalCelledProjection(5), new VerticalCelledProjection(5), new HorizontalProjectionHistogram(), new VerticalProjectionHistogram(), new LocalLineFitting(49)) .build()) .neuralNetwork( NeuralNetworkBuilder .create() .fromFile("neural_network.eg") .build()) .build(); Contributing Want to help out? Feel free to share your ideas. Fork it. Create a branch (git checkout -b my_fancy_feature) Commit your changes (git commit -am "Added amazing feature") Push to the branch (git push origin my_fancy_feature) Open a Pull Request References Arora, Sandhya (2008). “Combining Multiple Feature Extraction Techniques for Handwritten Devnagari Character Recognition”, IEEE Region 10 Colloquium. pp. 342-348 Haykin, Simon (1999). “Neural Networks A Comprehensive Foundation”, 2nd Edition. Pearson Education. Perez, Juan-Carlos ; Vidal, Enrique ; Sanchez, Lourdes (1994). “Simple and Effective Feature Extraction for Optical Character Recognition”, Selected Paper From the 5th Spanish Symposium on Pattern Recognition and Image Analysis. Zahid Hossain, M. ; Ashraful Amin, M. ; Yan, Hong (2012). “Rapid Feature Extraction for Optical Character Recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 6. pp. 801-813 Thanks Thanks to Heaton Research for providing an amazing Neural Network framework. Also thanks to Apache Math Commons for doing all the math without the mess.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值