java仿百度ocr,JAVA实现百度OCR文字识别功能

闲来无事,发现百度有一个OCR文字识别接口,感觉挺有意思的,拿来研究一下。

百度服务简介:文字识别是百度自然场景OCR服务,依托百度业界领先的OCR算法,提供了整图文字检测、识别、整图文字识别、整图文字行定位和单字图像识别等功能。

不多说啦,直接看demo吧!

9de5c696fb8ff46ff48ca6a83fd5357b.png

package com.oa.test;

import java.io.BufferedReader;

import java.io.File;

import java.io.InputStream;

import java.io.InputStreamReader;

import java.net.HttpURLConnection;

import java.net.URL;

import com.oa.commons.util.BASE64;

public class OCRTest {

public static String request(String httpUrl, String httpArg) {

BufferedReader reader = null;

String result = null;

StringBuffer sbf = new StringBuffer();

try {

URL url = new URL(httpUrl);

HttpURLConnection connection = (HttpURLConnection) url

.openConnection();

connection.setRequestMethod("POST");

connection.setRequestProperty("Content-Type",

"application/x-www-form-urlencoded");

// 填入apikey到HTTP header

connection.setRequestProperty("apikey", "您自己的apikey");

connection.setDoOutput(true);

connection.getOutputStream().write(httpArg.getBytes("UTF-8"));

connection.connect();

InputStream is = connection.getInputStream();

reader = new BufferedReader(new InputStreamReader(is, "UTF-8"));

String strRead = null;

while ((strRead = reader.readLine()) != null) {

sbf.append(strRead);

sbf.append("

");

}

reader.close();

result = sbf.toString();

} catch (Exception e) {

e.printStackTrace();

}

return result;

}

/**

* @param args

*/

public static void main(String[] args) {

File file = new File("d:che4.jpg");

String imageBase = BASE64.encodeImgageToBase64(file);

imageBase = imageBase.replaceAll("

","");

imageBase = imageBase.replaceAll("+","%2B");

String httpUrl = "http://apis.baidu.com/apistore/idlocr/ocr";

String httpArg = "fromdevice=pc&clientip=10.10.10.0&detecttype=LocateRecognize&languagetype=CHN_ENG&imagetype=1&image="+imageBase;

String jsonResult = request(httpUrl, httpArg);

System.out.println("返回的结果--------->"+jsonResult);

}

/**

* 将本地图片进行Base64位编码

*

* @param imgUrl

* 图片的url路径,如d:中文.jpg

* @return

*/

public static String encodeImgageToBase64(File imageFile) {// 将图片文件转化为字节数组字符串,并对其进行Base64编码处理

// 其进行Base64编码处理

byte[] data = null;

// 读取图片字节数组

try {

InputStream in = new FileInputStream(imageFile);

data = new byte[in.available()];

in.read(data);

in.close();

} catch (IOException e) {

e.printStackTrace();

}

// 对字节数组Base64编码

BASE64Encoder encoder = new BASE64Encoder();

return encoder.encode(data);// 返回Base64编码过的字节数组字符串

}附件:

ec29081d30bc3cf9f1754d4f97c615af.png(che4.jpg)

运行后结果:

{"errNum":"0","errMsg":"success","querySign":"2289891521,4081625058","retData":[{"rect":{"left":"32","top":"15","width":"418","height":"118"},"word":"u8c6bC88888"},{"rect":{"left":"45","top":"137","width":"373","height":"18"},"word":"u4e1cu98ceu672cu7530u6d1bu9633u952eu901au5e97u7535u8bdduff1a03796358222"}]}

注意:将此结果放到 在线JSON校验格式化工具中(http://www.bejson.com/)会得到你想要的结果:

{

"errNum": "0",

"errMsg": "success",

"querySign": "2289891521,4081625058",

"retData": [

{

"rect": {

"left": "32",

"top": "15",

"width": "418",

"height": "118"

},

"word": "豫C88888"

},

{

"rect": {

"left": "45",

"top": "137",

"width": "373",

"height": "18"

},

"word": "东风本田洛阳键通店电话:03796358222"

}

]

}

怎么样,感觉很神奇吧,感兴趣的试一下吧!

6900e9a1e1245adec03ae1ee4bf98c68.png

最后,解释一下几个参数的含义:

apikey:API密钥 也就是您自己的apikey

fromdevice:来源,例如:android、iPhone 默认是PC

clientip:客户端出口IP

detecttype:OCR接口类型

languagetype:要检测的文字类型

imagetype:图片资源类型

image:图片资源,目前仅支持jpg格式

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值