java仿百度ocr,JAVA实现百度OCR文字识别功能

最新推荐文章于 2024-07-24 11:00:44 发布

weixin_39589923

最新推荐文章于 2024-07-24 11:00:44 发布

阅读量142

点赞数

文章标签：百度OCR 文字识别 API接口参数设置文字定位

闲来无事，发现百度有一个OCR文字识别接口，感觉挺有意思的，拿来研究一下。

百度服务简介：文字识别是百度自然场景OCR服务，依托百度业界领先的OCR算法，提供了整图文字检测、识别、整图文字识别、整图文字行定位和单字图像识别等功能。

不多说啦，直接看demo吧！

package com.oa.test;

import java.io.BufferedReader;

import java.io.File;

import java.io.InputStream;

import java.io.InputStreamReader;

import java.net.HttpURLConnection;

import java.net.URL;

import com.oa.commons.util.BASE64;

public class OCRTest {

public static String request(String httpUrl, String httpArg) {

BufferedReader reader = null;

String result = null;

StringBuffer sbf = new StringBuffer();

try {

URL url = new URL(httpUrl);

HttpURLConnection connection = (HttpURLConnection) url

.openConnection();

connection.setRequestMethod("POST");

connection.setRequestProperty("Content-Type",

"application/x-www-form-urlencoded");

// 填入apikey到HTTP header

connection.setRequestProperty("apikey", "您自己的apikey");

connection.setDoOutput(true);

connection.getOutputStream().write(httpArg.getBytes("UTF-8"));

connection.connect();

InputStream is = connection.getInputStream();

reader = new BufferedReader(new InputStreamReader(is, "UTF-8"));

String strRead = null;

while ((strRead = reader.readLine()) != null) {

sbf.append(strRead);

sbf.append("

");

}

reader.close();

result = sbf.toString();

} catch (Exception e) {

e.printStackTrace();

}

return result;

}

/**

* @param args

public static void main(String[] args) {

File file = new File("d:che4.jpg");

String imageBase = BASE64.encodeImgageToBase64(file);

imageBase = imageBase.replaceAll("

","");

imageBase = imageBase.replaceAll("+","%2B");

String httpUrl = "http://apis.baidu.com/apistore/idlocr/ocr";

String httpArg = "fromdevice=pc&clientip=10.10.10.0&detecttype=LocateRecognize&languagetype=CHN_ENG&imagetype=1&image="+imageBase;

String jsonResult = request(httpUrl, httpArg);

System.out.println("返回的结果--------->"+jsonResult);

}

/**

* 将本地图片进行Base64位编码

* @param imgUrl

* 图片的url路径，如d:中文.jpg

* @return

public static String encodeImgageToBase64(File imageFile) {// 将图片文件转化为字节数组字符串，并对其进行Base64编码处理

// 其进行Base64编码处理

byte[] data = null;

// 读取图片字节数组

try {

InputStream in = new FileInputStream(imageFile);

data = new byte[in.available()];

in.read(data);

in.close();

} catch (IOException e) {

e.printStackTrace();

}

// 对字节数组Base64编码

BASE64Encoder encoder = new BASE64Encoder();

return encoder.encode(data);// 返回Base64编码过的字节数组字符串

}附件：

(che4.jpg)

运行后结果：

{"errNum":"0","errMsg":"success","querySign":"2289891521,4081625058","retData":[{"rect":{"left":"32","top":"15","width":"418","height":"118"},"word":"u8c6bC88888"},{"rect":{"left":"45","top":"137","width":"373","height":"18"},"word":"u4e1cu98ceu672cu7530u6d1bu9633u952eu901au5e97u7535u8bdduff1a03796358222"}]}

注意：将此结果放到在线JSON校验格式化工具中(http://www.bejson.com/)会得到你想要的结果：

{

"errNum": "0",

"errMsg": "success",

"querySign": "2289891521,4081625058",

"retData": [

{

"rect": {

"left": "32",

"top": "15",

"width": "418",

"height": "118"

"word": "豫C88888"

{

"rect": {

"left": "45",

"top": "137",

"width": "373",

"height": "18"

"word": "东风本田洛阳键通店电话：03796358222"

}

]

}

怎么样，感觉很神奇吧，感兴趣的试一下吧！

最后，解释一下几个参数的含义：

apikey：API密钥也就是您自己的apikey

fromdevice：来源，例如：android、iPhone 默认是PC

clientip：客户端出口IP

detecttype：OCR接口类型

languagetype：要检测的文字类型

imagetype：图片资源类型

image：图片资源，目前仅支持jpg格式

weixin_39589923

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫