偷懒是学习的动力。最近上的课程多数是英文的材料,而且还是图片,对于我这种英语恐惧症的人而言就是噩梦,unfriendly!!!!于是发现了百度的api–
OCR
具体细节请看API文档:http://ai.baidu.com/docs#/OCR-API/top
第一步获取access_token
首先需要进行百度开发者认证,然后创建一个文字识别的应用
import requests
import ssl,sys
# 获取token
host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=&client_secret='
# 填入 自己的APIKEY 和SK
headers = {
'Content-Type':'application/json;charset=UTF-8'
}
res = requests.get(url=host,headers=headers).json()
print(res['access_token'])
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
第二步调用api
import requests
import base64
import ssl,sys
url = 'https://aip.baidubce.com/rest/2.0/ocr/v1/general_basic'
data = {}
data['access_token']='刚才回去的token'
#读取图片
file=open('1.png','rb')
image= file.read()
file.close()
data['image'] = base64.b64encode(image)
headers={
"Content-Type":"application/x-www-form-urlencoded",
"apikey":"自己的APIkey"
}
res = requests.post(url=url,headers=headers,data=data)
result = res.json()
with open("1.txt","a") as f:
for line in result["words_result"]:
print(line["words"],end="")
f.write(line["words"]+"\n")
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26