哈哈，我用Python开发了一个搜题神奇

最新推荐文章于 2024-07-13 03:08:21 发布

TrueDei

最新推荐文章于 2024-07-13 03:08:21 发布

阅读量1.5w

点赞数 311

分类专栏：有道智云

本文链接：https://blog.csdn.net/qq_17623363/article/details/111830479

版权

有道智云专栏收录该内容

9 篇文章 1 订阅

订阅专栏

专业搜题，家长好帮手

很早之前曾经做过一个图片识别的项目，当时有一项功能是整题识别，即传入数学题的截图，可通过ocr技术识别出图片内容，但当时只限于识别文字，并未作更深一步的处理，现在想来实用性并不强，毕竟大家更需要的是解题思路，而不是让AI读出题干（题干的文字，我都认识，连起来我就不知道怎么下手去做了 = = ），最近刚好有时间，于是尝试来为有娃的朋友做一个搜题神器。

鉴于之前整题识别的开发使用有道智云的良好体验，我再次打开其官方文档，果然找到了拍照搜题服务的开放API，轻车熟路地做了一个简单的批量搜题demo, 下面分享一下开发过程。

调用API接口的准备工作

首先，是需要在有道智云的个人页面上创建实例、创建应用、绑定应用和实例，获取到应用的id和密钥。具体个人注册的过程和应用创建过程详见文章

在这里插入图片描述

开发过程详细介绍

下面介绍具体的代码开发过程。

API接收的参数较为简单：

字段名	类型	含义	必填	备注
q	text	要识别的图片，需要Base64编码	True	必须是Base64编码(baes64前边不要加上data:image/png;base64)
appKey	text	应用ID	True	可在应用管理查看
salt	text	UUID	True	uuid
curtime	text	当前UTC时间戳（秒）	true	TimeStamp
sign	text	签名 sha256(应用ID+input+salt+curtime+应用密钥);input的生成规则见表下的备注	True	sha256(应用ID+input+salt+curtime+应用密钥)
signType	text	签名类型	true	v2
type	text	上传类型，仅支持base64上传，请填写固定值1	True	1
searchType	text	搜索类型，img为图片搜题,text为文本搜题	false	img

签名sign生成方法如下：
signType=v2；
sign=sha256(应用ID+input+salt+curtime+应用密钥)。
其中，input的计算方式为：input=q前10个字符 + q长度 + q后10个字符（当q长度大于20）或 input=q字符串（当q长度小于等于20）。

需要注意的是，API对题目图片有如下要求：

规则	描述
传输方式	HTTPS
请求方式	POST
字符编码	统一使用UTF-8编码
请求格式	表单
响应格式	JSON
图片格式	jpg/png/bmp
图片大小	1MB以下
文字长度	50个字符以下

Demo开发：

这个demo使用python3开发，包括maindow.py，QuestionClass.py，OcrQuestion.py 三个文件，分别为demo的界面、界面逻辑处理和ocr搜题方法的封装。

界面部分：

UI 部分较简单，主要功能为选择待题目图片、选择批改结果存储路径。其布局代码如下：

root=tk.Tk()
root.title(" youdao ocr question test")
frm = tk.Frame(root)
frm.grid(padx='50', pady='50')

# 选题和结果保存按钮
btn_get_file = tk.Button(frm, text='选择题目图片', command=get_files)
btn_get_file.grid(row=0, column=0, ipadx='3', ipady='3', padx='10', pady='20')
text1 = tk.Text(frm, width='40', height='10')
text1.grid(row=0, column=1)
btn_get_result_path=tk.Button(frm,text='选择搜索结果路径',command=set_result_path)
btn_get_result_path.grid(row=1,column=0)
text2=tk.Text(frm,width='40', height='2')
text2.grid(row=1,column=1)


# 搜题按钮
btn_sure=tk.Button(frm,text="搜题",command=search_question_files)
btn_sure.grid(row=4,column=1)

root.mainloop()

其中启动按钮btn_sure的绑定事件search_question_files()来根据题目照片搜题，并在完成后打开结果存储路径:

def search_question_files():
    question.start_ocr()
    os.system('start '+question.result_path)

QuestionClass.py

这里主要配合UI的逻辑，调用搜题方法。

首先定义一个类Question:

class Question():
    def __init__(self,file_paths,result_path):	
        self.file_paths=file_paths		# 题目照片存储路径
        self.result_path=result_path	# 结果路径

start_ocr()方法调用connect()方法依次搜题并保存结果。

def start_ocr(self):
    for file_path in self.file_paths:
        result=connect(file_path)
        print(file_path)
       self.save_result_format(file_path,result)

从OcrQuestion.py的connect方法获取的结果是json格式，save_result_format()方法,解析从接口取得的接口，格式整理，保存结果到html：

    def save_result_format(self,file_path,result):
        result_file_name=os.path.basename(file_path).split('.')[0]+'_result.html'
        f=open(self.result_path+'/'+result_file_name,'w',encoding='utf-8')
        result_json= json.loads(result)
        if result_json['errorCode'] == '0':
            data=result_json['data']
            questions=data["questions"]
            text=data["text"]
            f.write("题目识别：<br/>"+text)
            i=0
            for answers in questions:
                i=i+1
                subject="科目："+answers["subject"]+"<br>"
                answer="答案：" +answers["answer"]+"<br>"
                analysis="分析："+answers["analysis"]+"<br>"
                knowledge="知识点："+answers["knowledge"]+"<br>"
                print(subject+answer+analysis+knowledge)
                result_each="<h3>搜题结果"+str(i)+"<br></h3>"
                result_each=result_each+subject+answer+analysis+knowledge+"<br>=================这是一条分隔符============<br>"
                f.write(result_each)
        else:
            f.write("result error code:"+result_json['errorCode'])

OcrQuestion.py

OcrQuestion.py 中封装请求ocr搜题API的方法，其中最主要的方法是connect()：

def connect(pic_path):
    f = open(pic_path, 'rb')  # 二进制方式打开图文件
    q = base64.b64encode(f.read()).decode('utf-8')  # 读取文件内容，转换为base64编码
    f.close()

    data = {}
    data['q'] = q
    data['signType'] = 'v2'
    curtime = str(int(time.time()))
    data['curtime'] = curtime
    salt = str(uuid.uuid1())
    signStr = APP_KEY + truncate(q) + salt + curtime + APP_SECRET
    sign = encrypt(signStr)
    data['appKey'] = APP_KEY
    data['salt'] = salt
    data['sign'] = sign

    response = do_request(data)
    result=response.content.decode('utf-8')
    print(result)
    return result

API响应结果示例

{
    "data":{
        "questions":[
            {
                "score":0.9875,
                "answer":"D",
                "subject":"历史",
                "id":"a9db8f1252778836c99204e5cf9d7738",
                "analysis":"",
                "type":"",
                "content":"xxx",
                "knowledge":""
            }
        ],
        "text":"xxx"
    },
    "errorCode":"0"
}

响应结果是以json形式输出，包含字段如下表所示：

字段	含义
errorCode	识别结果错误码，一定存在。详细信息可参见错误代码列表
data	数据
-text	图片题目OCR结果
-questions	相关题目
–id	答案
–content	题目内容
–answer	答案
–analysis	解析
–knowledge	知识点