word类型考题带选项答案批量存入mysql数据库中

最新推荐文章于 2022-09-30 13:56:39 发布

weixin_30376163

最新推荐文章于 2022-09-30 13:56:39 发布

阅读量1.3k

点赞数

文章标签：数据库爬虫 php

原文链接：http://www.cnblogs.com/chaihy/p/11052631.html

版权

由于工作需要批量将word文档考题导入mysql中

题目如下（简单列举下PHP试题）：

1．mysql_connect( )与@mysql_connect( )的区别是

A @mysql_connect( )不会忽略错误,将错误显示到客户端

B mysql_connect( )不会忽略错误,将错误显示到客户端

C 没有区别

D 功能不同的两个函数

正确答案是 B

答案解析：@阻止警告输出，有些函数，在遇到入参不正确时，会提示警告，但程序也可以正常运行。其实只要把警告去掉就可以，所以就有@这个符号

...

以上word 文档中很多类似题有单选，有多选，以下方式将所有单选入库

通过分析PHP处理比较麻烦，并没有好的方式批量将考题入库，由于之前有爬虫经验，想到可以通过解析html入库，于是想到了word转html

爬虫这里我用的python requests

于是word 转 html 也用python 代码如下：（test.docx 转 test.html）

# encoding=utf-8
from pydocx import PyDocX
html = PyDocX.to_html("test.docx")
f = open("test.html", 'w', encoding="utf-8")
f.write(html)
f.close()

这时候把test.html放在我们项目目录，访问http://127.0.0.1/test.html 能够直接访问（这个不多说...）

然后就是爬虫了，一下代码解析出试题，选项，答案，分析

# *-* coding:utf-8 *-* #
import json
import requests
from lxml import etree
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
}

session = requests.session()

url = 'http://127.0.0.1/test.html'

def get_questions():
'''
:return:
'''
response = session.get(url,headers=headers)
return response

def get_index():
'''
解析试题
:return:
'''
html = get_questions()
#html = html.encode('raw_unicode_escape')
#html = html.decode() #拿到考试题
#分析
res = etree.HTML(html.content)
node_list = res.xpath('/html/body/p')
length = len(node_list)
question = []
question_option = []
question_answer = []
question_answer_analysis = []
for x in range(0,length):
node_list = res.xpath('/html/body/p')
node = node_list[x]
other_node = node.xpath('.//span')
# 1、是考题2、选项3、答案4、答案解析
if(int(x) % 4 == 0):
if other_node:
question.append('')
else:
question.append(node.xpath('.//text()'))
if((int(x)-1) % 4 == 0):
question_option.append(node.xpath('.//text()'))
if((int(x)-2) % 4 == 0):
question_answer.append(node.xpath('.//text()'))
if((int(x)-3) % 4 == 0):
question_answer_analysis.append(node.xpath('.//text()'))
q_len = len(question)
for x in range(0,q_len):
if x >100:
break
if(question[x]):
print(question[x])
print(question_option[x])
print(question_answer[x])
print(question_answer_analysis[x])
print('___________________________________________'+"<br>")