需求
需求来自贤内,把网页上的考试卷内容提取到本地,打印出来看,比在手机或者电脑上看方便。
背景弊病
另外网页上有个弊病就是提交后,正确答案和解析一直显示在题目下面,想重新考一次都不行,除非你真的能选择性障目不去扫视答案,反正我是做不到。
解题说明
代码较简单,主要用了美味汤BeautifulSoup。这里有个问题是,目标内容存在于嵌套的html里面,webdriver.Chrome().page_source 解析不到相应内容,即使手动指定url为iframe里面的url,也get不到,算了,手动copy下html吧,反正十几个也不多。
代码
from bs4 import BeautifulSoup
import os
def main():
path = './htmls/'
for file in os.listdir(path):
htmlfile = os.path.join(path, file)
html = open(htmlfile, 'r', encoding='utf-8')
soup = BeautifulSoup(html,'lxml')
question = soup.find_all('div', class_='ibs-stem ibs-editor-text') # 提取问题
ques = []
for i in question:
ques.append(i.text)
answer = soup.find_all('div', class_='ibs-table') # 提取答案选项
ans = []
for i in answer:
ans.append(i.text)
name = os.path.splitext(os.path.split(htmlfile)[1])[0]
test = [name]
for i in range(len(ques)): # 问题和选项组合起来
idx = i + 1
test.append(str(idx) + ques[i])
test.append(' ' + str(ans[i*4:i*4+4]))
with open(os.path.join('./txtfile/', name) + '.txt', 'a', encoding='utf-8') as f:
f.write('\n'.join(test))
parser = soup.find_all('div', class_='ibs-explain ibs-mt16') # 提取解析
par = []
for i in parser:
par.append(i.text)
par_list = [name]
for i in range(len(par)):
idx = i + 1
par_list.append(str(idx) + par[i])
with open(os.path.join('./analysis/', name) + '_analysis.txt', 'a', encoding='utf-8') as f:
f.write('\n'.join(par_list))
print(f'{name}, done')
if __name__ == '__main__':
main()