python 从题库excel中读取需要的属性生成json，然后爬取问卷星比对出答案

最新推荐文章于 2023-06-17 17:42:03 发布

shiyu_mj

最新推荐文章于 2023-06-17 17:42:03 发布

阅读量879

点赞数

分类专栏：爬虫文章标签： python json 开发语言

本文链接：https://blog.csdn.net/qq_42972591/article/details/122330255

版权

本文介绍如何使用Python读取Excel文件中的数据生成JSON，并结合Selenium爬虫爬取问卷星网站，通过分析源码筛选题目与选项，最后进行答案比对。

摘要由CSDN通过智能技术生成

1.excel文件
https://download.csdn.net/download/qq_42972591/74125316

import pandas as pd
import re
import json
    
df=pd.read_excel('文化题库.xlsx',sheet_name ='Sheet1')
k='[A-Z]'
dic={
   }
#清空base.txt
with open('base.txt','w') as f:
        pass
#表格第一行被读取成columns了,所以从1开始
for i in range(1,161):
    line=df.iloc[i]
    #line[8]有nan值，需去掉，否则list(line[8])错误
    #line[8]!=line[8]   去掉nan
    if line[0]=='题型' or line[8]!=line[8]:
        continue
    answer=list(line[8])#多选选项拆分
    answers=''
    #匹配选择题
    if re.search(k,line[8]):
        for it in answer:
            pos=ord(it)-63  #'A'的ascii为65，-63对应到表格答案相应的列
            answers+=line[pos]+';'  
    else:
        answers=line[8]         #判断题直接取答案
    line[1]=line[1].replace('\n','')#去掉换行符
    key=str(i