批量删除某网站上传的题库

最新推荐文章于 2024-10-30 13:16:11 发布

A_manda

最新推荐文章于 2024-10-30 13:16:11 发布

阅读量267

点赞数

文章标签： python curl

本文链接：https://blog.csdn.net/A_manda/article/details/107788699

版权

某出题网站只支持批量上传，却不支持批量删除，只能一条一条处理，手工点击要上千次，WTF！！！

只好祭出postman、curl和渣渣python

基本需求：查询题库列表，并逐一删除

一、使用浏览器捕获查询、删除基本操作的HTTP请求

1、查询请求

参数：

publicStatus=&guid=yh2d8f76c1-f732-41a2-8541-880cffbfd36f&page=1

*page为页码

响应body：

<script>
...
</script>
<div>
    <ul class="person-dati-small-tabs clearfix">
    </ul>
    <div class="person-dati-title clearfix">
        <div class="tihao">1.</div>
        <div class="tigan">
         <input type="hidden" name="qGuid" value="c4f96a0b-88d7-42ab-ab34-dff35a4ba445" />
         <span class="tixing">（单选）</span>
         <a target="_blank" href="https://saishi.cnki.net/exam/Questions/Answer/c4f96a0b-88d7-42ab-ab34-dff35a4ba445"><span class="timu">57.以下哪个选项正确：</span></a>
        </div>
    </div>
</div>

需要的数据：

*这是题目ID，删除时需要

2、删除请求

参数：

questionGuid=c4f96a0b-88d7-42ab-ab34-dff35a4ba445

二、使用postman实现

postman查询和删除单条OK后，希望能实现自动化。

1、批量查询获取ID

这里使用了postman的runner，runner可以批量执行HTTP请求

https://saishi.cnki.net/pcenter/ExamineeCenter/QuestionListPagePartial?publicStatus=&guid=yh2d8f76c1-f732-41a2-8541-880cffbfd36f&page={{page}}

{{page}}为定义的变量，可以以文件里的列表为data，注意Postman中配置的Iterator取值和文件中数目一致。

文件内容示例：

page
1
2
3
4
5
6

postman使用之五：Runner的使用 https://blog.csdn.net/csdnhxs/article/details/98476130

批量查询响应无法保存至文件，只能一个一个复制，跪了，又回到起点了。

三、使用curl实现

Windows10本身不支持curl命令，需要单独安装。而且curl中需要设置各类header，不如postman便捷。

curl查询得到响应后，再配合find过滤ID

D:\doc>find "input" QuestionList.html

---------- QUESTIONLIST.HTML
          <input type="hidden" name="qGuid" value="c9325b27-da52-45fe-b335-dcab6b8a879a" />
          <input type="hidden" name="qGuid" value="c4f96a0b-88d7-42ab-ab34-dff35a4ba445" />
          <input type="hidden" name="qGuid" value="cb54e4f4-8bf6-48e0-bd8e-f5da274e882f" />
          <input type="hidden" name="qGuid" value="5ee50713-f9a2-44d6-8105-ec48a00ead4d" />
          <input type="hidden" name="qGuid" value="fff2e60a-55fd-4e7c-b3a0-95ccfa4a378e" />
          <input type="hidden" name="qGuid" value="e7236375-92af-48a7-9c0d-8054f56e19d8" />

四、使用python

python中http相关的包常见的有三个，http.client,urllib,requests，这次使用requests（首次使用，需要安装requests）

相关知识点：

1.requests实现HTTPS请求

requests库使用总结
https://blog.csdn.net/jojoy_tester/article/details/70545589

2.idea搭建python开发运行环境

https://blog.csdn.net/qq_38188725/article/details/80623710

3.python使用正则表达式

4.python字符串截取子串

https://blog.csdn.net/qingzhuyuxian/article/details/79882088

str = ’0123456789′
print str[0:3] #截取第一位到第三位的字符
print str[:] #截取字符串的全部字符
print str[6:] #截取第七个字符到结尾
print str[:-3] #截取从头开始到倒数第三个字符之前
print str[2] #截取第三个字符
print str[-1] #截取倒数第一个字符
print str[::-1] #创造一个与原字符串顺序相反的字符串
print str[-3:-1] #截取倒数第三位与倒数第一位之前的字符
print str[-3:] #截取倒数第三位到结尾
print str[:-5:-3] #逆序截取，具体啥意思没搞明白？

if __name__ == '__main__':
    import requests
    import re

    count = 0
    for i in range(1,2):
        url = 'https://xxx/xxx?publicStatus=&guid=yh2d8f76c1-f732-41a2-8541-880cffbfd36f&page=' + str(i)
        response = requests.post(url, verify=False)
        if 200 == response.status_code:
            text = response.text
            keys = re.findall('<input type=\"hidden\" name=\"qGuid\" value=\"\S{36}',text)

            for k in keys:
                count = count+1
                questionGuid = k[-36:]
                print("page=", i, ",count=", count,",questionGuid=",questionGuid, ",url=", url)
                data = {"questionGuid": questionGuid}
                urlDel = "https://xxx/RemoveQuestion"
                respDel = requests.post(urlDel,data,verify=False)
                print("delete status=",response.status_code, "url=",urlDel)
        else:
            print("status=",response.status_code, "url=",url)