python爬取学校题库_Python爬虫实战-获取某网站题库

weixin_39710003

于 2020-11-24 11:32:43 发布

阅读量2.1k

点赞数

文章标签： python爬取学校题库

本文介绍了一个Python爬虫程序，用于爬取指定网站的题库信息。程序首先设置请求头，然后打开一个文件用于存储爬取的数据。通过`get_info`函数遍历并访问每个题目页面，提取题目标题、描述、输入、输出、样例输入和样例输出等信息，并写入文件。为了避免被Ban，程序在每次请求之间添加了3秒的延迟。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

爬取*网站题库

import requests

import re

import time

import html

headers = {

'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36'

'(KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' #加入请求头

}

f = open('/Volumes/SHARE/Python/GetAcmText/Text.txt', 'a+') #在路径下创建文件名为Text.txt的文件

def get_info(url):

global i

i = i + 1

print(i) #用于观察

res = requests.get(url, headers=headers)

if res.status_code == 200: #判断网站是否为可访问

title = re.findall('

(.*?)

', res.content.decode('utf-8'), re.S)[1].strip() #正则获取题目名

describes = re.findall('

(.*?)', res.content.decode('utf-8'), re.S)

describe = describes[0].strip()

tinput = describes[1].strip()

toutput = describes[2].strip()

einput = re.fi

最低0.47元/天解锁文章

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。