https://blog.csdn.net/qq_16964363/article/details/79224776
主要参考这篇文章,侵删。
近期cf上线了难度指数功能,将每道题的难度量化。那么我根据这篇博主的启发, 做了个爬虫来对每个分类的难度进行分析。先上爬虫代码:
# -*- coding: utf-8 -*-
import json
import urllib.request
from bs4 import BeautifulSoup
sum_difficulty = {}
avg_difficulty = {}
min_difficulty = {}
max_difficulty = {}
problems_count = {}
max_page = 48
for i in range(1, max_page):
print ('parsing page %d' % i)
url='http://codeforces.com/problemset/page/%s'%str(i)
data=urllib.request.urlopen(url).read() #发起请求并读取回应
data=data.decode('UTF-8')
soup = BeautifulSoup(data, 'html.parser')
for p in soup.find(class_='problems').find_all('tr'):
tds = p.find_all('td')
if len(tds) != 5:
continue
diffic