代码粗糙,水平有限。
最近才发现我们学校的OJ原来每题的EMB会变化的
有的时候刚A掉一道题,夸嚓,直接掉了1EMB。
又想赚钱,又想写难度比较简单的题,就懒人的精神上来了。
花了一会儿时间写了个爬虫,把我们学校OJ的题目,把EMB,Solves,Submissions,Accepts,之类的保存下来
题目的难度大致取决于两个因素:
1.人次通过率(尝试人数/通过人数)
2.通过人数
(至于为什么不判断提交通过率(提交/通过),个人觉得所占比例比较小{就是懒})
本程序的目的是:
从给定的序号范围内,若题目存在,保存 EMB,Solves,Submissions,Accepts 在当前目录下的文件 ProblemSet.csv 上
提供了一组过滤器
1.myfilter_Less_than[2] 2.myfilter_Greater_than[2] 表示满足: a. myfilter_Less_than[0] < EMB < myfilter_Greater_than[0] b. myfilter_Less_than[1] < Difficulty < myfilter_Greater_than[1] 以上条件才会被保存在csv文件中 程序比较粗糙。 尤其是计算难度和效益这部分,完全就是瞎扯
self.hardness = 100 - 100float(user_solves+1)/(user_tries+1) - float(user_solves)/25
self.efficiency = reward * 100 - self.hardness5
希望能够有更好的难度分析的建模方式 代码如下,水平有限。
注:1.请在网络通畅时使用
2.python需要 requests , BeautifulSoup , csv 支持包
import requests
import csv
import time
from bs4 import BeautifulSoup
problem = [0]
class Problem:
def _init_(self,reward,user_solves,user_tries,sub_accpeted,sub_tires,info):
self.reward = reward
self.userSolves = user_solves
self.userTries = user_tries
self.subAccepted = sub_accpeted
self.subTries = sub_tires
self.Info = info[5:]
self.num = int(info[:4])
# 可优化的地方
# 比较粗糙的计算
self.hardness = 100 - 100*float(user_solves+1)/(user_tries+1) - float(user_solves)/25
self.efficiency = reward * 100 - self.hardness*5
def GetNum(str,solve_try):
res = 0
k = 0;
while(str[k] < '0' or str[k] > '9'):
k = k + 1;
while(str[k] !=' '):
if(str[k] != ','):
res = res*10 + int(str[k])
k = k + 1
else:
k = k + 1
solve_try[0] = res
#delim
res = 0
while(str[k] < '0' or str[k] > '9'):
k = k + 1;
while(str[k] !=' '):
if(str[k] != ','):
res = res*10 + int(str[k])
k = k + 1
else:
k = k + 1
solve_try[1] = res
def GetInfo(url,Less_than,Greater_than):
data = requests.get(url,timeout = 10)
soup = BeautifulSoup(data.text,"html.parser")
str = soup.text
struct = soup.find("div",class_="description")
title = soup.find("h1",class_="ui header")
if(struct == None):
return 0
solve_try = [0,0]
accpt_try = [0,0]
EMB = float(struct.contents[5].b.text)
if(EMB < Greater_than[0] or EMB > Less_than[0]):
return 0
solves = struct.contents[1].text
accpt = struct.contents[3].text
GetNum(solves,solve_try)
GetNum(accpt,accpt_try)
tmpProblem = Problem()
tmpProblem._init_(EMB,solve_try[0],solve_try[1],accpt_try[0],accpt_try[1],title.text)
if(tmpProblem.hardness < Greater_than[1] or tmpProblem.hardness > Less_than[1]):
return 0
problem[0] = tmpProblem
return 1
cnt = 0
inf = 10000000
baseLink = "https://acm.ecnu.edu.cn/problem/"
myfilter_Less_than = [inf,inf] #第一个参数限制报酬 第二个参数限制难度
myfilter_Greater_than = [-inf,-inf] #Greater_than[0]<reward<Less_than[0] Greater_than[1]<hardness<Less_than[1]
csvFile = open("ProblemSet.csv","w",newline='')
writer = csv.writer(csvFile)
writer.writerow(["序号","题名","报酬","难度","效益","解决人数","尝试人数","提交次数","通过次数"])
for i in range(1000,3500):
cnt = cnt + 1
if(cnt == 100):
csvFile.close();
csvFile = open("ProblemSet.csv","a",newline='')
writer = csv.writer(csvFile)
cnt = 0
url = baseLink + str(i)
isMatched = GetInfo(url,myfilter_Less_than,myfilter_Greater_than)
if(isMatched && i != 3269): #题目3269不知名bug
print(i)
key = problem[0]
writer.writerow([key.num,key.Info,key.reward,round(key.hardness,2),key.efficiency,key.userSolves,key.userTries,key.subAccepted,key.subTries])
csvFile.close()