python爬虫实现hdu自动交题

最新推荐文章于 2023-06-06 07:43:33 发布

种起水稻

最新推荐文章于 2023-06-06 07:43:33 发布

阅读量558

点赞数

分类专栏：爬虫 hdu 文章标签： python 爬虫

本文链接：https://blog.csdn.net/qq_43624038/article/details/110289735

版权

爬虫同时被 2 个专栏收录

2 篇文章 2 订阅

订阅专栏

hdu

1 篇文章 0 订阅

订阅专栏

python爬虫实现hdu自动AC机

苦逼大学生的编程之旅都是从hdu开始的，当学习被强制要求就开始无趣了起来，这个时候就得学会自己给自己找点乐子了，前几天刚开始学爬虫我就有个想法，是否可以整一个自动交题的代码，跟智慧树脚本一样自动答题，那样孩子就再也不用担心hdu题数太少被骂了

第一步肯定是学会从csdn中爬取相应的代码，利用正则和BeautifulSoup来提取出csdn中的代码

def search_code(url):         #输入代码网址,返回代码文本
    headers = {
     xxxxxxxx
    }
    request = urllib.request.Request(url, headers=headers)
    html = ""
    try:
        response = urllib.request.urlopen(request)
        html = response.read().decode('utf-8')
    except urllib.error.URLError as e:
        if hasattr(e, "code"):
            print(e.code)
        if hasattr(e, "reason"):
            print(e.reason)

    item=''

    soup = BeautifulSoup(html, 'html.parser')
    for item in soup.find_all('span', class_="cpp"):
        item=str(item)
        item= re.sub("<.*?>","",item)
        item = re.sub("&lt;", "<", item)
        item = re.sub("&gt;", ">", item)
        item = translate_code(item)
    if(item!=''):
        return item
    for item in soup.find_all('code'):
        item = str(item)
        item = re.sub("<.*?>", "", item)
        item = re.sub("&lt;", "<", item)
        item = re.sub("&gt;", ">", item)
        item = translate_code(item)

    return item

这只是得到一个网页上的代码，还需要引出多个网页的查找，目的是获取更多的关于题解的网址
在这里插入图片描述
https://so.csdn.net/so/search/s.do?q=hdu1100&t=blog&u=
可以发现其实只要变动q=后的就可以做到查询不同的题解


def search_answer(tihao):       #输入题号，给出代码列表

     url = 'https://so.csdn.net/so/search/all?q=hdu'+str(tihao)+'&t=all&p=1&s=0&tm=0&lv=-1&ft=0&l=&u='

这几天好像csdn页面改了，我之前的代码不对了只能改用webdriver来模拟

    driver = webdriver.Chrome('C:\Program Files\Google\Chrome\Application\chromedriver.exe')

    driver.get(url)

    time.sleep(3)

    html = driver.page_source

    link = []

    soup = BeautifulSoup(html, 'html.parser')

    for item in soup.find_all('div',class_="list-item"):
        item = str(item)


        link1 = re.findall(findpic, item)  # 返回csdn网址
        if len(link1) >0:
            link.append(link1[0])

之前的代码

    url = 'https://so.csdn.net/so/search/s.do?t=all&s=&tm=&v=&l=&lv=&u=&q=hdu' + str(tihao)
    request = urllib.request.Request(url, headers=headers)
    html = ""
    try:
        response = urllib.request.urlopen(request)
        html = response.read().decode('utf-8')
    except urllib.error.URLError as e:
        if hasattr(e, "code"):
            print(e.code)
        if hasattr(e, "reason"):
            print(e.reason)
    link=[]

    soup = BeautifulSoup(html, 'html.parser')


    for item in soup.find_all('div', class_="container-list container-other-list active"):
        item = str(item)
        link = re.findall(findpic, item)              #返回csdn网址

最后利用一个for来实现提取代码去提交

    for i in range(0 , len(link)):             #遍历网址


        it = str(link[i])
        code = search_code(it)    #返回代码
        submit(tihao,code,i+1)
        if query_result(tihao):
            break
        else:
            print(str(tihao) + '失败了')
        time.sleep(random.randint(1, 3))
        if i>5:
            break

当然还有submit部分，我的理解就是先模拟登录

    session = requests.Session()
    session.post(url, data=data, headers=headers)

然后再向hdu提交代码
在这里插入图片描述
提交的网址就从chrome里抓包得到
发送的内容也可以从这里得到

    if(daima.find('import')!=-1):
        data = {
            'check': '0',
            'problemid': str(tihao),
            'language': str(5),
            'usercode': daima
        }
    else:
        data = {
            'check': '0',
            'problemid': str(tihao),
            'language': str(0),
            'usercode': daima
        }
    r = session.post(url, data=data, headers=headers)

提交完肯定就要去判断有无AC

def query_result(pid):
    url = 'http://acm.hdu.edu.cn/status.php?first=&pid=' + str(pid) + '&user=' + "yourID" + '&lang=0&status=0'
    headers = {
              xxxxxxxxxxxxxxxxxxx
    }
    html =  requests.get(url,headers = headers)
    pattern_query = r'<td><font color=red>(.*?)</font>'
    query_result = re.findall(pattern_query, html.text)
    if len(query_result) > 0:
        return True
    else:
        return False

最后来个主控函数，程序就基本能跑了


def start():
    for i in range(1200, 1500):
        if query_result(i):
            print(str(i) + '已AC')
        else:
            search_answer(i)