中科大EPC课程爬取

最新推荐文章于 2021-06-28 08:38:30 发布

眕眕

最新推荐文章于 2021-06-28 08:38:30 发布

阅读量1.1w

点赞数 6

CC 4.0 BY-SA版权

文章标签： python爬虫

本文链接：https://blog.csdn.net/qq_28491207/article/details/84261732

本文介绍了一种使用Python爬虫技术自动监测中国科技大学英语语言实践中心(EPC)课程空位的方法，通过定时刷新EPC网站并发送邮件通知，显著提高了选课效率。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

中科大EPC课程爬取

中科大的日常交流英语和学术交流英语在完成20学时课堂学习的同时，还需要在英语语言实践中心（EPC）修满20学时的实践课，才能获得相应学分，而EPC的课程还是比较难选的，这里用Python爬取EPC课程来捡漏。具体实现流程是用Python每隔一分钟刷新一次EPC网站，如果有人退课，则用邮件通知自己。

EPC网站分析

进入EPC主页，发现网站的登录需要验证码，一开始的想法是用Cookies来模拟登录。具体做法如下：
登录EPC后，点击F12打开控制台，查看Network，勾选上Preserve log，然后点击“ Situational Dialogue”，在控制台中定位到正确的URL，同时复制请求头作为Python中的Headers，结果确实是可以在Python中爬取的，但这样有一个坏处，每次运行都需要先用浏览器登录EPC，然后复制新的Cookies。后来实验发现，在Python中登录EPC是不需要验证码的，所以这里采用另一种实现方法。
查找URL
利用相同的方法，在Network中找到请求方法为“POST”的链接，即为登录链接，我这里使用requests.Session()来保留Cookies。
登录URL
具体代码实现如下：

import requests,smtplib,email,time
from bs4 import BeautifulSoup as bs  #使用 BeautifulSoup库对页面进行解析
from email.mime.multipart import MIMEMultipart    
from email.mime.text import MIMEText    
from email.mime.image import MIMEImage 
from email.header import Header  

MAX = 13 #作为周数的约束条件，最大值为21
INI = 125 #作为访问失败的无效值，随意定的
session = requests.Session()
#登录EPC
url_login ='http://epc.ustc.edu.cn/n_left.asp'
data={
'submit_type': 'user_login',
'name': 'xxxxxx',
'pass': 'xxxxxx',
'user_type': '2',
'Submit': 'LOG IN'
}
resp = session.post(url=url_login,data=data)

#解析页面，返回列表：[week，星期，教师，学时，上课时间，教室]
def getInfo(url):
    resp = session.get(url)
    resp.encoding = resp.apparent_encoding
    #print(resp.text)
    soup = bs(resp.text,'html.parser' )
    tds = soup.select('td[align="center"]') 
    return [int(tds[14].string[1:3]),tds[15].string,] + [string for string in tds[18].strings]
    #tds[0] #只显示可预约
    #tds[1] #预约单元
    #tds[2] #周数
    #tds[3] #星期几
    #tds[4] #教师
    #tds[5] #学时
    #tds[6] #上课时间
    #tds[7] #教室……
    #tds[14] #第多少周
    #tds[15].string #星期
    #tds[16].string #教师
    #[x for x in tds[18].strings] #时间
def getEPC():
    #返回数据：字典
    #key：name or INI
    #value:[week，星期，教师，学时，上课时间，教室] or [INI]
    try:
        url1 = 'http://epc.ustc.edu.cn/m_practice.asp?second_id=2001'  # Situational dialogue
        url2 = 'http://epc.ustc.edu.cn/m_practice.asp?second_id=2002'  # Topical discussion
        url3 = 'http://epc.ustc.edu.cn/m_practice.asp?second_id=2003'  # Debate
        url4 = 'http://epc.ustc.edu.cn/m_practice.asp?second_id=2004'  # Drama
        url7 = 'http://epc.ustc.edu.cn/m_practice.asp?second_id=2007'  # Pronunciation Practice
   
        info={}    
        info['Situational Dialogue']= getInfo(url1)
        info['Topical Discussion']= getInfo(url2)
        info['Drama']= getInfo(url4)
        info['Pronunciation Practice']= getInfo(url7)
        return info
    except:
        return {INI:[INI]}
#邮箱发送
def SendMail(text):
    subject = 'EPC Crawling'
    sender = 'xxxxxx'
    receiver= ['xxxxxx',xxxxxx']

    msg = MIMEMultipart('mixed')
    msg['Subject'] = subject
    msg['From'] = sender
    msg['To'] = ';'.join(receiver)
    text = MIMEText(text,'plain','utf-8')
    msg.attach(text)   

    smtp = smtplib.SMTP()
    smtp.connect('xxxxxx')
    username = 'xxxxxx'
    password = 'xxxxxx'
    smtp.login(username,password)
    smtp.sendmail(sender,receiver,msg.as_string())
    smtp.quit()
#主程序
if __name__=="__main__":
    while True:
        status = True
        info = getEPC()
        print(time.ctime(),'：')
        for key,value in info.items():
            print('{}:{}'.format(key,value))
        print('\n')
        for key,value in info.items():
            if value[0] == INI: 
                text = "EPC crawling stopped."
                print(text)
                #SendMail(text)
                status = False
                break
            if ((value[0] < MAX)): #这里或许可以另外建一个筛选规则
                text = 'There is a course of {} in week{},{},{},{}.'.format(key,value[0],value[1],value[2],value[3],end='\n\n')
                print(text)
                SendMail(text)
        if status==False:
            break
        time.sleep(60) #这里修改刷新频率

邮件内容如下：

因为我的邮箱是和微信绑定的，基本上是一爬取到课程，我就能收到，然后看情况能否选课。
因为中科大校园网并不稳定，这里另外建立一个脚本来调用上述代码：

import os
cmd = 'py epc.py'
while True:
    os.system(cmd) #可以增加一个每隔5s尝试一次

在使用Python之前，两个月才选了8节EPC，用了Python后，两周就修了12个EPC，科技改变生活。

结语

随着EPC课程的修满，这块代码也可以公开了。有两个可以遗憾的地方：一是筛选规则简单，导致一堆垃圾邮件。这里可以增加一个筛选规则，减少垃圾邮件，还可以定点查询某个时间段某个类型的课是否有空缺；二是不能自动选课，还需要自己在收到有课后登录网页选课，十分麻烦。留待他人优化吧。