python课表查询系统_使用python抓取广西科技大学教务系统课程表

最新推荐文章于 2023-05-31 19:59:32 发布

禁忌的爱

最新推荐文章于 2023-05-31 19:59:32 发布

阅读量1.9k

点赞数

文章标签： python课表查询系统

本文链接：https://blog.csdn.net/weixin_35797900/article/details/111967610

版权

本文介绍了一个使用Python编写的爬虫系统，用于抓取广西科技大学的教务系统课程表，并将其存储到本地MySQL数据库中。系统具备避免重复抓取、班级编号和学期数据的获取等功能，方便学生通过Web端查询。

摘要由CSDN通过智能技术生成

因学校教务系统课程表查询功能累赘，服务器经常挂，同时也不适合手机端查询，所以用python开发爬虫抓取所有课程表，放到我的服务器上面。本文仅供学习。

特性

中途退出程序再次运行不会抓取到重复课程表

使用MySQL数据库储存数据，方便管理课程表

注意

请勿复制粘贴本文章代码，因为python对于代码缩进要求非常严格。

请到我的Gihub上面下载源代码 (附班级课表数据库文件)：

web端查询课程表

操作环境：

Windows 10 专业版

Python 3.5.1

WampSever 2.5

WampSever 安装

储存课程表需要用到Mysql数据库，所以要下载Wampsever。

进入网站，往下拉，找到如图所示的位置

根据你系统选择安装32位或者64位的，如果你不知道你系统是多少位，下载32位的即可。

安装完毕后，运行它。

数据库表的导入

运行wampsever后，在桌面右下角状态栏中，鼠标左键点击wampserver图标，如下图，点击phpMyAdmin，进入web版数据库管理页面。

进入到web界面后，左上角处点击new，新建名字为python的数据库

按照图中设置。

选择sql文件导入数据库。

Python 安装

点击Download Python 3.5.1 开始下载，

安装时记得勾选Add python.exe to Path

Python插件安装

安装Python 完毕后，需要安装下列插件。

以管理员模式打开CMD命令提示符，执行命令

pip install beautifulsoup4

PyMySQL 数据库连接插件

以管理员模式打开CMD命令提示符，执行命令

pip install PyMySQL

原理分析

教务管理信息系统课程表链接分析

从链接的尾部*** .aspx?xsbh=14040801&xq=15-16-2 中的问号?可以看出该网页是用GET的方式请求数据的，第一个参数是xsbh = 14040801，第二个参数是xq = 15-16-2，xq明显是学期拼音首字母缩写，而xsbh应该就是班级编号了。所以我们要先获取所有的班级编号和学期数，然后拼接成班级课表链接，进行遍历抓取课程表：http://202.193.177.7:8081/(S(oaml0r55uwfphs55ssjh43mh))/web_jxrw/cx_kb_bjkb_bj.aspx?xsbh=班级编号&xq=学期

抓取所有学期

打开链接后，查看网页源代码，所有学期数都是放在ID为DDxq的select标签下的option标签(如图)，

然后出现下图，

因为在前端开发中，id这个属性的值是唯一的，所以使用Beautiful Soup选择此处的代码为soup.find(id="DDxq").find_all("option")

这就可以用beautifulsoup插件提取所有学期。

抓取所有学期python代码

spiderForSemester.py

Python

import urllib

import urllib.request

from bs4 import BeautifulSoup

import pymysql.cursors

connection = pymysql.connect(

host='localhost',

user='root',

password='',

db='python',

charset="utf8",

cursorclass=pymysql.cursors.DictCursor)

def sqlLink(semester):

try:

with connection.cursor() as cursor:

sql = "INSERT INTO `semester` (`semester`) VALUES (%s)"

cursor.execute(sql, (semester))

connection.commit()

finally:

# connection.close()

pass

return

response = urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjxzall.aspx')

if response.code == 200:

print ("succeed",response.code)

html = response.read()

soup = BeautifulSoup(

html,

'html.parser',

from_encoding = 'gb2312')

arr = soup.find(id="DDxq").find_all("option")

for x in arr:

ch = x.get_text()

print(ch)

sqlLink(str(ch))

else:

print ("failed")

importurllib

importurllib.request

frombs4importBeautifulSoup

importpymysql.cursors

connection=pymysql.connect(

host='localhost',

user='root',

password='',

db='python',

charset="utf8",

cursorclass=pymysql.cursors.DictCursor)

defsqlLink(semester):

try:

withconnection.cursor()ascursor:

sql="INSERT INTO `semester` (`semester`) VALUES (%s)"

cursor.execute(sql,(semester))

connection.commit()

finally:

# connection.close()

pass

return

response=urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjxzall.aspx')

ifresponse.code==200:

print("succeed",response.code)

html=response.read()

soup=BeautifulSoup(

html,

'html.parser',

from_encoding='gb2312')

arr=soup.find(id="DDxq").find_all("option")

forxinarr:

ch=x.get_text()

print(ch)

sqlLink(str(ch))

else:

print("failed")

抓取所有班级和班级编号

同样查看教务系统查询课表页面源代码，所有班级都放在ID为Cxbj_all1_bjlist1的select标签下(如图)，直接用beautifulsoup抓取。

图中的option标签value属性就是班级编号，text内容就是班级名字。同样使用beautifulsoup，代码：soup.find(id="Cxbj_all1_bjlist1").find_all("option")

抓取所有班级python代码

Python

import urllib

import urllib.request

from bs4 import BeautifulSoup

import pymysql.cursors

connection = pymysql.connect(

host='localhost',

user='root',

password='',

db='python',

charset="utf8",

cursorclass=pymysql.cursors.DictCursor)

def sqlLink(clssName,classCode,grade):

try:

with connection.cursor() as cursor:

sql = "INSERT INTO `class` (`className`,`classCode`,`grade`) VALUES (%s,%s,%s)"

cursor.execute(sql, (clssName,classCode,grade))

connection.commit()

finally:

# connection.close()

pass

return

response = urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjxzall.aspx')

if response.code == 200:

print ("succeed",response.code)

html = response.read()

soup = BeautifulSoup(

html,

'html.parser',

from_encoding = 'gb2312')

arr = soup.find(id="Cxbj_all1_bjlist1").find_all("option")

for x in arr:

ch = x.get_text()

classCode = x['value']

last = x['value'][0]+x['value'][1]

if last == "98":

grade = "19"+last

pass

else:

grade = "20"+last

print(classCode)

sqlLink(str(ch),classCode,grade)

else:

print ("failed")

importurllib

importurllib.request

frombs4importBeautifulSoup

importpymysql.cursors

connection=pymysql.connect(

host='localhost',

user='root',

password='',

db='python',

charset="utf8",

cursorclass=pymysql.cursors.DictCursor)

defsqlLink(clssName,classCode,grade):

try:

withconnection.cursor()ascursor:

sql="INSERT INTO `class` (`className`,`classCode`,`grade`) VALUES (%s,%s,%s)"

cursor.execute(sql,(clssName,classCode,grade))

connection.commit()

finally:

# connection.close()

pass

return

response=urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjxzall.aspx')

ifresponse.code==200:

print("succeed",response.code)

html=response.read()

soup=BeautifulSoup(

html,

'html.parser',

from_encoding='gb2312')

arr=soup.find(id="Cxbj_all1_bjlist1").find_all("option")

forxinarr:

ch=x.get_text()

classCode=x['value']

last=x['value'][0]+x['value'][1]

iflast=="98":

grade="19"+last

pass

else:

grade="20"+last

print(classCode)

sqlLink(str(ch),classCode,grade)

else:

print("failed")

抓取所有课程表

有了班级列表和学期这些数据，我们就可以遍历班级列表和学期列表来拼接成课程表查询链接，提取每个链接中的课程。

同理，在id为GVkb的table标签下，每天的课程都会放在td标签下。

Python代码为soup.find(id="GVkb").find_all("td")

抓取所有课程表python代码

此处分为两个python文件, 请按照注释命名文件，第一个是spider.py，第二个是spider_classTable.py，要放在同一目录下运行。

第一个文件spider.py

Python

import pymysql.cursors

import spider_classTable

connection = pymysql.connect(

host='localhost',

user='root',

password='',

db='python',

charset="utf8",

cursorclass=pymysql.cursors.DictCursor)

try:

### get all semester

with connection.cursor() as cursor:

sql = "SELECT * FROM `semester`"

cursor.execute(sql)

resultSemester = cursor.fetchall()

# for x in resultSemester:

# print(x["semester"])

# pass

### get all classCode

with connection.cursor() as cursor:

sql = "SELECT `classCode` FROM `class`"

cursor.execute(sql)

resultClassCode = cursor.fetchall()

# for x in resultClassCode:

# print(x["classCode"])

# pass

finally:

# connection.close()

i = 0

pass

for x in resultClassCode:

classCode = x["classCode"]

# print("classCode = ",classCode)

for y in resultSemester:

semester = y["semester"]

# print("Semester = ",semester)

gradeClassCode = int(classCode[0]+classCode[1])

gradeSemester = int(semester[0]+semester[1])

try:

with connection.cursor() as cursor:

sql = "SELECT `classCode`,`semester` FROM `timetable` WHERE classCode='"+classCode+"' AND semester='"+semester+"'"

cursor.execute(sql)

resultRepeat = cursor.fetchone()

finally:

# connection.close()

pass

if (gradeSemester >= gradeClassCode) and (gradeSemester <= gradeClassCode+3):

if resultRepeat is None:

arr = [classCode,semester]

print(arr)

spider_classTable.insertTameTable(classCode,semester)

else:

print("The timetable had already exist")

i += 1

print(i,"Processing...")

importpymysql.cursors

importspider_classTable

connection=pymysql.connect(

host='localhost',

user='root',

password='',

db='python',

charset="utf8",

cursorclass=pymysql.cursors.DictCursor)

try:

### get all semester

withconnection.cursor()ascursor:

sql="SELECT * FROM `semester`"

cursor.execute(sql)

resultSemester=cursor.fetchall()

# for x in resultSemester:

# print(x["semester"])

# pass

### get all classCode

withconnection.cursor()ascursor:

sql="SELECT `classCode` FROM `class`"

cursor.execute(sql)

resultClassCode=cursor.fetchall()

# for x in resultClassCode:

# print(x["classCode"])

# pass

finally:

# connection.close()

i=0

pass

forxinresultClassCode:

classCode=x["classCode"]

# print("classCode = ",classCode)

foryinresultSemester:

semester=y["semester"]

# print("Semester = ",semester)

gradeClassCode=int(classCode[0]+classCode[1])

gradeSemester=int(semester[0]+semester[1])

try:

withconnection.cursor()ascursor:

sql="SELECT `classCode`,`semester` FROM `timetable` WHERE classCode='"+classCode+"' AND semester='"+semester+"'"

cursor.execute(sql)

resultRepeat=cursor.fetchone()

finally:

# connection.close()

pass

if(gradeSemester>=gradeClassCode)and(gradeSemester<=gradeClassCode+3):

ifresultRepeatisNone:

arr=[classCode,semester]

print(arr)

spider_classTable.insertTameTable(classCode,semester)

else:

print("The timetable had already exist")

i+=1

print(i,"Processing...")

第二个文件spider_classTable.py

Python

import urllib

import urllib.request

from bs4 import BeautifulSoup

import pymysql.cursors

def openLink(response):

try:

urllib.urlopen(response)

return 0

except e:

return 1

pass

def insertTameTable(classCode,semester):

connection = pymysql.connect(

host='localhost',

user='root',

password='',

db='python',

charset="utf8",

cursorclass=pymysql.cursors.DictCursor)

# response = urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjkb_bj.aspx?xsbh='+classCode+'&xq='+semester)

response = urllib.request.Request('http://172.16.129.117/web_jxrw/cx_kb_bjkb_bj.aspx?xsbh='+classCode+'&xq='+semester)

flag = 1

while (flag == 1):

try:

print("opening link...")

response = urllib.request.urlopen(response)

flag = 0

except:

print("failed open the link. retrying...")

flag = 1

pass

if response.code == 200:

print ("succeed to open the url, code=",response.code)

html = response.read()

soup = BeautifulSoup(

html,

'html.parser',

from_encoding = 'gbk')

arr = soup.find(id="GVkb").find_all("td")

note = soup.find(id="TextBox1")

timeTable = []

for x in arr:

ch = x.get_text()

strings = str(ch).replace("\xa0","none")+"\n"

timeTable.append(strings)

if len(timeTable) == 8:

try:

with connection.cursor() as cursor:

sql = "INSERT INTO `timetable` (`classCode`,`semester`,`section`,`one`,`two`,`three`,`four`,`five`,`six`,`seven`) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"

cursor.execute(sql, (classCode,semester,timeTable[0],timeTable[1],timeTable[2],timeTable[3],timeTable[4],timeTable[5],timeTable[6],timeTable[7]))

connection.commit()

finally:

timeTable = []

print("insert succeed!")

pass

try:

with connection.cursor() as cursor:

sql = "INSERT INTO `timetablenote` (`classCode`,`semester`,`note`) VALUES (%s,%s,%s)"

cursor.execute(sql, (classCode,semester,note.get_text()))

connection.commit()

finally:

print("note insert succeed!")

pass

else:

print("failed to open the url, code=", response.code)

pass

importurllib

importurllib.request

frombs4importBeautifulSoup

importpymysql.cursors

defopenLink(response):

try:

urllib.urlopen(response)

return0

excepte:

return1

pass

definsertTameTable(classCode,semester):

connection=pymysql.connect(

host='localhost',

user='root',

password='',

db='python',

charset="utf8",

cursorclass=pymysql.cursors.DictCursor)

# response = urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjkb_bj.aspx?xsbh='+classCode+'&xq='+semester)

response=urllib.request.Request('http://172.16.129.117/web_jxrw/cx_kb_bjkb_bj.aspx?xsbh='+classCode+'&xq='+semester)

flag=1

while(flag==1):

try:

print("opening link...")

response=urllib.request.urlopen(response)

flag=0

except:

print("failed open the link. retrying...")

flag=1

pass

ifresponse.code==200:

print("succeed to open the url, code=",response.code)

html=response.read()

soup=BeautifulSoup(

html,

'html.parser',

from_encoding='gbk')

arr=soup.find(id="GVkb").find_all("td")

note=soup.find(id="TextBox1")

timeTable=[]

forxinarr:

ch=x.get_text()

strings=str(ch).replace("\xa0","none")+"\n"

timeTable.append(strings)

iflen(timeTable)==8:

try:

withconnection.cursor()ascursor:

sql="INSERT INTO `timetable` (`classCode`,`semester`,`section`,`one`,`two`,`three`,`four`,`five`,`six`,`seven`) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"

cursor.execute(sql,(classCode,semester,timeTable[0],timeTable[1],timeTable[2],timeTable[3],timeTable[4],timeTable[5],timeTable[6],timeTable[7]))

connection.commit()

finally:

timeTable=[]

print("insert succeed!")

pass

try:

withconnection.cursor()ascursor:

sql="INSERT INTO `timetablenote` (`classCode`,`semester`,`note`) VALUES (%s,%s,%s)"

cursor.execute(sql,(classCode,semester,note.get_text()))

connection.commit()

finally:

print("note insert succeed!")

pass

else:

print("failed to open the url, code=",response.code)

pass

运行抓取课程表python代码

等待抓取。我大概花了6.5个小时才把所有的课程表抓取下来。建议在晚上准备睡觉的时候运行，直到第二天早上。

禁忌的爱

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
python课表查询系统_使用python抓取广西科技大学教务系统课程表

因学校教务系统课程表查询功能累赘，服务器经常挂，同时也不适合手机端查询，所以用python开发爬虫抓取所有课程表，放到我的服务器上面。本文仅供学习。特性中途退出程序再次运行不会抓取到重复课程表使用MySQL数据库储存数据，方便管理课程表注意请勿复制粘贴本文章代码，因为python对于代码缩进要求非常严格。请到我的Gihub上面下载源代码 (附班级课表数据库文件)：web端查询课程表操作环境：Wi...
复制链接

扫一扫