python课表查询系统_使用python抓取广西科技大学教务系统课程表

本文介绍了一个使用Python编写的爬虫系统,用于抓取广西科技大学的教务系统课程表,并将其存储到本地MySQL数据库中。系统具备避免重复抓取、班级编号和学期数据的获取等功能,方便学生通过Web端查询。
摘要由CSDN通过智能技术生成

因学校教务系统课程表查询功能累赘,服务器经常挂,同时也不适合手机端查询,所以用python开发爬虫抓取所有课程表,放到我的服务器上面。 本文仅供学习。

特性

中途退出程序再次运行不会抓取到重复课程表

使用MySQL数据库储存数据,方便管理课程表

注意

请勿复制粘贴本文章代码,因为python对于代码缩进要求非常严格。

请到我的Gihub上面下载源代码 (附班级课表数据库文件):

web端查询课程表

操作环境:

Windows 10 专业版

Python 3.5.1

WampSever 2.5

WampSever 安装

储存课程表需要用到Mysql数据库,所以要下载Wampsever。

进入网站,往下拉,找到如图所示的位置

根据你系统选择安装32位或者64位的,如果你不知道你系统是多少位,下载32位的即可。

安装完毕后,运行它。

数据库表的导入

运行wampsever后,在桌面右下角状态栏中,鼠标左键点击wampserver图标,如下图,点击phpMyAdmin,进入web版数据库管理页面。

进入到web界面后,左上角处点击new,新建名字为python的数据库

按照图中设置。

选择sql文件导入数据库。

Python 安装

点击Download Python 3.5.1 开始下载,

安装时记得勾选Add python.exe to Path

Python插件安装

安装Python 完毕后,需要安装下列插件。

以管理员模式打开CMD命令提示符,执行命令

pip install beautifulsoup4

PyMySQL 数据库连接插件

以管理员模式打开CMD命令提示符,执行命令

pip install PyMySQL

原理分析

教务管理信息系统课程表链接分析

从链接的尾部*** .aspx?xsbh=14040801&xq=15-16-2 中的问号?可以看出该网页是用GET的方式请求数据的,第一个参数是xsbh = 14040801,第二个参数是xq = 15-16-2,xq明显是学期拼音首字母缩写,而xsbh应该就是班级编号了。所以我们要先获取所有的班级编号和学期数,然后拼接成班级课表链接,进行遍历抓取课程表:http://202.193.177.7:8081/(S(oaml0r55uwfphs55ssjh43mh))/web_jxrw/cx_kb_bjkb_bj.aspx?xsbh=班级编号&xq=学期

抓取所有学期

打开链接后,查看网页源代码,所有学期数都是放在ID为DDxq的select标签下的option标签(如图),

然后出现下图,

因为在前端开发中,id这个属性的值是唯一的,所以使用Beautiful Soup选择此处的代码为soup.find(id="DDxq").find_all("option")

这就可以用beautifulsoup插件提取所有学期。

抓取所有学期python代码

spiderForSemester.py

Python

import urllib

import urllib.request

from bs4 import BeautifulSoup

import pymysql.cursors

connection = pymysql.connect(

host='localhost',

user='root',

password='',

db='python',

charset="utf8",

cursorclass=pymysql.cursors.DictCursor)

def sqlLink(semester):

try:

with connection.cursor() as cursor:

sql = "INSERT INTO `semester` (`semester`) VALUES (%s)"

cursor.execute(sql, (semester))

connection.commit()

finally:

# connection.close()

pass

return

response = urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjxzall.aspx')

if response.code == 200:

print ("succeed",response.code)

html = response.read()

soup = BeautifulSoup(

html,

'html.parser',

from_encoding = 'gb2312')

arr = soup.find(id="DDxq").find_all("option")

for x in arr:

ch = x.get_text()

print(ch)

sqlLink(str(ch))

else:

print ("failed")

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

importurllib

importurllib.request

frombs4importBeautifulSoup

importpymysql.cursors

connection=pymysql.connect(

host='localhost',

user='root',

password='',

db='python',

charset="utf8",

cursorclass=pymysql.cursors.DictCursor)

defsqlLink(semester):

try:

withconnection.cursor()ascursor:

sql="INSERT INTO `semester` (`semester`) VALUES (%s)"

cursor.execute(sql,(semester))

connection.commit()

finally:

# connection.close()

pass

return

response=urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjxzall.aspx')

ifresponse.code==200:

print("succeed",response.code)

html=response.read()

soup=BeautifulSoup(

html,

'html.parser',

from_encoding='gb2312')

arr=soup.find(id="DDxq").find_all("option")

forxinarr:

ch=x.get_text()

print(ch)

sqlLink(str(ch))

else:

print("failed")

抓取所有班级和班级编号

同样查看教务系统查询课表页面源代码,所有班级都放在ID为Cxbj_all1_bjlist1的select标签下(如图),直接用beautifulsoup抓取。

图中的option标签value属性就是班级编号,text内容就是班级名字。同样使用beautifulsoup,代码:soup.find(id="Cxbj_all1_bjlist1").find_all("option")

抓取所有班级python代码

Python

import urllib

import urllib.request

from bs4 import BeautifulSoup

import pymysql.cursors

connection = pymysql.connect(

host='localhost',

user='root',

password='',

db='python',

charset="utf8",

cursorclass=pymysql.cursors.DictCursor)

def sqlLink(clssName,classCode,grade):

try:

with connection.cursor() as cursor:

sql = "INSERT INTO `class` (`className`,`classCode`,`grade`) VALUES (%s,%s,%s)"

cursor.execute(sql, (clssName,classCode,grade))

connection.commit()

finally:

# connection.close()

pass

return

response = urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjxzall.aspx')

if response.code == 200:

print ("succeed",response.code)

html = response.read()

soup = BeautifulSoup(

html,

'html.parser',

from_encoding = 'gb2312')

arr = soup.find(id="Cxbj_all1_bjlist1").find_all("option")

for x in arr:

ch = x.get_text()

classCode = x['value']

last = x['value'][0]+x['value'][1]

if last == "98":

grade = "19"+last

pass

else:

grade = "20"+last

print(classCode)

sqlLink(str(ch),classCode,grade)

else:

print ("failed")

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

importurllib

importurllib.request

frombs4importBeautifulSoup

importpymysql.cursors

connection=pymysql.connect(

host='localhost',

user='root',

password='',

db='python',

charset="utf8",

cursorclass=pymysql.cursors.DictCursor)

defsqlLink(clssName,classCode,grade):

try:

withconnection.cursor()ascursor:

sql="INSERT INTO `class` (`className`,`classCode`,`grade`) VALUES (%s,%s,%s)"

cursor.execute(sql,(clssName,classCode,grade))

connection.commit()

finally:

# connection.close()

pass

return

response=urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjxzall.aspx')

ifresponse.code==200:

print("succeed",response.code)

html=response.read()

soup=BeautifulSoup(

html,

'html.parser',

from_encoding='gb2312')

arr=soup.find(id="Cxbj_all1_bjlist1").find_all("option")

forxinarr:

ch=x.get_text()

classCode=x['value']

last=x['value'][0]+x['value'][1]

iflast=="98":

grade="19"+last

pass

else:

grade="20"+last

print(classCode)

sqlLink(str(ch),classCode,grade)

else:

print("failed")

抓取所有课程表

有了班级列表和学期这些数据,我们就可以遍历班级列表和学期列表来拼接成课程表查询链接,提取每个链接中的课程。

同理,在id为GVkb的table标签下,每天的课程都会放在td标签下。

Python代码为soup.find(id="GVkb").find_all("td")

抓取所有课程表python代码

此处分为两个python文件, 请按照注释命名文件,第一个是spider.py,第二个是spider_classTable.py,要放在同一目录下运行。

第一个文件spider.py

Python

import pymysql.cursors

import spider_classTable

connection = pymysql.connect(

host='localhost',

user='root',

password='',

db='python',

charset="utf8",

cursorclass=pymysql.cursors.DictCursor)

try:

### get all semester

with connection.cursor() as cursor:

sql = "SELECT * FROM `semester`"

cursor.execute(sql)

resultSemester = cursor.fetchall()

# for x in resultSemester:

# print(x["semester"])

# pass

### get all classCode

with connection.cursor() as cursor:

sql = "SELECT `classCode` FROM `class`"

cursor.execute(sql)

resultClassCode = cursor.fetchall()

# for x in resultClassCode:

# print(x["classCode"])

# pass

finally:

# connection.close()

i = 0

pass

for x in resultClassCode:

classCode = x["classCode"]

# print("classCode = ",classCode)

for y in resultSemester:

semester = y["semester"]

# print("Semester = ",semester)

gradeClassCode = int(classCode[0]+classCode[1])

gradeSemester = int(semester[0]+semester[1])

try:

with connection.cursor() as cursor:

sql = "SELECT `classCode`,`semester` FROM `timetable` WHERE classCode='"+classCode+"' AND semester='"+semester+"'"

cursor.execute(sql)

resultRepeat = cursor.fetchone()

finally:

# connection.close()

pass

if (gradeSemester >= gradeClassCode) and (gradeSemester <= gradeClassCode+3):

if resultRepeat is None:

arr = [classCode,semester]

print(arr)

spider_classTable.insertTameTable(classCode,semester)

else:

print("The timetable had already exist")

i += 1

print(i,"Processing...")

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

importpymysql.cursors

importspider_classTable

connection=pymysql.connect(

host='localhost',

user='root',

password='',

db='python',

charset="utf8",

cursorclass=pymysql.cursors.DictCursor)

try:

### get all semester

withconnection.cursor()ascursor:

sql="SELECT * FROM `semester`"

cursor.execute(sql)

resultSemester=cursor.fetchall()

# for x in resultSemester:

#     print(x["semester"])

#     pass

### get all classCode

withconnection.cursor()ascursor:

sql="SELECT `classCode` FROM `class`"

cursor.execute(sql)

resultClassCode=cursor.fetchall()

# for x in resultClassCode:

#     print(x["classCode"])

# pass

finally:

# connection.close()

i=0

pass

forxinresultClassCode:

classCode=x["classCode"]

# print("classCode = ",classCode)

foryinresultSemester:

semester=y["semester"]

# print("Semester = ",semester)

gradeClassCode=int(classCode[0]+classCode[1])

gradeSemester=int(semester[0]+semester[1])

try:

withconnection.cursor()ascursor:

sql="SELECT `classCode`,`semester` FROM `timetable` WHERE classCode='"+classCode+"' AND semester='"+semester+"'"

cursor.execute(sql)

resultRepeat=cursor.fetchone()

finally:

# connection.close()

pass

if(gradeSemester>=gradeClassCode)and(gradeSemester<=gradeClassCode+3):

ifresultRepeatisNone:

arr=[classCode,semester]

print(arr)

spider_classTable.insertTameTable(classCode,semester)

else:

print("The timetable had already exist")

i+=1

print(i,"Processing...")

第二个文件spider_classTable.py

Python

import urllib

import urllib.request

from bs4 import BeautifulSoup

import pymysql.cursors

def openLink(response):

try:

urllib.urlopen(response)

return 0

except e:

return 1

pass

def insertTameTable(classCode,semester):

connection = pymysql.connect(

host='localhost',

user='root',

password='',

db='python',

charset="utf8",

cursorclass=pymysql.cursors.DictCursor)

# response = urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjkb_bj.aspx?xsbh='+classCode+'&xq='+semester)

response = urllib.request.Request('http://172.16.129.117/web_jxrw/cx_kb_bjkb_bj.aspx?xsbh='+classCode+'&xq='+semester)

flag = 1

while (flag == 1):

try:

print("opening link...")

response = urllib.request.urlopen(response)

flag = 0

except:

print("failed open the link. retrying...")

flag = 1

pass

if response.code == 200:

print ("succeed to open the url, code=",response.code)

html = response.read()

soup = BeautifulSoup(

html,

'html.parser',

from_encoding = 'gbk')

arr = soup.find(id="GVkb").find_all("td")

note = soup.find(id="TextBox1")

timeTable = []

for x in arr:

ch = x.get_text()

strings = str(ch).replace("\xa0","none")+"\n"

timeTable.append(strings)

if len(timeTable) == 8:

try:

with connection.cursor() as cursor:

sql = "INSERT INTO `timetable` (`classCode`,`semester`,`section`,`one`,`two`,`three`,`four`,`five`,`six`,`seven`) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"

cursor.execute(sql, (classCode,semester,timeTable[0],timeTable[1],timeTable[2],timeTable[3],timeTable[4],timeTable[5],timeTable[6],timeTable[7]))

connection.commit()

finally:

timeTable = []

print("insert succeed!")

pass

try:

with connection.cursor() as cursor:

sql = "INSERT INTO `timetablenote` (`classCode`,`semester`,`note`) VALUES (%s,%s,%s)"

cursor.execute(sql, (classCode,semester,note.get_text()))

connection.commit()

finally:

print("note insert succeed!")

pass

else:

print("failed to open the url, code=", response.code)

pass

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

importurllib

importurllib.request

frombs4importBeautifulSoup

importpymysql.cursors

defopenLink(response):

try:

urllib.urlopen(response)

return0

excepte:

return1

pass

definsertTameTable(classCode,semester):

connection=pymysql.connect(

host='localhost',

user='root',

password='',

db='python',

charset="utf8",

cursorclass=pymysql.cursors.DictCursor)

# response = urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjkb_bj.aspx?xsbh='+classCode+'&xq='+semester)

response=urllib.request.Request('http://172.16.129.117/web_jxrw/cx_kb_bjkb_bj.aspx?xsbh='+classCode+'&xq='+semester)

flag=1

while(flag==1):

try:

print("opening link...")

response=urllib.request.urlopen(response)

flag=0

except:

print("failed open the link. retrying...")

flag=1

pass

ifresponse.code==200:

print("succeed to open the url, code=",response.code)

html=response.read()

soup=BeautifulSoup(

html,

'html.parser',

from_encoding='gbk')

arr=soup.find(id="GVkb").find_all("td")

note=soup.find(id="TextBox1")

timeTable=[]

forxinarr:

ch=x.get_text()

strings=str(ch).replace("\xa0","none")+"\n"

timeTable.append(strings)

iflen(timeTable)==8:

try:

withconnection.cursor()ascursor:

sql="INSERT INTO `timetable` (`classCode`,`semester`,`section`,`one`,`two`,`three`,`four`,`five`,`six`,`seven`) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"

cursor.execute(sql,(classCode,semester,timeTable[0],timeTable[1],timeTable[2],timeTable[3],timeTable[4],timeTable[5],timeTable[6],timeTable[7]))

connection.commit()

finally:

timeTable=[]

print("insert succeed!")

pass

try:

withconnection.cursor()ascursor:

sql="INSERT INTO `timetablenote` (`classCode`,`semester`,`note`) VALUES (%s,%s,%s)"

cursor.execute(sql,(classCode,semester,note.get_text()))

connection.commit()

finally:

print("note insert succeed!")

pass

else:

print("failed to open the url, code=",response.code)

pass

运行抓取课程表python代码

等待抓取。我大概花了6.5个小时才把所有的课程表抓取下来。建议在晚上准备睡觉的时候运行,直到第二天早上。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值