1.条件准备
1.1、PyCharm
1.2、最好大学网排名:(http://www.zuihaodaxue.cn/zuihaodaxuepaiming2019.html)
2.整体思路
2.1 利用requests获得网页信息
2.2 引用BeautifulSoup 库获取“排名”,“学校名称”,“省市”,“总分”,并按顺序排列
3.代码如下
import requests
from bs4 import BeautifulSoup
import bs4
def getHtmlText(url):
try:
r = requests.get(url)
r.raise_for_status()
r.encoding = r.apparent_encoding
return r.text# "r.text“则为错
except():
return ""
def fillUlist(Ulist, html):
soup = BeautifulSoup(html, "html.parser")
for tr in soup.find('tbody').children:
if isinstance(tr, bs4.element.Tag):
tds = tr('td')
Ulist.append([tds[0].string, tds[1].string, tds[2].string, tds[3]