题目:
使用requests库及BeautifulSoup库爬取https://www.shanghairanking.cn/rankings/bcur/2023网站前20名高校信息,按照以下格式输出。提交实验报告及程序源文件。
1.在管理员命令标识符安装requests库及BeautifulSoup库:
2.爬取https://www.shanghairanking.cn/rankings/bcur/2023网站前20名高校信息:
3.在python写爬虫代码:
4.运行代码:
import requests
from bs4 import BeautifulSoup
url = "https://www.shanghairanking.cn/rankings/bcur/2023"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
table = soup.find("table", class_="rk-table")
print("{:<8} {:<30} {:<10}".format("排名", "学校名称", "总分"))
for row in table.find_all("tr")[1:]:
cols = row.find_all("td")
rank = cols[0].text.strip()
university = cols[1].find("a").text.strip()
total_score = cols[4].text.strip()
print("{:<8} {:<30} {:<10}".format(rank, university, total_score))
if int(rank) >= 20:
break
结果如下:
大概就是这样谢谢!