爬取软科-中国大学排行榜

最新推荐文章于 2023-11-16 20:48:42 发布

有点emo的Neo

最新推荐文章于 2023-11-16 20:48:42 发布

阅读量653

点赞数

分类专栏： #python课堂作业文章标签： python

本文链接：https://blog.csdn.net/qq_43284192/article/details/109467344

版权

#python课堂作业专栏收录该内容

9 篇文章 0 订阅

订阅专栏

爬取软科中国大学排行榜

在这里插入图片描述
要求最后得到的文本爬取的格式如下

import requests

r=requests.get('http://www.shanghairanking.cn/rankings/bcur/2020')

r.status_code

r.encoding=r.apparent_encoding

r.text
##输出展示文本
school=r.text

from bs4 import BeautifulSoup

soup=BeautifulSoup(school,'html.parser')

soup.tbody

soup.find_all('tbody')[0].find_all('tr')[0].find_all('td')[0].string.replace('\n','').replace(' ','')
##输出“1”
soup.find_all('tbody')[0].find_all('tr')[0].find_all('td')[1].a.string
##'清华大学'
soup.find_all('tbody')[0].find_all('tr')[0].find_all('td')[2].string.replace('\n','').replace(' ','')
##‘北京’
soup.find_all('tbody')[0].find_all('tr')[0].find_all('td')[3].string.replace('\n','').replace(' ','')
##‘综合’
soup.find_all('tbody')[0].find_all('tr')[0].find_all('td')[4].string.replace('\n','').replace(' ','')
##‘852.5’
soup.find_all('tbody')[0].find_all('tr')[0].find_all('td')[5].string.replace('\n','').replace(' ','')
##'38.2'

整理，利用for循环输出排名

for t in soup.find_all('tbody')[0].find_all('tr'):
    print(t.find_all('td')[0].string.replace('\n','').replace(' ',''),
         t.find_all('td')[1].a.string,
         t.find_all('td')[2].string.replace('\n','').replace(' ',''),
         t.find_all('td')[3].string.replace('\n','').replace(' ',''),
         t.find_all('td')[4].string.replace('\n','').replace(' ',''),
         t.find_all('td')[5].string.replace('\n','').replace(' ',''))

在这里插入图片描述
完成输出，具体导出呈txt或相关文件，可以看一下美团或者相亲的那个导出方法。

有点emo的Neo

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
爬取软科-中国大学排行榜

爬取软科中国大学排行榜要求最后得到的文本爬取的格式如下import requestsr=requests.get('http://www.shanghairanking.cn/rankings/bcur/2020')r.status_coder.encoding=r.apparent_encodingr.text##输出展示文本school=r.textfrom bs4 import BeautifulSoupsoup=BeautifulSoup(school,'html
复制链接

扫一扫