Python 19行代码获取中国大学排名


解决问答伙伴的问题:
获取中国大学排名数据(软科数据)
链接:https://www.shanghairanking.cn/rankings/bcur/2020

  • 工具:
    python,request,xpath,lxml

第一步:引入工具包

import requests
from lxml import etree

第二步:设置请求头

headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36',
    'Accept-Encoding':'gzip, deflate, br',
    'Accept-Language':'zh-CN,zh;q=0.9',
}

第三步:请求页面数据

html = requests.get('https://www.shanghairanking.cn/rankings/bcur/2020',headers=headers)
print(html)#打印请求状态

第四步:解析数据

etree_html= etree.HTML(html.text)
table_list = etree_html.xpath('//*[@id="content-box"]/div[2]/table/tbody/tr')
for i in range(len(table_list)):
    rank = table_list[i].xpath('td[1]/text()')[0].replace('\t','').replace('\n','').replace(' ','')
    name = table_list[i].xpath('td[2]/a/text()')[0].encode('raw_unicode_escape').decode()
    area = table_list[i].xpath('td[3]/text()')[0].encode('raw_unicode_escape').decode().replace('\t','').replace('\n','').replace(' ','')
    type = table_list[i].xpath('td[4]/text()')[0].encode('raw_unicode_escape').decode().replace('\t','').replace('\n','').replace(' ','')
    score = table_list[i].xpath('td[5]/text()')[0].replace('\t','').replace('\n','').replace(' ','')
    print(rank,name,area,type,score)

结果显示

在这里插入图片描述

完整代码

import requests
from lxml import etree
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36',
    'Accept-Encoding':'gzip, deflate, br',
    'Accept-Language':'zh-CN,zh;q=0.9',
}
html = requests.get('https://www.shanghairanking.cn/rankings/bcur/2020',headers=headers)
print(html)#打印请求状态

etree_html= etree.HTML(html.text)
table_list = etree_html.xpath('//*[@id="content-box"]/div[2]/table/tbody/tr')
for i in range(len(table_list)):
    rank = table_list[i].xpath('td[1]/text()')[0].replace('\t','').replace('\n','').replace(' ','')
    name = table_list[i].xpath('td[2]/a/text()')[0].encode('raw_unicode_escape').decode()
    area = table_list[i].xpath('td[3]/text()')[0].encode('raw_unicode_escape').decode().replace('\t','').replace('\n','').replace(' ','')
    type = table_list[i].xpath('td[4]/text()')[0].encode('raw_unicode_escape').decode().replace('\t','').replace('\n','').replace(' ','')
    score = table_list[i].xpath('td[5]/text()')[0].replace('\t','').replace('\n','').replace(' ','')
    print(rank,name,area,type,score)
  • 注:中文需要处理编码

希望能帮到小伙伴,留了溜了,拜了个拜 ~~~~

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值