python爬虫案例（二）：大学排名

最新推荐文章于 2024-04-06 14:30:58 发布

snow小米

最新推荐文章于 2024-04-06 14:30:58 发布

阅读量1.2k

点赞数 3

分类专栏：爬虫案例文章标签： python

本文链接：https://blog.csdn.net/snow654321/article/details/106156419

版权

爬虫案例专栏收录该内容

2 篇文章 0 订阅

订阅专栏

小菜鸟从一个个案例来练习爬虫，心路是曲折的，555

在爬虫案例（一）中，是用urllib进行的。本案例中应用的是requests库，它会比urllib更加方便，requests是python实现的最简单易用的HTTP库，建议爬虫使用requests库。
1.安装requests库。
python没有安装requests模块，可以在cmd窗口通过：pip install requests 语句进行安装，用来爬取网页内容。
在这里插入图片描述
类似的，安装第三方库beautifulsoup4用来将爬取的网页内容分析处理：pip install beautifulsoup4。
2.写出案例代码如下：
（该案例源代码来自https://python123.io/index/notebooks/python_programming_basic_v2）

import requests
from bs4 import BeautifulSoup
allUniv = []
def getHTMLText(url):
    try:
        response = requests.get(url,timeout=30)
        response.encoding='utf-8'
        if response.status_code == 200: #如果状态码是200，不产生异常
            return response.text
        return None
    except RequestException:
        print("请求索引页错误")
        return None

def fillUnivList(soup):
    data = soup.find_all('tr')#查看html源码：信息在html中以table形式存在，每个学校信息是一行，存放在tr中，单元格信息存放在td中
    for tr in data:
        ltd = tr.find_all('td')
        if len(ltd)==0:
            continue    
        singleUniv = []
        for td in ltd:
            singleUniv.append(td.string)
        allUniv.append(singleUniv)
def printUnivList(num):
    print("{:<4}{:<15}{:<8}{:<8}{:<10}".format("排名","学校名称","省市","学校类型","总分"))
    for i in range(num):
        u=allUniv[i]
        print("{:<4}{:<15}{:<8}{:<8}{:<10}".format(u[0],u[1],u[2],u[3],u[4]))
def main():
    url = 'http://www.zuihaodaxue.cn/zuihaodaxuepaiming2020.html'
    html = getHTMLText(url)
    soup = BeautifulSoup(html, "html.parser")#使用bs4库中BeautifulSoup类，生成一个对象。
    fillUnivList(soup)
    printUnivList(10)#输出排名前10的学校
main()

结果为：

排名  学校名称           省市      学校类型    总分        
1   清华大学           北京      综合      852.5     
2   北京大学           北京      综合      746.7     
3   浙江大学           浙江      综合      649.2     
4   上海交通大学         上海      综合      625.9     
5   南京大学           江苏      综合      566.1     
6   复旦大学           上海      综合      556.7     
7   中国科学技术大学       安徽      理工      526.4     
8   华中科技大学         湖北      综合      497.7     
9   武汉大学           湖北      综合      488.0     
10  中山大学           广东      综合      457.2

snow小米

关注

3
点赞
踩
22

收藏

觉得还不错? 一键收藏
0
评论
python爬虫案例（二）：大学排名

小菜鸟从一个个案例来练习爬虫，心路是曲折的，555在爬虫案例（一）中，是用urllib进行的。本案例中应用的是requests库，它会比urllib更加方便，requests是python实现的最简单易用的HTTP库，建议爬虫使用requests库。1.安装requests库。python没有安装requests模块，可以在cmd窗口通过：pip install requests 语句进行安装，用来爬取网页内容。类似的，安装第三方库beautifulsoup4用来将爬取的网页内容分析处理：pip
复制链接

扫一扫

专栏目录