Python爬虫06-bs4爬取全国城市温度

最新推荐文章于 2024-03-28 21:33:54 发布

对流层的酱猪肘

最新推荐文章于 2024-03-28 21:33:54 发布

阅读量256

点赞数 1

分类专栏： Python爬虫文章标签： python

本文链接：https://blog.csdn.net/weixin_47133012/article/details/107687950

版权

Python爬虫专栏收录该内容

27 篇文章 1 订阅

订阅专栏

1、导入模块

import requests
from bs4 import BeautifulSoup

2、定义函数解析网页

def Geturl(url):
    headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
    response=requests.get(url,headers=headers)
    res=response.content.decode('utf-8')

先获取conMidtab 这个div标签

    soup=BeautifulSoup(res,'html5lib')
    conMidtab=soup.find('div',class_='conMidtab')

获取所有table标签

    tables=conMidtab.find_all('table')

遍历tables，获取所有tr标签，并且把前两个过滤掉

    for table in tables:
        trs=table.find_all('tr')[2:]

遍历trs，获取所有td标签
enumerate(trs) 返回两个值，第一个是下标索引，第二个是下标索引对应的值

        for index,tr in enumerate(trs):
            tds=tr.find_all('td')

解决直辖市和省份问题通过判断下标索引值来取第1个值

            if index==0:
                city_tag=tds[1]
            else:
                city_tag=tds[0]

            temp_tag=tds[-2]  # 温度标签在倒数第二个tds里
            city=list(city_tag.stripped_strings)[0]
            temp=list(temp_tag.stripped_strings)[0]
            print('城市：',city,'温度：',temp)

3、定义一个主函数，调用

def main():
    urls=['http://www.weather.com.cn/textFC/hb.shtml','http://www.weather.com.cn/textFC/hd.shtml','http://www.weather.com.cn/textFC/gat.shtml']
    for url in urls:
        Geturl(url)

if __name__ == '__main__':
    main()