python爬虫综合实战(动态+静态)，爬取国内天气

最新推荐文章于 2024-07-07 08:00:00 发布

CBeat

最新推荐文章于 2024-07-07 08:00:00 发布

阅读量1.4k

点赞数 5

分类专栏：爬虫文章标签： python 爬虫 request xpath

本文链接：https://blog.csdn.net/qq_43954124/article/details/118915856

版权

爬虫专栏收录该内容

2 篇文章 2 订阅

订阅专栏

文章目录

前言
一、使用的库与网站
二、爬取天气信息(动态)
- 1.分析API请求
- 2.爬虫主程序编写
三、cityid的获取(静态)

前言

最近做的一些项目需要获得国内某地的天气，所以写了一个爬取天气的程序。

一、使用的库与网站

使用的库有requests、lxml
爬取天气的网站为https://www.weatherol.cn/
json解析网站https://www.json.cn/
cityid获取https://blog.csdn.net/li_and_li/article/details/79602686

二、爬取天气信息(动态)

1.分析API请求

打开网站https://www.weatherol.cn/，进入开发者选项，刷新一下，包过滤选择XHR，可以看到如下几个请求。
在这里插入图片描述
可以看到，请求天气信息的API应为。

https://www.weatherol.cn/api/home/getCurrAnd15dAnd24h?cityid=101180301

参数cityid为城市统一编码。
然后，我们将返回的json解析一下，看看里面都有什么信息。
在这里插入图片描述
可以看到，15天内的信息都在这里面，我们可以自行提取需要的信息

2.爬虫主程序编写

程序的主要结构为使用requests发出post请求，再使用parse_data()函数解析响应

代码如下：

import json
from typing import Union

import requests

# API
city_weather_url = 'http://www.weatherol.cn/api/home/getCurrAnd15dAnd24h'
# ua
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'


def get_headers():
    """
    获得请求头部
    return: headers: dict
    """
    headers = {
        'user_agent': user_agent
    }
    return headers


def parse_data(weather_data):
    """
    解析返回的数据，提取有用的内容
    return: weather_dict: dict
    """
    ret_dict = {}
    ret_dict['当前天气'] = weather_data['data']['current']['current']['weather']
    ret_dict['当前温度'] = weather_data['data']['current']['current']['temperature']

    high = weather_data['data']['forecast15d'][1]['temperature_am']
    low = weather_data['data']['forecast15d'][1]['temperature_pm']
    ret_dict['今日温度'] = low + ' - ' + high

    ret_dict['风向'] = weather_data['data']['current']['current']['winddir']
    ret_dict['风速'] = weather_data['data']['current']['current']['windpower']
    ret_dict['气压'] = weather_data['data']['current']['current']['airpressure'] + 'hpa'
    ret_dict['湿度'] = weather_data['data']['current']['current']['humidity'] + '%'

    aqi = weather_data['data']['current']['air']['AQI']
    level = weather_data['data']['current']['air']['levelIndex']
    ret_dict['空气质量'] = aqi + '/' +level

    ret_dict['小提示'] = weather_data['data']['current']['tips']

    return ret_dict
    

def get_weather(city_id) -> Union[None, dict]:
    """
    根据城市ID获取天气信息
    """
    params = {
        'cityid': city_id
    }
    # 发出post请求
    response = requests.get(url=city_weather_url, headers=get_headers(), params=params)
    weather_json = response.text

    # 转换返回的字符串为json并解析
    weather_data = json.loads(weather_json)
    weather_dict = parse_data(weather_data)
    print(response)
    return weather_dict


def test():
    weather_dict = get_weather('101180301')
    print(weather_dict)

if __name__ == '__main__':
     test()

运行一下
在这里插入图片描述

三、cityid的获取(静态)

现在，我们已经可以通过cityid来获取15天内的所有天气信息，但是，我们怎么来获取cityid呢？
我们发现网上有很多人都已经汇总好了国内所有城市的ID，我们只需要将其解析储存到本地，使用的时候再去检索就好了。
通过百度，我找到了一个比较好爬取的博客。

https://blog.csdn.net/li_and_li/article/details/79602686

直接F12看下网页结构并用Xpath Helper这个工具解析一下xpath
在这里插入图片描述
代码如下

import requests
import json
from lxml import etree
url = 'https://blog.csdn.net/li_and_li/article/details/79602686'

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36'
}


def parse_html(html):
    """
    解析网页，返回网页内所有城市的名字与ID
    """
    et = etree.HTML(html)
    citys = et.xpath('//div[@id="content_views"]/p/text()')
    ret_dict = {}
    for city in citys:
        try:
            city_info = city.split(',')
            city_id = city_info[0]
            city_name = city_info[1]
            ret_dict[city_name] = city_id
        except:
            print('err str: ' + city)
            continue 

    return ret_dict


response = requests.get(url, headers=headers)
html = response.text
city_info = parse_html(html)
with open('city_id.json', 'w', encoding='utf8') as fp:
    fp.write(json.dumps(city_info, ensure_ascii=False))

爬取后的结果
在这里插入图片描述
之后，我们就可以从这个文件里面得到cityid然后使用上面写的天气爬虫爬取天气了！！

CBeat

关注

5
点赞
踩
13

收藏

觉得还不错? 一键收藏
0
评论
python爬虫综合实战(动态+静态)，爬取国内天气

文章目录前言一、使用的库与网站二、使用步骤1.引入库2.读入数据总结前言最近做的一些项目需要获得国内某地的天气，所以写了一个爬取天气的程序。一、使用的库与网站11二、使用步骤1.引入库代码如下（示例）：import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsimport warningswarnings.filterwarnings('ignore')im
复制链接

扫一扫