用Requests和lxml爬虫天气预报，并数据可视化为轮播图

最新推荐文章于 2024-06-05 11:29:15 发布

米饭呐

最新推荐文章于 2024-06-05 11:29:15 发布

阅读量277

点赞数 2

文章标签：爬虫

本文链接：https://blog.csdn.net/weixin_50271491/article/details/138035643

版权

本文详细介绍了如何使用Python的Requests和Ixml库爬取深圳2023年每月天气数据，通过XPath解析网页内容，然后用Pandas和Pyecharts进行数据整理和可视化，生成了轮播图展示天气变化情况。

摘要由CSDN通过智能技术生成

小白记录python爬虫练习2，用Requests和Ixml爬虫深圳2023年每月天气，并将数据可视化为轮播图

练习网站，点击天气，选所要爬取的城市：

https://www.tianqi.com/

导入第三方库

import requests
from lxml import etree
import csv

分析所要爬取的url有什么变化规律

https://lishi.tianqi.com/shenzhen/202301.html
可以很容易看出前面‘https://lishi.tianqi.com/shenzhen/’是固定的
变化的是后面的数字，根据年份+月份组成，个位数月份前面需加0

代码：
for month in range(1, 13):
    weather_time = '2023' + ('0' + str(month) if month < 10 else str(month))
    url = f'https://lishi.tianqi.com/shenzhen/{weather_time}.html'

利用Requests发送请求，lxml预处理数据，处理数据时的xpath可以通过网页获取，获取到的每日数据保存到数据字典day_weather_info中，再保存到列表weather_info中，这样weather_info的数据就是所爬取的这个月的数据，我们再将每个月的数据放入年数据weathers中，具体代码如下

def getWeather(url):
    weather_info = []
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Edg/123.0.0.0'
    }
    # 发起请求
    response = requests.get(url=url, headers=headers)
    # 数据预处理
    response_html = etree.HTML(response.text)
    # xpath提取当页的所有数据（每月）
    response_list = response_html.xpath('/html/body/div[7]/div[1]/div[4]/ul/li')

    # 循环遍历每日数据
    for li in response_list:
        # 每日数据放入字典
        day_weather_info = {}
        # 日期 原本格式为2023-01-01 星期日 存入格式为2023-01-01
        day_weather_info['date_time'] = li.xpath('./div[1]/text()')[0].split(' ')[0]
        # 最高温 原本格式为16℃ 存入格式为16
        high = li.xpath('./div[2]/text()')[0]
        day_weather_info['high'] = high[:high.find('℃')]
        # 最低温 原本格式为16℃ 存入格式为16
        low = li.xpath('./div[3]/text()')[0]
        day_weather_info['low'] = low[:low.find('℃')]
        # 天气
        day_weather_info['weather'] = li.xpath('./div[4]/text()')[0]
        weather_info.append(day_weather_info)

    # print(weather_info)
    return weather_info

# 全年的天气数据
weathers = []

for month in range(1, 13):
    weather_time = '2023' + ('0' + str(month) if month < 10 else str(month))
    url = f'https://lishi.tianqi.com/shenzhen/{weather_time}.html'
    # 爬虫获取每月的天气数据
    weather = getWeather(url)
    # 将每月的数据放入年数据
    weathers.append(weather)
    print(weathers)

将数据保存到csv文件中，采用一次性写入的方法

with open('weather.csv', 'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    # 写入表头
    writer.writerow(['日期', '最高气温', '最低气温', '天气'])
    # 一次性写入多行用writerows（写入的数据是列表，一个列表对应一行）
    list_year = []
    for month_weather in weathers:
        for day_weather_dict in month_weather:
            list_year.append(list(day_weather_dict.values()))
    writer.writerows(list_year)

打开weather.csv文件看，就是以下数据

新建一个.py文件将数据可视化为轮播图，使用pandas和pyecharts，先导入第三方库

import pandas as pd
from pyecharts import options as opts
from pyecharts.charts import Bar, Timeline

读取数据并处理数据，具体代码如下

# 数据读取
df = pd.read_csv('weather.csv')
# print(df['日期'])

# 将日期转换为datetime类型
df['日期'] = df['日期'].apply(lambda x: pd.to_datetime(x))
# print(df['日期'])

# 取出月份
df['month'] = df['日期'].dt.month
# print(df['month'])

df_agg = df.groupby(['month', '天气']).size().reset_index()
# 设置df_agg列名
df_agg.columns = ['month', 'weather', 'count']
# print(df_agg)

处理好数据之后就是画图，具体代码如下

# 画图
# 时间序列
timeline = Timeline()
# 设置播放时间间隔
timeline.add_schema(play_interval=1000)

'''
将数据封装成这个格式
[['雾', 1], ['小雨', 3], ['晴', 3], ['阴', 4], ['多云', 20]]
'''
for month in df_agg['month'].unique():
    data = (
        df_agg[df_agg['month'] == month][['weather', 'count']]
        .sort_values(by='count', ascending=True)
        .values.tolist()
    )

    # 绘制柱状图
    bar = Bar()
    # x轴数据：天气名称
    bar.add_xaxis([x[0] for x in data])
    # y轴数据：出现次数
    bar.add_yaxis('', [x[1] for x in data])
    # 柱状图横着放
    bar.reversal_axis()
    # 将计数标签放在图形右边
    bar.set_series_opts(label_opts=opts.LabelOpts(position='right'))
    bar.set_global_opts(title_opts=opts.TitleOpts(title='深圳2023年每月天气变化'))
    # 将设置好的bar对象放置在时间轮播图当中，并标签选择月份
    timeline.add(bar, f'{month}月')

# 将设置好的图表保存为html文件
timeline.render('weather.html')

之后用浏览器打开weather.html文件，就可以看到深圳2023年每月天气变化的轮播图啦

米饭呐

关注

2
点赞
踩
4

收藏

觉得还不错? 一键收藏
1
评论
用Requests和lxml爬虫天气预报，并数据可视化为轮播图

利用Requests发送请求，lxml预处理数据，处理数据时的xpath可以通过网页获取，获取到的每日数据保存到数据字典day_weather_info中，再保存到列表weather_info中，这样weather_info的数据就是所爬取的这个月的数据，我们再将每个月的数据放入年数据weathers中，具体代码如下。新建一个.py文件将数据可视化为轮播图，使用pandas和pyecharts，先导入第三方库。之后用浏览器打开weather.html文件，就可以看到深圳2023年每月天气变化的轮播图啦。
复制链接

扫一扫