爬取天气网天气信息

最新推荐文章于 2024-07-20 17:12:48 发布

程序猪666

最新推荐文章于 2024-07-20 17:12:48 发布

阅读量202

点赞数 5

文章标签： python 开发语言 pandas 网络爬虫 json

本文链接：https://blog.csdn.net/weixin_47238423/article/details/134979363

版权

爬取天气网天气信息

https://www.bilibili.com/video/BV19j41157AM/?vd_source=efbd9f2a10893931c2f338f96867a1e9

https://v.douyin.com/i8d6yMFV

需要完整程序代码联系我！！!
此Python程序是一个用于从天气网获取天气数据的网络爬虫。它使用`requests`库进行HTTP请求，使用`BeautifulSoup`进行HTML解析，以及使用`json`处理城市名称及其对应的拼音表示。

以下是主要组件的详细说明：

1. **fetch_weather_data函数：**
- 接受URL、头文件和Cookie作为参数，以向天气网站发起请求。
- 使用BeautifulSoup解析HTML，提取各种与天气相关的信息。
- 提取的数据包括一般天气信息、白天和夜晚的详细信息、逐小时的预报数据、生活指数以及周围地区和景点的天气信息。
2. **get_weather_link函数：**
- 根据城市名称构建天气预报页面的URL。
- 使用`get_pinyin`函数将城市名称转换为拼音，该函数从包含城市名称及其拼音表示的JSON文件中读取。
3. **get_pinyin函数：**
- 从JSON文件（`cities.json`）中读取，获取给定城市名称的拼音表示。
4. **主执行块：**
- 提示用户输入城市名称。
- 使用`get_weather_link`构建天气预报URL。
- 如果成功获取URL，打印链接并调用`fetch_weather_data`函数显示天气信息。
- 如果在JSON文件中未找到城市名称，向用户提供相应信息。
5. **异常处理：**
- 脚本包含异常处理，以捕获并打印执行过程中可能发生的任何意外错误。
在运行脚本之前，请确保已安装所需的库：
```bash
pip install requests
pip install beautifulsoup4
```

还请确保`cities.json`文件存在且正确格式化，其中包含城市名称及其对应的拼音表示。请注意，网络爬取可能受法律和道德考虑的限制，务必审查并遵守您正在爬取的网站的服务条款。

import requests
from bs4 import BeautifulSoup
import json

def fetch_weather_data(url, headers=None, cookies=None):
    try:
        response = requests.get(url, headers=headers, cookies=cookies)
        response.raise_for_status()  # 抛出HTTP错误异常
    except requests.RequestException as e:
        print(f"请求页面失败。错误信息：{e}")
        return

    response.encoding = 'utf-8'

    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')

        try:
            hidden_title = soup.select_one('#hidden_title')['value']
            update_time = soup.select_one('#update_time')['value']
            fc_24h_internal_update_time = soup.select_one('#fc_24h_internal_update_time')['value']

            print('天气预报信息:', hidden_title)
            print('更新时间:', update_time)
            print('内部更新时间:', fc_24h_internal_update_time)

            # 提取白天信息
            day_temperature = soup.select_one('p.tem span').get_text(strip=True)
            day_weather = soup.select_one('li p.wea').get('title')

            # 提取白天风信息及级数
            day_wind_info = soup.select_one('p.win span').get('title') if soup.select_one('p.win span') else ''
            day_wind_level = soup.select_one('p.win span').get_text(strip=True) if soup.select_one('p.win span') else ''

            # 提取日出时间
            sunrise_time = soup.select_one('p.sunUp span').get_text(strip=True) if soup.select_one('p.sunUp span') else ''

            # 提取夜间信息
            night_weather = soup.select('li p.wea')[1].get('title')
            night_temperature = soup.select('li p.tem span')[1].get_text(strip=True)

            # 提取夜间风信息及级数
            night_wind_info = soup.select('li p.win span')[1].get('title') if soup.select('li p.win span')[1] else ''
            night_wind_level = soup.select('li p.win span')[1].get_text(strip=True) if soup.select('li p.win span')[1] else ''

            # 提取日落时间
            sunset_time = soup.select_one('p.sunDown span').get_text(strip=True) if soup.select_one('p.sunDown span') else ''

            print('白天信息:', day_temperature + '℃', day_weather, day_wind_info, day_wind_level, '日出时间:', sunrise_time)
            print('夜间信息:', night_temperature + '℃', night_weather, night_wind_info, night_wind_level, sunset_time)

            # 寻找包含hour3data的脚本
            hourly_forecast_script = soup.find('script', string=lambda x: x and 'var hour3data' in x)

            if hourly_forecast_script:
                # 如果找到脚本，则提取hour3data的值
                hourly_forecast_data = hourly_forecast_script.text.split('var hour3data=')[-1]
                print(f"\nHourly Forecast Data: {hourly_forecast_data}")
            else:
                print("未找到包含hour3data的脚本")

            # 提取生活指数
            life_indices = soup.select('.livezs ul li')
            print("\n生活指数：")
            for index in life_indices:
                category = index.select_one('em').get_text(strip=True)
                level_tag = index.select_one('span')
                level = level_tag.get_text(strip=True) if level_tag else 'N/A'
                suggestion = index.select_one('p').get_text(strip=True)
                print(f"{category}：{level} {suggestion}\n")

            # 提取周边地区天气信息
            print("\n周边地区天气信息：")
            around_city_items = soup.select('.city li')
            for item in around_city_items:
                city_name = item.select_one('span').get_text(strip=True)
                temperature = item.select_one('i').get_text(strip=True)
                print(f"{city_name}: {temperature}")

            # 提取周边景点天气信息
            print("\n周边景点天气信息：")
            around_view_items = soup.select('.view li')
            for item in around_view_items:
                view_name = item.select_one('span').get_text(strip=True)
                temperature = item.select_one('i').get_text(strip=True)
                print(f"{view_name}: {temperature}")

        except (AttributeError, KeyError, IndexError) as e:
            print(f"解析页面时发生错误：{e}")
            return

    else:
        print(f"请求页面失败。状态码：{response.status_code}")

def get_weather_link(city_name):
    base_url = 'http://www.weather.com.cn/weather1d/'
    pinyin_city_name = get_pinyin(city_name)
    weather_link = f'{base_url}{pinyin_city_name}.shtml'
    return weather_link

def get_pinyin(city_name):
    with open('cities.json', 'r', encoding='utf-8') as json_file:
        city_dict = json.load(json_file)
    return city_dict.get(city_name, '')

if __name__ == "__main__":
    try:
        user_city = input("请输入城市名: ")
        weather_link = get_weather_link(user_city)

        if weather_link:
            print(f"{user_city}的天气预报链接为: {weather_link}")
            fetch_weather_data(weather_link)
        else:
            print("未找到对应城市的拼音")

    except Exception as e:
        print(f"发生未处理的异常：{e}")
        # 可以记录异常或执行其他错误处理操作

程序猪666

关注

5
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
爬取天气网天气信息

此Python程序是一个用于从天气网获取天气数据的网络爬虫。它使用`requests`库进行HTTP请求，使用`BeautifulSoup`进行HTML解析，以及使用`json`处理城市名称及其对应的拼音表示。还请确保`cities.json`文件存在且正确格式化，其中包含城市名称及其对应的拼音表示。请注意，网络爬取可能受法律和道德考虑的限制，务必审查并遵守您正在爬取的网站的服务条款。需要完整程序代码联系我！
复制链接

扫一扫