目录
1. 介绍
本文接着Python爬虫-爬取天气信息(1),介绍如何爬取某地区的今日天气信息。
您也可以访问我的主页查看更多文章:
2. 请求信息
(1)如下图,我们找到响应的今日天气信息如下,
(2)对应的标头信息如下所示:
需要注意的是请求的URL和第三章,爬取实况天气的URL相似,注意甄别。
今日天气URL:http://d1.weather.com.cn/dingzhi/101190101.html?_=1687251340643
实况天气URL:http://d1.weather.com.cn/sk_2d/101190101.html?_=1687251340642
3. 编写爬虫
(1)编写爬取今日天气的爬虫,dingzhi_weather_spider.py:
'''
爬取今日天气
'''
import re
import requests
import json
import datetime
UA = {
'Referer': 'http://www.weather.com.cn/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Edg/114.0.1823.43'
}
class GetDingZhiWeather():
def get_dingzhi_weather(area_id):
# 请求的URL
URL = f'http://d1.weather.com.cn/dingzhi/{area_id}.html'
# 发送请求
req = requests.get(URL, headers=UA)
# print(req.text)
# print(req)
if req.status_code == 200:
# 编码格式为UTF-8
req.encoding = 'utf-8'
# 获取当前日期
today = datetime.date.today()
# 匹配今日天气信息
dingzhi_weather = re.search(r'(\{"city".*?\})', req.text)
# 匹配天气预警信息
alarm_weather = re.search(r'(\{"w1".*?\})', req.text)
# 今日天气信息
weather_info = ''
# 天气预警信息
alarm_info = ''
if dingzhi_weather:
# 将JSON格式的字符串转换为对应的Python对象。
weather_json = json.loads(dingzhi_weather.group())
weather_info = f'''
当前日期: {str(today)}
当前地区: {weather_json['cityname']}
今日天气: {weather_json['weather']}
最高气温: {weather_json['temp']}
最低气温: {weather_json['tempn']}
今日风向: {weather_json['wd']}
今日风力: {weather_json['ws']}
'''
if alarm_weather:
# 将JSON格式的字符串转换为对应的Python对象。
alarm_json = json.loads(alarm_weather.group())
alarm_info = f'''
预警地区: {alarm_json['w1']}
预警类型: {alarm_json['w13']}
发布时间: {alarm_json['w8']}
预警内容: {alarm_json['w9']}
'''
return weather_info + alarm_info
else:
return "数据请求失败"
(2)编写测试代码,dingzhi_weather_test.py:
from spider.dingzhi_weather_spider import GetDingZhiWeather
if __name__ == '__main__':
# 调用get_dingzhi_weather方法获取地区ID为101130501的今日天气
weather = GetDingZhiWeather.get_dingzhi_weather(101130501)
print(weather)
4. 测试验证
(1)运行测试代码dingzhi_weather_test.py,查看控制台输出:
可以看到正常输出了地区ID为101130501的今日天气!