python 爬虫抓取中央气象台-台风网

最新推荐文章于 2023-10-05 08:26:22 发布

qq_1548357515

最新推荐文章于 2023-10-05 08:26:22 发布

阅读量2.2k

点赞数 3

分类专栏： python 文章标签： Python爬虫中央气象台台风数据 JSON解析数据抓取

本文链接：https://blog.csdn.net/qq_33239778/article/details/114029058

版权

python 专栏收录该内容

22 篇文章 0 订阅

订阅专栏

python 爬虫抓取中央气象台-台风网

技术交流： 1548357515

获取某一年的台风名称数据

请求链接为：
http://typhoon.nmc.cn/weatherservice/typhoon/jsons/list_2019?t=1614158017458&callback=typhoon_jsons_list_2019
t: 13位时间戳 callback: t yphoon_jsons_list + 年份
返回数据中文有乱码不要担心，代码中没事

# 获取所有台风
def get_html(url):
    html_obj = requests.get(url, headers=headers(url), proxies=proxys(), verify=False).text
    date = json.loads(re.match(".*?({.*}).*", html_obj, re.S).group(1))['typhoonList']
    item_list = []
    for v in date:
        item = {}
        item['id'] = v[0]
        item['name'] = '%s%s%s' % (v[4], v[2], v[1])
        item['dec'] = '%s' % v[6]
        item_list.append(item)
    return item_list

获取单个台风的详细数据

链接位：
http://typhoon.nmc.cn/weatherservice/typhoon/jsons/view_2540445?t=1614158670558&callback=typhoon_jsons_view_2540445
t: 13位时间戳 callback： typhoon_jsons_view_ + 台风的id

在这里插入图片描述

def get_xiang(item):
    print("开始抓取%s 台风信息, id: %s" % (item['name'], item['id']))
    t = int(round(time.time() * 1000))
    url = 'http://typhoon.nmc.cn/weatherservice/typhoon/jsons/view_%s?t=%s&callback=typhoon_jsons_view_2297801' % (item['id'], t)
    html_obj = requests.get(url, headers=headers(url), proxies=proxys(), verify=False).text
    date = json.loads(re.match(".*?({.*}).*", html_obj, re.S).group(1))['typhoon']
    # 建立字典
    info_dicts = defaultdict(list)
    for v in date[8]:
        info_dicts['id'].append(item['id'])
        info_dicts['name'].append(item['name'])
        info_dicts['desc'].append(item['dec'])
        # 时间  时间戳转日期
        info_dicts['时间'].append(millisecond_to_time(v[2]))
        info_dicts['风速'].append('%sm/s' % v[7])
        yi = '%s' % v[8]
        # 东:East,缩写成E; 2、南:South,缩写成S; 3、西:West,缩写成W; 4、北:North
        info_dicts['移向'].append(yi.replace('N', '北').replace('E', '东').replace('S', '南').replace('W', '西'))
        # 强度
        info_dicts['强度'].append(get_type(v[3]))
        info_dicts['中心位置'].append('%sN/%sE' % (v[5], v[4]))
        info_dicts['中心气压'].append('%s百帕' % v[6])
    data = pd.DataFrame(info_dicts)
    return data

# 强度类型
def get_type(date_type):
    item = {'TC': '热带气旋', 'TD': '热带低压', 'TS': '热带风暴', 'STS': '强热带风暴',
            'TY': '台风', 'STY': '强台风', 'SuperTY': '超强台风', '': '',}
    return item.get(date_type, '')