如今肺炎疫情成为了大家关注的焦点。确诊数据的变化时时刻刻牵动着大家的心。
这一次,我们就用python爬虫的方法获取肺炎数据。下一篇文章,我将用这些数据做一个肺炎数据可视化。
代码获取方式:
关注“python趣味爱好者”公众号,然后回复“爬取疫情”获取完整源代码。
公众号:aa27388473332。
获得的数据我们以“csv”形式保存下来,方便我们下一次数据可视化。
首先我们要访问的网站是:
# 请求的URL
url = 'https://view.inews.qq.com/g2/getOnsInfo?name=disease_h5&callback=&_=%d'
这个网址的源代码里面有相应数据。我们获取这个网页源代码以后,从里面筛选出相应的数据即可。
先创建存储系统,用来保存获取的数据
# 创建空 dataframes
col_names = ['省', '市', '新增确诊','累计确诊', '死亡', '治愈','死亡率','治愈率']
col_names_p = ['省', '新增确诊', '累计确诊', '死亡', '治愈', '死亡率', '治愈率']
然后我们用一个for循环筛选里面的数据。
for item in areaTree:
if item['name'] == '中国':
item_ps = item['children']
# 遍历省级数据
for item_p in item_ps:
province = item_p['name']
# print(province)
# print(item_p['total'])
confirm = item_p['total']['confirm']
death = item_p['total']['dead']
heal = item_p['total']['heal']
new_confirm = item_p['today']['confirm']
deadRate =item_p['total']['deadRate']
healRate =item_p['total']['healRate']
# 向df添加数据
data_dict = {'省': province,'新增确诊':new_confirm,'累计确诊': confirm,
'死亡': death, '治愈': heal, '死亡率': deadRate, '治愈率': healRate}
# print (data_dict)
my_df_p.loc[len(my_df_p)] = data_dict
# 遍历地级数据
item_cs = item_p['children']
for item_c in item_cs:
prefecture = item_c['name']
# print(' ' + prefecture)
# print(' ' + str(item_c['total']))
new_confirm = item_c['today']['confirm']
confirm = item_c['total']['confirm']
# suspect = item_c['total']['suspect']
death = item_c['total']['dead']
heal = item_c['total']['heal']
deadRate = item_c['total']['deadRate']
healRate = item_c['total']['healRate']
# 向df添加数据
data_dict = {'省': province, '市':prefecture, '新增确诊':new_confirm,'累计确诊': confirm,
'死亡': death, '治愈': heal, '死亡率': deadRate, '治愈率': healRate}
my_df.loc[len(my_df)] = data_dict
遍历各个省市的数据。
然后保存下来。我们还可以采集历史数据,这就需要for循环里面加上时间。代码如下:
for day_item in china_day_list:
date = day_item['date'] + '.2020'
confirm = day_item['confirm']
suspect = day_item['suspect']
dead = day_item['dead']
heal = day_item['heal']
nowConfirm = day_item['nowConfirm']
nowSevere = day_item['nowSevere']
deadRate = day_item['deadRate']
healRate = day_item['healRate']
原理相似
**关注公众号(回复“爬取疫情”获取源代码)**:
参考来源:https://github.com/dakula009/China_CoronaVirus_Data_Miner