一.数据采集
先找到数据,对数据进行数据采集
数据来源:新浪疫情实时监测
URL: https://news.sina.cn/zt_d/yiqing0121
数据采集:
1.访问https://news.sina.cn/zt_d/yiqing0121
2.打开Chrome开发者工具,点开network,刷新页面,点击各个请求,找到获取取json的请求。
例如:附图
https://interface.sina.cn/news/wap/fymap2020_data.d.json?1581410367084&&callback=sinajp_15814103671094932140955446096
返回的数据显示:
City中,cureNum是治愈数,deathNum是死亡人数。value是确诊数
- data: {times: "截至2月11日16时31分", mtime: "2020-02-11 16:31:00", cachetime: "2020-02-11 16:39:18",…}
- times: "截至2月11日16时31分"
- mtime: "2020-02-11 16:31:00"
- cachetime: "2020-02-11 16:39:18"
- gntotal: "42708"
- deathtotal: "1017"
- sustotal: "21675"
- curetotal: "3998"
- list: [{name: "北京", ename: "beijing", value: "342", susNum: "0", deathNum: "3", cureNum: "48",…},…]
- 0: {name: "北京", ename: "beijing", value: "342", susNum: "0", deathNum: "3", cureNum: "48",…}
- 1: {name: "湖北", ename: "hubei", value: "31728", susNum: "0", deathNum: "974", cureNum: "2258",…}
- 2: {name: "广东", ename: "guangdong", value: "1177", susNum: "148", deathNum: "1", cureNum: "209",…}
- 3: {name: "浙江", ename: "zhejiang", value: "1117", susNum: "0", deathNum: "0", cureNum: "257",…}
- 4: {name: "河南", ename: "henan", value: "1105", susNum: "0", deathNum: "7", cureNum: "209",…}
|
二.数据处理:
用到以下几个模块
1.requests模块 (用于网页访问)
2.json模块,读取数据
三.数据处理代码:
说明:响应返回数据包含在一个js变量中,需要用split函数处理下,然后用python自带的json.loads方法转为dict。
代码:
import requests import json def get_Data(): url="https://interface.sina.cn/news/wap/fymap2020_data.d.json?1581410367084&&callback=sinajp_15814103671094932140955446096" #result =requests.get(url) #获得响应结果 #json_str = re.search("\(+([^)]*)\)+", result.text).group(1) result =requests.get(url).text #获得响应结果的text json_str = result.split('(')[1].split(')')[0] #print (json_str) json_data = json.loads(json_str) #将JSON 字符串解码为 Python 对象 return json_data def print_Data(): json_data = get_Data() stime=json_data['data']['times'] gntotal=json_data['data']['gntotal'] sustotal=json_data['data']['sustotal'] deathtotal=json_data['data']['deathtotal'] curetotal=json_data['data']['curetotal'] print("当前日期:%s,全国确诊:%s,疑似:%s,死亡:%s,治愈:%s"%(stime,gntotal,sustotal,deathtotal,curetotal)) for province in json_data['data']['list']: #遍历打印省份数据 #data.append((province['name'], province['value'])) pname=province['name'] pvalue=province['value'] pdeathNum=province['deathNum'] pcureNum=province['cureNum'] print ("【%s】,确诊:%s,死亡:%s,治愈:%s"%(pname,pvalue,pdeathNum,pcureNum)) for city in province['city']: #便利打印城市数据 cname=city['mapName'] conNum=city['conNum'] cdeathNum=province['deathNum'] ccureNum=province['cureNum'] print ("--%s %s,确诊:%s,死亡:%s,治愈:%s"%(pname,cname,conNum,cdeathNum,ccureNum)) if __name__ == "__main__": print_Data() |