小福利3,带你用Python里面的正则表达式爬取大数据

24 篇文章 3 订阅

大家好,我是天空之城,今天给大家带来小福利3,带你用Python里面的正则表达式爬取大数据,效率杠杠滴!

import requests,re
headers = {
        'Referer': 'http://www.voice.baidu.com/',
        'Origin':'http://www.voice.baidu.com/',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'
    }

url='https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_aladin_banner&city=%E7%BE%8E%E5%9B%BD-%E7%BE%8E%E5%9B%BD'
res=requests.get(url=url,headers=headers).text
result=re.findall('"city":"(.*?)","cityCode"',res)
# print(result)
for i in result:
    am=bytes(i,'utf-8')
    print(am.decode('unicode-escape'))

爬取数据截图如下:
在这里插入图片描述
进一步处理得到:

import requests,re
headers = {
        'Referer': 'http://www.voice.baidu.com/',
        'Origin':'http://www.voice.baidu.com/',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36',


    }

url='https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_aladin_banner&city=%E7%BE%8E%E5%9B%BD-%E7%BE%8E%E5%9B%BD'
res=requests.get(url=url,headers=headers).text
result=re.findall('"city":"(.*?)","cityCode"',res)
# print(result)
for i in result:

    am=bytes(i,'utf-8')

    # print(am.decode('unicode-escape'))
    amn=am.decode('unicode-escape')

    ams=amn.replace("crued","治愈").replace("confirmedRelative","确诊相关").replace("died","死亡").replace("confirmed","确诊").replace("asymptomaticRelative","无症状相关").replace("nativeRelative","本土相关").replace("curConfirm","确诊治愈").replace("asymptomatic","无症状").replace("crued","治愈")

    print(ams)

处理后数据截图得到:
在这里插入图片描述

下面捕捉一下其他国家的疫情数据:

import requests,re
headers = {
        'Referer': 'http://www.voice.baidu.com/',
        'Origin':'http://www.voice.baidu.com/',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36',


    }

url='https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_aladin_banner&city=%E7%BE%8E%E5%9B%BD-%E7%BE%8E%E5%9B%BD'
res=requests.get(url=url,headers=headers).text
result=re.findall('"city":"(.*?)","diedPercent"',res)
# print(result)
for i in result:

    am=bytes(i,'utf-8')

    # print(am.decode('unicode-escape'))
    amn=am.decode('unicode-escape')
    # print(amn)

    # ams=amn.replace("crued","治愈").replace("confirmedRelative","确诊相关").replace("died","死亡").replace("confirmed","确诊").replace("asymptomaticRelative","无症状相关").replace("nativeRelative","本土相关").replace("curConfirm","确诊治愈").replace("asymptomatic","无症状").replace("crued","治愈")

    ams = amn.replace("died", "死亡").replace("diedPercent", "死亡率").replace("crued", "治愈").replace("confirmedRelative", "确诊相关").replace("confirmed","确诊").replace("curedPercent", "治愈率").replace("curConfirm", "确诊治愈")


    print(ams)

获得国外疫情数据截图如下:
在这里插入图片描述

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值