191029_中国地震局爬取

最新推荐文章于 2023-11-26 10:12:59 发布

Happy丶lazy

最新推荐文章于 2023-11-26 10:12:59 发布

阅读量620

点赞数

分类专栏：接单文章标签：接单

本文链接：https://blog.csdn.net/qq_39309652/article/details/102874283

版权

接单专栏收录该内容

35 篇文章 4 订阅

订阅专栏

由于是国家网，有好多限制，首先要加User-Agent，之后用正则表达式经行爬取，好久没有用正则了，还是让老师帮忙的

import requests
import pandas as pd 
from lxml import etree
import re
import json
num_mag=[]
orig_time=[]
latitudes=[]
longitudes=[]
depth=[]
epicenter=[]
for i in range(47,49):
    start_url='https://www.cea.gov.cn/eportal/ui?pageId=366509&currentPage={}'.format(i)
    print('保存第{0}页'.format(i))
    header = {
        'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'
    }
    response=requests.get(start_url,headers=header).content.decode()
    patten=re.compile('\[.*?\]',re.S)
    pic_list=re.findall(patten,response)
    diss_dict=json.loads(pic_list[0])
    for i in diss_dict:
        num_mag.append(i['num_mag'])#震级 
        orig_time.append(i['orig_time'])#发震时刻
        latitudes.append(i['latitudes'])#纬度
        longitudes.append(i['longitudes'])#经度
        try:
            if i['depth']:
                depth.append(i['depth'])
        except Exception as e:
            print(e)
        epicenter.append(i['epicenter'])#参考位置
data={
    '震级':num_mag,
    '发震时刻':orig_time,
    '纬度':latitudes,
    '深度':depth,
    '位置':epicenter,
}
df=pd.DataFrame(data)

代码还是有问题的，这样最后无法全部保存，还在修改中，如果有大佬看见这片文章，希望帮助下

Happy丶lazy

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
191029_中国地震局爬取

由于是国家网，有好多限制，首先要加User-Agent，之后用正则表达式经行爬取，好久没有用正则了，还是让老师帮忙的import requestsimport pandas as pd from lxml import etreeimport reimport jsonnum_mag=[]orig_time=[]latitudes=[]longitudes=[]depth=...
复制链接

扫一扫