写一个统计地震发生时间和深度的爬虫

最新推荐文章于 2023-03-11 16:17:33 发布

weixin_48701570

最新推荐文章于 2023-03-11 16:17:33 发布

阅读量333

点赞数 2

文章标签： python 正则表达式 html 数据挖掘

本文链接：https://blog.csdn.net/weixin_48701570/article/details/106820811

版权

想起了上次的地震，就写了一段爬虫从中国地震台网爬取数据
废话不多说，直接上代码

import re
import numpy as np
import pandas as pd
import urllib.request as urq
import urllib.error
from matplotlib import pyplot as plt
def get_data(url):
    headers = ('User-Agent',' Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18363file = ur.urlopen(url)')
    opener = urq.build_opener()
    opener.addheaders = [headers]
    urq.install_opener(opener)
    #各个原子
    data = urq.urlopen(url).read().decode('utf-8')
    Mpat='<td align="center" style="padding-left: 20px">(.*?)</td>'
    UTCpat='<td align="center" style="width: 155px;">2020-0\d-\d\d\s(\d\d):.+</td>'
    #纬度、经度和深度在html中的正则表达式是相同且连续的，所以就只正则一个好了
    weidupat='<td align="center">(.*?)</td>'
    weizhipat='<a href="http://news.ceic.ac.cn/.+.html">(.*?)</a>'
    Mlist=re.compile(Mpat).findall(data)
    UTClist=re.compile(UTCpat).findall(data)
    UTClist.sort()
    weidulist=re.compile(weidupat).findall(data)
    weizhilist=re.compile(weizhipat).findall(data)
    weizhiname = np.array(weizhilist)
    weizhiname = np.delete(weizhiname,[100,101,102,103,104,105,106],axis=0)
    i=round(len(weidulist)/3)
    c = np.array(weidulist).reshape(i,3)
    #开始画图了，先做个表格方便拓展，可以用xlrd记录到excel中，也可以把wjsdata打印出来
    wjsdata = pd.DataFrame(c,index=weizhiname,columns=['纬度','经度','深度'])
    #这个地方我想对深度进行排序但是不知道为何总是不成功，不知道有没有大佬帮我看看
    wjsdata.sort_values(by = ['深度'],inplace = True,ascending = True)
    print(wjsdata)
    plt.subplot2grid((2,1),(0,0))
    shengdupicture = plt.hist(wjsdata['深度'],bins=15,color='red')
    plt.xlabel('happen depth of earthquake')
    plt.subplot2grid((2,1),(1,0)) 
    shijianpicture = plt.hist(UTClist,bins=24,color='blue')
    plt.xlabel('happen time of earthquake')
    plt.show()
    
    return 666666
url='http://news.ceic.ac.cn/index.html?time=1592389127'#中国地震台网的url
a = get_data(url)

代码完了，下面是运行结果

好像地震发生时间和地震没有什么关系。。。
图忘记拉开了~上面是深度，下面是时间，没有用中文因为懒得找字体文件
上次的地震还是记忆犹新阿~虽然只是小小的震了但是毕竟住的高，还是有挺明显的震感的

weixin_48701570

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
3
评论
写一个统计地震发生时间和深度的爬虫

想起了上次的地震，就写了一段爬虫从中国地震台网爬取数据废话不多说，直接上代码import reimport numpy as npimport pandas as pdimport urllib.request as urqimport urllib.errorfrom matplotlib import pyplot as pltdef get_data(url): headers = ('User-Agent',' Mozilla/5.0 (Windows NT 10.0; Wi
复制链接

扫一扫