python抓取网页关键字_网页抓取关键经济指标

python抓取网页关键字

There is a plethora of financial data available nowadays and seemingly even more places to source that data from. There are countless different methods to go about gathering data, many of which require third party API’s which must be installed to your system in order to make the necessary API calls. In this quick notebook walkthrough, we will demonstrate how to perform a simple JSON web scrape to fetch the data and then organize it into a pandas DataFrame. We will then use the Python library Plotly to visualize the indicators.

如今,有大量的财务数据可用,而且似乎还有更多的地方可以从中获取数据。 收集数据的方法有无数种,其中许多方法需要第三方API,这些第三方API必须安装到您的系统上才能进行必要的API调用。 在这个快速的笔记本演练中,我们将演示如何执行简单的JSON Web抓取以获取数据,然后将其组织到pandas DataFrame中。 然后,我们将使用Python库Plotly可视化指标。

导入库 (Importing Libraries)

import pandas as pd
import requests
import json
import plotly.graph_objects as go

Next, we must write our web scrape function. We will be using the third party API from DB-nomics (db.nomics.world). The API will return our data in an HTML format via a URL which means we first need to convert it to a JSON format. We use the requests library to do this conversion. At this point, the JSON file is organized as a data dictionary, which means we need to index into the dictionary to grab the actual data. We index this information and organize it into three variables, one for the time-series index (periods), one for our actual data values (values), and one for the over-arching dataset which will be used to wrap our final DataFrame. We then return the DataFrame “indicators”.

接下来,我们必须编写我们的网络抓取功能。 我们将使用DB-nomics(db.nomics.world)中的第三方API。 API将通过URL以HTML格式返回我们的数据,这意味着我们首先需要将其转换为JSON格式。 我们使用请求库进行此转换。 此时,JSON文件被组织为数据字典,这意味着我们需要索引到字典中以获取实际数据。 我们将此信息编入索引,并将其组织成三个变量,一个用于时间序列索引(句点),一个用于我们的实际数据值(值),一个用于将用于包装最终DataFrame的总体数据集。 然后,我们返回DataFrame的“指标”。

定义网页抓取功能 (Define Web Scrape Function)

def scrapeindicator(url):
r = requests.get(url)
r_json = r.json()
periods = r_json['series']['docs'][0]['period']
values = r_json['series']['docs'][0]['value']
dataset = r_json['series']['docs'][0]['dataset_name'] indicators = pd.DataFrame(values, index = period)
indicators.columns = [dataset] return indicators

利用指示器刮擦功能 (Utilize Indicator Scrape Function)

Now that we have the function defined, we can utilize the function to begin scraping our data. Utilizing the DB-nomics API we can search for any global indicator we’d like. Below, we scrape six different indicators for Europe: European 10 year yields, unemployment rates, interest rates, inflation rates, annual GDP growth rates, and monthly changes in retail growth rates.

现在已经定义了函数,我们可以利用该函数开始抓取数据。 利用DB-nomics API,我们可以搜索任何我们想要的全局指标。 下面,我们为欧洲收集了六个不同的指标:欧洲十年期国债收益率,失业率,利率,通货膨胀率,年度GDP增长率以及零售增长率的月度变化。

euro_yields_10y = scrapeindicator('https://api.db.nomics.world/v22/series/Eurostat/irt_euryld_m/M.EA.INS_FWD.CGB_EA.Y10?observations=1')unemployment = scrapeindicator('https://api.db.nomics.world/v22/series/Eurostat/une_rt_m/M.NSA.TOTAL.PC_ACT.T.EA19?observations=1')interest = scrapeindicator('https://api.db.nomics.world/v22/series/Eurostat/ei_mfir_m/M.NSA.NAP.MF-LTGBY-RT.EU28?observations=1')inflation = scrapeindicator('https://api.db.nomics.world/v22/series/WB/WDI/FP.CPI.TOTL.ZG-EU?observations=1')GDPgrowth = scrapeindicator('https://api.db.nomics.world/v22/series/WB/WDI/NY.GDP.MKTP.KD.ZG-EU?observations=1')monthly_change_retail_trade = scrapeindicator('https://api.db.nomics.world/v22/series/Eurostat/sts_trtu_m/M.TOVT.G47.CA.PCH_SM.EA19?observations=1')

Our resulting Monthly Retail Growth % DataFrame:

我们得出的每月零售增长百分比数据框架:

We now have all of our data scraped, so it’s time to use Plotly for visualizations:

现在,我们已经抓取了所有数据,因此是时候使用Plotly进行可视化了:

# Instantiate a Plotly graph fig = go.Figure()# Add Interest Rates (EU) Trace
fig.add_trace(go.Scatter(x = interest.index,
y = interest['Interest rates - monthly data'],
name = 'Interest',
line_color = 'deepskyblue',
opacity = 0.8))# Add Unemployment Rates Index
fig.add_trace(go.Scatter(x = unemployment.index,
y = unemployment['Unemployment by sex and age – monthly data'],
name = 'Unemployment',
line_color = 'red',
opacity = 0.8))# Add European Yields (10Y)
fig.add_trace(go.Scatter(x = euro_yields_10y.index,
y = euro_yields_10y['Euro yield curves - monthly data'],
name = 'Euro Yields - 10Y',
line_color = 'green',
opacity = 0.8))# Add Inflation
fig.add_trace(go.Scatter(x = inflation.index,
y = inflation['World Development Indicators'],
name = 'Inflation',
line_color = 'purple',
opacity = 0.8))# Add GDP Growth
fig.add_trace(go.Scatter(x = GDPgrowth.index,
y = GDPgrowth['World Development Indicators'],
name = 'GDP Growth',
line_color = 'pink',
opacity = 0.8))# Add Monthly Retail Change in Volume
fig.add_trace(go.Scatter(x = monthly_change_retail_trade.index,
y = monthly_change_retail_trade['Turnover and volume of sales in wholesale and retail trade - monthly data'],
name = '% Monthly Change Volume Sales',
line_color = 'black',
opacity = 0.8))# Edit Attributes of plot
fig.update_layout(xaxis_range = ['2003-07-01', '2020-12-31'],
title_text = "Interest Rates, Unemployment, 10y yields, inflation UE, volume sales", xaxis_rangeslider_visible = True)fig.show()
Image for post

We are left with our resulting plotly visualization which is very dynamic and shows our progression. Stay tuned to the next post where we will automate the function and tie it into the DB-nomics third party API on a live connection.

剩下的是结果绘制的可视化图表,该图表非常动态并显示了我们的进度。 请继续关注下一篇文章,我们将在该文章中自动化该功能并将其与实时连接上的DB-nomics第三方API绑定。

翻译自: https://medium.com/@andrewcole.817/web-scraping-key-economic-indicators-3260c6bf4d60

python抓取网页关键字

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值