json和api与python

介绍(Introduction)

In a previous tutorial, we discussed how to web scrape with python. The goal of web scraping was to access data from a website or webpage. Well, sometimes a website can make it easier for a user to have direct access to their data with the use of an API (Application Programming Interface). This basically means that the company has made a set of dedicated URLs that provide this data in a pure form (meaning without any presentation formatting). This pure data is often in a JSON (JavaScript Object Notation) format, which we can then parse through and extract what we need using python.

在先前的教程中,我们讨论了如何使用python进行网络抓取。 网络抓取的目的是从网站或网页访问数据。 好吧,有时网站可以使用户更容易使用API​​(应用程序编程接口)直接访问其数据。 从根本上讲,这意味着该公司已创建了一组专用URL,这些URL以纯格式(即没有任何表示格式)提供此数据。 这些纯数据通常采用JSON(JavaScript对象表示法)格式,然后我们可以使用python解析并提取所需内容。

For this tutorial, we will use the free API found at covid19api.com that provides data on the coronavirus. We will find the total number of confirmed cases in each country and then we will create a pandas dataframe that contains that information. So let’s begin!

对于本教程,我们将使用covid19api.com上提供的免费API,该API提供有关冠状病毒的数据。 我们将找到每个国家的确诊病例总数,然后创建一个包含该信息的熊猫数据框。 让我们开始吧!

检查API (Inspecting the API)

If you go to the documentation page of the API, this is what you’ll see:

如果您转到API的文档页面,则会看到以下内容:

Image for post

This shows us the different URLs in the API, the information they provide, and example requests/responses of those URLs on the right.

这向我们展示了API中的不同URL,它们提供的信息以及右侧这些URL的示例请求/响应。

We can see that the information we are seeking is in the summary page. We can click on view more on the right so we can see what the response would be from that URL:

我们可以看到我们正在寻找的信息在摘要页面中。 我们可以单击右侧的更多视图,以便可以看到该URL的响应:

Image for post

This is a JSON object! As you can see, it is very similar to a python dictionary and is made up of key-value pairs. In fact, in order for us to parse through this and extract what we want from it, we will eventually turn it into a python dictionary object. Upon inspection, we can see that it looks like a nested dictionary. The outer dictionary has the keys ‘Global’ (with a value of a dictionary) and ‘Countries’ (with a value of a list that is made up of dictionaries, with each dictionary corresponding to a specific country).

这是一个JSON对象! 如您所见,它与python字典非常相似,并且由键值对组成。 实际上,为了使我们能够解析并从中提取想要的内容,我们最终会将其变成python字典对象。 经过检查,我们可以看到它看起来像一个嵌套的字典。 外部词典具有键“ Global”(具有词典的值)和“ Countries”(具有由词典组成的列表的值,每个词典对应于一个特定的国家/地区)。

通过API发出HTTP请求 (Making an HTTP Request from the API)

So let’s open a jupyter notebook and request the information from that URL. We will use the requests library to make an HTTP request from that URL and save the response object’s text under the variable response:

因此,让我们打开一个jupyter笔记本,并从该URL请求信息。 我们将使用请求库从该U RL发出HTTP请求,并将响应对象的文本保存在变量response下:

response = requests.get(‘https://api.covid19api.com/summary’).text
Image for post

This shows what the response is to our HTTP request from the API. As you can see, it is a long python string that is in JSON format.

这显示了对来自API的HTTP请求的响应。 如您所见,这是一个JSON格式的长python字符串。

创建一个Python字典 (Creating a Python Dictionary)

Since the response is in JSON format, we can load this string into python and convert it into a python dictionary. We first need to import the json library, and then we can use the loads method from the json library and pass it our string:

由于响应为JSON格式,因此我们可以将此字符串加载到python中,然后将其转换为python字典。 我们首先需要导入json库,然后可以使用json库中的loads方法并将其传递给我们的字符串:

response_info = json.loads(response)
Image for post

Note how the type of our response_info variable is now a python dictionary!

注意我们的response_info变量的类型现在是一个python字典!

Now that our response is in the form of a python dictionary, we can use what we know about python dictionaries to parse it and extract the information we need!

现在我们的响应是python字典的形式,我们可以使用我们对python字典的了解来解析它并提取所需的信息!

Also note: The requests library has a built-in JSON decoder that we could have used instead of the json module that would have converted our JSON object to a python dictionary. However, I used the above method to introduce the json module in this tutorial. Here’s what the code would have looked like if we instead used the JSON decoder within the requests module:

还要注意: requests库具有一个内置的JSON解码器,我们可以使用它来代替将JSON对象转换为python字典的json模块。 但是,我使用以上方法在本教程中介绍了json模块。 如果我们改为在requests模块中使用JSON解码器,则代码如下所示:

requests.get(‘https://api.covid19api.com/summary’).json()

解析字典 (Parsing the Dictionary)

As previously mentioned, we would like to make a pandas dataframe that has two columns: countries, and the number of total confirmed cases for that country. We can do so by looping through the values of the ‘Countries’ key of our outer dictionary:

如前所述,我们希望创建一个熊猫数据框,其中有两列:国家和该国家的确诊病例总数。 我们可以通过遍历外部字典的“ Countries”键的值来实现:

Image for post

As you can see, the value of our ‘Countries’ key is just a list of dictionaries, with each dictionary containing key-value pairs corresponding to a specific country. So we need to loop through this list of dictionaries, extracting the values of the ‘Country’ and ‘TotalConfirmed’ keys from each dictionary and then appending them to a new list as follows:

如您所见,“国家/地区”键的值只是字典的列表,每个字典都包含对应于特定国家/地区的键/值对。 因此,我们需要遍历字典列表,从每个字典中提取“ Country”和“ TotalConfirmed”键的值,然后将它们追加到新列表中,如下所示:

country_list = []for country_info in response_info[‘Countries’]:
country_list.append([country_info[‘Country’], country_info[‘TotalConfirmed’]])

This will loop through the list of dictionaries, extracting the values from the ‘Country’ and ‘TotalConfirmed’ keys from each dictionary into a list, and then adding this resulting list to our country_list. We will end up with a list of lists, with each list or element in the outer list containing the country name and the total confirmed cases for that specific country.

这将遍历字典列表,从每个字典的“ Country”和“ TotalConfirmed”键中提取值到一个列表中,然后将此结果列表添加到我们的country_list中。 我们将得到一个列表列表,外部列表中的每个列表或元素都包含国家名称和该特定国家/地区已确认的总病例数。

Image for post

创建一个熊猫DataFrame (Creating a Pandas DataFrame)

We will now create a pandas dataframe using this country_list and the pandas DataFrame constructor:

现在,我们将使用此country_list和pandas DataFrame构造函数创建一个pandas数据框:

country_df = pd.DataFrame(data=country_list, columns=[‘Country’, ‘Total_Confirmed’])
Image for post

Success! We now have a dataframe that contains two columns: Country and Total_Confirmed!

成功! 现在,我们有了一个包含两列的数据框:“国家”和“总计确认”!

结论 (Conclusion)

In this tutorial, we had a brief introduction to what APIs and JSON are. We then made an HTTP request to a Coronavirus COVID19 API to get information on the number of total confirmed coronavirus cases in each country. We then converted this JSON response to our request into a python dictionary. We then parsed through this dictionary, extracting the information we were seeking, and then created a pandas dataframe containing this information.

在本教程中,我们简要介绍了什么是API和JSON。 然后,我们向冠状病毒COVID19 API发出了HTTP请求,以获取有关每个国家/地区已确诊的冠状病毒病例总数的信息。 然后,我们将此对请求的JSON响应转换为python字典。 然后,我们通过该词典进行解析,提取出我们正在寻找的信息,然后创建了一个包含此信息的熊猫数据框。

翻译自: https://towardsdatascience.com/json-and-apis-with-python-fba329ef6ef0

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值