COVID-19和世界幸福报告数据告诉我们什么?

For many people, the idea of ​​staying home actually sounded good at first. This process was really efficient for Netflix and Amazon. But then sad truths awaited us. What was boring was the number of dead and intubated patients one after the other. We all know the aftermath well.

对于很多人来说,居家的想法一开始听起来确实不错。 对于Netflix和Amazon,此过程确实非常有效。 但是可悲的事实等待着我们。 无聊的是死者和插管者的数量接连不断增加。 我们都知道后果。

In this article, we will try to examine the covid-19 virus, which can affect all countries in the world, and the relationship between it and the countries explained in the happiness report.

在本文中,我们将尝试检查可能影响世界所有国家的covid-19病毒,以及它与幸福报告中解释的国家之间的关系。

Before we start, let’s get to know our datasets:

在开始之前,让我们了解我们的数据集:

  • ‘covid19_Confirmed_dataset.csv’ (Data include 96 days from the first case)

    'covid19_Confirmed_dataset.csv'(数据包括自第一种情况起的96天)
  • ‘worldwide_happiness_report.csv’

    'worldwide_happiness_report.csv'

And of course the libraries we will use:

当然,我们将使用的库:

import pandas as pd 
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

First of all, we will need a small clearing of our data. We will follow the following procedures to obtain our data frame, where ‘Lat’ and ‘Long’ are removed from the columns:

首先,我们将需要少量清除数据。 我们将按照以下过程获取数据框,其中从列中删除了“ Lat”和“ Long”:

corona_dataset_csv.drop(["Lat", "Long"],axis=1,inplace=True)

And only the country names and the number of cases are shown day by day:

每天仅显示国家名称和案件数:

corona_dataset_aggregated = corona_dataset_csv.groupby("Country/Region").sum()
Image for post
Our first aggregated data frame will look like this.
我们的第一个聚合数据帧将如下所示。

For 3 countries to be shown in the same graphic:

在同一图形中显示3个国家/地区:

corona_dataset_aggregated.loc["China"].plot()
corona_dataset_aggregated.loc["Italy"].plot()
corona_dataset_aggregated.loc["Spain"].plot()plt.legend()
Image for post

We will use the derivative function in order to better observe which periods stand out and the prominent trend points in infected numbers. Of course for this: diff ()

我们将使用导数函数,以便更好地观察哪些时期脱颖而出以及感染人数中的显着趋势点。 当然要这样做:diff()

corona_dataset_aggregated.loc["China"].diff().plot()
Image for post
Max notation for China with diff ()
diff()在中国的最大符号

We add ‘max_infection_rate’ as a new column and refresh our data frame.

我们将“ max_infection_rate”添加为新列,并刷新我们的数据框。

countries = list(corona_dataset_aggregated.index)
max_infection_rates = []
for c in countries : max_infection_rates.append(corona_dataset_aggregated.loc[c].diff().max())max_infection_rates

Meanwhile, we are starting to process our data from the happiness report.To import:

同时,我们开始处理幸福报告中的数据。要导入:

happiness_report_csv = pd.read_csv("worldwide_happiness_report.csv")
Image for post

We extract “Overall rank”, “Score”, “Generosity”, “Perceptions of corruption” from among the columns.

我们从各列中提取“总体排名”,“得分”,“慷慨”,“腐败感”。

useless_cols = ["Overall rank","Score","Generosity","Perceptions of corruption"]

Now we are including “max_infection_rate” in this framework and we are making countries fits for themselves.

现在,我们将“ max_infection_rate”包含在此框架中,并且正在使国家适合自己。

data = corona_data.join(happiness_report_csv,how="inner")
data.head()
Image for post

We will use the corr () function for the correlation matrix:

我们将对相关矩阵使用corr()函数:

data.corr()
Image for post

As you can see this matrix consists of correlation coefficcients off every two columns in our data set.

如您所见,此矩阵由数据集中每两列的相关系数组成。

We have ‘max infection rate’ and ‘GDP per capita’ and this data frame is the correlation coefficient between these two variables. As this value gets higher, it means the correlation between these two variables is also higher.

我们有“最大感染率”和“人均GDP”,并且此数据框是这两个变量之间的相关系数。 随着该值变高,这意味着这两个变量之间的相关性也变高。

If you look at other of life factors, for example social support has the life expectancy and freedom to make life choices is also we can see that we have positive correlations betwen all off other life factors.

如果您查看其他生活因素,例如社会支持具有预期寿命和自由选择生活的机会,那么我们也可以看到我们与其他生活因素之间存在正相关关系。

But our work is not done yet. We know that our Analysis is not finished unless we visualize the results in terms figures and graphs so that everyone can understand what you get out of our analysis.

但是我们的工作还没有完成。 我们知道,除非我们用术语图和图形将结果可视化,以便每个人都可以理解您从分析中得到的结果,否则分析尚未完成。

We found out that there are positive correlation between the max inf rate and all off the life factors that we have in our data set.

我们发现,最大INF比率与数据集中所有寿命因素之间存在正相关。

In this task, i am going to use seaborn module, which is a very handed tool for regionalisation. What we want to do is to plot every each of these columns.

在此任务中,我将使用seaborn模块,这是用于区域化的非常有用的工具。 我们要做的是绘制每个这些列。

x = data["GDP per capita"]
y = data["max_infection_rate"]
sns.scatterplot(x,y)
Image for post

However, it is not possible to examine the graph in detail. So this difference between in X axis and Y access has caused the problem that we cannot enough details in our data. So for so to solve this problem, what we can do is to use log scaling:

但是,无法详细检查图表。 因此,X轴访问和Y轴访问之间的差异导致了问题,即我们的数据中没有足够的细节。 因此,要解决此问题,我们可以做的是使用日志缩放:

x = data["GDP per capita"]
y = data["max_infection_rate"]
sns.scatterplot(x, np.log(y))
Image for post

This is completely shows us as it goes. So this slope, as you can see there is increase. There is a correlation positive.

这完全向我们展示了一切。 如您所见,这个斜率在增加。 存在正相关。

sns.regplot(x, np.log(y))
Image for post

Very clearly there is a positive slope between these two variables (“max inf rate” & “GDP per capita”)

很明显,这两个变量之间存在正斜率(“最大通胀率”和“人均GDP”)

所以 (Consequently)

We have found very interestingresult in this analysis. This result shows that people who are living in developed countries are more prone to getting the infection off Covid-19 with compare off with compared to less developed countries.Can be said that this result is because off lack of corona test kits in less developed countries, in order to prove that this is not the case.

我们在这项分析中发现了非常有趣的结果。 该结果表明,与欠发达国家相比,生活在发达国家的人更容易感染Covid-19,这可以说是因为欠发达国家缺少电晕测试仪,以证明事实并非如此。

Even so i recommend to do the similar analysis on the data said related to cumulative number of the deaths.

即便如此,我还是建议对与死亡总数相关的数据进行类似分析。

See here for more: https://github.com/fk-pixel/Coursera-Project-Network/blob/master/Covid19_DataAnalysis%20.ipynb

有关更多信息,请参见此处: https : //github.com/fk-pixel/Coursera-Project-Network/blob/master/Covid19_DataAnalysis%20.ipynb

翻译自: https://medium.com/think-make/what-does-covid-19-and-world-happiness-report-data-tell-us-c76bdd44b7ac

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值