国民生产总值饼状图
This report visualizes the data of the life expectancy of the countries across the World. Also, it tries to establish a relationship between life expectancy and GDP per capita of the countries.
该报告显示了世界各国预期寿命的数据。 此外,它试图建立预期寿命与国家人均GDP之间的关系。
A comparative study has been conducted on the top ten and bottom ten countries of 2017 with their life expectancy in 1987. As there is a difference of 30 years, this analysis helped to understand the increase or decrease in the expected years of life in the countries taken into account.
对2017年的前十名和后十名国家进行了比较研究,其预期寿命是1987年。由于相差30年,该分析有助于了解这些国家预期寿命的增加或减少考虑在内。
Moreover, it is very obvious that life expectancy is affected by factors such as happiness, pollution, terrorism, diseases, and many more but I tried to compare the life expectancy with the GDP per capita of each country. As a result, countries with better GDP per capita provided better life expectancy to their citizens in comparison to the countries with a lower GDP per capita.
此外,很明显,预期寿命受到幸福,污染,恐怖主义,疾病等因素的影响,但我试图将预期寿命与每个国家的人均GDP进行比较。 结果,与人均GDP较低的国家相比,人均GDP较高的国家为其公民提供了更好的预期寿命。
Data Sets:
数据集:
Three data sets and two links which have been used in this report are taken from assorted sources. I have downloaded the CSV format files from the source. Merging of all these data sets was performed by using Jupyter notebook from Anaconda. One of the important aspects of big data, i.e. Variety is present due to the diverse sources of data and their integration to analyze and find the required result.
本报告中使用的三个数据集和两个链接来自各种来源。 我已经从源代码下载了CSV格式文件。 通过使用Anaconda的Jupyter笔记本对所有这些数据集进行合并。 大数据的重要方面之一,即多样性,是由于数据源的多样性及其集成以分析和找到所需结果而出现的。
1. Our World in Data: life expectancy
1.我们的数据世界:预期寿命
Our World in Data is an online publication which helps us in understanding the changes in the living conditions across the World. This data set of life expectancy contains data from 1950 to 2017. There are 19206 rows and 4 columns (Country name, country code, Year and Life expectancy) in this data set.
《数据世界》是在线出版物,可帮助我们了解世界各地生活条件的变化。 该预期寿命数据集包含1950年至2017年的数据。此数据集中有19206行和4列(国家/地区名称,国家/地区代码,年份和预期寿命)。
Source: https://ourworldindata.org/life-expectancy
资料来源: https : //ourworldindata.org/life-expectancy
2. World Bank Data: GDP (per capita) by Country and County-wise Population
2.世界银行数据:按国家和县人口的国内生产总值(人均)
The World Bank is headquartered in United States and shares its knowledge for analysis by assorted agencies and individuals. I have collected data of GDP per capita from the following
世界银行总部位于美国,并共享其知识,以供各种机构和个人进行分析。 我从以下收集了人均GDP数据
source: https://data.worldbank.org/indicator/NY.GDP.PCAP.PP.KD and the Population from the following source: https://data.worldbank.org/indicator/SP.POP.TOTL . There are 265 rows and 64 columns in each of the data sets of GDP per capita and Population, respectively.
来源: https : //data.worldbank.org/indicator/NY.GDP.PCAP.PP.KD和来自以下来源的人口: https : //data.worldbank.org/indicator/SP.POP.TOTL 。 每个人均GDP和人口数据集分别有265行和64列。
3. Additional GDPs
3.额外的国内生产总值
Due to the missing values of GDPs of Andorra and Monaco, in spite of populating them with the mean/median/mode of GDP or removing them, I have collected their exact data from following sources: https://www.macrotrends.net/countries/MCO/monaco/gdp-per-capita and https://www.macrotrends.net/countries/AND/andorra/gdp-per-capita
由于缺少安道尔和摩纳哥的GDP值,尽管将其用GDP的均值/中位数/众数填充或除去了它们,但我还是从以下来源收集了它们的确切数据: https : //www.macrotrends.net/国家/ MCO / monaco / gdp-per-capita和https://www.macrotrends.net/countries/AND/andorra/gdp-per-capita
Data processing, cleaning and integration:
数据处理,清理和集成:
First the data sets were downloaded and saved in CSV format. Then GDPs of Andorra and Monaco were updated manually in the CSV. Due to the variety in the sources of my data sets, the name of the countries were slightly different, like, South Korea was mentioned as Korea, Rep., in few places. So, such discrepancies were removed. After this, the CSVs were loaded in Jupyter notebook using read_csv function of pandas library in Python. Three data frames of life expectancy, population and gdp were created for the year 2017. And later, more data frames were created by removing unnecessary columns from the data frames of population and gdp. Later, they were merged with an inner join using the pandas merge function where the column containing the country name (entity) was used as the primary key. Column names were updated using rename function and row containing ‘World’ data was dropped using drop function. Data frame was sorted using sort_values function and top ten and bottom ten countries were selected by using head function and changing the ‘ascending’ parameter of sort_values. Further, similar procedure was followed to get a data frame for the year 1987. Country name (or entity), life expectancy, population and GDP were the columns of my focus.
首先,下载数据集并以CSV格式保存。 然后,在CSV中手动更新了安道尔和摩纳哥的GDP。 由于我的数据集来源各异,因此国家/地区的名称略有不同,例如韩国在很少地方被称为韩国共和国。 因此,消除了这种差异。 之后,使用Python中pandas库的read_csv函数将CSV加载到Jupyter笔记本中。 为2017年创建了三个预期寿命,人口和gdp数据框。后来,通过从人口和gdp数据框中删除不必要的列,创建了更多数据框。 后来,使用pandas合并功能将它们与内部联接合并,其中将包含国家/地区名称(实体)的列用作主键。 使用重命名功能更新列名,并使用放置功能删除包含“世界”数据的行。 使用sort_values函数对数据框进行排序,并通过使用head函数并更改sort_values的“升序”参数来选择前十名和后十名国家。 此外,遵循类似的步骤来获得1987年的数据框架。国家名称(或实体),预期寿命,人口和GDP是我关注的列。
Visualization:
可视化:
First and Second Plots: As mentioned in the abstract, the first grouped bar chart compares the life expectancy of top ten countries of 2017 to their life time expectancy in 1987. The second grouped bar chart compares the life expectancy of bottom ten countries of 2017 to their life time expectancy in 1987. Bar function from matplotlib’s pyplot library was used to plot these charts. Grid function was used to show the exact values in best possible way. Color-codes used to plot these charts are: ‘c, ‘lightblue’, ‘darkorange’ and ‘moccasin’ by referring to the named colors link of matplotlib. To choose the colors, a blog from goodly was referred for using the bright color for the main bars and vice versa. As a convention, y-axis of this chart starts from 0. Title of the chart has the font size of 15 while the x and y labels have the font size of 13.5. It’s apparent from chart that in 1987, top 10 countries of 2017 had life expectancies in 70s with San Marino performing best among all, followed by Japan. Where as in 2017, all of them entered in 80s with Monaco performing best followed by San Marino. Also, if we talk about the bottom ten countries of 2017, in 1987, most of them had their life expectancies in 40s except Nigeria having in 50s and Chad touching 60s. And, without much surprise, most of them exceeded in 2017 except Chad which is surprising because it was the best performer of the group in 1987.
第一和第二个图表:如摘要所述,第一个分组的条形图将2017年排名前十的国家的预期寿命与1987年的预期寿命进行比较。第二个分组的条形图将2017年排名前十的国家的预期寿命与第二个组的图表进行比较。的预期寿命是1987年。使用matplotlib的pyplot库中的Bar函数绘制这些图表。 网格函数用于以最佳方式显示精确值。 通过引用matplotlib的命名颜色链接,用于绘制这些图表的颜色代码为:'c,'lightblue','darkorange'和'moccasin'。 要选择颜色,最好引用博客,因为主要条使用鲜艳的颜色,反之亦然。 按照惯例,此图表的y轴从0开始。图表的标题的字体大小为15,而x和y标签的字体大小为13.5。 从图表中可以明显看出,1987年,2017年排名前10位的国家的预期寿命为70年代,其中圣马力诺的表现最佳,其次是日本。 与2017年一样,他们都进入了80年代,摩纳哥表现最好,其次是圣马力诺。 另外,如果我们谈论的是2017年排名前十的国家/地区,那就是1987年,除了尼日利亚的50岁和乍得的60岁,大多数人的预期寿命都是40岁。 而且,毫不奇怪,除了乍得,其中大多数人在2017年都超过了,这令人惊讶,因为乍得是1987年该团体中表现最好的。
Third Plot: The third plot is a bubble chart which is plotted using the scatter function of pyplot library. This plot shows the relationship between the life expectancy of the countries with their respective GDP per capita. Variables used to plot this chart are as follows: GDP per capita (on x-axis), life expectancy (on y-axis), bubbles (representing the countries) and the population of countries (as size of bubbles). Title of the chart has the font size of 15 whereas the x and y labels have the font size of 13.5. Due to the large number of countries, colors were randomly assigned to each country by using rand function from numpy. The Population data was converted into millions and the GDP (per capita) was available in US dollars. Marker ‘o’ was used in scatted function to get the shape of bubbles. Grid function was used to show the exact values in best possible way. Albeit, we can’t deny the role of other factors along with the GDP, but, with the available data, countries with the higher GDP performed better in terms of life expectancy.
第三图:第三图是气泡图,使用pyplot库的散点函数绘制。 该图显示了这些国家的预期寿命与其各自的人均GDP之间的关系。 用于绘制此图表的变量如下:人均GDP(在x轴上),预期寿命(在y轴上),泡沫(代表国家)和国家人口(以泡沫的大小表示)。 图表标题的字体大小为15,而x和y标签的字体大小为13.5。 由于国家/地区众多,因此使用numpy中的rand函数将颜色随机分配给每个国家/地区。 人口数据转换为百万,而GDP(人均)可用美元获得。 分散功能中使用了标记“ o”来获得气泡的形状。 网格函数用于以最佳方式显示精确值。 尽管我们不能否认其他因素与GDP的关系,但是根据现有数据,GDP较高的国家在预期寿命方面表现更好。
Conclusion:
结论:
To the best of my understanding, first and second grouped bar charts provides a thorough insight about the increment in the life time expectancy of the countries over the span of 30 years with Chad as only exception. Third chart successfully depicts a positive relationship between the GDP and life expectancy of each country, of course, with few outliers.
据我所知,第一组和第二组条形图提供了有关这些国家在30年内预期寿命增加的透彻见解,但乍得只是一个例外。 第三个图表成功地描绘了每个国家的GDP与预期寿命之间的正向关系,当然,离群值很少。
To get my ipython notebook, please follow this link of Kaggle!
要获取我的ipython笔记本,请点击Kaggle的此链接!
I have shared my own experience in this article. Please share your thoughts if you find anything incorrect here.
我在本文中分享了自己的经验。 如果您发现任何不正确的地方,请分享您的想法。
Twitter: @MrTomarOfficial
推特: @MrTomarOfficial
LinkedIn: https://ie.linkedin.com/in/raghvendra-pratap-singh-tomar
领英(LinkedIn): https : //ie.linkedin.com/in/raghvendra-pratap-singh-tomar
翻译自: https://medium.com/swlh/life-expectancy-and-gdp-6aa13f87fb23
国民生产总值饼状图