matplotlib可视化_EDA:Geopandas,Matplotlib和Bokeh中的可视化

matplotlib可视化

Nowadays, everyone is immersed with plenty of data from news sources, cellphones, laptops, workplaces, and so on. Data conveys with tons of information from different data variables like date, string, numeric, and geographical format. How to effectively grasp the core value from a huge dataset that is easily interpreted by users? The answer would be the Exploratory Data Analysis (EDA). EDA comes as a tool to visualize and analyze data to extract insights from the dataset. Viewers are able to have a better understanding of the dataset from the important characteristics summarized through the process of EDA.

如今,每个人都沉浸在来自新闻来源,手机,笔记本电脑,工作场所等的大量数据中。 数据传递着来自不同数据变量(如日期,字符串,数字和地理格式)的大量信息。 如何从用户易于解释的庞大数据集中有效地把握核心价值? 答案将是探索性数据分析(EDA)。 EDA是一种可视化和分析数据以从数据集中提取见解的工具。 通过EDA流程总结的重要特征,观众可以更好地理解数据集。

In this article, you will learn:

在本文中,您将学习:

(1) Dynamic geographical plot with Geopandas and Bokeh

(1)带有Geopandas和Bokeh的动态地理图

(2) Analytics on worldwide dataset from 2016 to 2019

(2)2016年至2019年全球数据集的分析

(3) Visualization in Matplotlib and Bokeh

(3)Matplotlib和Bokeh中的可视化

动态胆量图 (Dynamic choropleth Plot)

Choropleth map provides various patterns and symbols on geographic areas (i.e. countries) which shows a good representation of measurement across regions. To create a global choropleth map, we’ll focus on the survey of the state of global happiness, which ranks 155 countries by their happiness levels, released at the United Nations. Links to the Kaggle website: World Happiness Report. The creation of the plot is using Python libraries and packages — Pandas, Geopandas and Bokeh.

Choropleth地图在地理区域(即国家/地区)上提供了各种模式和符号,可以很好地表示跨地区的测量结果。 要创建全球choropleth地图,我们将集中于对全球幸福状况的调查,该调查在联合国发布的155个国家的幸福度排名中 链接到Kaggle网站: 世界幸福报告 。 使用Python库和程序包-Pandas,Geopandas和Bokeh来创建情节。

下载世界地图文件 (Download world map File)

To render a world map, it is needed to have a shapefile with world coordinates. Natural Earth is a great source to download geospatial data, filled with various public domain map dataset. For the generation of dynamic geographical plot, 1–110m small scale data comes as a good map dataset.

要渲染世界地图,需要具有一个带有世界坐标的shapefile。 Natural Earth是下载包含各种公共领域地图数据集的地理空间数据的理想资源。 为了生成动态地理图, 1-110m的小比例尺数据是一个很好的地图数据集。

将shp文件转换为Geopandas数据框 (Convert shp file into Geopandas Dataframes)

Geopandas can convert ESRI shapefile into a GeoDataframe object with read_file function. Geopandas can read almost any vector-based spatial data format including ESRI shapefile using read_file command which returns a GeoDataframe object. You can specify the columns while reading the dataset with geopands function.

Geopandas可以使用read_file函数将ESRI shapefile转换为GeoDataframe对象。 Geopandas可以使用read_file命令读取几乎任何基于矢量的空间数据格式,包括ESRI shapefile, read_file命令返回GeoDataframe对象。 您可以在使用geopands函数读取数据集时指定列。

Code snippet for GeoDataframe
GeoDataframe的代码段

2015年的静态Choropleth地图 (Static choropleth map for year 2015)

First, we create a data frame of the world happiness report and specify the year of 2015. The resulting data frame df_2015 can then be merged to the GeoDataframe gdf. For later use of Bokeh to create the visualization, we need to have geojson format data for the source of plotting. A collection of features contains points, lines, and polygons from GeoJSON data. Therefore, we convert the data frame into JSON and converts it to string-like object.

首先,我们创建世界幸福报告的数据框并指定2015年。然后可以将所得数据框df_2015合并到GeoDataframe gdf中。 为了以后使用Bokeh创建可视化,我们需要有geojson格式的数据作为绘图源。 要素集合包含来自GeoJSON数据的点,线和面。 因此,我们将数据帧转换为JSON并将其转换为类似字符串的对象。

The merged file is a GeoDataframe object that can be rendered using geopandas module. However, since we want to incorporate data visualization interactivity, we will use the Bokeh library. Bokeh consumes GeoJSON format which represents geographical features with JSON. GeoJSON describes points, lines, and polygons (called Patches in Bokeh) as a collection of features. We therefore convert the merged file to the GeoJSON format.

合并的文件是一个GeoDataframe对象,可以使用geopandas模块进行渲染。 但是,由于我们要合并数据可视化交互性,因此我们将使用Bokeh库。 散景使用GeoJSON格式,该格式代表JSON的地理特征。 GeoJSON将点,线和面(在Bokeh中称为Patches)描述为要素集合。 因此,我们将合并后的文件转换为GeoJSON格式。

Code snippet for Json Data
Json Data的代码段

Then, we are ready to create a static choropleth map with the Bokeh module. We first read in geojson data withGeoJSONDataSource package. Next, we assign a color palette as ‘YlGnBu’ and reverse the color order to match the darkest color for the highest happiness score. Then, we apply custom tick labels for color bars. For the color bar, we map the color mapper, orientation, and tick labels into the ColorBar package.

然后,我们准备使用Bokeh模块创建一个静态的Choropleth贴图。 我们首先使用GeoJSONDataSource包读取geojson数据。 接下来,我们将调色板指定为“ YlGnBu”,并颠倒颜色顺序以匹配最深的颜色以获得最高的幸福分数。 然后,我们为色条应用自定义刻度标签。 对于颜色栏,我们将颜色映射器,方向和刻度标签映射到ColorBar包中。

Code snippet for choropleth map
Choropleth贴图的代码段

We create the figure object with the assignment of plot height and width. Then, we add patches for the figure with x and y coordinates, and specify the field and transform columns in the fill_colors parameter. To display the bokeh plot in the Jupyter notebook, we need to put the output_notebook() module and have the figure displayed in the show() module.

我们通过分配绘图高度和宽度来创建图形对象。 然后,为带有x和y坐标的图形添加补丁,并在fill_colors参数中指定字段和转换列。 要在Jupyter笔记本中显示散景图,我们需要放置output_notebook()模块,并在show()模块中显示该图。

Code snippet for choropleth map
Choropleth贴图的代码段

分析: (Analytics:)

From the plot below, we see that countries like Canada, Mexico, and Australia have a higher happiness score. For South America, and European countries, the overall score is distributed around Index 5 and 6. In Contrast, African countries like Niger, Chad, Mali, and Benin show a much lower happiness index.

从下面的图中可以看出,加拿大,墨西哥和澳大利亚等国家的幸福感得分较高。 对于南美和欧洲国家,总体得分围绕指数5和6进行分配。相反,非洲国家(如尼日尔,乍得,马里和贝宁)的幸福指数要低得多。

Image for post
Plot1: Static choropleth map for year 2015
情节1:2015年的静态Choropleth地图

2015年至2019年的交互式Choropleth地图 (Interactive choropleth map from year 2015 to 2019)

There are two parts added for the interactive choropleth map. One is the creation of a hover tool. We assign the columns for the information displayed on the graph. The other is the creation of the callback function. For the plot interaction, we specify the year through the slider to update the data. We pass the slider value to the callback and have the data adjusted. Then, we pass the slider object to the widgetbox parameter in the bokeh Column class. Finally, we add the curdoc class to create interactive web applications that can connect front-end UI events to real, running Python code.

交互式Choropleth映射添加了两个部分。 一种是创建悬停工具。 我们为图表上显示的信息分配列。 另一个是回调函数的创建。 对于绘图交互,我们通过滑块指定年份以更新数据。 我们将滑块值传递给回调并调整数据。 然后,将滑块对象传递给bokeh Column类中的widgetbox参数。 最后,我们添加curdoc类来创建交互式Web应用程序,该应用程序可以将前端UI事件连接到实际的,正在运行的Python代码。

Code snippet for choropleth map
Choropleth贴图的代码段

For those who have error to run the choropleth map in the Jupyter notebook, there’s an alternative to run the script in the terminal.

对于那些无法在Jupyter笔记本中运行Choropleth映射的人,还有另一种方法可以在终端中运行脚本。

bokeh serve --show EDA_Plot.py
Video of interactive choropleth map
互动式choropleth地图的视频

2015年至2019年《世界幸福报告》的分析图表 (Analytics Plots on World Happiness Report from 2015 to 2019)

2016年GDP和幸福指数的散点图 (Scatter Plot of GDP & Happiness_Score Index in 2016)

Code snippet for Scatter Plot
散点图的代码段

分析: (Analytics:)

We look into the correlation of GDP Growth and happiness levels score in 2016. As the countries are color-coded by regions, we can see that southeast countries have lower GDP growth followed by underlying happiness scores. Most countries in central and eastern Europe have GDP growth fall within 0.8 and 1.4 with a happiness score between 5 and 6. For the region of Western Europe, they tend to show a higher range of economic growth along with the happiness index.

我们研究了2016年GDP增​​长与幸福度得分之间的相关性。由于这些国家按地区进行了颜色编码,因此我们可以看到,东南部国家的GDP增长率较低,其次是基本幸福度得分。 中欧和东欧的大多数国家的GDP增长率都落在0.8到1.4之间,幸福指数在5到6之间。在西欧地区,它们的幸福指数趋向于显示出更大的经济增长范围。

Image for post
Plot2: Scatter Plot of GDP & Happiness_Score Index in 2016
情节2:2016年GDP与幸福感散点图

前十名和后十名经济体指数(人均GDP) (Top and Bottom 10 Countries of Economy Index (GDP per capita))

分析: (Analytics:)

For the top 10 economy trend countries, ‘United Arab Emirate’ has shown the increasing trend with 0.68 growth on the economy from 2015 to 2018. ‘Myanmar’ has a rising rate with 0.41 on GDP per Capita growth as one only Asian country. Surprisingly, Sub-Saharan Africa countries like ‘Malawi’, ‘Guinea’, ‘Tanzania’ are the top 5 countries with the upward economic trend.

在十大经济趋势国家中,“阿拉伯联合酋长国”呈现出上升趋势,2015年至2018年经济增长率为0.68。“缅甸”的人均GDP增长率为0.41,是唯一的亚洲国家。 令人惊讶的是,撒哈拉以南非洲国家(如“马拉维”,“几内亚”,“坦桑尼亚”)是经济趋势排名前五的国家。

We can see that countries with decreased economic trends are mostly in Africa. Bottom 5 countries like ‘Libya’, ‘Yemen’, ‘Kuwait’, ‘Jordan’, ‘Sierra Leone’ have lower Economy Index from 2015 to 2018. Four of those countries are located in the Middle East and Northern Africa.

我们可以看到经济趋势下降的国家大多在非洲。 “利比亚”,“也门”,“科威特”,“约旦”,“塞拉利昂”等排在后5位的国家在2015年至2018年的经济指数较低。其中四个国家位于中东和北非。

Image for post
Plot3: Top and Bottom 10 Countries of Economy Index
情节3:经济指标排名前十和后十的国家

阿联酋GDP年度变化 (UAE Yearly GDP Change)

分析: (Analytics:)

Seeing the top and Bottom 10 Countries of Economy Index (GDP per capita growth), we closely look into the United Arab Emirate’s economic trend. In 1980, UAE shows the max GDP growth value among 40 years. However, the growth becomes negative in the range of the year 1982 to 1986. In the next 10 years, UAE shows a quite stable GDP growth around 0.1 to 0.2 rise. In the year 2009, there’s a plunge on GDP growth followed by the impact of the financial crisis.

看到前十名和后十名经济体国家(人均GDP增长),我们将密切关注阿拉伯联合酋长国的经济趋势。 1980年,阿联酋显示了40年以来的最大GDP增长值。 但是,在1982年至1986年的范围内,该增长率将变为负值。在接下来的10年中,阿联酋的GDP增长率将保持在0.1至0.2左右的稳定水平。 2009年,国内生产总值(GDP)暴跌,随后是金融危机的影响。

Image for post
Plot4:UAE Yearly GDP Change
图4:阿联酋年度GDP变化

结论: (In Conclusion:)

  • To create a choropleth map, geopands can convert shp files into the data frame object. For the creation of visualization, bokeh works well with the geopandas package. However, it’s better to mind that countries need to be matched from the ship file with the outsource data when merging both datasets.

    要创建一个Choropleth贴图,geopands可以将shp文件转换为数据框对象。 为了创建可视化效果,散景与geopandas软件包很好地配合使用。 但是,最好记住的是,合并两个数据集时,需要将船舶文件中的国家与外包数据进行匹配。
  • Matplotlib and Bokeh are two great packages for visualization tool in Python. Scatter plot better shows the correlation of 2 variables with numeric values. In terms of the diverging plot, it better shows the downward and upward trend of the dataset. For the DateTime format variable, it’s better to take care of date with a missing value for the plot creation. The line graph displays a distinct trend on the time series data.

    Matplotlib和Bokeh是Python中可视化工具的两个很好的软件包。 散点图更好地显示了2个变量与数值的相关性。 就散布图而言,它更好地显示了数据集的下降趋势和上升趋势。 对于DateTime格式变量,最好在创建绘图时注意缺少值的日期。 折线图在时间序列数据上显示明显的趋势。

翻译自: https://towardsdatascience.com/eda-visualization-in-geopandas-matplotlib-bokeh-9bf93e6469ec

matplotlib可视化

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值