法国数学
Which mountain ranges are the most dangerous in France for hikers and alpinists? This was my main question, because I recently moved to Grenoble which is basically french hiking paradise.
对于登山者和登山者来说,法国哪个山脉最危险 ? 这是我的主要问题,因为我最近搬到了格勒诺布尔,这基本上是法国的远足天堂。
Unfortunately, only regional data I have found about mountain accidents where yearly avalanche accidents from reports of ANENA (organization for study of snow and avalanches in France) divided by communes (small administrative units in France). This was the second best thing to mountain accidents grouped by mountain ranges, which I was unable to find, so as in poker or tetris I played the hand I was dealt and searched for shapefile of communes in France (luckily, official sources of french gouvernment did not let me down). These were only two pieces for my visualization “puzzle” I needed to start coding in Jupyter Notebook to create interactive map of avalanche accidents in France in last 10 years.
不幸的是,我仅发现了每年一次有关山难的区域数据 根据 ANENA (法国雪崩研究组织)的报告除以公社 (法国小型行政单位) 得出的 雪崩事故 。 这是我无法找到的按山脉分组的山地事故的第二好的东西,所以在扑克或俄罗斯方块中,我扮演了我被抓到的手,并在法国搜索了公社的shapefile (幸运的是,法国政府的官方消息没有让我失望)。 这只是我的可视化“难题”的两部分,我需要在Jupyter Notebook中开始编码,以创建过去10年法国雪崩事故的交互式地图 。
不要说话,只是编码 (Don’t talk, just code)
All files with source data and code in Jupyter notebook can be found in my GitHub repo Avalanche danger in France.
Jupyter笔记本中所有包含源数据和代码的文件都可以在我在法国的GitHub存储库Avalanche危险中找到。
1)安装 (1) Installation)
Assuming you have already standard Python libraries like Pandas and Numpy installed, for handling geospatial data and shapefiles I needed to add GeoPandas and Bokeh.
假设您已经安装了标准的Python库(例如Pandas和Numpy),用于处理地理空间数据和shapefile,则需要添加GeoPandas和Bokeh。
In my case (Windows with Anaconda) magical words for my command line were; conda install geopandas and conda install bokeh, as it is advised in documentation of GeoPandas and Bokeh libraries.
在我的情况下(带有Anaconda的Windows),我命令行中的神奇词汇是: 正如在GeoPandas和Bokeh库的文档中所建议的那样, conda 安装geopandas和conda 安装bokeh 。
![Image for post](https://miro.medium.com/max/9999/1*DrbumSjVdG9LSry5Arf5Rg.png)
2)熟悉Geopandas数据框 (2) Getting familliar with Geopandas dataframe)
Previously, I used twice term shapefile without further explanation, which is kind of mean and I don’t wanna be mean girl, so…
以前,我两次使用shapefile一词,但没有进一步解释,这是一种卑鄙的态度,我不想成为卑鄙的女孩,所以…
Shapefile format means geospatial data in form of points, lines or polygons, typically used in GIS (Geografic Information System). Extension for shapefile is .shp, but for display of data you need actually 3 files in a folder, .shp, .shx and .dbf.
Shapefile格式是指点,线或多边形形式的地理空间数据 ,通常在GIS(地理信息系统)中使用。 shapefile的扩展名是.shp,但是要显示数据,实际上您需要在文件夹中.shp,.shx和.dbf 3个文件 。
If polygon sounds more like a name for spaceship Jean-Luc Piccard had to fly, while his Enterprise was being repaired, you will have better idea from following example. This is polygon for french commune Asnières-sur-Oise (polygon the most close to spaceship I could find after few random trials).
如果多边形听起来更像是让太空飞船的名字让·卢克·皮卡德(Jean-Luc Piccard)在其企业修理期间必须飞行,那么从下面的示例中您将有更好的主意。 这是法国公社瓦兹河畔阿斯尼尔( Osnières-sur-Oise)的多边形(经过几次随机试验,我能找到的最接近太空飞船的多边形) 。
![Image for post](https://miro.medium.com/max/9999/1*KbtC9pWcV3XYOTjCncxrxg.png)
Here, we can see that polygons or geospatial data from GeoDataFrames in general can be displayed in same fashion as rows in classic Pandas DataFrame, using iloc method. For image of polygon we need to look for column with active geometry, where are data stored in form of GPS coordinates. (Column with active geometry will be most likely named geometry.)
在这里,我们可以看到,通常可以使用iloc方法以与传统Pandas DataFrame中的行相同的方式显示GeoDataFrames中的多边形或地理空间数据。 对于多边形图像,我们需要查找具有活动 几何形状的 列 ,其中数据以GPS坐标的形式存储。 (具有活动几何图形的列很可能被命名为几何图形。)
Column with coordinate system (geometry) is what sets apart GeoDataFrame in GeoPandas from other DataFrames in Pandas.
具有坐标系(几何)的列是将GeoPandas中的GeoDataFrame与Pandas中其他数据框架区分开的地方 。
Reading and viewing GeoDataFrame is very similar to simple DataFrame (except for already mentioned neccesity of having .shp, .shx and .dbf files in one folder in order to load shapefile):
读取和查看GeoDataFrame与简单的DataFrame非常相似(除了已经提到的在一个文件夹中具有.shp,.shx和.dbf文件以便加载shapefile的必要性):
# importing standard python libraries + geopandas for dealing with geospatial data
import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# reading shapefile of France divided into communes
gdf = gpd.read_file(r"CONTOURS-IRIS\1_DONNEES_LIVRAISON_2014\CONTOURS-IRIS_2-0_SHP_LAMB93_FE-2014\CONTOURS-IRIS_FE.shp")
# viewing geodataframe
gdf.head()
![Image for post](https://miro.medium.com/max/9999/1*B6YioTRHc1bvhvZtdqhArA.png)
Both pieces of my visualization puzzle (shapefile with all communes and avalanche accidents report) need to have one joining column. It is commune (NOM_COM in gdf GeoDataFrame and commune in avalanche DataFrame).
我的可视化拼图的两个部分(所有社区的shapefile和雪崩事故报告)都需要有一个连接列。 这是公社(NOM_COM在GDF GeoDataFrame和雪崩数据帧公社 )。
![Image for post](https://miro.medium.com/max/9999/1*mIXCOvjgI2cNCDuyfKMt8g.png)
Unfortunately, names of communes in GeoDataFrame and DataFrame were written using different guidelines for upper/lowercase letters, special french characters (é, à, ç etc.) and sometimes even mixed use of hyphens vs. blank spaces.
不幸的是,GeoDataFrame和DataFrame中的公社名称使用大写/小写字母,特殊的法语字符(é,à,ç等)的不同指南编写,有时甚至连字符与空格混合使用。
3)数据清理和法语陷阱 (3) Data cleaning and traps of French language)
Therefore I changed all commune names to lowercase with str.lower( ) method and replaced special french characters to avoid double versions of same commune name.
因此,我使用str.lower()方法将所有公社名称更改为小写,并替换了特殊的法语字符以避免相同公社名称的双重版本。
# because of inconsistent using of french special characters and some other characters I change them to unified writing
avalanche['commune'].replace(['è', 'é', 'ê', 'ë'], 'e', inplace=True, regex=True)
avalanche['commune'].replace('-', ' ', inplace=True, regex=True)
avalanche['commune'].replace('à', 'a',inplace=True, regex=True)
avalanche['commune'].replace('î', 'i',inplace=True, regex=True)
avalanche['commune'].replace('ô', 'o',inplace=True, regex=True)
avalanche['commune'].replace("d'", "d ", inplace=True, regex=True)
avalanche['commune'].replace(['st ', 'St '], 'saint ', inplace=True, regex=True)
avalanche['commune'].replace('s/', 'sur', inplace=True, regex=True)
gdf['NOM_COM'].replace(['è', 'é', 'ê', 'ë'], 'e', inplace=True, regex=True)
gdf['NOM_COM'].replace('-', ' ', inplace=True, regex=True)
gdf['NOM_COM'].replace(['à', 'â'], 'a',inplace=True, regex=True)
gdf['NOM_COM'].replace('î', 'i',inplace=True, regex=True)
gdf['NOM_COM'].replace('ô', 'o',inplace=True, regex=True)
gdf['NOM_COM'].replace("d'", "d ", inplace=True, regex=True)
gdf['NOM_COM'].replace(['st ', 'St '], 'saint ', inplace=True, regex=True)
gdf['NOM_COM'].replace('s/', 'sur', inplace=True, regex=True)
Ideally, all commune names from avalanche DataFrame should be also in gdf GeoDataFrame, but it was not the case because of several reasons:
理想情况下,雪崩数据帧中的所有公社名称也应位于gdf GeoDataFrame中,但并非如此,原因如下:
- names of communes in avalanche reports were inserted by human, and therefore included some mistakes in spelling and sometimes records mentioned instead of commune name of ski station 雪崩报告中的公社名称是人为插入的,因此包括一些拼写错误,有时还提到了记录,而不是滑雪站的公社名称
- some communes merged into different commune or changed its name over time, therefore some commune names were outdated today 一些公社合并为不同的公社或随着时间的推移更改其名称,因此今天有些公社名称已过时
- few communes in different regions have completely same names 不同地区的几个公社名称完全相同
- polygons in shapefile were derived not from communes, but from smaller units which were same as communes in majority of cases, but not in all of them shapefile中的多边形并非来自公社,而是源自与公社相同的较小单位,在大多数情况下,但并非在所有情况下
First three problems I solved by simple name changes in either gdf GeoDataframe or avalanche DataFrame. It was time consuming, but because it is quite specific only for my case and not especially linked to geospatial data, I will not go into details about it here (anyway, more details can be found in my GitHub repo).
我通过更改gdf GeoDataframe或雪崩DataFrame中的简单名称解决了前三个问题。 这很耗时,但是因为它仅针对我的情况是特定的,并且与地理空间数据没有特别联系,所以我在这里不做详细介绍(无论如何,更多详细信息可以在我的GitHub repo中找到)。
Last problem with polygons not always derived from shape you need, is something you might encouter when dealing with geospatial data too. Here we can see that commune Chamonix Mont Blanc consists of 4 different polygons.
多边形的最后一个问题并非总是从您需要的形状中得出,在处理地理空间数据时也可能会遇到麻烦。 在这里我们可以看到夏蒙尼勃朗峰公社由4个不同的多边形组成。
![Image for post](https://miro.medium.com/max/9999/1*6IroOU8grhXrj36YeFTkfQ.png)
It can be solved by using dissolve function which creates new polygons based on chosen column, (example: dissolve(by=”column_x”)). Commune boundaries are defined as GeoDataFrame keeping only columns we want to keep. Buffer is created to avoid intersection of new polygons.
可以通过使用dissolve函数来解决此问题,该函数会根据所选列创建新的多边形 (例如:dissolve(by =“ column_x”))。 公有边界定义为GeoDataFrame,仅保留我们要保留的列。 创建缓冲区是为了避免新多边形相交。
# creating polygons based on commune
gdf['geometry'] = gdf.buffer(0.01)
commune_boundary = gdf[['DEPCOM', 'NOM_COM','geometry']]
gdf = commune_boundary.dissolve(by='DEPCOM')
![Image for post](https://miro.medium.com/max/9999/1*xxm2t2TLb4ZXCiY_RW8tEg.jpeg)
4)创建最终的GeoDataFrame以在Bokeh中进一步使用 (4) Creating final GeoDataFrame for further use in Bokeh)
Now, I finally had perfect polygons derived from communes. But I also needed sum of avalanche accidents for each commune. Simple groupby() function and creating new dataframe would do the trick:
现在,我终于有了从公社派生的完美多边形。 但是我还需要每个公社的雪崩事故总数。 简单的groupby()函数并创建新的数据框将达到目的:
![Image for post](https://miro.medium.com/max/9999/1*cZTo9UGsnuF1o8tEv3mvtg.png)
Of course, one little cosmetic change was needed for ugly column name 0.
当然,丑陋的列名0需要进行一些外观上的更改。
![Image for post](https://miro.medium.com/max/9999/1*e4fxLLZH09NxiLsYQO1jiA.png)
Finally, I could merge data in gdf GeoDataFrame with cumulative avalanche report in aval_final DataFrame.
最后,我可以在GDF GeoDataFrame与aval_final数据帧累积雪崩报告合并数据。
![Image for post](https://miro.medium.com/max/9999/1*CtWh1A276gYNzQ21XA3EBw.png)
NaN values are communes without any avalanche reported in last 10 years. Later, when using Bokeh we would need to transform this GeoDataFrame into GeoJSON format. This would trigger error, because NaN values are not recognized as JSON object. Therefore, I will replace all NaN by string “No avalanche” with fillna() function and verify that no NaN appear in GeoDataFrame after.
NaN值是过去10年未报告雪崩的公社 。 稍后,当使用Bokeh时,我们需要将此GeoDataFrame转换为GeoJSON格式。 这将触发错误,因为NaN值无法识别为JSON object 。 因此,我将使用fillna()函数将所有NaN替换为字符串“ No a avalanche”,并确认之后没有NaN出现在GeoDataFrame中。
![Image for post](https://miro.medium.com/max/9999/1*OtVXRtcUbsGw-G0-xhoqKQ.png)
If you stuck for so long reading this article, I think you deserve one more cool photo of French Alps before attacking final step of creating visualization in Bokeh library.
如果您花了这么长时间阅读本文,我认为在攻击Bokeh库中创建可视化的最后一步之前,您还应该再得到一张法国阿尔卑斯山的精美照片。
![Image for post](https://miro.medium.com/max/9999/1*utSsWEaisGjHAwFr8cd0pA.jpeg)
5)散景雪崩事故互动地图 (5) Interactive map of avalanche accidents in Bokeh)
Bokeh library uses GeoJSON format for data visualization, so I need to convert final GeoDataFrame to GeoJSON format.
散景库使用GeoJSON格式进行数据可视化,因此我需要将最终的GeoDataFrame转换为GeoJSON格式。
import json
geosource = GeoJSONDataSource(geojson = aval_gdf_final.to_json())
Unlike most Python libraries, elements in Bokeh library need to be imported separately in groups. In code below I import all elements I will use, but there are many more options.
与大多数Python库不同,Bokeh库中的元素需要成组分别导入。 在下面的代码中,我导入了将要使用的所有元素,但是还有更多选择。
from bokeh.io import save
from bokeh.models import (ColorBar,
GeoJSONDataSource, HoverTool,
LinearColorMapper)
from bokeh.palettes import mpl
from bokeh.plotting import figure, output_file
Next, I will chose color settings. There is plenty of color palettes in Bokeh documentation. In color_mapper min and max of vis values are chosen as well as color outside of chosen palette (nan_color) to represent NaN values or more precisely string “No avalanche” in our GeoJSON.
接下来,我将选择颜色设置。 散景文档中有很多调色板。 在color_mapper中,选择vis值的最小值和最大值,以及所选调色板之外的颜色(nan_color)以表示NaN值,或更精确地表示GeoJSON中的字符串“ No a avalanche”。
I also used major_label_overrides to highlight that only communes with at least one 1 avalanche accident are included in color palette. Without using this feature all communes with 0 up to 4 avalanches would end up in one group with same color which would be visually quite unclear and confusing.
我还使用major_label_overrides突出显示 调色板中仅包括 至少发生1次雪崩事故的社区 。 如果不使用此功能,则0雪崩到4雪崩的所有公社都将以相同的颜色归为一组,这在视觉上是非常不清楚和令人困惑的。
# define color palettes
palette = mpl['Viridis'][6]
palette = palette[::-1] # reverse order of colors so higher values have darker colors
# instantiate LinearColorMapper that linearly maps numbers in a range into a sequence of colors
# and nan values will be colored in grey
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 30, nan_color = '#d9d9d9')
# define custom tick labels for color bar.
tick_labels = {'0': '1', '5': '5', '10':'10', '15':'15', '20':'20', '25':'25', '30':'30'}
# create color bar
color_bar = ColorBar(color_mapper = color_mapper,
label_standoff = 8,
width = 500, height = 20,
border_line_color = None,
location = (0,0),
orientation = 'horizontal',
major_label_overrides = tick_labels)
For final visualization we need to create figure object.
为了最终可视化,我们需要创建图形对象。
# create figure object
p = figure(title = 'Number of avalance accidents in commune',
plot_height = 1400,
plot_width = 1200,
toolbar_location = 'below',
tools = 'pan, wheel_zoom, box_zoom, reset')
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
Method patches() is used to plot shapes of communes in the graph. Apart from defining x and y coordinates, source as our GeoJSON we also need to specify fill_color. Our color has to change on each polygon, therefore I use dictionary with keys field and transform. Field values are variable we want to plot, transform values are color groups for this variable.
方法patch()用于在图形中绘制公共形状。 除了定义x和y坐标之外,将source作为我们的GeoJSON,还需要指定fill_color 。 我们的颜色必须在每个多边形上改变,因此我使用带有键字段的字典并进行变换。 字段值是我们要绘制的变量,变换值是此变量的颜色组。
# add patch renderer to figure
communes = p.patches('xs','ys', source = geosource,
fill_color = {'field' :'count_of_avalanche_accidents',
'transform' : color_mapper},
line_color = 'gray',
line_width = 0.20,
fill_alpha = 1)
In tooltips of hovertool we chose data shown to user while cursor is moved on the map (in my case it is name of commune and total of avalanche accidents so columns commune and count_of_avalanche_accidents).
在hovertool的工具提示中,我们选择了在光标在地图上移动时向用户显示的数据 (在我的情况下是公社名称和雪崩事故总数,因此列commune和count_of_avalanche_accidents) 。
# create hover tool
p.add_tools(HoverTool(renderers = [communes],
tooltips = [('Commune','@NOM_COM'),
('Number of avalanche accidents','@count_of_avalanche_accidents')]))
p.add_layout(color_bar, 'below')
If you still find something unclear in code above, I recommend to check Bokeh library with quite similar visualization.
如果您仍然在上面的代码中发现一些不清楚的地方,建议您使用非常相似的可视化检查Bokeh库 。
Finally, I saved final visualization as separate html file, but if you want to show graph directly in Jupyter Notebook you can use instead of save() function show(). Just don’t forget to import the function before from bokeh.io.
最后,我将最终的可视化效果保存为单独的html文件,但是如果您想直接在Jupyter Notebook中显示图形,则可以使用show()函数代替save()函数。 只是不要忘记从bokeh.io导入函数。
# final visualization can be seen as html page
output_file("mountains_danger.html")
save(p)
Anyone would probably guess that most avalanche accidents appeared in French Alps in area around Mont Blanc and Pyrenees mountains near borders with Spain. For me, it was suprising to find out that avalanches did appear also in Alsace region or Massif Central and one case did happen even on island of Corsica.
任何人都可能会猜测,大多数雪崩事故都发生在法国阿尔卑斯山附近与西班牙接壤的勃朗峰和比利牛斯山脉周围的地区。 对于我来说,令人惊讶的是,雪崩确实也在阿尔萨斯地区或Massif Central出现,甚至在科西嘉岛上也发生了一起案件。
![Image for post](https://miro.medium.com/max/9999/1*MGx4uWDoqdYsnUYhAa_6xA.png)
One downside of having map visualization with so many polygons is quite slow zooming and need for extra storage on GitHub in case you would like to post it there. On the other hand, the map is very detailed and precise.
具有如此多的多边形的地图可视化的缺点之一是缩放速度相当慢,并且需要在GitHub上额外存储,以防您想要将其发布到那里。 另一方面,地图非常详细和精确。
ANENA: Association Nationale pour l’Étude de la Neige et des Avalanches, Bilan des accidents (2020), Anena.org
ANENA:国家倒灌协会和雪崩协会, 《事故的Bilan》 (2020年), Anena.org
Data.gouv.fr (2020), Data.gouv.org
Data.gouv.fr(2020), Data.gouv.org
S. PATEL, A Complete Guide to an Interactive Geographical Map using Python (2019), TowardsDataScience.com
S.PATEL , 使用Python的交互式地理地图的完整指南 (2019), TowardsDataScience.com
Earth Lab, Lesson 6. How to Dissolve Polygons Using Geopandas: GIS in Python Spatial data open source python Workshop (2019), EarthDataScience.org
Earth Lab, 第6课。如何使用Geopandas分解多边形:Python中的GIS空间数据开源python Workshop (2019), EarthDataScience.org
Geographic Information Systems, Dissolve causes ‘No Shapely geometry can be created from null value’ in geopandas (2018), gis.stackexchange.com
地理信息系统, Dissolve在Geopandas (2018), gis.stackexchange.com中 导致``无法从空值创建形状几何''
Stackoverflow, Deleting inner lines of polygons after dissolving in geopandas (2020), Stackoverflow.com
Stackoverflow, 溶解在geopandas中后删除多边形的内线 (2020) , Stackoverflow.com
翻译自: https://towardsdatascience.com/avalanche-danger-in-france-247b81b85e4e
法国数学