python gis库_使用开放的python库自动化GIS和遥感工作流

最新推荐文章于 2024-07-09 12:01:39 发布

weixin_26752765

最新推荐文章于 2024-07-09 12:01:39 发布

阅读量864

点赞数

文章标签： python

原文链接：https://towardsdatascience.com/automating-gis-and-remote-sensing-workflows-with-open-python-libraries-e71dd6b049ee

版权

本文介绍了如何利用开源的Python库来自动化地理信息系统（GIS）和遥感的工作流程，帮助数据科学家和GIS专业人员提升效率。

摘要由CSDN通过智能技术生成

python gis库

Over my career I’ve worked on many geospatial related projects using the ArcGIS platform, which I absolutely love. That means I get to consult in projects with cutting-edge geospatial technologies, like Multidimensional Rasters, Deep Learning, and spatial IoT automation. With that in mind, I always try to keep track of how to perform the same operations I’m working on without Esri’s beautifully crafted systems.

在我的职业生涯中，我曾使用我绝对喜欢的ArcGIS平台从事过许多与地理空间相关的项目。这意味着我可以在具有尖端地理空间技术的项目中进行咨询，例如多维栅格，深度学习和空间物联网自动化。考虑到这一点，我始终尝试在没有 Esri精心设计的系统的情况下跟踪如何执行与我相同的操作。

Over the last few weekends during an exceptionally tedious quarantine I've worked on this little script that would reproduce something I've been developing with ESRI's Living Atlas to achieve NDVI Zonal Statistics (Normalized Difference Vegetation Index, a classic in Remote Sensing) for rural land plots.

在过去的几个周末中，我在一个非常繁琐的隔离中工作，编写了这个小脚本，该脚本将重现我与ESRI的《生活地图集》一直在开发的内容，以实现针对农村的 NDVI分区统计 (标准差植被指数，这是遥感领域的经典) 地块。

The plan here was to perform an entire geoprocessing and remote sensing routine without having to resort to any GIS Desktop software. We're starting out with a layer that contains some parcels in which we're intersted and a layer with protected areas where special legislation applies. Nothing else is allowed outside python! This is our plan du jour:

这里的计划是执行整个地理处理和遥感例程，而不必借助任何GIS Desktop软件。我们首先从一个包含一些地块的图层开始，然后到一个包含受保护区域的图层，其中要应用特殊法规。 python之外没有其他任何内容！这是我们的计划：

Check the parcels geojson layer for invalid geometries and correcting those;
检查宗地geojson图层是否存在无效的几何并进行更正；
Calculate the area for each parcel in hectares;
计算每个包裹的面积(以公顷为单位)；
Verify for each parcel if it intersects any protected areas;
验证每个包裹是否与任何保护区相交；
Calculate the area of intersection for each parcel and insert the name of the protected area on the parcel's attributes;
计算每个宗地的相交面积，并在宗地的属性上插入保护区的名称；
Get the latest five Sentinel-2 satellite imagery with no clouds;
获取最新的五张Sentinel-2卫星图像，无云；
Calculate the NDVI on each image;
计算每个图像上的NDVI；
Calculate zonal statistics (mean, min, max and std. dev.) for the NDVI on each parcel;
计算每个包裹上NDVI的区域统计数据(平均值，最小值，最大值和标准偏差)；
Create a map showing the results clipped in each parcel.
创建一个地图，显示每个宗地中裁剪的结果。

让我们开始吧 (Let's get started)

So. First things first. We have to import the geojson file that has the land plots stored in it. For that, we'll be using the geopandas library. If you're not familiar with it, my advice is get acquainted with it. Geopandas basically emulates the functions we've been using in classic GIS Desktop softwares (GRASS, GDAL, ArcGIS, QGIS…) in python in a way consistent with pandas — a very popular tool among data scientists — to allow spatial operations on geometric types.

所以。首先是第一件事。我们必须导入存储有土地图的geojson文件。为此，我们将使用geopandas库。如果您不熟悉它，那么我的建议是熟悉的。 Geopandas基本上以python方式模拟了我们在经典GIS桌面软件(GRASS，GDAL，ArcGIS，QGIS…)中使用的功能，与pandas (数据科学家中非常流行的工具)相一致，从而可以对几何类型进行空间运算。

In order to import our geojson, the first thing we must note are the data types in geopandas. Basically, when we ran the .read_file method, we assigned a geodataframe type to the polygons variable. Inside every geodataframe there will always be a geoseries , which we can access via the .geometry method. Once we find the geoseries , we can make use of the .isvalid method, that produces a list of True/False values for each record in our series.

为了导入geojson，首先要注意的是geopandas中的数据类型。基本上，当我们运行.read_file方法，我们分配一个geodataframe类型的polygons变量。在每个geodataframe ，总会有一个geoseries ，我们可以通过.geometry方法进行访问。找到geoseries ，就可以使用.isvalid方法，该方法为序列中的每个记录生成一个True / False值列表。

And of course there are invalid geometries hanging in our dataset. It comes as no surprise, since those land plots came from the CAR Registry, where every rural land owner in Brazil have to self-declare the extent of their own properties.

当然，我们的数据集中还有无效的几何图形。毫不奇怪，因为这些地块来自中非共和国汽车登记处，巴西的每个农村土地所有者都必须自行申报自己财产的范围。

So, how do we fix that? Maybe you're used to running the excelent invalid geometries checking tools present in ArcGIS or QGIS, which generate even a report on what the problem was with each record in a table. But we don't have access to those in geopandas. Instead, we'll do a little trick to correct the geometries, by applying a 0 meter buffer to all geometries.

那么，我们该如何解决呢？也许您已经习惯运行ArcGIS或QGIS中提供的出色的无效几何检查工具，它们甚至生成有关表中每个记录存在问题的报告。但是我们无法访问那些在大熊猫中的人。取而代之的是，我们将通过对所有几何图形应用0米的缓冲区来纠正几何图形。

And now we’ll finally get to take a look at our polygons, by using the .plot method, which is actually inherited from thematplotlib components in geopandas .

现在，我们终于可以使用.plot方法来查看多边形，该方法实际上是从geopandas的matplotlib组件继承而来的。

This is a fast and useful way to get a quick notion of what our data looks like spatially, but it's not the same as a map.

这是一种快速而有用的方法，可以快速了解我们的数据在空间上看起来是什么样子，但与地图不同。

计算面积 (Calculating areas)

Since we want to know the area of each of our land plots in hectares, we need to make sure our geodataframe is in a Projected Coordinate System. If you're not familiar with coordinate reference systems (CRS), the bare minimum you need to know is that Geographic Coordinate Systems operate in degrees (like latitude and longitude) and Projected Coordinate Systems operate in distances, enabling us to calculate areas using the metric system. From the plotted graph we just created we can see that the polygons are probably plotted accordingly to latitude and longitude, somewhere around -15.7°, -47.7°. If we run print(polygons.crs) we'll get epsg:4326 as response. There are many coordinate systems available, so the EPSG system is a very good way to keep track of them; 4326 means our geodataframe is in the WGS84 Coordinate System — a Geographic Coordinate System indeed.

由于我们想知道每公顷土地的面积(以公顷为单位)，因此需要确保地理数据框位于“ 投影坐标系”中 。如果您不熟悉坐标参考系统(CRS)，则需要了解的最低限度是地理坐标系以度(例如纬度和经度)运行，而投影坐标系以距离进行操作，这使我们能够使用公制。从我们刚刚创建的绘图图中，我们可以看到多边形可能是根据纬度和经度绘制的，大约在-15.7°，-47.7°左右。如果运行print(polygons.crs) epsg:4326 print(polygons.crs)我们将得到epsg:4326作为响应。有许多坐标系可用，因此EPSG系统是跟踪它们的一种很好的方法。 4326表示我们的地理数据框位于WGS84坐标系中—确实是地理坐标系 。

So we do need to transform the coordinate system of our geodataframe. To do that, we must first decide which system we'll transform it into. At this point I'm very used to converting from one system to another, but if you're a bit lost, a good way to choose a projected coordinate system is just using the Universal Transverse of Mercator Projection of the WGS84 system. So all you have to do is find out in which UTM Zone your data is in, so you know that's the zone where your data will have the least area distortion. A good way to do that is this little web application.

因此，我们确实需要变换地理数据框的坐标系。为此，我们必须首先决定将其转换为哪个系统。在这一点上，我已经很习惯从一个系统转换为另一个系统，但是如果您有点迷茫，选择投影坐标系的一种好方法就是使用WGS84系统的墨卡托投影的通用横向 。因此，您要做的就是找出您的数据位于哪个UTM 区域中，因此您知道这是数据区域失真最小的区域。这样做的一个好方法是这个小的Web应用程序。

So all that's left is visiting the EPSG Authority website and checking the EPSG code for our chosen projection. Next, we need to define a new geodataframe variable using the .to_crs method.

因此，剩下的就是访问EPSG管理局网站并检查我们选择的投影的EPSG代码。接下来，我们需要使用.to_crs方法定义一个新的 geodataframe变量。

We can now finally create a new column in that new geodataframe, which I named area_ha and calculate the area (in meters, since we're using a UTM projection) that we'll have to divide by 10,000 for it to be in hectares.

现在，我们终于可以在新的地理数据框中创建一个新列，我将其命名为area_ha并计算必须除以10,000的面积(以米为单位，因为我们使用的是UTM投影)。

检查覆盖 (Checking for overlays)

We can now import the second layer we were provided with, the one with all the protected areas (UCs, or Unidades de Conservação) that might represent legal restrictions to agricultural use in the land plots we're analysing. We'll import it just like we did with the previous geojson. The main difference this time is that this piece of data actually has a lot more fields, with the name of the protected area as well as the date of creation and more legal stuff.

现在，我们可以导入第二层，即第二层，其中包含所有保护区( UCs或Unidades deConservação )，这可能代表了我们正在分析的土地上对农业使用的法律限制。我们将像导入先前的geojson一样导入它。这次的主要区别是该数据实际上具有更多字段，其中包括保护区的名称，创建日期和更多合法内容。

We're interested in getting all that rich information into the attribute table of the land plots intersecting a given protected area. Geopandas actually makes that very easy to do, with the overlay method. It does literally exactly what we're looking for. In our case, it will create a new geodataframe with one record for each intersecting part of each land plot, with the information about the protected area it intersected. With that we can calculate a new field for the area of intersection in hectares, just like we did before.

我们有兴趣将所有丰富的信息纳入与给定保护区相交的土地图的属性表中。实际上，使用overlay方法可以使Geopandas变得非常容易。它从字面上 正是我们要寻找的。在我们的案例中，它将创建一个新的地理数据框，其中每个土地图的每个相交部分都有一个记录，并包含与该保护区相交的信息。这样，我们可以像以前一样为以公顷为单位的交点区域计算一个新字段。

现在为遥感位 (Now for the remote sensing bit)

We're done dealing with the GIS bit of this article, and now things get interesting. We want to get a good sense of how the vegetation is doing in each land plot, so we'll try to get the latest (free) satellite imagery we can get and run some analyses with it. There are many options of free satellite imagery, like the Landsat family, but ESA's Sentinel2 mission provides free imagery in a decent spatial scale (20x20m pixel).

我们已经完成了本文的GIS处理，现在事情变得有趣了。我们希望对每个土地上的植被状况有一个很好的了解，因此我们将尝试获取最新的(免费)卫星图像，我们可以获取并进行一些分析。有许多免费卫星图像可供选择，例如Landsat家族，但ESA的Sentinel2任务可在体面的空间比例(20x20m像素)内提供免费图像。

To access that, there's more than one library we could use. The Sentinel Hub is a solid one, ArcGIS Living Atlas is another one. But my personal choice is Google Earth Engine for two simple reasons: with small changes we can access data from Landsat and Sentinel with the same code, plus it's (still) free to use — but let us not forget what happened with Google Maps API. The real downside here is that the GEE's python API is poorly documented.

要访问该库，我们可以使用多个库。 Sentinel Hub是坚固的，ArcGIS Living Atlas是另一个。但是我个人选择Google Earth Engine是出于两个简单的原因：只需进行少量更改，我们就可以使用相同的代码访问Landsat 和 Sentinel的数据，而且(仍然)可以免费使用它-但我们不要忘记Google Maps API的情况。真正的缺点是GEE的python API的文献资料很少。

What we want to do is fairly simple. We have to call the Sentinel 2 Image Collection from GEE, filter it for area of interest, date and cloud percentage, and get the latest five in the collection. In order to get the area of interest, we have to run one more command with geopandas, the .total_bounds method, which will give us the coordinates for a rectangle that contains the full extent of our land plots.

我们想要做的很简单。我们必须从GEE调用Sentinel 2 Image Collection ，对它的关注区域，日期和云百分比进行过滤，并获取该集合中的最新五个。为了获得感兴趣的区域，我们必须使用geopandas再运行一个命令，即.total_bounds方法，该命令将为我们提供包含整个土地图范围的矩形的坐标。

We have taken those five latest images and stored them in our collectionList variable. But let's have a look to make sure the satellite imagery actually corresponds to our expectations. To do that, let's use Earth Engine's .Image , set it to a RGB composition — in Sentinel 2, that means using the bands 4, 3 and 2 — and plot it using Ipython's .display .

我们已经拍摄了这五个最新图像，并将它们存储在我们的collectionList变量中。但是，让我们看一下以确保卫星图像确实符合我们的期望。为此，让我们使用Earth Engine的.Image ，将其设置为RGB合成-在Sentinel 2中，这意味着使用波段4、3和2-并使用Ipython的.display 。

计算NDVI (Calculating the NDVI)

This bit is actually pretty simple. GEE makes it very easy to calculate normalized differences, which is precisely what NDVI is:

这一点实际上很简单。 GEE使得计算归一化差异非常容易，这正是NDVI的含义：

The normalized difference vegetation index (NDVI) is a simple graphical indicator that assesses whether or not the target being observed contains live green vegetation. Healthy vegetation (chlorophyll) reflects more near-infrared (NIR) and green light compared to other wavelengths. But it absorbs more red and blue light.

标准化差异植被指数 ( NDVI )是一个简单的图形指标，用于评估被观察的目标是否包含活绿色植被 。与其他波长相比，健康的植被(叶绿素)反射更多的近红外(NIR)和绿光。但它吸收更多的红色和蓝色光。

NDVI quantifies vegetation by measuring the difference between near-infrared (which vegetation strongly reflects) and red light (which vegetation absorbs). Satellite sensors like Landsat and Sentinel-2 both have the necessary bands with NIR and red.

NDVI通过测量近红外 (植被强烈反射)和红光 (植被吸收)之间的差异来量化植被。诸如Landsat和Sentinel-2之类的卫星传感器都有必不可少的NIR和红色波段。

In GEE, we just need to use the .normalizedDifference method with the right bands. In our case, it's the bands 8 (NIR) and 4 (Red). Normally, that would be it, but I'm an old-fashioned GIS geek, so I need to have a look at my data in a map to make sure it went ok.

在GEE中，我们只需要在正确的频段上使用.normalizedDifference方法。在我们的例子中，是第8波段(NIR)和第4波段(红色)。通常，就是这样，但是我是一个老式的GIS怪胎，因此我需要查看地图中的数据以确保一切正常。

For that, I'll create a simple Folium map — a way to render Leaflet maps with python — and load both the NDVI raster and the land plots geojson to it.

为此，我将创建一个简单的Folium贴图-一种使用python渲染Leaflet贴图的方法，然后将NDVI栅格和地形图geojson加载到其中。

区域统计很有趣 (Zonal Statistics is fun)

We're almost through. What we need to do now is resume the NDVI data we just calculated for the five Sentinel 2 images in each land plot. Folks in the remote sensing have been doing this for many years, with something called Zonal Statistics. This tool is present in many GIS Desktop softwares, and it's possible even in GEE python API — again, poorly documented. For simplicity, ease of access and liability, my preferred tool for this is actually rasterstats.

我们快完成了。现在我们需要做的是恢复刚刚为每个土地图中的五个Sentinel 2图像计算的NDVI数据。遥感领域的人们已经做了很多年了，这就是所谓的Zonal Statistics 。该工具存在于许多GIS桌面软件中，并且即使在GEE python API中也有可能-同样，文献记录不多。为了简化，易于访问和承担责任，我为此首选的工具实际上是rasterstats 。

The problem is, so far we've been dealing with GEE own format for dealing with imagery. And rasterstats will be needing those oldschool .TIF files. So we gotta export our imagery to a local folder.

问题是，到目前为止，我们一直在处理GEE自己的格式以处理图像。 光栅统计仪将需要那些老式的.TIF文件。因此，我们必须将图像导出到本地文件夹。

Finally, this is where the magic happens. Let us have a little for loop that opens each local NDVI raster file, calculates the stats for each land plot, and then append the results for mean, minimum, maximum and standard deviation to a pandas' dataframe.

最后，这就是魔术发生的地方。让我们有一个for循环，用于打开每个本地NDVI栅格文件，计算每个土地图的统计信息，然后将平均值，最小，最大和标准偏差的结果附加到熊猫的数据框中。

We can now visualize the NDVI statistics in each of our land plots by plotting our data, just like before, except this time we can tell matplotlib which column and which color scheme to use. Since the results seem ok, we can export the geodataframe to a geojson file and share with our not-so-tech colleagues.

现在，我们可以像以前一样通过绘制数据来可视化每个地块中的NDVI统计数据，除了这次我们可以告诉matplotlib使用哪个列和哪个配色方案。由于结果看起来还可以，因此我们可以将地理数据框导出到geojson文件，并与我们的非高科技同事共享。

那是所有人 (That's all folks)

That's it for this piece! We have successfully implemented classic geoprocessing and remote sensing workflows without having to use ArcGIS or QGIS once! You can find this whole workflow in my Github page. If you have questions or suggestions, don't hesitate to drop me a line anytime!

这就是这个！我们已经成功实现了经典的地理处理和遥感工作流程，而无需一次使用ArcGIS或QGIS！您可以在我的Github页面上找到整个工作流程。如果您有任何疑问或建议，请随时联系我！

翻译自: https://towardsdatascience.com/automating-gis-and-remote-sensing-workflows-with-open-python-libraries-e71dd6b049ee

python gis库

weixin_26752765

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
python gis库_使用开放的python库自动化GIS和遥感工作流

python gis库Over my career I’ve worked on many geospatial related projects using the ArcGIS platform, which I absolutely love. That means I get to consult in projects with cutting-edge geospatial techn...
复制链接

扫一扫