[译] 为什么我们需要关注PostGIS? - Part 2

最新推荐文章于 2023-10-13 08:40:30 发布

Bruce Jia（上海）

最新推荐文章于 2023-10-13 08:40:30 发布

阅读量424

点赞数

分类专栏：地理信息系统GIS

原文链接：https://medium.com/@tjukanov/why-should-you-care-about-postgis-a-gentle-introduction-to-spatial-databases-9eccd26bc42b

版权

地理信息系统GIS 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

原文链接：https://medium.com/@tjukanov/why-should-you-care-about-postgis-a-gentle-introduction-to-spatial-databases-9eccd26bc42b

由于原博文较长，我分为几个部分翻译。上一部分在这里: https://blog.csdn.net/IDisposable/article/details/104004337

第二部分：空间SQL的神奇世界

空间SQL（结构化查询语言）能够加速你的处理（当被广泛运用的时候）。下图是使用SHP和QGIS，以及使用PostGIS来处理500万个随机点的对比。

（数据库相关的博文总是会有个柱状图来对比处理时间的。PostGIS = 非常快。柱状图不说谎。）

在这个对比中我使用了芬兰的邮政编码数据和每个邮编区域的人口数据。在SHP文件和本地的数据库中我都存有这份数据。在每个多边形中我穿件随机点来表示人口。我使用了QGIS处理SHP文件（QGIS的矢量处理工具），在PostGIS中使用了下面这条简单的SQL:

SELECT ST_GeneratePoints(geom, he_vakiy) from paavo.paavo

正如你从前面的图中看见的，做相同的分析，PostGIS花了QGIS和SHP方案不到10%的时间。如果你是一个GIS分析员，你每天都会做类似的工作，那么一年下来PostGIS能够帮助你节省相当多的时间。

除了快速处理外，你能享受PostGIS提供的一大推空间函数(spatial functions)。具体哪些函数对你有用取决于你的业务需求。

在Voronoi分析和更加传统的GIS分析（缓存分析，重叠分析，相交分析，剪切分析等等）你能做更高级的分析。例如：

路由用 pgRouting 和道路数据你能找到优化的路由，而且能做不同的网络分析；
多边形骨架化这个功能使得你能够轻松地构造多边形的中轴; （译者注：我不熟悉GIS分析方面，学习中，欢迎指正）
几何形状分割为后续处理而划分几何形状能够加速你的GIS处理过程;
聚类从你的数据中找到聚类和模式。现在AI正在风口上，对有的人来说k-means算法可能比以前更有吸引力。。。。。。

你要多边形骨架化有什么用？大多数人可能会有这个问题，但是当你的空间分析需要它的时候，你会非常开心，因为已经有人帮你搞定最难的部分了（嗯，就是数学）。组合各种空间函数和Postgres自身的函数，使你能够完成高级的空间分析。

一些复杂而有趣的问题（空间连接，聚合等等），能用一条数据库SQL语句表达，但他们需要很多的计算，这正是PostGIS带给你的。如果你用编程的方式访问文件（SHP）的话，那么解决同样的问题，你可能需要编写几百行特定功能的代码。

PostGIS和可视化

在我过去的可视化项目中，PostGIS在可视化中扮演了一份角色。我常常预处理数据，然后在QGIS中做真正的可视化。

让我们来看一个例子。

火车的动态Voronoi线。古怪，但是满足我的需求。

这个火车和Voronois的动图是展现PostGIS能力的一个好玩例子。我在本地数据库中存有几百万的或者GPS点数据，用这些点的移动我创建了动画。但是我想测试下Voronoi线的动画会是什么样。

首先，每列火车每分钟都有几个GPS的点数据，所以我需要将这些点组合成一个点来代表或者每分钟所在的位置。我先是手动创建了一张表来存储这些结果点。我写了这样的一条SQL语句。

INSERT INTO trains.voronoipoints 
SELECT '2018–01–15 09:00:00' AS t, 
       geom 
FROM   (SELECT St_centroid(St_collect(geom)) AS geom, 
               trainno 
        FROM   (SELECT geom, 
                       trainno 
                FROM   trains.week 
                WHERE  time > '2018–01–15 09:00:00' 
                       AND time < '2018–01–15 09:01:00') AS a 
        GROUP  BY trainno) AS b

如果我们拆解这条语句我们能够得到:

你能看到一些SQL查询的常规元素 (INSERT INTO, SELECT, AS, FROM, WHERE, AND, GROUP BY)
geom, trainno 和 time 是我在“trains"大纲，“week”表中的列名
子查询a返回dd returns all GPS points which have been tracked within the requested timeframe.
Because I select all GPS points tracked inside one minute, I might get several points for each train. I only wanted one, so that the voronoi lines would look more sensible. That’s why I use ST_Collect to group the points together and to create a multipoint geometry from them. ST_Centroid replaces the multipoint geometry with a single point located at the centroid (subquery b) and the data is grouped by train numbers.

To do the same thing multiple times, I had a simple Python script to loop over the same query for a few hundred times where I had the start and end times as parameters. After successfully finding one representative point for each minute, I just ran the following command (in 11.5 seconds):

SELECT t, ST_VoronoiLines(geom) from trains.voronoipoints

Then I added the result to QGIS and visualized it with Time Manager. This might be a bit hacky way to achieve the result and a more experienced SQL user might’ve done it completely with a single SQL command, but I’m still pretty happy with the result. Although it might be pointless.

Eventually pretty simple, but the result looks like higher level math (and it is!), as all the hard work is done by PostGIS. Also because I was able to make the Voronoi analysis for only one point per train, the processing time was only seconds for hundreds of thousands of points.

Often the processing time of your queries grows exponentially as the data amounts grow. This is why you have to be smart with your queries.

Hey look! I made a SQL meme!

As a rule of thumb, the more data a query has to fetch and more operations the database has to do (ordering, grouping etc), it becomes slower and thus less efficient. An efficient SQL query only fetches the rows and columns it really needs. SQL can work like a logical puzzle, where you really have to think thoroughly what you want to achieve.

I must also note that tweaking the performance of your queries is a slippery slope and you can get lost in the world of endless optimization. Finding the balance between an “optimal query” and an optimal query is really important. Especially if you are not building an application for a million users, a few milliseconds here or there won’t probably rock your boat.

如何开始？

我敢说学习SQL对于GIS用户来说，比学习JavaScript, Python或者R语言更加有用。多少年过去了，SQL的语法只有很微小的变化，SQL技能是非常保值的。

I have found that the learning curve in SQL isn’t really steep to do the basics, but it might take you some time to really see the benefits that it can bring to your spatial analysis. But I encourage to be patient and try more complicated analytics and aim for faster processing. Eventually you will see the difference.

First when you are learning SQL basics you will learn how to query data from a single table using basic data selection techniques such as selecting columns, sorting result set, and filtering rows. Then, you will learn about the advanced queries such as joining multiple tables, using set operations, and constructing a subquery. Finally, you will learn how to manage database tables such as creating new a table or modifying an existing table’s structure.

But there also also tools to help you out!

QGIS有一个很赞的工具叫做DB Manager。（译者注：我们有一个很赞的QQ群“开源GIS技术交流群”，群号767137544，欢迎加入）DB Manager提供了一个数据库图形界面，它在QGIS里边而且更容易理解。你使用右键点点就能修改和增添数据库表，添加索引，以及做一些其他的基本操作。

【QGIS DB Manager截图.】

你也应该看看pgAdmin，他是PostgreSQL数据库最流行的管理和开发平台。有多重方法能够将你的数据导入到PostGIS(例如org2org, shp2pgsql)。通常来说我鼓励你使用不同的工具和方法来跟数据打交道。

我结合Python和PostGIS做了很多实验。让Python(或者R)和PostGIS合作，能够让你的数据处理和自动化上一个台阶。使用psycopg2，结合Python基本的脚本能力，连接到PostGIS是上手的好方法。

这么好的东东？你是不是想开始撸了？

下载PostGIS并且安装到你的机器上。链接：download the installers
加载数据到PostGIS。你可以从单个的SHP文件入手，使用QGIS DB Manager把它导入到数据库；也可以参考教程（this tutorial）中的例子，把Natural Earth data数据导入到PostGIS。
开始撸SQL。从基础的选择，过滤和修改更新数据开始，慢慢地你会看到PostGIS能给你工作带来的好处。