Note|Polygonal Clustering Analysis Using Multilevel Graph-Partition_the multilevel graph partition algorithm-CSDN博客

本文链接：https://blog.csdn.net/weixin_45439556/article/details/97624069

Note|Polygonal Clustering Analysis Using Multilevel Graph-Partition

**1.**六种空间聚类方法

空间聚类方法
分区聚类	需要提前指定生成的聚类数量
层次聚类	不需要事先知道聚类数量，但需要指定终止条件
密度聚类	将密集区域中的对象分组成一个聚类，当研究区域的密度几乎相同时，不能很好地工作；很难确定多边形的密度
网格聚类	将空间划分为由网格结构组成的子区域，然后基于网格进行聚类分析
图形聚类	使用图表示对象之间的关系和配置
模型聚类	首先假设每个集群有一个模型，然后找到更适合每个模型的数据

**2.**不相交（重叠）多边形相似性主要取决于其几何构型和多边形空间关系。首先，多数算法没有使用空间关系；其次，多级图划分（multilevel graph-partition）虽不能克服六种空间聚类算法缺点，但可以更好的表示多边形之间的关系，发现全局最优结果。提出了一种考虑几何相似性的聚类多边形多级图划分算法。

3.(1)从四个方面（distance, connectivity, size and shape.）对多边形间几何和关系（拓扑和方向性质）相似性进行度量；(2)改进了多级图划分算法，用于发现不重叠多边形的最优聚类模式。

连通性（connectivity）

（1）若X中除了空集和X本身外，没有别的既开又闭的子集，则称此拓扑空间X是连通的。（2）若E作为X的子空间，E在诱导拓扑下是可连通的，则称拓扑空间X的子集E，是连通的。

能够等价描述E的内涵有下面3点：1）若X不能表示为两个非空不交的开集的并，则拓扑空间X是连通的。2）若当X分成两个非空子集A、B时，并且满足A∪B时，有A交B的闭包非空，或B交A的闭包非空，则称拓扑空间X是连通的。3）若X中既开又闭的子集只有X与空集，则称，拓扑空间X是连通的

**4.**图示将多边形之间的相似性看作边的权重，将多边形看作节点。对多级图划分进行改进，使图在多个层次上变粗，每个聚类集群的相似度达到最大，并使得得到的簇间相似度达到最小。

（the multilevel graph-partition is improved to coarsen the graph at multiple levels to make maximum similarities within each cluster, and to refine the obtained clusters to reach minimum similarities between clusters. ）

**5.**聚类过程由四个相似性度量的权重以及满足总相似性的阈值来控制。

**6.**作者对方法优势进行概括：改进的相似性度量对于处理任意形状的多边形更加通用，而不局限于点和建筑物；改进的算法对于发现先验信息较少的多边形集群更加通用和优化。

背景

1. Voronoi图

Voronoi图特点

1、每个多边形内仅含有一个中心点；

2、每个多边形区域内的点到相应中心点的距离最近；

3、位于多边形边上的点到其两边的中心的距离相等。

Voronoi图做法

1、构建三角型网；

2、找到每个三角形的外接圆圆心；

3、连接外接圆圆心。

维诺图（Voronoi Diagram）分析与实现

2. V图（粗线划分）将平面划分为一组无缝的、不重叠的区域。

可以基于 Voronoi 图构造一个相邻图，每个多边形是一个节点，两个相邻的多边形由一条边连接起来。

The Delaunay triangulations between two neighboring polygons help to capture the distance, connectivity and shape similarities between two polygons by the bridge region, which is deﬁned as all triangles between the two polygons (Section 3.4).

（相邻图不能捕获两个多边形之间的相似性，因此相邻多边形之间的关系需要通过三角形来推导。 Delaunay 三角形将最大化所有三角形中的最小角度。By the bridge region的相邻多边形之间的 Delaunay 三角剖分有助于获取两个多边形之间的distance, connectivity and shape相似性）
在这里插入图片描述
3.

To avoid generating narrow triangles and making the similarity more precise, some regular sampled nodes are inserted for each edge of a polygon. The gap between two neighboring nodes is proportional to the minimum length of polygons’ edges (Figure 2).

（为了避免产生狭窄的三角形，使相似性更加精确，在多边形的每条边上插入规则采样节点。两个相邻节点之间的间隔与多边形边的最小长度成正比）

在这里插入图片描述

提出算法

This study aims at partitioning a graph into sub-graphs, such that the similarities among the nodes in each sub-graph will be smaller than a user predefined threshold, while the similarities between the nodes in different sub-graphs will be larger than the threshold. Therefore, the final partition depends on the similarity among nodes and the specified threshold. Multilevel graph partitioning can make the original big graph into smaller graphs, termed coarsening the graph, by collapsing a group of nodes into a coarse node and reconstructing the edges among the coarse nodes. Iterating the partition process, the original graph will be coarsened to multiple coarse graphs, among which the links are recorded. Therefore, the multilevel graphs will be generated during multilevel partition (i.e. the phase of coarsening). The nodes at different levels in the coarser graphs represent clusters of polygons, thus multilevel graph partition can cluster polygons at different levels. Due to the links between the multilevel graphs, the coarser or smaller graphs can be ncoarsened to the most detailed graph, i.e. the original graph (Karypis and Kumar 1998). During the uncoarsening phase, the nodes in different clusters can be exchanged to make the partitions optimized. That is, the similarities between clusters reach minimum. Figure 3 shows the various phases of the multilevel graph partition. Figure 4 illus-trates the flowchart of the proposed method consisting of four steps which are summarized below, and will be described in detail in Section 3.2.

（该研究将图划分为子图，使每个子图中的节点之间的相似性小于用户预定义的阈值，不同子图中的节点之间的相似性大于阈值。最终的分区取决于节点之间的相似性和指定的阈值。多级图划分通过将一组节点折叠为一个粗节点并重构粗节点之间的边，将原始的大图划分为更小的图，称为粗化图。通过迭代划分过程，将原图粗化为多个粗图，并记录其中的链接。因此，在多级划分阶段(即粗化阶段)生成多级图。粗图中不同层次的节点表示多边形的集群分类，因此多层次的图划分可以在不同层次上对多边形进行聚类。由于多层图之间的联系，粗糙或较小的图可以不粗化到最详细的图，即原图(Karypis 和 Kumar1998)。在粗化阶段，可以交换不同簇中的节点，使分区得到优化。也就是说，集群之间的相似性达到最小。图 3 显示了多级图划分的各个阶段。图 4.illus-triates 由四个步骤组成的建议方法的流程图。）
在这里插入图片描述

Framework

In the first step, a neighboring graph of original polygons is constructed, with nodes representing the polygons and the edges linking two polygons if their Voronoi regions share common boundaries. To perform graph partition or polygonal clustering, the similarities between two linked polygons need to be computed.*

（构造一个原始多边形邻接图，其中节点代表多边形，如果两个多边形的 Voronoi 区域有共同边界，则边连接两个多边形。为了执行图划分或多边形聚类，需要计算两个连接多边形之间的相似性。）

In the second step, the neighboring graph constructed in the first step is split into sub-graphs by the multi-level graph-partition method.

Because the number of clusters is not required, the termination condition is the similarity between polygons. he termination condition is the similarity between polygons.When the similarity between any two
polygons is less than a given threshold, suggesting that no poly-gons belonging to the same cluster can be found, the coarsening phase is finished.

Thirdly, the multilevel algorithm repeatedly conducts a k-way partition of the coarse graph until only k vertices are left. These k coarse nodes serve as the initial k-way partitioning of the original graph.

（通过粗化保留下的k个nodes作为原图的初始k-way划分）

粗化阶段只考虑集群内部的相似性，而第四阶段考虑集群之间的分离。

Clustering Steps

在这里插入图片描述

Constructing Phase
Coarsening Phase(存在另一种解释为“简化阶段”)

This phase mainly colnsists of two tasks: (1) finding the appropriate matched nodes in the graph G^k;(2) collapsing the found nodes into a coarse node, coarsening the graph G^k to G^k+1, and recalculating the weights of the edges in G^k+1 from the ones in G^k.

（(1)找到适合的匹配点 (2)将找到的点折叠为粗化点，将图G^k 粗化为 G^k+1，在G^k的基础上重新计算 G^k+1的边权重）

Two adjacent nodes with large similarity are considered as matched nodes. A coarser graph G^k+1is produced by collapsing two matched nodes in a finer graph G^k to a coarse node (i.e. multinode) in G^k+1.

在这里插入图片描述
The strategy of finding matchings is to make the sum of the weights of matchings as large as possible because a large weight means that two nodes belong to the same cluster.

finds a matching in which any weighted edge is larger than the user predefined threshold even if the matching is not a maximal one. This revised method overcomes the draw-back of Metis, and can find clusters varying in size. This study uses HEM to find matchings.

the weight between multinodes $v^{k+1}_ {t}$ and $v^{k+1}_ {t'}$ is deﬁned as $w^{k+1}_{t,t'}=\frac {w^k_{i,i'}+w^k_{i,j'}+w^k_{j,j'}+w^k_{j,i'}}{m'}$ ,where $m′ $ is the number of the nodes with non-zerov alues among the four edges ${w^k_{i,i'},w^k_{i,j'},w^k_{j,j'},w^k_{j,i'}}$

(a)The edges $w^k_{i,j}$ are visited randomly, and the distance, connectivity, size and shape similarity between nodes $v^k_i$ and $v^k_j$ as well as the spatial similarity are calculated.
(b)The weights $W^k$ of edges are sorted in descending order, and all nodes are assumed to be
unmatched at the beginning.
©The edges are visited in descending order. Let $\phi$ be a user-deﬁned threshold of similarity between two polygons, if $w^k_{i,j}\geqφ$ ,then $v^k_i$ is matched with $v^k_j$ and $M^k=M^k\bigcup{w^k_{i,j}}$ .
(d)Step © is repeated until no matched nodes can be found.
(e) For any $w^k_{ij}\in{M^k}$ ,nodes $v^k_i$ and $v^k_j$ are collapsed into a multinode $v^{k+1}_t $, and unmatched nodes( $w^k_{i,j}<\phi$ ) are directly copied over to $G^{k+1}$ .The coarser graph $G^{k+1}=(V^{k+1},W^{k+1})$ is rebuilt.
(f) Step (e) is repeated until at a level $c$ no matching can be found in $G^c$ , and a sequence of graphs $\mid{G^0}\mid,\mid{G^1}\mid,...,\mid{G^c}\mid$ are produced.

(

(a)随机访问边界 $w^k_{i,j}$ ，计算节点 $v^k_i$ 和 $v^k_j$ 之间的距离、连通度、大小和形状相似度以及空间相似度.

(b)边的权值 $W^k$ 按降序排序，并假定所有节点在开始时都不匹配。

©边缘按降序访问。设用户定义的两个多边形之间的相似性阈值为 $\phi$ ,若 $w^k_{i,j}\geqφ$ ,则 $v^k_i$ 和 $v^k_j$ 匹配且 $M^k=M^k\bigcup{w^k_{i,j}}$

(e)对于任意的 $w^k_{ij}\in{M^k}$ ，节点 $v^k_i$ 和 $v^k_j$ 被折叠成一个多节点 $v^{k+1}t $,那些不匹配的节点(即 $ w^k{i,j}<\phi$ )直接复制到 $G^{k+1}$ 。重新构造粗图 $G^{k+1}=(V^{k+1},W^{k+1})$ 。

)

Partitioning Phase
Refining Phase

The coarsening phase only considers the similarity within clusters, not the one between clus-ters, thus the produced coarsened graphs are not the best. What’s more, as mentioned above, the resulting clusters produced in the coarsening and partitioning phases are locally optimal, which can be improved in the refining phase.

(粗化阶段只考虑集群内部的相似性，而不考虑簇间的相似性，因此生成的粗图不是最好的。在粗化和分区阶段产生的簇是局部最优的，这可以在精炼阶段得到改进。)

在这里插入图片描述
The information gain $gain(v^0_i)b_d $ of moving node $v^0_i$ from cluster $P^0_d$ to cluster $P^0_b$ measures the similarity between a node and a cluster.

In Equation (1), $ED[v^0_i,b]$ refers to the similarity of node $v^0_i$ belonging to cluster $P^0_b$ , $MinDist(v^0_i,V^0_j)$ the minimum path length between nodes $v^0_i\in{P^0_d}$ and $v^0_j\in{P^0_b}$ , $\mid{P^0_b}\mid$ the number of nodes in cluster $P^0_b$ , and $ID[v^0_i]$ the similarity of node $v^0_i$ belonging to cluster $P^0_d$ .
在这里插入图片描述
When the sizes of polygons are much less than the distances between polygons at a given scale, the distance factor should be the main control condition, and others can be ignored in spatial cognition.

Similarity Criteria

the distance is related to the average length of the triangle edges between two disjoint polygons. Moreover, the distance similarity is inversely proportional to the average length, it is deﬁned as Equation

where $l^{x,y}_i$ is the length of the triangle’s edges connecting polygons $x$ and $y$ (Figure 7a), and $n$
refers to the number of triangles connecting polygons $x$ and $y$ . $d i s t (x, y)$ measures the distance
similarity between polygons $x$ and $y$ .

在这里插入图片描述
2. The connectivity measures the boundary compatibility between two polygons, i.e. the length of the adjacent boundaries of two polygons.

The connectivity between two polygons is deﬁned as the length of the skeleton line of a bridge region

在这里插入图片描述
where $L e n (S k e l e t o n (x, y))$ refers to the length of the skeleton line of polygons $x$ and $y$ , and $S k e l e t o n (x, y)$ is the line connecting all middle points of all triangle edges between polygons $x$ and $y$ (Figure 9).

在这里插入图片描述
3. The size similarity between two polygons can be measured by the ratio of the area of a smaller polygon to that of a bigger polygon

在这里插入图片描述

The shape similarity is deﬁned to be the ratio of the sum area of two polygons and their bridge regions (the regions with doted outlines in Figure 8) to the area of the convex hull (the polygon with bold outlines in Figure 8).

在这里插入图片描述

*where $t^{x,y}_i$ is a triangle connecting polygons $x$ and $y$ ( $\sum^{i=1}_n Area(t^{x,y}_i)\quad$ is the area of the polygon with dotted lines in Figure 8c), $A r e a (x)$ is the area of polygon $x$ , $C o n v e x (x, y)$ is the convex hull of polygons $x$ and $y$ (the polygon with bold boundaries in Figure 8c), $n$ is the number of triangles connecting polygons $x$ and $y$ , and $s h p (x, y)$ is the shape similarity between polygons $x$ and $y$ .

A min-max normalization is applied to map the four similarities into the range [0, 1] (Equation 6):

在这里插入图片描述

where $m i n (P)$ and $m a x (P)$ refer to the minimum and maximum values of similarities of all data. Min-max normalization preserves the relationship among the original data values.The spatial similarity is deﬁned as the weighted average of the four similarities (Equation 7) and there is no order in applying the four criteria:

在这里插入图片描述

实验与分析

Evaluation of Clustering Quality

For a polygon $v_i$ in a cluster, it’s silhouette $s (i)$ is deﬁned as follows

在这里插入图片描述

where $s (i)$ ranges from −1 to 1, $a (i)$ measures the compactness of the cluster containing $v_i$ , and
$b (i)$ captures the degree to which $v_i$ is separated from the other clusters.

Thinking

The work can be extended in many different directions. Some important challenges are listed below:

The distance and connectivity between polygons, or the size and shape of polygons are
used only at the constructing phase but not recomputed at the coarsening and reﬁning
phases. The presented algorithm needs to be extended by considering the information
mentioned above.
More factors impacting the visual judgment of polygonal clusters should be considered in
some speciﬁc ﬁelds, such as the direction of a group of buildings, or the actual shape of
polygons.
Non-spatial attributes can be incorporated into the algorithm to produce more reliable
results.