1. 回顾
Kruskal算法需要使用并查集,所以记录该篇前需要回顾DSAA之THE DISJOINT SET ADT(一)和DSAA之THE DISJOINT SET ADT(二)。不相交集合的概念和实现都是比较简单。
2. Kruskal’s Algorithm
- A second greedy strategy is continually to select the edges in order of smallest weight and accept an edge if it does not cause a cycle.
我们假定连通为一个等价关系,那么两个点不连通就一定不成环。在这里是指当两个点不存在连通这个关系时,选择这两个点所形成的边。如果这两个点已经连通了,那么添加这条边一定形成环
- Formally, Kruskal’s algorithm maintains a forest – a collection of trees.
- Initially, there are |V| single-node trees. Adding an edge merges two trees into one. When the algorithm terminates, there is only one tree, and this is the minimum spanning tree.
- The algorithm terminates when enough edges are accepted. It turns out to be simple to decide whether edge (u,v) should be accepted or rejected.
- The edges could be sorted to facilitate the selection, but building a heap in linear time is a much better idea. Then delete_mins give the edges to be tested in order. Typically, only a small fraction of the edges needs to be tested before the algorithm can terminate, although it is always possible that all the edges must be tried.
这里还是需要稍微想下,最小生成树的边为V-1,而无向图连通图的|E|>=|V|,所以算法可能遍历所有的边而终止。
刚开始学习不相交子集的时候,真的不知道会用在什么场合,现在图论的最小生成树中就用到了。Kruskal原理虽然不复杂,但如果没有不相交集合的概念,恐怕很难实现。
总结下上面的内容:首先Kruskal每次选择一条权值最小的边加到树中,如果该边的两个节点已经属于同一个树(集合),那么直接舍弃该边。否则,将该边的两个顶点合并(相当于两个子树合并成一个树)。经过
∣
V
∣
−
1
|V|-1
∣V∣−1次的合并(当然不代表delete_min
了这么多次),最终S(不相交集合)只有一个树,该树就是最小生成树。
3. 伪代码实现
DSAA给出了伪代码的实现,如下:
void kruskal( graph G ){
unsigned int edges_accepted;
DISJ_SET S;
PRIORITY_QUEUE H;
vertex u, v;
set_type u_set, v_set;
edge e;
//每个节点都是一个子集,(子集都是树)
initialize( S );
//heap以前记录过有O(n)的构建方式
read_graph_into_heap_array( G, H );
//这里就是O(n),可以查看以前的记录篇,有比较具体的分析
build_heap( H );
//记录当前最小生成树的边数
edges_accepted = 0;
while( edges_accepted < NUM_VERTEX-1 ){
//权重最小的边
//特别的,heap中元素的数目一直在减少
e = delete_min( H ); /* e = (u, v) */
u_set = find( u, S );
v_set = find( v, S );
if( u_set != v_set ){
/* accept the edge */
edges_accepted++;
set_union( S, u_set, v_set );
}
}
}
4. 时间复杂度
The worst-case running time of this algorithm is O ( ∣ E ∣ l o g ∣ E ∣ ) O(|E|log |E|) O(∣E∣log∣E∣), which is dominated by the heap operations. Notice that since ∣ E ∣ = O ( ∣ V ∣ 2 ) |E| = O(|V|^2) ∣E∣=O(∣V∣2), this running time is actually O ( ∣ E ∣ l o g ∣ V ∣ ) O(|E| log |V|) O(∣E∣log∣V∣).
虽然书中没有解释为啥,但是我们可以自己推一下:假设使用的不相交集合的merge
时间复杂度为
O
(
1
)
O(1)
O(1),find
(不考虑路径压缩)时间复杂度为
O
(
l
o
g
n
)
O(logn)
O(logn),delete_min
的时间复杂度为
O
(
1
)
O(1)
O(1)。
kruskal算法的最坏时间复杂度为
O
(
∣
V
∣
+
∣
E
∣
∗
2
l
o
g
∣
V
∣
)
=
O
(
∣
E
∣
l
o
g
∣
V
∣
)
,
∣
E
∣
>
=
∣
V
∣
O(|V|+|E|*2log|V|)=O(|E|log|V|), |E|>=|V|
O(∣V∣+∣E∣∗2log∣V∣)=O(∣E∣log∣V∣), ∣E∣>=∣V∣。笔者在图论上一直使用伪代码,其实比较虚的。之后不会着急记录DFS算法,**将稍微讨论下图的表示实现,及完整的kruskal算法的实现(因为其包含不相交集合和图)。**等到DFS记录完之后,选择几个leetcode题来看下有些特殊的图可以用更加简洁的方式表示。