Lazy Unions
The Union-Find Data Structure
FIND: Given
x∈X
x
∈
X
, return na,e of x’s group.
UNION: Given x & y, merge groups containing them.
Previous solution.(for Kruskal’s MST algorithm)
−
−
Each points directly to the “leader” of its grou[.
−
−
O(1) FIND [just return x’s leader]
O(nlog(n)) total work for n UNIONS[when 2 groups merge,
smaller group inherits leader of large one]
Lazy Unions
New idea: Update only one pointer each merge.
In array representation:
(Where
A[i]↔
A
[
i
]
↔
name of
i′s
i
′
s
parent.
How to Merge
In general: When two groups merge in a UNION, make one group’s leader
[root of the tree] a child of the other one.
Pro: UNION reduces to 2 FINDS[r1 = FIND(x), r2 = FIND(y)] and
O(1)
O
(
1
)
extra work [link r1, r2 together]
Con: To recover leader of an object, need to follow a path of parent pointers[not just one]
⇒
⇒
Not clear if FIND still takes
O(1)
O
(
1
)
time.
Union-Find (Union by Rank)
The lazy Union Implementation
New implementation:
Each object
x∈X
x
∈
X
has a parent field.
Invariant: Parent pointers induce a collection of directed trees on X.
(x is a root
⇔
⇔
parent[x] = x)
Initially: For all x, parent[x] = x;
FIND(x): Traverse parent pointers from x until you hit the root.
UNION(x,y):
s1
s
1
= FIND(x);
s2=FIND(y)
s
2
=
F
I
N
D
(
y
)
; Reset parent of one of
s1,s2
s
1
,
s
2
to be the other.
Union by rank
Ranks: For each
x∈X
x
∈
X
, maintain field rank[x].
[In general rank[x] = 1+ (max rank of x’s children)]
Invarant (for now): For all
x∈X
x
∈
X
, rank[x] = maximum number of hops from some leaf to x.
[Initially, rank[x] = 0 for all
x∈X
x
∈
X
]
To avoid scraggly trees.Given x & y:
−
−
= FIND(x),
s2
s
2
= FIND(y)
−
−
If rank[] > rank[
s2
s
2
] then set parent[
s2
s
2
] to
s1
s
1
else set parent[
s1
s
1
] to
s2
s
2
.
Properties of Ranks
Recall: Lazy Unions.
Invariant (for now): rank[x] = max # of hops from a leaf to x.
[Note
maxxrank[x]≈
m
a
x
x
r
a
n
k
[
x
]
≈
worst-case running time of FIND].
Union by Rank: Make old root with smaller rank child of the root with larger rank.
[Choose new root arbitrarily in case of a tie, and add 1 to its rank.]
Immediate from Invariant/Rank Maintenance:
(1) For all object x, rank[x] only goes up over time
(2) Only ranks of roots can go up.
[once x a non-root, rank[x] frozen forevermore]
(3) Ranks strictly increase along a path to the root.
Rank Lemma
Rank Lemma: Consider an arbitrary sequence of UNION(+ FIND)
operations. For every
r∈{0,1,2,...}
r
∈
{
0
,
1
,
2
,
.
.
.
}
, there are at most
n2r
n
2
r
objects with rank
r
r
.
Corollary(推论): Max rank always
Corollary(推论): Worst-case running time of FIND, UNION is O(log n).
Proof of Rank Lemma:
Claim 1: If x, y have the same rank
r
r
, then their subtrees are disjoint.
Claim 2: The subtree of a rank-r object has size .
[Note Claim 1 + Claim 2 imply the Rank Lemma].
Path Compression
Idea: Why bother traversing a leaf-root path multiple times?
Path compression: After FIND(x), install shortcuts(i,e, revise pointers)
to x’s root all along the x
→
→
root path.
Con: Constant-factor overhead to FIND
Pro: Speeds up subsequent FINDs.
On Ranks
Important: Maintain all rank fields EXACTLY as without path compression.
−
−
Rank initially all 0.
In UNION, new root = old root with bigger rank.
−
−
When mergeing two nodes of common rank , reset new root’s rank to
r+1
r
+
1
.
Bad news, Now rank[x] is only an upper bound on the maximum number of hops on a path from a leaf to x.
Good news: Rank Lemma still holds(
≤n2r
≤
n
2
r
objects with rank r)
Also: Still always have rank[parent[x]] > rank[x] for all non-roots x.
Hopcroft-Ullman Theorem
Theorem: With Union by Rank and path compression, m Union + Find operation takes
O(mlog∗n)
O
(
m
l
o
g
∗
n
)
time, where
log∗n=
l
o
g
∗
n
=
the number of times you need to apply log to n before the result it
≤1
≤
1
.