Union-Find Problem
Dynamic Connectivity Problem
Description
Given a set of N objects, we can excute two method on each pair of objects:
- Union command: connect two objects.
- Find/connected query: is there a path connecting the two objects?
We can use the combination of these two methods to see is there a certain path between two objects. Many real world applicaton problems can be simplified into this problem, like finding friend relationship in a social network.
In real world application, the number of objects N can be large, as well as the number of operations M that we need to excute to solve the problem, the queries and union commands may be intermixed. The propose of this algorithm is to design a efficient data structure and operation methods for union-find.
Properities of connection
We assume "connected " is an equivalence relation:
- Reflexive: p is connected to p.
- Symmetric: if p is connected to q, then q is connected to p.
- Transitive: if p is connected to q and q is connected to r, then p is connected to r.
Connected components is the maximal set of objects that are mutually connected.
Quick Find Solution
In this solution, the data structure we use is a integer array of length N, denoted as id[], the index of array represents different objects, if p and q are connected, they will have the same id value.
The quick -find solution uses a eager approach, for find command, it checks if p and q have the same id; for union command, it merges components containing p and q, by changing all entries whose id equals id[p] to id[q].
public class QuickFindUF{
private int[] id;
public QuickFindUF(int N){
id = new int[N];
for (int i = 0; i < N; i++)
id[i] = i;
}
public boolean connected(int p, int q){
return id[p] == id[q];
}
public void union(int p, int q){
int pid = id[p];
int qid = id[q];
for (int i = 0; i < id.length; i++)
if (id[i] == pid) id[i] = qid;
}
}
the problem of this method is that the union operation is too expensive. It takes N^2 array accesses to process a sequence of N union commands on N objects.
algorithm | initialize | union | find |
---|---|---|---|
quick-find | N | N | 1 |
Quick Union Solution
To improve the performance, the quick-union method use another data structure: the basic structure is still an integer array id[] in length N, but the id[i] value is no longer the group it belong to, instead, it represent the parent of i, in other word, it use a tree structure and the root of i is id[id[id[…id[i]…]]].
- Find:Check if p and q have the same root.
- Union:To merge components containing p and q, set the id of p’s root to the id of q’s root.
public class QuickUnionUF{
private int[] id;
public QuickUnionUF(int N){
id = new int[N];
for (int i = 0; i < N; i++) id[i] = i;
}
private int root(int i){
while (i != id[i]) i = id[i];
return i;
}
public boolean connected(int p, int q){
return root(p) == root(q);
}
public void union(int p, int q){
int i = root(p);
int j = root(q);
id[i] = j;
}
}
But unfortunately, the quick-union method can still be expensive when the tree get pretty tall.
algorithm | initialize | union | find |
---|---|---|---|
quick-union | N | N | N |
Quick-find defect.
- Union too expensive (N array accesses).
- Trees are flat, but too expensive to keep them flat.
Quick-union defect.
- Trees can get tall.
- Find too expensive (could be N array accesses).
Solution Impovements
Improvement 1: weighting
Weighted quick-union:
- Modify quick-union to avoid tall trees.
- Keep track of size of each tree (number of objects).
- Balance by linking root of smaller tree to root of larger tree.
The data structure it uses is same as quick-union, but maintain extra array sz[i] to count number of objects in the tree rooted at i.
Find: Identical to quick-union.
return root(p) == root(q);
Modify quick-union to:
- Link root of smaller tree to root of larger tree.
- Update the sz[] array.
int i = root(p);
int j = root(q);
if (i == j) return;
if (sz[i] < sz[j]) {
id[i] = j; sz[j] += sz[i];
}
else {
id[j] = i; sz[i] += sz[j];
}
Running time.
- Find: takes time proportional to depth of p and q.
- Union: takes constant time, given roots.
The depth of x Increases by 1 when tree T1 containing x is merged into another tree T2. The size of the tree containing x at least doubles since | T 2 | ≥ | T 1 |. Size of tree containing x can double at most lg N times
algorithm | initialize | union | find |
---|---|---|---|
weighted-QU | N | lg N | lg N |
Improvement 2: path compression
What quick union with path compression will do is that after computing the root of p, it will set the id of each examined node to point to that root.
There are two way to achieve this:
- Two-pass implementation: add second loop to root() to set the id[] of each examined node to the root.
- Simpler one-pass variant: Make every other node in path point to its grandparent (thereby halving path length).
private int root(int i){
while (i != id[i]){
id[i] = id[id[i]];
i = id[i];
}
return i;
}
Weighted quick-union with path compression
Starting from an empty data structure, any sequence of M union-find ops on N objects makes ≤ c ( N + M lg* N ) array accesses. Analysis can be improved to N + M α(M, N). Cost within constant factor of reading in the data. In theory, WQUPC is not quite linear. In practice, WQUPC is linear.
Union-find applications
- Percolation.
- Games (Go, Hex).
- Dynamic connectivity.
- Least common ancestor.
- Equivalence of finite state automata.
- Hoshen-Kopelman algorithm in physics.
- Hinley-Milner polymorphic type inference.
- Kruskal’s minimum spanning tree algorithm.
- Compiling equivalence statements in Fortran.
- Morphological attribute openings and closings.
- Matlab’s bwlabel() function in image processing.
Percolation
A model for many physical systems:
- N-by-N grid of sites.
- Each site is open with probability p (or blocked with probability 1 – p).
- System percolates iff top and bottom are connected by open sites.
Likelihood of percolation depends on site vacancy probability p. When N is large, theory guarantees a sharp threshold p*, p > p*: almost certainly percolates, p < p*: almost certainly does not percolate.
Monte Carlo simulation can be used to estimate the p* value
- Initialize N-by-N whole grid to be blocked.
- Declare random sites open until top connected to bottom.
- Vacancy percentage estimates p*.
To check whether an N-by-N system percolates, we can turn it into a problem that can be solve by quick-union method.
- Create an object for each site and name them 0 to N 2 – 1.
- Sites are in same component if connected by open sites.
- Percolates if any site on bottom row is connected to site on top row.
To avoid a N^2 accesses of array, we add a virtual top and bottom and see if virtual top site is connected to virtual bottom site.