the union find problem
/*
two classic algorithms: Quick find and Quick union
Steps to develop a usable algorithm
- model the problem
- find an algorithm to solve it
- fast enough? fits in memory?
- if not, find why.
- figure out a way to address the problem
- iterate until satisfied
The scientific method
Mathematical analysis
*/
- dynamic connectivity (the model of the problem for union find)
the problem: find if there is a path.(we will do "find the path" later, not today)
- modeling the connections
- reflexive: p-p
- symmetric: if p-q, then q-p
- transitive: if p-q, q-r, then p-r
- connected components: the maximal set of objects that are mutually connected
- find query: check if two objects are in the same connected components
- union command: replace components containing two objects with their union
- quick find (eager approach)
- Data structure:
- Integer array id[] of size N.
- Interpretation: p and q are connected if they have the same id.
- Find: Check if p and q have the same id.
- Union: To merge components containing p and q, change all entries with id[p] to id[q].
- quick find is too slow.
- quadratic time is too slow, can't accept quadratic algorithms for large problems. the reason is they don't scale.
- Data structure:
- quick union (lazy approach)
- Data structure:
- Integer array id[] of size N.
- Interpretation: id[i] is parent of i.
- Root of i is id[id[id[...id[i]...]]].
- Find. Check if p and q have the same root.
- Union. Set the id of q's root to the id of p's root.
- Quick union is also too slow
- Quick-find defect.
- Union too expensive (N steps).
- Trees are flat, but too expensive to keep them flat.
- Quick-union defect.
- Trees can get tall.
- Find too expensive (could be N steps)
- Need to do find to do union
- Quick-find defect.
- What is the maximum number of array accesses during a find operation when using the quick-union data structure on
N
elements?---Linear
- Data structure:
- improvements
- weighting
- Weighted quick-union.
- Modify quick-union to avoid tall trees.
- Keep track of size of each component.
- Balance by linking small tree below large one.
- Java implementation.
- Almost identical to quick-union.
- Maintain extra array sz[] to count number of elements in the tree rooted at i.
- Find. Identical to quick-union.
- Union. Modify quick-union to merge smaller tree into larger tree update the sz[] array.
- Analysis.
- Find: takes time proportional to depth of p and q.
- Union: takes constant time, given roots.
- Fact: depth is at most lg N. [needs proof]
- Weighted quick-union.
- Path compression
- Path compression. Just after computing the root of i,set the id of each examined node to root(i).
- Standard implementation: add second loop to root() to set the id of each examined node to the root.
- Simpler one-pass variant: make every other node in path point to its grandparent.
- WQUPC
- Theorem. Starting from an empty data structure, any sequence of M union and find operations on N objects takes O(N + M lg* N) time.
- Proof is very difficult.
- But the algorithm is still simple!
- Linear algorithm?
- Cost within constant factor of reading in the data.
- In theory, WQUPC is not quite linear.
- In practice, WQUPC is linear.
- Amazing fact: In theory, no linear linking strategy exists
- weighting
- Union-find applications
- Network connectivity.
- Percolation. A model for many physical systems
- N-by-N grid.
- Each square is vacant or occupied.
- Grid percolates if top and bottom are connected by vacant squares.
- Image processing.
- Least common ancestor.
- Equivalence of finite state automata.
- Hinley-Milner polymorphic type inference.
- Kruskal's minimum spanning tree algorithm.
- Games (Go, Hex)
- Compiling equivalence statements in Fortran.