Coursera-Algorithm Day1 Note

Dynamic Connectivity Problem

Description

Given a set of N objects, we can excute two method on each pair of objects:

  • Union command: connect two objects.
  • Find/connected query: is there a path connecting the two objects?

illustration of union and connected operation

We can use the combination of these two methods to see is there a certain path between two objects. Many real world applicaton problems can be simplified into this problem, like finding friend relationship in a social network.

In real world application, the number of objects N can be large, as well as the number of operations M that we need to excute to solve the problem, the queries and union commands may be intermixed. The propose of this algorithm is to design a efficient data structure and operation methods for union-find.

Properities of connection

We assume "connected " is an equivalence relation:

  • Reflexive: p is connected to p.
  • Symmetric: if p is connected to q, then q is connected to p.
  • Transitive: if p is connected to q and q is connected to r, then p is connected to r.

Connected components is the maximal set of objects that are mutually connected.

connected components

Quick Find Solution

In this solution, the data structure we use is a integer array of length N, denoted as id[], the index of array represents different objects, if p and q are connected, they will have the same id value.

data structure1
The quick -find solution uses a eager approach, for find command, it checks if p and q have the same id; for union command, it merges components containing p and q, by changing all entries whose id equals id[p] to id[q].

public class QuickFindUF{
 	private int[] id;
 	public QuickFindUF(int N){
 		id = new int[N];
 		for (int i = 0; i < N; i++)
 			id[i] = i;
 	}
	 public boolean connected(int p, int q){ 
	 	return id[p] == id[q]; 
	 }
 	public void union(int p, int q){
 		int pid = id[p];
 		int qid = id[q];
 		for (int i = 0; i < id.length; i++)
 			if (id[i] == pid) id[i] = qid;
 	}
}

the problem of this method is that the union operation is too expensive. It takes N^2 array accesses to process a sequence of N union commands on N objects.

algorithminitializeunionfind
quick-findNN1

Quick Union Solution

To improve the performance, the quick-union method use another data structure: the basic structure is still an integer array id[] in length N, but the id[i] value is no longer the group it belong to, instead, it represent the parent of i, in other word, it use a tree structure and the root of i is id[id[id[…id[i]…]]].

在这里插入图片描述

  • Find:Check if p and q have the same root.
  • Union:To merge components containing p and q, set the id of p’s root to the id of q’s root.
public class QuickUnionUF{
 	private int[] id;
 	public QuickUnionUF(int N){
 		id = new int[N];
 		for (int i = 0; i < N; i++) id[i] = i;
 	}
 	private int root(int i){
 		while (i != id[i]) i = id[i];
 		return i;
 	}
 	public boolean connected(int p, int q){
 		return root(p) == root(q);
 	}
 	public void union(int p, int q){
 		int i = root(p);
 		int j = root(q);
 		id[i] = j;
 	}
}

But unfortunately, the quick-union method can still be expensive when the tree get pretty tall.

algorithminitializeunionfind
quick-unionNNN

Quick-find defect.

  • Union too expensive (N array accesses).
  • Trees are flat, but too expensive to keep them flat.

Quick-union defect.

  • Trees can get tall.
  • Find too expensive (could be N array accesses).

Solution Impovements

Improvement 1: weighting

Weighted quick-union:

  • Modify quick-union to avoid tall trees.
  • Keep track of size of each tree (number of objects).
  • Balance by linking root of smaller tree to root of larger tree.

The data structure it uses is same as quick-union, but maintain extra array sz[i] to count number of objects in the tree rooted at i.

Find: Identical to quick-union.

return root(p) == root(q);

Modify quick-union to:

  • Link root of smaller tree to root of larger tree.
  • Update the sz[] array.
 int i = root(p);
 int j = root(q);
 if (i == j) return;
 if (sz[i] < sz[j]) { 
 	id[i] = j; sz[j] += sz[i]; 
 	}
 else { 
 	id[j] = i; sz[i] += sz[j]; 
 	} 

Running time.

  • Find: takes time proportional to depth of p and q.
  • Union: takes constant time, given roots.

The depth of x Increases by 1 when tree T1 containing x is merged into another tree T2. The size of the tree containing x at least doubles since | T 2 | ≥ | T 1 |. Size of tree containing x can double at most lg N times

algorithminitializeunionfind
weighted-QUNlg Nlg N

Improvement 2: path compression

What quick union with path compression will do is that after computing the root of p, it will set the id of each examined node to point to that root.

There are two way to achieve this:

  • Two-pass implementation: add second loop to root() to set the id[] of each examined node to the root.
  • Simpler one-pass variant: Make every other node in path point to its grandparent (thereby halving path length).
private int root(int i){
 	while (i != id[i]){
 		id[i] = id[id[i]];
 		i = id[i];
 	}
 	return i;
}

Weighted quick-union with path compression

Starting from an empty data structure, any sequence of M union-find ops on N objects makes ≤ c ( N + M lg* N ) array accesses. Analysis can be improved to N + M α(M, N). Cost within constant factor of reading in the data. In theory, WQUPC is not quite linear. In practice, WQUPC is linear.

在这里插入图片描述
在这里插入图片描述

Union-find applications

  • Percolation.
  • Games (Go, Hex).
  • Dynamic connectivity.
  • Least common ancestor.
  • Equivalence of finite state automata.
  • Hoshen-Kopelman algorithm in physics.
  • Hinley-Milner polymorphic type inference.
  • Kruskal’s minimum spanning tree algorithm.
  • Compiling equivalence statements in Fortran.
  • Morphological attribute openings and closings.
  • Matlab’s bwlabel() function in image processing.

Percolation

A model for many physical systems:

  • N-by-N grid of sites.
  • Each site is open with probability p (or blocked with probability 1 – p).
  • System percolates iff top and bottom are connected by open sites.

在这里插入图片描述
Likelihood of percolation depends on site vacancy probability p. When N is large, theory guarantees a sharp threshold p*, p > p*: almost certainly percolates, p < p*: almost certainly does not percolate.

在这里插入图片描述

Monte Carlo simulation can be used to estimate the p* value

  • Initialize N-by-N whole grid to be blocked.
  • Declare random sites open until top connected to bottom.
  • Vacancy percentage estimates p*.

To check whether an N-by-N system percolates, we can turn it into a problem that can be solve by quick-union method.

  • Create an object for each site and name them 0 to N 2 – 1.
  • Sites are in same component if connected by open sites.
  • Percolates if any site on bottom row is connected to site on top row.

在这里插入图片描述

To avoid a N^2 accesses of array, we add a virtual top and bottom and see if virtual top site is connected to virtual bottom site.

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值