Coursera-Algorithm Day1 Note

最新推荐文章于 2020-07-30 11:44:42 发布

AndrewZ_98

最新推荐文章于 2020-07-30 11:44:42 发布

阅读量368

点赞数

分类专栏： Coursera-Algorithm 文章标签： algorithm

本文链接：https://blog.csdn.net/AndrewZ_98/article/details/105620240

版权

Coursera-Algorithm 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Union-Find Problem

Dynamic Connectivity Problem

Description

Given a set of N objects, we can excute two method on each pair of objects:

Union command: connect two objects.
Find/connected query: is there a path connecting the two objects?

illustration of union and connected operation

We can use the combination of these two methods to see is there a certain path between two objects. Many real world applicaton problems can be simplified into this problem, like finding friend relationship in a social network.

In real world application, the number of objects N can be large, as well as the number of operations M that we need to excute to solve the problem, the queries and union commands may be intermixed. The propose of this algorithm is to design a efficient data structure and operation methods for union-find.

Properities of connection

We assume "connected " is an equivalence relation:

Reflexive: p is connected to p.
Symmetric: if p is connected to q, then q is connected to p.
Transitive: if p is connected to q and q is connected to r, then p is connected to r.

Connected components is the maximal set of objects that are mutually connected.

connected components

Quick Find Solution

In this solution, the data structure we use is a integer array of length N, denoted as id[], the index of array represents different objects, if p and q are connected, they will have the same id value.

data structure1
The quick -find solution uses a eager approach, for find command, it checks if p and q have the same id; for union command, it merges components containing p and q, by changing all entries whose id equals id[p] to id[q].

public class QuickFindUF{
 	private int[] id;
 	public QuickFindUF(int N){
 		id = new int[N];
 		for (int i = 0; i < N; i++)
 			id[i] = i;
 	}
	 public boolean connected(int p, int q){ 
	 	return id[p] == id[q]; 
	 }
 	public void union(int p, int q){
 		int pid = id[p];
 		int qid = id[q];
 		for (int i = 0; i < id.length; i++)
 			if (id[i] == pid) id[i] = qid;
 	}
}

the problem of this method is that the union operation is too expensive. It takes N^2 array accesses to process a sequence of N union commands on N objects.

algorithm	initialize	union	find
quick-find	N	N	1

Quick Union Solution

To improve the performance, the quick-union method use another data structure: the basic structure is still an integer array id[] in length N, but the id[i] value is no longer the group it belong to, instead, it represent the parent of i, in other word, it use a tree structure and the root of i is id[id[id[…id[i]…]]].

在这里插入图片描述

Find:Check if p and q have the same root.
Union:To merge components containing p and q, set the id of p’s root to the id of q’s root.

public class QuickUnionUF{
 	private int[] id;
 	public QuickUnionUF(int N){
 		id = new int[N];
 		for (int i = 0; i < N; i++) id[i] = i;
 	}
 	private int root(int i){
 		while (i != id[i]) i = id[i];
 		return i;
 	}
 	public boolean connected(int p, int q){
 		return root(p) == root(q);
 	}
 	public void union(int p, int q){
 		int i = root(p);
 		int j = root(q);
 		id[i] = j;
 	}
}

But unfortunately, the quick-union method can still be expensive when the tree get pretty tall.

algorithm	initialize	union	find
quick-union	N	N	N

Quick-find defect.

Union too expensive (N array accesses).
Trees are flat, but too expensive to keep them flat.

Quick-union defect.

Trees can get tall.
Find too expensive (could be N array accesses).

Solution Impovements

Improvement 1: weighting

Weighted quick-union:

Modify quick-union to avoid tall trees.
Keep track of size of each tree (number of objects).
Balance by linking root of smaller tree to root of larger tree.

The data structure it uses is same as quick-union, but maintain extra array sz[i] to count number of objects in the tree rooted at i.

Find: Identical to quick-union.

return root(p) == root(q);

Modify quick-union to:

Link root of smaller tree to root of larger tree.
Update the sz[] array.

 int i = root(p);
 int j = root(q);
 if (i == j) return;
 if (sz[i] < sz[j]) { 
 	id[i] = j; sz[j] += sz[i]; 
 	}
 else { 
 	id[j] = i; sz[i] += sz[j]; 
 	}

Running time.

Find: takes time proportional to depth of p and q.
Union: takes constant time, given roots.

The depth of x Increases by 1 when tree T1 containing x is merged into another tree T2. The size of the tree containing x at least doubles since | T 2 | ≥ | T 1 |. Size of tree containing x can double at most lg N times

algorithm	initialize	union	find
weighted-QU	N	lg N	lg N

Improvement 2: path compression

What quick union with path compression will do is that after computing the root of p, it will set the id of each examined node to point to that root.

There are two way to achieve this:

Two-pass implementation: add second loop to root() to set the id[] of each examined node to the root.
Simpler one-pass variant: Make every other node in path point to its grandparent (thereby halving path length).

private int root(int i){
 	while (i != id[i]){
 		id[i] = id[id[i]];
 		i = id[i];
 	}
 	return i;
}

Weighted quick-union with path compression

Starting from an empty data structure, any sequence of M union-find ops on N objects makes ≤ c ( N + M lg* N ) array accesses. Analysis can be improved to N + M α(M, N). Cost within constant factor of reading in the data. In theory, WQUPC is not quite linear. In practice, WQUPC is linear.

在这里插入图片描述

Union-find applications

Percolation.
Games (Go, Hex).
Dynamic connectivity.
Least common ancestor.
Equivalence of finite state automata.
Hoshen-Kopelman algorithm in physics.
Hinley-Milner polymorphic type inference.
Kruskal’s minimum spanning tree algorithm.
Compiling equivalence statements in Fortran.
Morphological attribute openings and closings.
Matlab’s bwlabel() function in image processing.

Percolation

A model for many physical systems:

N-by-N grid of sites.
Each site is open with probability p (or blocked with probability 1 – p).
System percolates iff top and bottom are connected by open sites.

在这里插入图片描述
Likelihood of percolation depends on site vacancy probability p. When N is large, theory guarantees a sharp threshold p*, p > p*: almost certainly percolates, p < p*: almost certainly does not percolate.

在这里插入图片描述

Monte Carlo simulation can be used to estimate the p* value

Initialize N-by-N whole grid to be blocked.
Declare random sites open until top connected to bottom.
Vacancy percentage estimates p*.

To check whether an N-by-N system percolates, we can turn it into a problem that can be solve by quick-union method.

Create an object for each site and name them 0 to N 2 – 1.
Sites are in same component if connected by open sites.
Percolates if any site on bottom row is connected to site on top row.

在这里插入图片描述

To avoid a N^2 accesses of array, we add a virtual top and bottom and see if virtual top site is connected to virtual bottom site.

在这里插入图片描述

AndrewZ_98

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Coursera-Algorithm Day1 Note

Union-Find ProblemDynamic Connectivity ProblemDescriptionProperities of connectionQuick Find SolutionQuick Union SolutionSolution ImpovementsImprovement 1: weightingImprovement 2: path compressionWeig...
复制链接

扫一扫