【CS 61b study notes 4】Asymptotics&DisjointSets

Asymptotics

Intuitive Runtime Characterizations

Technique1 : Measure execution time in seconds using a client program

Technique2 : Counting possible operations

  • 2-1 : Count possible operations for an array of size N = 10000

    • Pro : Machine independent. Input dependence captured in model
    • Con : Array size was arbitrary. Does not tell you actual time
  • 2-2 : Count possible operations in terms of input array size N

    • Pro : Machine independent. Input dependence captured in model. Tell us how algorithm scales
    • Con : Does not tell you actual time.
//dup1 : compare everthing
for (int i = 0 , i < A.length; i += 1){
	for (int j = i+1 ; j < A>length; j+=1){
		if (A[i]==A[j]{
			return true;
		}
	}
}
return false;
operation <dup1>2-1 Count ,N=100002-2 Symbolic count
i = 011
j = i+11 to 100001 to N
less than (<)2 to 500150012 to ( N 2 + 3 N + 2 ) 2 \frac{(N^2+3N+2)}{2} 2(N2+3N+2)
increment (+=1)0 to 500050000 to N 2 + N 2 \frac{N^2+N}{2} 2N2+N
equals(==)1 to 499950001 to N 2 − N 2 \frac{N^2-N}{2} 2N2N
array accesses2 to 999900002 to N 2 − N N^2-N N2N
// dup2 : compare only neighbors
for (int i = 0; i < A.length -1 ; i += 1){
	if(A[i]==A[i+1]){
		return true;
	}
}
return false;
operation <dup2>2-1 Count ,N=100002-2 Symbolic count
i = 011
less than (<)0 to 100000 to N
increment (+=1)0 to 99990 to N-1
equals(==)1 to 99991 to N-1
array accesses2 to 199982 to 2N-2

If we want to choose the better algorithm , we need to consider :
- Fewer operations to do the same work
- Algorithm scales better in the worst case

Worst Case Order of Growth

  • Intuitive Simplification 1 : Consider Only the Worst Case
  • Intuitive Simplification 2 : Restric Attention to One Operation ( Pick sime representative operation to act as a proxy for the overall runtime)
  • Intuitive Simplification 3 : Eliminate low order terms
  • Intuitive Simplification 4 : Eliminate multiplicative constants

Big Theta

Suppose we have a function R(N) with order of growth f(N)
In “Big Theta” notation we write this as R ( N ) ∈ Θ ( f ( N ) ) R(N) \in \Theta(f(N)) R(N)Θ(f(N)) , eg : N 3 + N 4 ∈ Θ ( N 4 ) N^3 + N^4 \in \Theta (N^4) N3+N4Θ(N4)
R ( N ) ∈ Θ ( f ( N ) ) R(N) \in \Theta(f(N)) R(N)Θ(f(N)) means there exist positive cosntants k1 and k2 such that:
k 1 × f ( N ) ≤ R ( N ) ≤ k 2 × f ( N ) k_1 \times f(N) \leq R(N) \leq k_2 \times f(N) k1×f(N)R(N)k2×f(N)
for all values N greater than some N 0 N_0 N0

Using Big-Theta does not change anything about runtime analysis . The only difference is that we use the Θ \Theta Θ symbol anywhere we would have said “order of growth”.

Big O Notation

Whereas Big Theta can informally be thought of as something like “equals”, Big O can be thought of as “less than or equal”.

R ( N ) ∈ O ( f ( N ) ) R(N) \in O(f(N)) R(N)O(f(N)) means there exist positive cosntants k1 and k2 such that:
R ( N ) ≤ k 2 × f ( N ) R(N) \leq k_2 \times f(N) R(N)k2×f(N)
for all values N greater than some N 0 N_0 N0 , i.e. very large N.

Big Omega

While Big Theta can be informally thought of as runtime equality and Big O represents " less than or equal", Big Omega can be thought as the " greater than or equal " .

  • All of the following statements are true:

    • N 3 + N 4 ∈ Θ ( N 4 ) N^3+N^4 \in \Theta(N^4) N3+N4Θ(N4)
    • N 3 + N 4 ∈ Ω ( N 4 ) N^3+N^4 \in \Omega(N^4) N3+N4Ω(N4)
    • N 3 + N 4 ∈ Ω ( N 3 ) N^3+N^4 \in \Omega(N^3) N3+N4Ω(N3)
    • N 3 + N 4 ∈ Ω ( 1 ) N^3+N^4 \in \Omega(1) N3+N4Ω(1)
  • Common uses for Big Omega:

    • It is used to prove Big Theta runtime .
      • If R ( N ) = O ( f ( N ) ) R(N) = O(f(N)) R(N)=O(f(N)) and R ( N ) = Ω ( N ) R(N)=\Omega(N) R(N)=Ω(N) then R ( N ) = Θ ( N ) R(N) = \Theta(N) R(N)=Θ(N)
    • It is used to prove the difficulty of a problem. eg : Any duplicate-finding algorithm must be Ω ( N ) \Omega(N) Ω(N) , because the algorithm must at least look at each element.

Big Theta vs. Big O vs. Big Omega

notationInformal meaningFamilyFamilymember
Big Theta Θ ( f ( N ) ) \Theta(f(N)) Θ(f(N))Order of growth is f(N) Θ ( N 2 ) \Theta(N^2) Θ(N2) N 2 / 2 , 2 N 2 , N 2 + 38 N + N N^2/2 , 2N^2, N^2+38N+N N2/2,2N2,N2+38N+N
Big O O ( f ( N ) ) O(f(N)) O(f(N))Order of growth is less than or equal to f(N) O ( N 2 ) O(N^2) O(N2) N 2 / 2 , 2 N 2 , l g ( N ) N^2/2 , 2N^2, lg(N) N2/2,2N2,lg(N)
Big Omega Ω ( f ( N ) ) \Omega(f(N)) Ω(f(N))Order of growth is greater than or equal to f(N) Ω ( N 2 ) \Omega(N^2) Ω(N2) N 2 / 2 , 2 N 2 , 5 N N^2/2 , 2N^2, 5^N N2/2,2N2,5N

Amortized Analysis

  • A more rigorous examination of amortized analysis is in three steps :
    1. Pick a cost model (like in regular runtime analysis)
    2. Compute the average cost od the ith operation
    3. Show that this average(amortized) cost is bounded by a constant

Disjoint Sets

Problem abstraction

  • Basic Problem : Deriving the Disjoint Sets data structure for solving the "Dynamic Connectivity Problem

    • How a data structrue design can evolve from basic to sophisticated
    • How our choice of underlying abstraction can affect asympotic runtime (using the formal Big-Theta notation) and code complexity
  • Two operations of the Disjoint Sets data structure

    • connect(x, y) : Connects x and y
    • isConnected(x, y) : Returns true if x and y are connected. Connection can be transitive , i.e. they do not need to be direct.
  • Two assumptions to keep things simple

    • Force all items to be integers instead of arbitrary data ( which means we discuss Disjoint Sets on Integers) eg : ListOfSetsDS List<Set<Integer>>
      • For instance , if we have N = 6 elements and nothing has bee connected yet , our list of sets looks like [{0},{1},{2},{3},{4},{5},{6}]. Then, isConnected(5,6) requires iterating through N-1 sets to find 5 , then N sets to find 6 . It is the worst case ,and the overall runtime is Θ ( N ) \Theta(N) Θ(N).
    • Declare the number of items in advance , everything is disconnected at start.

Design an efficient DisjointSets implementation.

  • Number of elements N can be huge
  • Number of method calls M can be huge
  • Calls to methods maybe interspersed ( can not assume the connect operations followed by only isConnected operations)

The Disjoint Set Interface

public interface DisjointSets{
	/** connects two items P and Q*/
	void connect(int p, int q);
	/** check to see if two items are connected*/
	boolean isConnected(int p, int q);
}

Naive Approach v.s. Connected Components

  • Naive Approach

    • Connecting two things : Record every single connecting line in some data structure
    • Checking Connectedness : Do some sort of iteration over the lines to see if one thing can be reached from the other.
  • Connected Components

    • For each item, its connected component is the set of all items that are connected to that them. Only record the sets that each items belongs to
    • Model connectedness in terms of sets
      • How things are connected is not something we need to know
      • Only need to keep track of which connected component each item belongs to
    • {0, 1, 2, 4}, {3, 5}, {6}

Quick Find

Challenge : Pick Data Structure to support tracking of sets.

  • Let’s consider another approach using a single array of integers.

    • The indices of the array represent the elements of our set.
    • The value at the index is the set number it belongs to.
    • eg : we represent [{0, 1, 2, 4}, {3, 5}, {6}] as int[] : [4 , 4, 4, 5, 4, 5, 6]. The array indices (0,…6) are the elements, the value at id[i] is the set it belongs to .
    • The specific set number does not matter as long as all elements in the same set share the same id. So the int array can be [2, 2, 2, 3, 2, 3, 6]
  • connect(x, y)

    • Now we represent [{0, 1, 2, 4}, {3, 5}, {6}] as int[] : [4 , 4, 4, 5, 4, 5, 6]. So id[2] = 4 and id[3] = 5. After calling connect(2, 3) , all the elements with id = 4 and 5 should have the same id . It becomes [{0, 1, 2, 4, 3, 5}, {6}] and id: [5, 5, 5, 5, 5, 5, 6].
    • Need to iterate through the whole array , so the overall runtime is Θ ( N ) \Theta(N) Θ(N)
  • isConnected(x, y)

    • To check isConnected(x, y) , we simply check if id[x] == id[y] .Notice that this is a constant time operation, so the overall runtime is Θ ( 1 ) \Theta(1) Θ(1) .
public class QuickFindDS implements DisjointSets{
	private int[] id;
	/** Constructor : Theta(N) */
	public QuickFindDS(int N){
		id = new int[N];
		for (int i = 0 ; i < N; i++){
			id[i] = i;
		}
	}

	/** connect : Theta(N) */
	public void connect(int p, int q){
		int pid = id[p];
		int qid = id[q];
		for(int i = 0; i < id.length ; i++){
			if (id[i]==pid){
				id[i] = qid;
			}
		}
	}
	
	/** isConnect : Theta(1) */
	public boolean isConnected(int p, int q){
		return (id[p]==id[q]);
	}
}

Quick Union

Basic Idea :This approach allows us to imagine each of our sets as a tree. Instead of an id , we assign each item the index of its parent. If an item has no parent , then it is a ‘root’ and we assign it a negative value
So we represent [{0, 1, 2, 4}, {3, 5}, {6}] as int[] parent [-1, 0, 1, -1, 0, 3, -1]. At this time we represent the sets using only an array.
For this method , we can define a helper function called find(int item) , which returns the root of the tree item, i.e. find(5) == 3 , and find(2) == 0

  • connect(x, y)

    • To connect two items, we find the set that each item belongs to ( i.e. find the roots of their respective trees) , and make one the child of the other
    • eg : connect(5, 2)
      1. find(5) -> 3
      2. find(2) -> 0
      3. Set find(5)'s value to find(2) , that is parent[3] = 0
      • Now the element 3 points to the element 0 , combining the two trees/sets into one.
  • isConnected(x, y)

    • If two elements are part of the same set, then they will be in the same tree. So for isConnected(x, y) , we simply check if find(x) == find(y)
  • Performance / defect

    • There is a potential performance is issue with QuickUnion : the tree can become very long. In this case , finding the root of an item (find(item)) becomes very expensive .
    • In the worst case ,we have to traverse all the items to get to the root , which is a Θ ( N ) \Theta(N) Θ(N) runtime. Since we have to call find(item) for both connect and isConnected method , so the runtime for both is upper bounded by O ( N ) O(N) O(N).
public class QuickUnionDS implements DisjointSets{
	private int[] parent;
	
	/** constructor : Theta(N)*/
	public QuickUnionDS(int num){
		parent = new int[num];
		for(int i = 0; i < num ; i++){
			parent[i] = -1;
		}
	}
	
	/** hepler funtion -- find O(N), the worst case is Theta(N)*/
	private int find(int p){
		int r = p;
		while(parent[r] >= 0){
			r = parent[r];
		}
		return r;
	}

	/**connect : call the find method so the runtime is O(N)*/
	@Override
	public void connect(int p, int q){
		int i = find(p);
		int j = find(q);
		parent[i] = j;
	}
	
	
	/**isConnected : call the find method so the runtime is O(N) */
	@Override
	public boolean isConnected(int p , int q){
		return find(p) == find(q);
	}
}

Weight Quick Union

Improving on Quick Union relies on a key insight : whenever we call find(int item) , we have to climb the root of a tree. Thus , the shorter the tree the faster it takes.

  • New rule

    • Whenever we call connect , we always link the root of the smaller tree to the larger tree.
    • Need to calculate the maximum height of the tree , the worst case is Θ ( l o g ( N ) \Theta(log(N) Θ(log(N).
  • Maximum height : LogN

    • N is the number of elements in our Disjoint Sets
    • The runtimes of connect and isConnected are bounded by O(logN)
    • Why logN?
      • Imagine any elements x x x in tree T 1 T1 T1 . The depth of x x x increase by 1 only we tree T 1 T1 T1 is placed below another tree T 2 T2 T2.
      • When that happens , the size of the resulting tree will be at least double the size of T 1 T1 T1 because s i z e ( T 2 ) > s i z e ( T 1 ) size(T2) > size(T1) size(T2)>size(T1) .
      • The tree with x x x can double at most l o g 2 N log_{2}N log2N times until we have reached a total of N N N items ( 2 l o g 2 N = N 2^{log_{2}N} = N 2log2N=N) .
      • So we can double up to l o g 2 N log_{2}N log2N times and each time, our tree adds a level → \rightarrow maximum l o g 2 N log_{2}N log2N

Path Compression

  • Performing M operations on a DisjointSet object with N elements :
    • For naive implementation , runtime is O ( M N ) O(MN) O(MN)
    • For the best implementation , runtime is O ( N + M l o g N ) O(N+MlogN) O(N+MlogN)
  • Path compression results in a union/ connected operations that are very close to amortized constant time (amortized constant means constant on average )
    • M operations on N nodes is O ( N + M l g ∗ N ) O(N + Mlg*N) O(N+MlgN)\
    • Clever idea : when we do isConnected(x, y) , tie all nodes seen to the root. (that is put the x , y, and x, y’ parents directly point to the root)
    • A tigter bound: O ( N + M ( α ( N ) ) O(N+M(\alpha(N)) O(N+M(α(N)), where α \alpha α is the inverse Ackermann function
    • The inverse Ackermann function is less than 5 for all pratical inputs

Summary

  • Method Summary
    N is the number of elements in Disjoint Sets
ImplementationConstructorconnectisConnected
ListOfSets Θ ( N ) \Theta(N) Θ(N) O ( N ) O(N) O(N) O ( N ) O(N) O(N)
QuickFInd Θ ( N ) \Theta(N) Θ(N) Θ ( N ) \Theta(N) Θ(N) Θ ( 1 ) \Theta(1) Θ(1)
QuickUnion Θ ( N ) \Theta(N) Θ(N) O ( N ) O(N) O(N) O ( N ) O(N) O(N)
Weighted QuickUnion Θ ( N ) \Theta(N) Θ(N) O ( l o g N ) O(logN) O(logN) O ( l o g ( N ) O(log(N) O(log(N)
WQU with path comprehension Θ ( N ) \Theta(N) Θ(N) O ( α ( N ) ) O(\alpha(N)) O(α(N)) O ( α ( N ) ) O(\alpha(N)) O(α(N))
  • A summary of Our Iterative Design Process

    • Represent sets as connected components ( do not track individual connections)
      • ListOfSetsDS : Store connected components as a List of Sets
      • QuickFindDS : Store connected components as set ids
      • QuickUnionDS : Store conneted components as parent ids
        • WeightedQuickUnionDS : also track the size of each set, and use size to decide on new tree root
          • WeightedQuickUnionWithPathCompressionDS : On calls to connect and isConnected ,set parent id to the root for all items seen.
  • Performance Summary

    • Runtimes are given assuming
      • we have a DisjointSets object of size N
      • We perform M operations , where an operation is defined as either a call to connected or isConnected
Implementationruntime
ListOfSetsDS O ( N M ) O(NM) O(NM)
QuickFIndDS Θ ( N M ) \Theta(NM) Θ(NM)
QuickUnionDS O ( N M ) O(NM) O(NM)
WeightedQuickUnionDS O ( N + M l o g N ) O(N+MlogN) O(N+MlogN)
WeightedQuickUnionWithPathCompressionDS O ( N + M α ( N ) ) O(N+M\alpha(N)) O(N+Mα(N))
  • 10
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值