Complexity analysis
Complexity analysis
analyses an algorithm
- establish its correctness
- determining the amount of resources it needs
in this lecture focus on time complexity
- space complexity - Amount of memory the algorithm needs
- time complexity - Amount of time the algorithm needs
measure of performance
when determining the performance of an algorithm, it’s important to have a theory that is independent on implementation details, such as computer used…
basic operations of an algorithm
-
an operation that is fundamental for the performance of a particular algorithm
-
the number of basic operations provides an appropriate measure of the work done by the algorithm
-
The performance of the algorithm is in principle proportional(成正比) to the number of basic operations
what is a basic operation?
-
corresponds to some particular operation
- comparing two elements - in sorting
- traversing an edge - graph traversal
-
can be a whole loop
complexity as a function of the input size
- input size also is important
- The complexity of an algorithm is usually described as a function f(n) of the input size n
Examples
- Binary search
f
(
n
)
=
log
2
n
f(n) = \log_2n
f(n)=log2n
- ppt
- Merge sort
f
(
n
)
=
n
log
2
n
f(n)=n\log_2n
f(n)=nlog2n
- ppt
Average and worse case analysis
-
the reason why the complexity are presented as functions of the input size
- for it is valid for different input sizes
- the input size has a influence on the amount of work required by the algorithm
-
some algorithms will perform different amount of work for different inputs even if the input size is the same
-> a need for average and worst case complexity analysis
Definitions
worst-case complexity W(n): the maximum number of basic operations performed for any input for size n
average-case complexity A(n): the average number of basic operations performed by the algorithm, where each input I occurs with some probability p(I)
Quicksort
- worst n 2 n^2 n2 and average-case n log n n\log n nlogn complexity differs significantly
Asymptotic growth rate
relative growth rate of functions
- For sufficiently large values of n, the growing functions can be well approximated by its leading term
imprecision in analysis
- the number of steps(machine operations) is roughly proportional to the number of basic operations
- algorithms whose complexity (leading term) only differs by a constant(系数) have the same complexity
Algorithms of different complexity
- different constant may make an algorithm faster in the beginning
- however, for large values 50 n 2 50n^2 50n2 will perform better than n 3 n^3 n3
O(f), Ω(f), and Θ(f)
https://blog.csdn.net/so_geili/article/details/53353593
graphical illustration of them
Example: O( n 3 n^3 n3 ), Ω( n 3 n^3 n3 ), and Θ( n 3 n^3 n3 )
Example - Θ notations for some algorithms
-
merge sort Θ(nlogn)
- also in O(nlogn)
-
quick sort worst-case Θ( n 2 n^2 n2); average-case Θ(nlogn)
-
binary search worst-case Θ(nlogn)
Complexity of problems
Complexity of problems
the worst-case complexity of the best possible algorithm that can be used to solve the problem
Example: the complexity of the sorting problem is Θ(nlogn)
there exists sorting algorithms with a worst case performance that is Θ(nlogn)
How to show the complexity of problems?
- The complexity of a problem can often be establish by theorem proving
- W A ( n ) = F ( n ) W_A (n) = F(n) WA(n)=F(n), 最优算法的最坏情况=理论证明需要至少几步
Complexity classes
a set of problems that have related resource-based complexity
Example: all problems that can be solved using a polynomial bound algorithm belongs to the same complexity class (complexity class P)
P and NP
P: the class of problems that can be solved by polynomial bound algorithms
NP
- contains problems that we believe are more difficult to solve than those in P
- cannot be solved using any polynomial bound algorithm; however can know true/false
Decision problems
a question that has two possible answers: yes and no
- concerns some input
- a mapping from all possible inputs to {yes,no}
- many are representations of complex optimization problems
- are used to define P and NP
Example: Knapsack problem
Optimization problem: Find the largest value of any subset of objects so that the total weight of the selected items does not exceed the weight capacity of the knapsack
Decision problem: Given a value k, is there a subset of the items that fits in the knapsack and has a value of at least k?
Polynomially bounded
the algorithm’s worst-case complexity is bounded by a polynomial function of its input size
- terminate after at most p(n) steps
The complexity class P
- consists of all decision problems that are polynomially bounded
- the problems in P are considered “easy” even if some are not solvable in practice, like n 1000 n^{1000} n1000
- one reason why P are considered “easy” is that there are (commonly occurring) problems that are much more difficult than any problem in P, like 2 n 2^n 2n
The complexity class NP
NP: nondeterministic polynomial-time
the class of decision problems where a given solution for a given input can be checked quickly (in polynomial time) to see if it is a valid solution
- not a polynomial-time way to find a solution
- it “only” takes polynomial time to verify that it is correct
- can “prove” in polynomial time that any ‘yes’ instance is correct
the set of decision problems for which there exists a polynomially bounded non-deterministic algorithm(or machine)
a non-deterministic algorithm(or machine)
- always makes the right choices
- does not have to search in the solution space in the way that a deterministic algorithm needs to do in order to find solutions
P=NP?
NP complete
the subset of the problems in NP that contains the hardest problems to solve
-
any other problem in NP can be polynomially reduced (transformed/converted) into
to it -
a routine to solve an NP complete problem can be used as a sub routine to solve any other problem in NP
- if there is a polynomially bound function for any NP complete problem, then there has to be a polynomially bound function for all problems in NP
Reducing a problem into another problem
How to show that a problem is NP complete
Amortized analysis
- a technique for analyzing the running time of repeated operations on a data structure
- is used when calculating a worst-case average bound per operation for any sequence of operations
- not about the cost of a specific operation in the sequence
- few slow operations and many fast operations
- generates an upper bound (average per operation) that is valid for any sequence of operations(数据顺序) on a data structure
- In amortized analysis, it is important to identify a function that well describes the cost for different operations, and how these influence the data structure, so that the amortized bound can be shown
Amortized or worst-case analysis?
- worst-case analysis might give a very pessimistic bound, a single operation might have a cost that is (much) worse than the amortized bound
- Amortized analysis provides a worst-case bound on the average time per operation, not the cost of individual
operations - if all operations have a low cost(所有操作需要高效), it’s appropriate to use worst-case analysis
Amortized vs average-case analysis
-
both concerns estimating the average cost over sequences of operations
-
differences
amortized
- an upper bound on the average cost for per operation
- on the running time of any sequence of operations
- need no information about the probability of inputs
average-case
- an average cost per iteration, and consider the probability of the occurrence of different inputs
- use probabilistic assumptions on the input(sequence of operations) in order to calculate the expected running time per operation
- if no information on the probability of input, can’t use it
Three approaches to amortized analysis
Aggregate analysis
- simplest
- to compute an upper bound on the cost for any sequence of operations, and then divide with the number of operations in the sequence
- 例子见ppt