Quick sort and proof
- Worst O(n2), expected running time O(nlgn), constant factors are small
- In place algr.
- With random sampling, no particular input elicits(evoke/draw out) worst case
- Is a Divide & Conquer algr.
- Divide: Partition O(n)
- Conquer: Sort 2 subarray
- Combine: N/A(in place, no extra step needed)
- As procedure runs, the array is partitioned into 4 regions(loop invariant)(p, i, j, r)
- <= pivot
- > pivot
- not yet partitioned
- pivot itself
- proof loop invariant
- At begin it’s correct(0,0,…)
- each loop maintains it
- if arr[j] > pivot
- if arr[j] <= pivot
- termination all are in the 3 sets
PARTITION(A, p,r)
x←A[r]
i←p−1
forj←ptor−1
do if A[j] ≤ x
then i ← i + 1
exchange A[i] ↔ A[j]
exchangeA[i+1]↔A[r]
returni+1
-
Q1: Modify PARTITION so that q = (p +r)/2 when all elements in the array A[p …r] have the same value.
no good way, just if condition,
if(alwayssame) return mid -
Q2: How would you modify QUICKSORT to sort into nonincreasing order?
A[j] <= x -------> A[j] >= x
Perf analysis
1. Worst-case – unbalanced
T ( n ) = T ( n − 1 ) + T ( 0 ) + Θ ( n ) = T ( n − 1 ) + Θ ( n ) = Θ ( n 2 ) \begin{aligned} T(n) &= T(n-1)+T(0)+\Theta(n) \\ &= T(n-1)+\Theta(n) \\ &= \Theta(n^2) \end{aligned} T(n)=T(n−1)+T(0)+Θ(n)=T(n−1)+Θ(n)=Θ(n2)
- Same with Insertion sort
- Also if sorted, still Θ ( n 2 ) \Theta(n^2) Θ(n2), while insert sort is Θ ( n ) \Theta(n) Θ(n)
- proof this is the worst case running time(use substitution, assume
T
(
n
)
≤
c
n
2
T(n) \le cn^2
T(n)≤cn2)
T ( n ) = m a x 0 ≤ q ≤ n − 1 T ( q ) + T ( n − q − 1 ) + Θ ( n ) ≤ m a x ( c q 2 + c ( n − q − 1 ) 2 + Θ ( n ) ) ≤ c n 2 \begin{aligned} T(n) &= max_{0\le q\le n-1}T(q)+T(n-q-1)+\Theta(n) \\ &\le max (cq^2+c(n-q-1)^2+\Theta(n) )\\ &\le cn^2 \end{aligned} T(n)=max0≤q≤n−1T(q)+T(n−q−1)+Θ(n)≤max(cq2+c(n−q−1)2+Θ(n))≤cn2
2. Best-case – balanced
T
(
n
)
≤
2
T
(
n
/
2
)
+
Θ
(
n
)
=
O
(
n
l
g
n
)
\begin{aligned} T(n) &\le 2T(n/2)+\Theta(n) \\ &= O(nlgn) \end{aligned}
T(n)≤2T(n/2)+Θ(n)=O(nlgn)
(using master theorem)
3. Constant proportionality
suppose 9-1split, intuitively is unbalanced
T
(
n
)
≤
T
(
9
n
/
10
)
+
T
(
n
/
10
)
+
c
n
=
O
(
n
l
g
n
)
\begin{aligned} T(n) &\le T(9n/10)+T(n/10)+cn\\ &= O(nlgn) \end{aligned}
T(n)≤T(9n/10)+T(n/10)+cn=O(nlgn)
- Depth is l o g 10 / 9 n = Θ ( l g n ) log_{10/9}n =\Theta(lgn) log10/9n=Θ(lgn)
- Asymptotically same as best-case
- Any split of constant proportionality yields a recursion tree of depth Θ ( l g n ) \Theta(lgn) Θ(lgn), each label O(n)
Average-case Intuition
Assume the best-case and worst-case splits alternate levels in the tree
- worst-case array – 2 level – [0,(n-1)] – [0,(n-1)/2-1,(n-1)/2]
- Θ ( n ) + Θ ( n − 1 ) = Θ ( n ) \Theta(n)+\Theta(n-1)=\Theta(n) Θ(n)+Θ(n−1)=Θ(n)
- best-case array – 1 level – [(n-1)/2,(n-1)/2]
- Θ ( n ) \Theta(n) Θ(n)
- between good and bad splits, is like the running time for good splits alone: still O(nlgn), but with a slightly larger constant hidden by the O-notation.
Random Sampling
- Because the pivot element is randomly chosen, we expect the split of the input array to be reasonably well balanced on average.
- no change in quick sort
- random pivot in partition
- i ← RANDOM(p,r)
- exchange A[r] ↔ A[i]
Rigorously proof
We derive an O(nlgn) bound on the expected running time. This upper bound combined with the
Θ
(
n
l
g
n
)
\Theta(nlgn)
Θ(nlgn) best-case bound yields a
Θ
(
n
l
g
n
)
\Theta(nlgn)
Θ(nlgn) expected running time.
X
i
j
=
I
z
i
i
s
c
o
m
p
a
r
e
d
t
o
z
j
X
=
∑
∑
X
i
j
E
(
X
)
=
∑
∑
E
(
X
i
j
)
X_{ij} = I{z_i is compared to z_j}\\ X = \sum\sum X_{ij} \\ E(X) = \sum\sum E(X_{ij} )
Xij=IziiscomparedtozjX=∑∑XijE(X)=∑∑E(Xij)
Pr{zi is compared to zj}
= Pr{zi or zj is first pivot chosen from Zij}
= Pr{zi isfirstpivotchosenfromZij} + Pr{zj isfirstpivotchosenfromZij}