目录
More Divide-and-Conquer
In General Divide-and-Conquer:
T
(
n
)
=
q
T
(
n
/
p
)
+
O
(
n
)
T(n)=qT(n/p)+O(n)
T(n)=qT(n/p)+O(n) 其中
q
q
q 是 # of calls for problem,
p
p
p 是每一次 size reduce factor。
由 recursion tree 可知,# of levels=
log
p
n
\log_p n
logpn
因此,Total time
T
(
n
)
≤
∑
k
=
0
log
p
n
q
k
⋅
c
⋅
n
p
k
=
c
n
∑
k
=
0
log
p
n
(
q
p
)
k
T(n) \leq \sum_{k=0}^{\log_p n} q^k \cdot c \cdot \frac{n}{p^k} = cn \sum_{k=0}^{\log_p n} \Big(\frac{q}{p}\Big)^k
T(n)≤k=0∑logpnqk⋅c⋅pkn=cnk=0∑logpn(pq)k 这里,
q
k
q^k
qk 是 # of subproblem at level
k
k
k,
c
n
p
k
c\frac{n}{p^k}
cpkn 是 time spend at level
k
k
k.
Case 1:
\quad
q
=
1
,
p
=
2
q=1, p=2
q=1,p=2 (binary search)
T
(
n
)
≤
c
n
∑
k
=
1
∞
1
2
k
≤
2
c
n
=
O
(
n
)
T(n)\leq cn \sum_{k=1}^{\infty} \frac{1}{2^k} \leq 2cn = O(n)
T(n)≤cnk=1∑∞2k1≤2cn=O(n)
\quad
In General: if
q
<
p
q<p
q<p, then
T
(
n
)
=
O
(
n
)
T(n) = O(n)
T(n)=O(n)**
Case 2:
\quad
q
=
2
,
p
=
2
q=2, p=2
q=2,p=2 (merge sort)
T
(
n
)
≤
c
n
∑
k
=
0
log
2
n
1
k
=
O
(
n
log
n
)
T(n) \leq cn \sum_{k=0}^{\log_2 n} 1^k = O(n \log n)
T(n)≤cnk=0∑log2n1k=O(nlogn)
\quad
In General: if
p
=
q
p=q
p=q, then
T
(
n
)
=
O
(
n
log
n
)
T(n) = O(n \log n)
T(n)=O(nlogn)
Case 3:
\quad
q
>
p
q>p
q>p (multiplication)
T
(
n
)
≤
c
n
∑
k
=
0
log
p
n
(
q
p
)
k
≤
q
/
p
q
/
p
−
1
⋅
c
n
⋅
(
q
p
)
log
p
n
=
q
/
p
q
/
p
−
1
⋅
c
⋅
q
log
p
n
=
O
(
q
log
p
n
)
=
O
(
p
log
p
q
⋅
log
p
n
)
=
O
(
n
log
p
q
)
T(n) \leq cn \sum_{k=0}^{\log_p n} \Big(\frac{q}{p}\Big)^k \leq \frac{q/p}{q/p-1}\cdot cn \cdot \Big( \frac{q}{p} \Big)^{\log_p n} = \frac{q/p}{q/p-1}\cdot c \cdot q^{\log_p n} = O(q^{\log_p n})=O(p^{\log_p q \cdot \log_p n})=O(n^{\log_p q})
T(n)≤cnk=0∑logpn(pq)k≤q/p−1q/p⋅cn⋅(pq)logpn=q/p−1q/p⋅c⋅qlogpn=O(qlogpn)=O(plogpq⋅logpn)=O(nlogpq)
The Problem - Finding the Median
Suppose we are given a set of
n
n
n numbers
S
=
{
a
1
,
a
2
,
.
.
.
,
a
n
}
S=\{a_1, a_2, ..., a_n\}
S={a1,a2,...,an}. The median is the number that would be in the middle position if we were to sort them. 但是如果
n
n
n 是偶数,就没有middle position了。
定义:The median of
S
=
{
a
1
,
a
2
,
.
.
.
,
a
n
}
S=\{a_1,a_2,...,a_n\}
S={a1,a2,...,an} is equal to the
k
t
h
k^{th}
kth largest element in
S
S
S.
- n n n 是奇数: k = ( n + 1 ) / 2 k=(n+1)/2 k=(n+1)/2
- n n n 是偶数: k = n / 2 k=n/2 k=n/2
如果先 sort the numbers,需要
O
(
n
log
n
)
O(n \log n)
O(nlogn)
这里我们展示如何用一个基于 divide-and-conquer 的 randomized approach 用
O
(
n
)
O(n)
O(n) 得到median。
Design the Algorithm
基于 Splitters 的简单算法
我们先不考虑 median-finding,而考虑 selection 问题:Given a set of
n
n
n numbers
S
S
S and a number
k
k
k between
1
1
1 and
n
n
n, consider the function
S
e
l
e
c
t
(
S
,
k
)
Select(S, k)
Select(S,k) that returns the
k
t
h
k^{th}
kth largest element in
S
S
S.
目的:
S
e
l
e
c
t
(
S
,
k
)
Select(S, k)
Select(S,k) runs in expected time
O
(
n
)
O(n)
O(n).
Algorithm Structure:
- Choose an element a i ∈ S a_i \in S ai∈S as the splitter.
- Form sets
- S − = { a j : a j < a i } S^- = \{a_j : a_j<a_i\} S−={aj:aj<ai}
- S + = { a j : a j > a i } S^+=\{a_j: a_j>a_i\} S+={aj:aj>ai}
- We can then determine which of S − S^− S− or S + S^+ S+ contains the k t h k^{th} kth largest element, and iterate only on this one.
S
e
l
e
c
t
(
S
,
k
)
Select(S, k)
Select(S,k)
\quad
Choose a splitter
a
i
∈
S
a_i \in S
ai∈S
\quad
For each element
a
j
a_j
aj of
S
S
S
\quad
\quad
Put
a
j
a_j
aj in
S
−
S^−
S− if
a
j
<
a
i
a_j<a_i
aj<ai
\quad
\quad
Put
a
j
a_j
aj in
S
+
S^+
S+ if
a
j
>
a
i
a_j>a_i
aj>ai
\quad
EndFor
\quad
If
∣
S
−
∣
=
k
−
1
|S^−|=k−1
∣S−∣=k−1 then
\quad
\quad
The splitter
a
i
a_i
ai was in fact the desired answer
\quad
Else If
∣
S
−
∣
≥
k
|S^−|≥k
∣S−∣≥k then
\quad
\quad
The
k
t
h
k^{th}
kth largest element lies in
S
−
S^−
S−
\quad
\quad
Recursively call
S
e
l
e
c
t
(
S
−
,
k
)
Select(S^−, k)
Select(S−,k)
\quad
Else suppose
∣
S
−
∣
=
l
<
k
−
1
|S^−|=l<k−1
∣S−∣=l<k−1
\quad
\quad
The
k
t
h
k^{th}
kth largest element lies in
S
+
S^+
S+
\quad
\quad
Recursively call
S
e
l
e
c
t
(
S
+
,
k
−
1
−
l
)
Select(S^+, k−1−l)
Select(S+,k−1−l)
\quad
EndIf
Also, observe that if ∣ S ∣ = 1 |S| = 1 ∣S∣=1, then we must have k = 1 k = 1 k=1, and indeed the single element in S S S will be returned by the algorithm.
定理 (13.17): Regardless of how the splitter is chosen, the algorithm above returns the k t h k^{th} kth largest element of S S S.
Choosing a Good Splitter
Essentially, it’s important that the splitter significantly reduce the size of the set being considered, so that we don’t keep making passes through large sets of numbers many times. So a good choice of splitter should produce sets S − S^− S− and S + S^+ S+ that are approximately equal in size.
如果 medians 是 splitter,那么 T ( n ) ≤ T ( n / 2 ) + c n = O ( n ) T(n)\leq T(n/2) +cn = O(n) T(n)≤T(n/2)+cn=O(n)。然而我们就是要找 median。但是,我们可以证明任意一个 well-centered element 都可以成为一个 good splitter。
Well-Centered Splitter : Choose a splitter a i a_i ai such that there were at least ε n ε n εn both larger and smaller than a i a_i ai, for any fixed constant ε > 0 ε > 0 ε>0.
这样 the size of the sets in the recursive call would shrink by a factor of at least
(
1
−
ε
)
(1−ε)
(1−ε) each time,即有
T
(
n
)
≤
T
(
(
1
−
ε
)
n
)
+
c
n
T(n)\leq T\big((1-ε)n\big) + cn
T(n)≤T((1−ε)n)+cn If we unroll the recurrence for any
ε
>
0
ε>0
ε>0, we get
T
(
n
)
≤
c
n
+
(
1
−
ε
)
c
n
+
(
1
−
ε
)
2
c
n
+
.
.
.
=
[
1
+
(
1
−
ε
)
+
(
1
−
ε
)
2
+
.
.
.
.
]
⋅
c
n
≤
1
ε
c
n
T(n) \leq cn + (1-ε)cn + (1-ε)^2 cn + ... = \Big[1+(1-ε)+(1-ε)^2+....\Big] \cdot cn \leq \frac{1}{ε} cn
T(n)≤cn+(1−ε)cn+(1−ε)2cn+...=[1+(1−ε)+(1−ε)2+....]⋅cn≤ε1cn
Analyzing the Algorithm
定义:algorithm is in phase j j j when the size of the set under consideration is at most n ( 3 4 ) j n \big(\frac{3}{4} \big)^j n(43)j but greater than n ( 3 4 ) j + 1 n \big(\frac{3}{4} \big)^{j+1} n(43)j+1.
In a given iteration of the algorithm, we say that an element of the set under consideration is central if
- at least a quarter of the elements are smaller than it
- at least a quarter of the elements are larger than it.
# of subproblems
q
=
1
q=1
q=1
max size of a subproblem
3
n
4
=
n
p
\frac{3n}{4} = \frac{n}{p}
43n=pn, thus
p
=
4
3
p=\frac{4}{3}
p=34
∴
T
(
n
)
≤
T
(
3
n
/
4
)
+
O
(
n
)
\therefore T(n) \leq T(3n/4)+O(n)
∴T(n)≤T(3n/4)+O(n), 其中
3
n
/
4
3n/4
3n/4 是 max size recursive call,
O
(
n
)
O(n)
O(n) 是 expected time till recursive call。
在一次迭代中,the Probability that our random choice of splitter produces a central element is 1 2 \frac{1}{2} 21.
更新算法:
Q
u
i
c
k
F
i
n
d
(
S
,
k
)
QuickFind(S, k)
QuickFind(S,k)
\quad
While
i
∉
[
n
/
4
,
3
n
/
4
]
i \notin [n/4, 3n/4]
i∈/[n/4,3n/4]
\quad
\quad
Select
x
∈
S
x \in S
x∈S randomly
\quad
\quad
S
−
=
{
y
:
y
<
x
}
,
S
+
=
{
y
:
y
>
x
}
S^-=\{y: y<x\}, S^+ =\{y:y>x\}
S−={y:y<x},S+={y:y>x}
\quad
\quad
i
=
∣
S
−
∣
+
1
i=|S^-| + 1
i=∣S−∣+1
\quad
\quad
[
x
x
x's position]
\quad
EndWhile
\quad
If
i
=
k
i=k
i=k
\quad
Return
x
x
x
\quad
If
i
>
k
i>k
i>k
\quad
Q
u
i
c
k
F
i
n
d
(
S
−
,
k
)
QuickFind(S^-, k)
QuickFind(S−,k)
\quad
If
i
<
k
i<k
i<k
\quad
Q
u
i
c
k
F
i
n
d
(
S
+
,
k
−
i
)
QuickFind(S^+, k-i)
QuickFind(S+,k−i)
Running Time: O ( n ) ⋅ O(n) \cdot O(n)⋅ # iterations on while loop
Claim: The expected number of iterations before a central element is found is 2 2 2; and so the expected number of iterations spent in phase j j j, for any j j j, is at most 2 2 2.
证明:
Let X X X be be a random variable equal to the number of repeats until i ∈ [ n / 4 , 3 n / 4 ] i \in [n/4, 3n/4] i∈[n/4,3n/4].
∵ X \because X ∵X is nonnegative R.V.
∴ E [ X ] = ∑ i = 0 ∞ P ( X > i ) \therefore E[X]=\sum_{i=0}^\infty P(X>i) ∴E[X]=∑i=0∞P(X>i)
E [ X ] = 1 + P ( X > 1 ) + P ( X > 2 ) + . . . E[X]=1+P(X>1)+P(X>2)+... E[X]=1+P(X>1)+P(X>2)+... = 1 + ( 1 − p ) + ( 1 − p ) 2 + . . . =1+(1-p)+(1-p)^2+... =1+(1−p)+(1−p)2+... = 1 1 − ( 1 − p ) = 2 =\frac{1}{1-(1-p)} = 2 \quad \quad \quad \quad\; \; =1−(1−p)1=2
定理 (13.18): The expected running time of S e l e c t ( n , k ) Select(n, k) Select(n,k) is O ( n ) O(n) O(n)