Randomized Algorithms: Median Finding

More Divide-and-Conquer

In General Divide-and-Conquer:
T ( n ) = q T ( n / p ) + O ( n ) T(n)=qT(n/p)+O(n) T(n)=qT(n/p)+O(n) 其中 q q q 是 # of calls for problem, p p p 是每一次 size reduce factor。
在这里插入图片描述
由 recursion tree 可知,# of levels= log ⁡ p n \log_p n logpn
因此,Total time
T ( n ) ≤ ∑ k = 0 log ⁡ p n q k ⋅ c ⋅ n p k = c n ∑ k = 0 log ⁡ p n ( q p ) k T(n) \leq \sum_{k=0}^{\log_p n} q^k \cdot c \cdot \frac{n}{p^k} = cn \sum_{k=0}^{\log_p n} \Big(\frac{q}{p}\Big)^k T(n)k=0logpnqkcpkn=cnk=0logpn(pq)k 这里, q k q^k qk 是 # of subproblem at level k k k c n p k c\frac{n}{p^k} cpkn 是 time spend at level k k k.

Case 1:
\quad q = 1 , p = 2 q=1, p=2 q=1,p=2 (binary search)
T ( n ) ≤ c n ∑ k = 1 ∞ 1 2 k ≤ 2 c n = O ( n ) T(n)\leq cn \sum_{k=1}^{\infty} \frac{1}{2^k} \leq 2cn = O(n) T(n)cnk=12k12cn=O(n) \quad In General: if q < p q<p q<p, then T ( n ) = O ( n ) T(n) = O(n) T(n)=O(n)**

Case 2:
\quad q = 2 , p = 2 q=2, p=2 q=2,p=2 (merge sort)
T ( n ) ≤ c n ∑ k = 0 log ⁡ 2 n 1 k = O ( n log ⁡ n ) T(n) \leq cn \sum_{k=0}^{\log_2 n} 1^k = O(n \log n) T(n)cnk=0log2n1k=O(nlogn) \quad In General: if p = q p=q p=q, then T ( n ) = O ( n log ⁡ n ) T(n) = O(n \log n) T(n)=O(nlogn)

Case 3:
\quad q > p q>p q>p (multiplication)
T ( n ) ≤ c n ∑ k = 0 log ⁡ p n ( q p ) k ≤ q / p q / p − 1 ⋅ c n ⋅ ( q p ) log ⁡ p n = q / p q / p − 1 ⋅ c ⋅ q log ⁡ p n = O ( q log ⁡ p n ) = O ( p log ⁡ p q ⋅ log ⁡ p n ) = O ( n log ⁡ p q ) T(n) \leq cn \sum_{k=0}^{\log_p n} \Big(\frac{q}{p}\Big)^k \leq \frac{q/p}{q/p-1}\cdot cn \cdot \Big( \frac{q}{p} \Big)^{\log_p n} = \frac{q/p}{q/p-1}\cdot c \cdot q^{\log_p n} = O(q^{\log_p n})=O(p^{\log_p q \cdot \log_p n})=O(n^{\log_p q}) T(n)cnk=0logpn(pq)kq/p1q/pcn(pq)logpn=q/p1q/pcqlogpn=O(qlogpn)=O(plogpqlogpn)=O(nlogpq)

The Problem - Finding the Median

Suppose we are given a set of n n n numbers S = { a 1 , a 2 , . . . , a n } S=\{a_1, a_2, ..., a_n\} S={a1,a2,...,an}. The median is the number that would be in the middle position if we were to sort them. 但是如果 n n n 是偶数,就没有middle position了。
定义:The median of S = { a 1 , a 2 , . . . , a n } S=\{a_1,a_2,...,a_n\} S={a1,a2,...,an} is equal to the k t h k^{th} kth largest element in S S S.

  • n n n奇数 k = ( n + 1 ) / 2 k=(n+1)/2 k=(n+1)/2
  • n n n偶数 k = n / 2 k=n/2 k=n/2

如果先 sort the numbers,需要 O ( n log ⁡ n ) O(n \log n) O(nlogn)
这里我们展示如何用一个基于 divide-and-conquer 的 randomized approach 用 O ( n ) O(n) O(n) 得到median。

Design the Algorithm

基于 Splitters 的简单算法

我们先不考虑 median-finding,而考虑 selection 问题:Given a set of n n n numbers S S S and a number k k k between 1 1 1 and n n n, consider the function S e l e c t ( S , k ) Select(S, k) Select(S,k) that returns the k t h k^{th} kth largest element in S S S.
目的: S e l e c t ( S , k ) Select(S, k) Select(S,k) runs in expected time O ( n ) O(n) O(n).

Algorithm Structure:

  • Choose an element a i ∈ S a_i \in S aiS as the splitter.
  • Form sets
    • S − = { a j : a j < a i } S^- = \{a_j : a_j<a_i\} S={aj:aj<ai}
    • S + = { a j : a j > a i } S^+=\{a_j: a_j>a_i\} S+={aj:aj>ai}
  • We can then determine which of S − S^− S or S + S^+ S+ contains the k t h k^{th} kth largest element, and iterate only on this one.

S e l e c t ( S , k ) Select(S, k) Select(S,k)
\quad Choose a splitter a i ∈ S a_i \in S aiS
\quad For each element a j a_j aj of S S S
\quad \quad Put a j a_j aj in S − S^− S if a j < a i a_j<a_i aj<ai
\quad \quad Put a j a_j aj in S + S^+ S+ if a j > a i a_j>a_i aj>ai
\quad EndFor
\quad If ∣ S − ∣ = k − 1 |S^−|=k−1 S=k1 then
\quad \quad The splitter a i a_i ai was in fact the desired answer
\quad Else If ∣ S − ∣ ≥ k |S^−|≥k Sk then
\quad \quad The k t h k^{th} kth largest element lies in S − S^− S
\quad \quad Recursively call S e l e c t ( S − , k ) Select(S^−, k) Select(S,k)
\quad Else suppose ∣ S − ∣ = l < k − 1 |S^−|=l<k−1 S=l<k1
\quad \quad The k t h k^{th} kth largest element lies in S + S^+ S+
\quad \quad Recursively call S e l e c t ( S + , k − 1 − l ) Select(S^+, k−1−l) Select(S+,k1l)
\quad EndIf


Also, observe that if ∣ S ∣ = 1 |S| = 1 S=1, then we must have k = 1 k = 1 k=1, and indeed the single element in S S S will be returned by the algorithm.

定理 (13.17): Regardless of how the splitter is chosen, the algorithm above returns the k t h k^{th} kth largest element of S S S.

Choosing a Good Splitter

Essentially, it’s important that the splitter significantly reduce the size of the set being considered, so that we don’t keep making passes through large sets of numbers many times. So a good choice of splitter should produce sets S − S^− S and S + S^+ S+ that are approximately equal in size.

如果 medians 是 splitter,那么 T ( n ) ≤ T ( n / 2 ) + c n = O ( n ) T(n)\leq T(n/2) +cn = O(n) T(n)T(n/2)+cn=O(n)。然而我们就是要找 median。但是,我们可以证明任意一个 well-centered element 都可以成为一个 good splitter。

Well-Centered Splitter : Choose a splitter a i a_i ai such that there were at least ε n ε n εn both larger and smaller than a i a_i ai, for any fixed constant ε > 0 ε > 0 ε>0.

这样 the size of the sets in the recursive call would shrink by a factor of at least ( 1 − ε ) (1−ε) (1ε) each time,即有
T ( n ) ≤ T ( ( 1 − ε ) n ) + c n T(n)\leq T\big((1-ε)n\big) + cn T(n)T((1ε)n)+cn If we unroll the recurrence for any ε > 0 ε>0 ε>0, we get
T ( n ) ≤ c n + ( 1 − ε ) c n + ( 1 − ε ) 2 c n + . . . = [ 1 + ( 1 − ε ) + ( 1 − ε ) 2 + . . . . ] ⋅ c n ≤ 1 ε c n T(n) \leq cn + (1-ε)cn + (1-ε)^2 cn + ... = \Big[1+(1-ε)+(1-ε)^2+....\Big] \cdot cn \leq \frac{1}{ε} cn T(n)cn+(1ε)cn+(1ε)2cn+...=[1+(1ε)+(1ε)2+....]cnε1cn

Analyzing the Algorithm

定义:algorithm is in phase j j j when the size of the set under consideration is at most n ( 3 4 ) j n \big(\frac{3}{4} \big)^j n(43)j but greater than n ( 3 4 ) j + 1 n \big(\frac{3}{4} \big)^{j+1} n(43)j+1.

In a given iteration of the algorithm, we say that an element of the set under consideration is central if

  • at least a quarter of the elements are smaller than it
  • at least a quarter of the elements are larger than it.
    在这里插入图片描述

# of subproblems q = 1 q=1 q=1
max size of a subproblem 3 n 4 = n p \frac{3n}{4} = \frac{n}{p} 43n=pn, thus p = 4 3 p=\frac{4}{3} p=34
∴ T ( n ) ≤ T ( 3 n / 4 ) + O ( n ) \therefore T(n) \leq T(3n/4)+O(n) T(n)T(3n/4)+O(n), 其中 3 n / 4 3n/4 3n/4 是 max size recursive call, O ( n ) O(n) O(n) 是 expected time till recursive call。

在一次迭代中,the Probability that our random choice of splitter produces a central element is 1 2 \frac{1}{2} 21.

更新算法:


Q u i c k F i n d ( S , k ) QuickFind(S, k) QuickFind(S,k)
\quad While i ∉ [ n / 4 , 3 n / 4 ] i \notin [n/4, 3n/4] i/[n/4,3n/4]
\quad \quad Select x ∈ S x \in S xS randomly
\quad \quad S − = { y : y < x } , S + = { y : y > x } S^-=\{y: y<x\}, S^+ =\{y:y>x\} S={y:y<x},S+={y:y>x}
\quad \quad i = ∣ S − ∣ + 1 i=|S^-| + 1 i=S+1 \quad \quad [ x x x's position]
\quad EndWhile
\quad If i = k i=k i=k \quad Return x x x
\quad If i > k i>k i>k \quad Q u i c k F i n d ( S − , k ) QuickFind(S^-, k) QuickFind(S,k)
\quad If i < k i<k i<k \quad Q u i c k F i n d ( S + , k − i ) QuickFind(S^+, k-i) QuickFind(S+,ki)


Running Time: O ( n ) ⋅ O(n) \cdot O(n) # iterations on while loop

Claim: The expected number of iterations before a central element is found is 2 2 2; and so the expected number of iterations spent in phase j j j, for any j j j, is at most 2 2 2.

证明:
Let X X X be be a random variable equal to the number of repeats until i ∈ [ n / 4 , 3 n / 4 ] i \in [n/4, 3n/4] i[n/4,3n/4].
∵ X \because X X is nonnegative R.V.
∴ E [ X ] = ∑ i = 0 ∞ P ( X > i ) \therefore E[X]=\sum_{i=0}^\infty P(X>i) E[X]=i=0P(X>i)
E [ X ] = 1 + P ( X > 1 ) + P ( X > 2 ) + . . . E[X]=1+P(X>1)+P(X>2)+... E[X]=1+P(X>1)+P(X>2)+... = 1 + ( 1 − p ) + ( 1 − p ) 2 + . . . =1+(1-p)+(1-p)^2+... =1+(1p)+(1p)2+... = 1 1 − ( 1 − p ) = 2      =\frac{1}{1-(1-p)} = 2 \quad \quad \quad \quad\; \; =1(1p)1=2

定理 (13.18): The expected running time of S e l e c t ( n , k ) Select(n, k) Select(n,k) is O ( n ) O(n) O(n)

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值