Randomized Algorithms: Median Finding

最新推荐文章于 2022-03-21 10:33:02 发布

清幽小路

最新推荐文章于 2022-03-21 10:33:02 发布

阅读量326

点赞数

分类专栏：学习笔记文章标签：算法

本文链接：https://blog.csdn.net/weixin_43192983/article/details/108099580

版权

学习笔记专栏收录该内容

20 篇文章 0 订阅

订阅专栏

More Divide-and-Conquer

In General Divide-and-Conquer:
$T (n) = q T (n / p) + O (n)$ 其中 $q$ 是 # of calls for problem, $p$ 是每一次 size reduce factor。
在这里插入图片描述
由 recursion tree 可知，# of levels= $log_p n$
因此，Total time
$\leq \sum_{k=0}^{\log_p n} q^k \cdot c \cdot \frac{n}{p^k} = cn \sum_{k=0}^{\log_p n} \Big(\frac{q}{p}\Big)^k$ 这里， $q^k$ 是 # of subproblem at level $k$ ， $c\frac{n}{p^k}$ 是 time spend at level $k$ .

Case 1:
$\quad$ $q = 1, p = 2$ (binary search)
$T(n)\leq cn \sum_{k=1}^{\infty} \frac{1}{2^k} \leq 2cn = O(n)$ $\quad$ In General: if $q < p$ , then $T (n) = O (n)$ **

Case 2:
$\quad$ $q = 2, p = 2$ (merge sort)
$\leq cn \sum_{k=0}^{\log_2 n} 1^k = O(n \log n)$ $\quad$ In General: if $p = q$ , then $\log n)$

Case 3:
$\quad$ $q > p$ (multiplication)
$\leq cn \sum_{k=0}^{\log_p n} \Big(\frac{q}{p}\Big)^k \leq \frac{q/p}{q/p-1}\cdot cn \cdot \Big( \frac{q}{p} \Big)^{\log_p n} = \frac{q/p}{q/p-1}\cdot c \cdot q^{\log_p n} = O(q^{\log_p n})=O(p^{\log_p q \cdot \log_p n})=O(n^{\log_p q})$

The Problem - Finding the Median

Suppose we are given a set of $n$ numbers $S=\{a_1, a_2, ..., a_n\}$ . The median is the number that would be in the middle position if we were to sort them. 但是如果 $n$ 是偶数，就没有middle position了。
定义：The median of $S=\{a_1,a_2,...,a_n\}$ is equal to the $k^{th}$ largest element in $S$ .

$n$ 是奇数： $k = (n + 1) / 2$
$n$ 是偶数： $k = n / 2$

如果先 sort the numbers，需要 $\log n)$
这里我们展示如何用一个基于 divide-and-conquer 的 randomized approach 用 $O (n)$ 得到median。

Design the Algorithm

基于 Splitters 的简单算法

我们先不考虑 median-finding，而考虑 selection 问题：Given a set of $n$ numbers $S$ and a number $k$ between $1$ and $n$ , consider the function $S e l e c t (S, k)$ that returns the $k^{th}$ largest element in $S$ .
目的： $S e l e c t (S, k)$ runs in expected time $O (n)$ .

Algorithm Structure:

Choose an element $a_i \in S$ as the splitter.
Form sets
- $S^- = \{a_j : a_j<a_i\}$
- $S^+=\{a_j: a_j>a_i\}$
We can then determine which of $S^−$ or $S^+$ contains the $k^{th}$ largest element, and iterate only on this one.

$S e l e c t (S, k)$
$\quad$ Choose a splitter $a_i \in S$
$\quad$ For each element $a_j$ of $S$
$\quad$ $\quad$ Put $a_j$ in $S^−$ if $a_j<a_i$
$\quad$ $\quad$ Put $a_j$ in $S^+$ if $a_j>a_i$
$\quad$ EndFor
$\quad$ If $S^−|=k−1$ then
$\quad$ $\quad$ The splitter $a_i$ was in fact the desired answer
$\quad$ Else If $S^−|≥k$ then
$\quad$ $\quad$ The $k^{th}$ largest element lies in $S^−$
$\quad$ $\quad$ Recursively call $Select(S^−, k)$
$\quad$ Else suppose $S^−|=l<k−1$
$\quad$ $\quad$ The $k^{th}$ largest element lies in $S^+$
$\quad$ $\quad$ Recursively call $Select(S^+, k−1−l)$
$\quad$ EndIf

Also, observe that if $∣ S ∣ = 1$ , then we must have $k = 1$ , and indeed the single element in $S$ will be returned by the algorithm.

定理 (13.17): Regardless of how the splitter is chosen, the algorithm above returns the $k^{th}$ largest element of $S$ .

Choosing a Good Splitter

Essentially, it’s important that the splitter significantly reduce the size of the set being considered, so that we don’t keep making passes through large sets of numbers many times. So a good choice of splitter should produce sets $S^−$ and $S^+$ that are approximately equal in size.

如果 medians 是 splitter，那么 $T(n)\leq T(n/2) +cn = O(n)$ 。然而我们就是要找 median。但是，我们可以证明任意一个 well-centered element 都可以成为一个 good splitter。

Well-Centered Splitter : Choose a splitter $a_i$ such that there were at least $ε n$ both larger and smaller than $a_i$ , for any fixed constant $ε > 0$ .

这样 the size of the sets in the recursive call would shrink by a factor of at least $(1 - ε)$ each time，即有
$T(n)\leq T\big((1-ε)n\big) + cn$ If we unroll the recurrence for any $ε > 0$ , we get
$\leq cn + (1-ε)cn + (1-ε)^2 cn + ... = \Big[1+(1-ε)+(1-ε)^2+....\Big] \cdot cn \leq \frac{1}{ε} cn$

Analyzing the Algorithm

定义：algorithm is in phase $j$ when the size of the set under consideration is at most $\big(\frac{3}{4} \big)^j$ but greater than $\big(\frac{3}{4} \big)^{j+1}$ .

In a given iteration of the algorithm, we say that an element of the set under consideration is central if

at least a quarter of the elements are smaller than it
at least a quarter of the elements are larger than it.

# of subproblems $q = 1$
max size of a subproblem $\frac{3n}{4} = \frac{n}{p}$ , thus $p=\frac{4}{3}$
$\therefore T(n) \leq T(3n/4)+O(n)$ , 其中 $3 n / 4$ 是 max size recursive call, $O (n)$ 是 expected time till recursive call。

在一次迭代中，the Probability that our random choice of splitter produces a central element is $\frac{1}{2}$ .

更新算法：

$Q u i c k F i n d (S, k)$
$\quad$ While $\notin [n/4, 3n/4]$
$\quad$ $\quad$ Select $\in S$ randomly
$\quad$ $\quad$ $S^-=\{y: y<x\}, S^+ =\{y:y>x\}$
$\quad$ $\quad$ $i=|S^-| + 1$ $\quad$ $\quad$ [ $x$ 's position]
$\quad$ EndWhile
$\quad$ If $i = k$ $\quad$ Return $x$
$\quad$ If $i > k$ $\quad$ $QuickFind(S^-, k)$
$\quad$ If $i < k$ $\quad$ $QuickFind(S^+, k-i)$

Running Time: $\cdot$ # iterations on while loop

Claim: The expected number of iterations before a central element is found is $2$ ; and so the expected number of iterations spent in phase $j$ , for any $j$ , is at most $2$ .

证明：
Let $X$ be be a random variable equal to the number of repeats until $\in [n/4, 3n/4]$ .
$\because X$ is nonnegative R.V.
$\therefore E[X]=\sum_{i=0}^\infty P(X>i)$
$E [X] = 1 + P (X > 1) + P (X > 2) + . . .$ $1+(1-p)+(1-p)^2+...$ $=\frac{1}{1-(1-p)} = 2 \quad \quad \quad \quad\; \;$

定理 (13.18): The expected running time of $S e l e c t (n, k)$ is $O (n)$

清幽小路

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Randomized Algorithms: Median Finding

目录More Divide-and-ConquerThe Problem - Finding the MedianDesign the Algorithm基于 Splitters 的简单算法Choosing a Good SplitterAnalyzing the AlgorithmMore Divide-and-ConquerIn General Divide-and-Conquer:T(n)=qT(n/p)+O(n)T(n)=qT(n/p)+O(n)T(n)=qT(n/p)+O(n) 其中 qqq
复制链接

扫一扫