Reservoir Sampling (a random element from indefinite length stream with same probability)

Reservoir Sampling
Reservoir sampling is a family of randomized algorithms for randomly choosing k samples from a list of n items, where n is either a very large or unknown number. Typically n is large enough that the list doesn’t fit into main memory. For example, a list of search queries in Google and Facebook.

So we are given a big array (or stream) of numbers (to simplify), and we need to write an efficient function to randomly select k numbers where 1 <= k <= n. Let the input array be stream[].

A simple solution is to create an array reservoir[] of maximum size k. One by one randomly select an item from stream[0..n-1]. If the selected item is not previously selected, then put it in reservoir[]. To check if an item is previously selected or not, we need to search the item in reservoir[]. The time complexity of this algorithm will be O(k^2). This can be costly if k is big. Also, this is not efficient if the input is in the form of a stream.

It can be solved in O(n) time. The solution also suits well for input in the form of stream. The idea is similar to this post. Following are the steps.

1) Create an array reservoir[0..k-1] and copy first k items of stream[] to it.
2) Now one by one consider all items from (k+1)th item to nth item.
…a) Generate a random number from 0 to i where i is index of current item in stream[]. Let the generated random number is j.
…b) If j is in range 0 to k-1, replace reservoir[j] with arr[i]

Init : an array with the size: k

for(int i = k + 1; i  <= N; ++i)

{

    m = random(1, i);

    if(k >= m)

        swap the mth value with the ith value;

}

假设现在从第i+1个选择下一个元素,现证明每一个元素被选中的概率为k/(i + 1)。

对于第i + 1 个元素,它被选择的概率显然为k/(i+1)(从i+1个元素中生成一个随机数,小于等于k则被选择),对于前i个已经被选择的元素,在第i+1次选择后,仍被选择的概率为 (第i+1个元素被选择但是未被替换的概率+ 第i+1个元素未被选择的概率),前者为 k/(i+1) * (k-1/k),后者为( i+1-k)/(i+1),相加得 i/(i+1),即从前i+1个元素中选择,每个元素被选择的概率为k/i * i/(i+1) = k/(i+1).

1个也一样, suppose any thing that we have seen has the probability of 1/i to be chosen. for the element i+1, the probability to be chosen is 1/(i+1), and every one before it, the prob of being chosen and STAY is 1/i * i/(i+1) = 1/(i+1).

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值