【Algorithm】 Reservoir Sampling & Shuffle an Array

最新推荐文章于 2021-05-19 14:26:20 发布

Firehotest

最新推荐文章于 2021-05-19 14:26:20 发布

阅读量330

点赞数

分类专栏： Algorithm

本文链接：https://blog.csdn.net/Firehotest/article/details/105447112

版权

Algorithm 专栏收录该内容

22 篇文章 0 订阅

订阅专栏

Reservoir Sampling Example Question:

Randomly choosing k samples from a list of n items, where n is either a very large or unknown number. Typically n is large enough that the list doesn’t fit into main memory.

从最简单的方法开始：

int r[] with size k;

// O(k^2)

while (i != k) {

  srand(time(NULL)); 

  int j = rand() % n

  if (input[n] not exist in r) r[i++] = input[n];

  else continue; 

}

如果优化成O(n)?

通过Reservoir （rehzrvwaar） Sampling原理：

这个例子是从n中间抽1个的例子：

https://www.youtube.com/watch?v=A1iwzSew5QY

言而总之，总而言之，对于每一个新的m th选择，我最后的选择是它的概率是：1/m * (m / m + 1) * (m + 1 / m + 2) + ... + n - 1 / n. 其中n是总样本数。所以最后的概率是：1/n.

那具体到上面的从n当中抽出k的例子，做法就是，先把前k个数复制过来。之后每个新的数 k, k + 1, k + 2... n - 1，generate一个random number rand, rand % (i + 1)的值如果是< k的话，则交换更新里面的k。怎么证明这样的话，每个数能选进去的概率是 k / n呢？

我们从第n个数开始(坐标是n - 1)开始看，它能被选中的几率是：k / n (当次产生的random number)。那从第n - 1个数看，它能被选中的几率是: k / n - 1 (当次产生的random number) * n - 1 / n(没有被下一次的选中为被替代的那个数) = k / n.

那再看看原来被一开始复制过来的k个数，他们能被最终选中的概率是：(k / k + 1) * (k + 1 / k + 2) * ... * (n - 1 / n) 就是第k + 1/ k + 2 ... 第n次的random number选择的不是自己的概率。还是 k / n.

void selectKItems(int stream[], int n, int k)  
{  
    int i; // index for elements in stream[]  
  
    // reservoir[] is the output array. Initialize  
    // it with first k elements from stream[]  
    int reservoir[k];  
    for (i = 0; i < k; i++)  
        reservoir[i] = stream[i];  
  
    // Use a different seed value so that we don't get  
    // same result each time we run this program  
    srand(time(NULL));  
  
    // Iterate from the (k+1)th element to nth element  
    for (; i < n; i++)  
    {  
        // Pick a random index from 0 to i.  
        int j = rand() % (i + 1);  
  
        // If the randomly picked index is smaller than k,  
        // then replace the element present at the index  
        // with new element from stream  
        if (j < k)  
        reservoir[j] = stream[i];  
    }  
  
    cout << "Following are k randomly selected items \n";  
    printArray(reservoir, k);  
}

Leetcode 例题：

398. Random Pick Index

Given an array of integers with possible duplicates, randomly output the index of a given target number. You can assume that the given target number must exist in the array.

Note:
The array size can be very large. Solution that uses too much extra space will not pass the judge.

Example:

int[] nums = new int[] {1,2,3,3,3};
Solution solution = new Solution(nums);

// pick(3) should return either index 2, 3, or 4 randomly. Each index should have equal probability of returning.
solution.pick(3);

// pick(1) should return 0. Since in the array only nums[0] is equal to 1.
solution.pick(1);

class Solution {
public:
    Solution(vector<int>& nums) {
        srand(time(NULL));
        num_size_ = nums.size();
        store_= nums;
    }
    
    int pick(int target) {
        int res = -1;
        int found_num = 0;
        for (int i = 0; i < num_size_; ++i) {
            if (store_.at(i) != target) {
                continue;
            }
            
            ++found_num;
            if (rand() % found_num == 0) {
                res = i;
            }
        }
        return res;
    }
                
private:
    size_t num_size_ = 0;
    vector<int> store_;
};

说明：

第一个数被选中的几率是1，然后它能survive到最后的: 1 * 1/2 * 2/3 * m - 1/m。m是这个数据的个数。

382. Linked List Random Node

Given a singly linked list, return a random node's value from the linked list. Each node must have the same probability of being chosen.

Follow up:
What if the linked list is extremely large and its length is unknown to you? Could you solve this efficiently without using extra space?

Example:

// Init a singly linked list [1,2,3].
ListNode head = new ListNode(1);
head.next = new ListNode(2);
head.next.next = new ListNode(3);
Solution solution = new Solution(head);

// getRandom() should return either 1, 2, or 3 randomly. Each element should have equal probability of returning.
solution.getRandom();

class Solution {
public:
    /** @param head The linked list's head.
        Note that the head is guaranteed to be not null, so it contains at least one node. */
    Solution(ListNode* head) {
        head_ = head;
        srand(time(NULL));
    }
    
    /** Returns a random node's value. */
    int getRandom() {
        int index = 1;
        ListNode* cur = head_;
        int res = cur->val;
        cur = cur->next;
        while (cur != nullptr) {
            ++index;
            if (rand() % index == 0) {
                res = cur->val;
            }
            cur = cur->next;
        }
        return res;
    }
    
private: 
    ListNode* head_ = nullptr;
};

道理跟上面的差不多。

Shuffle a given array using Fisher-Yates shuffle algorithm

Given an array, write a program to generate a random permutation of array elements. This question is also asked as “shuffle a deck of cards” or “randomize a given array”. Here shuffle means that every permutation of array element should equally likely.

简单粗暴的做法：

Let the given array be arr[]. A simple solution is to create an auxiliary array temp[] which is initially a copy of arr[]. Randomly select an element from temp[], copy the randomly selected element to arr[0] and remove the selected element from temp[]. Repeat the same process n times and keep copying elements to arr[1], arr[2], … . The time complexity of this solution will be O(n^2).

那如果O(n)解决呢？

To shuffle an array a of n elements (indices 0..n-1):
  for i from n - 1 downto 1 do
       j = random integer with 0 <= j <= i
       exchange a[j] and a[i]

说明一下：

The probability that ith element (including the last one) goes to last position is 1/n, because we randomly pick an element in first iteration.

The probability that ith element goes to second last position can be proved to be 1/n by dividing it in two cases.
Case 1: i = n-1 (index of last element):
The probability of last element going to second last position is = (probability that last element doesn’t stay at its original position) x (probability that the index picked in previous step is picked again so that the last element is swapped)
So the probability = ((n-1)/n) x (1/(n-1)) = 1/n
Case 2: 0 < i < n-1 (index of non-last):
The probability of ith element going to second position = (probability that ith element is not picked in previous iteration) x (probability that ith element is picked in this iteration)
So the probability = ((n-1)/n) x (1/(n-1)) = 1/n

void randomize (int arr[], int n)  
{  
    // Use a different seed value so that  
    // we don't get same result each time 
    // we run this program  
    srand (time(NULL));  
  
    // Start from the last element and swap  
    // one by one. We don't need to run for  
    // the first element that's why i > 0  
    for (int i = n - 1; i > 0; i--)  
    {  
        // Pick a random index from 0 to i  
        int j = rand() % (i + 1);  
  
        // Swap arr[i] with the element  
        // at random index  
        swap(&arr[i], &arr[j]);  
    }  
}