【Algorithm】 Reservoir Sampling & Shuffle an Array

Reservoir Sampling Example Question: 


Randomly choosing samples from a list of n items, where n is either a very large or unknown number. Typically is large enough that the list doesn’t fit into main memory.



int r[] with size k;

// O(k^2)

while (i != k) {


  int j = rand() % n

  if (input[n] not exist in r) r[i++] = input[n];

  else continue; 




通过Reservoir (rehzrvwaar) Sampling原理:





言而总之,总而言之,对于每一个新的m th选择,我最后的选择是它的概率是:1/m * (m / m + 1) * (m + 1 / m + 2) + ... + n - 1 / n. 其中n是总样本数。所以最后的概率是:1/n.  


那具体到上面的从n当中抽出k的例子,做法就是,先把前k个数复制过来。之后每个新的数 k, k + 1, k + 2... n - 1,generate一个random number rand, rand % (i + 1)的值如果是< k的话,则交换更新里面的k。怎么证明这样的话,每个数能选进去的概率是 k / n呢?


我们从第n个数开始(坐标是n - 1)开始看,它能被选中的几率是:k / n (当次产生的random number)。那从第n - 1个数看,它能被选中的几率是: k / n - 1 (当次产生的random number) *  n - 1 / n(没有被下一次的选中为被替代的那个数) = k / n. 


那再看看原来被一开始复制过来的k个数,他们能被最终选中的概率是:(k / k + 1) * (k + 1 / k + 2)  * ... * (n - 1 / n) 就是第k + 1/ k + 2 ... 第n次的random number选择的不是自己的概率。还是 k / n. 


void selectKItems(int stream[], int n, int k)  
    int i; // index for elements in stream[]  
    // reservoir[] is the output array. Initialize  
    // it with first k elements from stream[]  
    int reservoir[k];  
    for (i = 0; i < k; i++)  
        reservoir[i] = stream[i];  
    // Use a different seed value so that we don't get  
    // same result each time we run this program  
    // Iterate from the (k+1)th element to nth element  
    for (; i < n; i++)  
        // Pick a random index from 0 to i.  
        int j = rand() % (i + 1);  
        // If the randomly picked index is smaller than k,  
        // then replace the element present at the index  
        // with new element from stream  
        if (j < k)  
        reservoir[j] = stream[i];  
    cout << "Following are k randomly selected items \n";  
    printArray(reservoir, k);  


Leetcode 例题:


398. Random Pick Index


Given an array of integers with possible duplicates, randomly output the index of a given target number. You can assume that the given target number must exist in the array.

The array size can be very large. Solution that uses too much extra space will not pass the judge.


int[] nums = new int[] {1,2,3,3,3};
Solution solution = new Solution(nums);

// pick(3) should return either index 2, 3, or 4 randomly. Each index should have equal probability of returning.

// pick(1) should return 0. Since in the array only nums[0] is equal to 1.


class Solution {
    Solution(vector<int>& nums) {
        num_size_ = nums.size();
        store_= nums;
    int pick(int target) {
        int res = -1;
        int found_num = 0;
        for (int i = 0; i < num_size_; ++i) {
            if (store_.at(i) != target) {
            if (rand() % found_num == 0) {
                res = i;
        return res;
    size_t num_size_ = 0;
    vector<int> store_;




第一个数被选中的几率是1,然后它能survive到最后的:  1 * 1/2 * 2/3 * m - 1/m。m是这个数据的个数。


382. Linked List Random Node


Given a singly linked list, return a random node's value from the linked list. Each node must have the same probability of being chosen.

Follow up:
What if the linked list is extremely large and its length is unknown to you? Could you solve this efficiently without using extra space?


// Init a singly linked list [1,2,3].
ListNode head = new ListNode(1);
head.next = new ListNode(2);
head.next.next = new ListNode(3);
Solution solution = new Solution(head);

// getRandom() should return either 1, 2, or 3 randomly. Each element should have equal probability of returning.
class Solution {
    /** @param head The linked list's head.
        Note that the head is guaranteed to be not null, so it contains at least one node. */
    Solution(ListNode* head) {
        head_ = head;
    /** Returns a random node's value. */
    int getRandom() {
        int index = 1;
        ListNode* cur = head_;
        int res = cur->val;
        cur = cur->next;
        while (cur != nullptr) {
            if (rand() % index == 0) {
                res = cur->val;
            cur = cur->next;
        return res;
    ListNode* head_ = nullptr;




Shuffle a given array using Fisher-Yates shuffle algorithm


Given an array, write a program to generate a random permutation of array elements. This question is also asked as “shuffle a deck of cards” or “randomize a given array”. Here shuffle means that every permutation of array element should equally likely.




Let the given array be arr[]. A simple solution is to create an auxiliary array temp[] which is initially a copy of arr[]. Randomly select an element from temp[], copy the randomly selected element to arr[0] and remove the selected element from temp[]. Repeat the same process n times and keep copying elements to arr[1], arr[2], … . The time complexity of this solution will be O(n^2).




To shuffle an array a of n elements (indices 0..n-1):
  for i from n - 1 downto 1 do
       j = random integer with 0 <= j <= i
       exchange a[j] and a[i]




The probability that ith element (including the last one) goes to last position is 1/n, because we randomly pick an element in first iteration.

The probability that ith element goes to second last position can be proved to be 1/n by dividing it in two cases.
Case 1: i = n-1 (index of last element):
The probability of last element going to second last position is = (probability that last element doesn’t stay at its original position) x (probability that the index picked in previous step is picked again so that the last element is swapped)
So the probability = ((n-1)/n) x (1/(n-1)) = 1/n
Case 2: 0 < i < n-1 (index of non-last):
The probability of ith element going to second position = (probability that ith element is not picked in previous iteration) x (probability that ith element is picked in this iteration)
So the probability = ((n-1)/n) x (1/(n-1)) = 1/n


void randomize (int arr[], int n)  
    // Use a different seed value so that  
    // we don't get same result each time 
    // we run this program  
    srand (time(NULL));  
    // Start from the last element and swap  
    // one by one. We don't need to run for  
    // the first element that's why i > 0  
    for (int i = n - 1; i > 0; i--)  
        // Pick a random index from 0 to i  
        int j = rand() % (i + 1);  
        // Swap arr[i] with the element  
        // at random index  
        swap(&arr[i], &arr[j]);  








