Having an input stream of n, select k elements with equal probability.
Algorithm:
1. select k elements into the result first 0....k-1
2. for k....n-1 elements, each time i, generate a random number 0....i: if rand < k, set result[k] = input[i]
Prove: http://www.geeksforgeeks.org/reservoir-sampling/
1. For elements k....n-1
a) The last element n-1, has k/n chance to be selected
b) The second last element n-2, in the first has k/n-1 chance to be selected to the k result; and when we iterate to the last element, that second last element has (n-1)/n chance to be NOT swapped back to the rest input. Total k/(n-1) * (n-1)/n = k/n
2. For elements 0...k-1
Each time the m (0<=m<=k-1) is not swapped chance: k/(k+1) * (k+1)/(k+2)....(n-1/n = k/n
How to generate the random number:
1) use Math.rand() which generate 0....0.99999, to get a rand between 0-i, use (int)(Math.rand() * (i+1))
2) use Random.nextInt(i) which generate 0....i-1 integers
public static int[] sample(int[] input, int k) {
int[] result = new int[k];
int i = 0;
for (; i < k; i++) {
result[i] = input[i];
}
Random random = new Random();
for (; i < input.length; i++) {
// int rand = (int) (Math.random() * (i+1));
int rand = random.nextInt(i+1);
if (rand < k) {
result[rand] = input[i];
}
}
return result;
}