Dec 16, 2008

Take s numbers from n numbers with equal probability, n is unlimited

Take s numbers from n numbers with equal probability, n is unlimited

we can use reservoir sampling to solve this problem
-Store first s elements into R.
-for each element in position k = s+1 to n ,
--accept it with probability s/k
--if accepted, choose a random element from R to replace.

Simple example:
Suppose we take 2 numbers from 3 numbers
First, put first 2 numbers in R
Second, the prob. for the 3rd number to be taken is 2/3
Third, if 3rd number is taken, choose a number from R replaced by 3rd number.
The prob. for 3rd number to be taken is: 2/3
The prob. for 1st and 2nd number to drop is: (1/2)*(2/3)=1/3
So the prob. for 1st and 2nd number to take is: (1-(1/2)*(2/3))*1=2/3

for choose the 4th num(2/4), the prob. for 1st, 2nd and 3rd is (1-(1/2)*(2/4))*2/3=2/4

distributed system can speed up this sampling
http://blogs.msdn.com/spt/archive/2008/02/05/reservoir-sampling.aspx

No comments:

Post a Comment