This post is an addendum to the recent post Reviewing a thousand things. We’re going to look again at the coupon collector problem, randomly sampling a set of N things with replacement until we’ve collected one of everything.
As noted before, for large N the expected number of draws before you’ve seen everything at least once is approximately
N( log(N) + γ )
where γ is Euler’s constant.
A stronger result using a Chernoff bound says that for any constant c, the probability that the number of draws exceeds
N( log(N) + c )
approaches
1 − exp( − exp(−c))
as N increases. See [1].
I said in the earlier post that if you were reviewing a thousand things by sampling flash cards with replacement, there’s a 96% chance that after drawing 10,000 cards you would have seen everything. We could arrive at this result using the theorem above.
If N = 1,000 and we want N( log(N) + c ) to equal 10,000, we set c = 3.092. Then we find
1 − exp( − exp(−c) ) = 0.0444
which is consistent with saying there’s a 96% chance of having seen everything, 95.56% if you want to be more precise.
[1] Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005.