Collecting a large number of coupons

This post is an addendum to the recent post Reviewing a thousand things. We’re going to look again at the coupon collector problem, randomly sampling a set of N things with replacement until we’ve collected one of everything.

As noted before, for large N the expected number of draws before you’ve seen everything at least once is approximately

N( log(N) + γ )

where γ is Euler’s constant.

A stronger result using a Chernoff bound says that for any constant c, the probability that the number of draws exceeds

N( log(N) + c )

approaches

1 − exp( − exp(−c))

as N increases. See [1].

I said in the earlier post that if you were reviewing a thousand things by sampling flash cards with replacement, there’s a 96% chance that after drawing 10,000 cards you would have seen everything. We could arrive at this result using the theorem above.

If N = 1,000 and we want N( log(N) + c ) to equal 10,000, we set c = 3.092. Then we find

1 − exp( − exp(−c) ) = 0.0444

which is consistent with saying there’s a 96% chance of having seen everything, 95.56% if you want to be more precise.

[1] Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005.