How to calculate binomial probabilities

Suppose you’re doing something that has probability of success p and probability of failure q = 1 − p. If you repeat what you’re doing m+n times, the probability of m successes and n failures is given by

$\frac{(m+n)!}{m!\, n!} p^m q^n$

Now suppose m and n are moderately large. The terms (m+n)! and m! n! will be huge, but the terms p^m and qⁿ will be tiny. The huge terms might overflow, and the tiny terms might underflow, even though the final result may be a moderate-sized number. The numbers m and n don’t have to be that large for this to happen since factorials grow very quickly. On a typical system, overflow happens if m+n > 170. How do you get to the end result without having to deal with impossibly large or small values along the way?

The trick is to work with logs. You want to first calculate

log( (m+n)! ) − log( m! ) − log( n! ) + m log( p ) + n log( q )

then exponentiate the result. This pattern comes up constantly in statistical computing.

Libraries don’t always include functions for evaluating factorial or log factorial. They’re more likely to include functions for Γ(x) and its log. For positive integers n, Γ(n+1) = n!. Now suppose you have a function lgamma that computes log Γ(x). You might write something like this.

    double probability(double p, double q, int m, int n)
    { 
        double temp = lgamma(m + n + 1.0);
        temp -=  lgamma(n + 1.0) + lgamma(m + 1.0);
        temp += m*log(p) + n*log(q);
        return exp(temp);
    }

The function lgamma is not part of the ANSI standard library for C or C++, but it is part of the POSIX standard. On Unix-like systems, lgamma is included in the standard library. However, Microsoft does not include lgamma as part of the Visual C++ standard library. On Windows you have to either implement your own lgamma function or grab an implementation from somewhere like Boost.

Here’s something to watch out for with POSIX math libraries. I believe the POSIX standard does not require a function called gamma for evaluating the gamma function Γ(x). Instead, the standard requires functions lgamma for the log of the gamma function and tgamma for the gamma function itself. (The mnemonic is “t” for “true,” meaning that you really want the gamma function.) I wrote a little code that called gamma and tested it on OS X and Red Hat Linux this evening. In both cases gcc compiled the code without warning, even with the -Wall and -pedantic warning flags. But on the Mac, gamma was interpreted as tgamma and on the Linux box it was interpreted as lgamma. This means that gamma(10.0) would equal 362880 on the former and 12.8018 on the latter.

If you don’t have access to an implementation of log gamma, see How to compute log factorial.

5 thoughts on “How to calculate binomial probabilities”

Thomas Guest

25 April 2008 at 01:46

Another interesting article John, especially the warning about gamma. I tried “man gamma” on a Redhat linux close to hand to get the details.

HISTORY
4.2BSD had a gamma() that computed ln(|Gamma(|x|)|), leaving the sign of
Gamma(|x|) in the external integer signgam. In 4.3BSD the name was
changed to lgamma(), and the man page promises

“At some time in the future the name gamma will be rehabilitated and
used for the Gamma function”

This did indeed happen in 4.4BSD, where gamma() computes the Gamma func-
tion (with no effect on signgam). However, this came too late, and we now
have tgamma(), the “true gamma” function.

CONFORMING TO
4.2BSD. Compatible with previous mistakes.

jimcp

25 April 2008 at 03:05

Choose the tools first before doing the job. The job you describe asks for software designed to handle math. The tool I count on is Mathematica. Imho, the dream of a mathematician. ( SAGE is an open source alternative and even speaks to Mathematica ) Mathematica can be called from Java, I am sure, probably C++ as well, but of this I am not sure.

Jan

14 October 2008 at 03:17

John, thanks a lot for that suggestion! It helped me to fit multiplicity distributions in high-energy physics proton-proton collisions (n goes up to 250).
Cheers, Jan

robin

15 October 2008 at 10:33

Nice article. I have used the gammln trick before but it is nice to have it written out explicitly for the binomial.

jimcp, I am not one to miss giving a sucker good jibe, but if you can find a set of programs that encompasses the full range of functionality I require (choose from parser X add to Parser Y and subtract the marketing lingo) to get my job done please enlighten me, otherwise you might want to study some numerical proofs.

Jason Merrill

28 February 2023 at 16:29

Note that the formula suggested here may still be inaccurate for large n because of cancellation between different terms in the exponent.

Catherine Loader discusses this (and a more accurate computation) in “Fast and Accurate Computation of Binomial Probabilities,” https://www.r-project.org/doc/reports/CLoader-dbinom-2002.pdf

Comments are closed.