Here’s a common problem that arises in Bayesian computation. Everything works just fine until you have more data than you’ve seen before. Then suddenly you start getting infinite, NaN, or otherwise strange results. This post explains what might be wrong and how to fix it.
A posterior density is (proportional to) a likelihood function times a prior distribution. The likelihood function is a product. The number of data points is the number of terms in the product. If these numbers are less than 1, and you multiply enough of them together, the result will be too small to represent in a floating point number and your calculation will underflow to zero. Then subsequent operations with this number, such as dividing it by another number that has also underflowed to zero, may produce an infinite or NaN result.
The instinctive reaction regarding underflow or overflow is to use logs. And often that works. If you wanted to know where the maximum of the posterior density occurs, you could find the maximum of the logarithm of the posterior density. But in Bayesian computations you often need to integrate the posterior density times some functions. Now you cannot just work with logs because logs and integrals don’t commute.
One way around the problem is to multiply the integrand by a constant so large that there is no danger of underflow. Multiplying by constants does commute with integration.
So suppose your integrand is on the order of 10^−400, far below the smallest representable number. Do you need extended precision arithmetic? No, you just need to understand your problem.
If you multiply your integrand by 10^400 before integrating, then your integrand is roughly 1 in magnitude. Then do you integration, and remember that the result is 10^400 times the actual value.
You could take the following steps.
- Find m, the maximum of the log of the integrand.
- Let I be the integral of exp( log of the integrand − m ).
- Keep track that your actual integral is exp(m) I, or that its log is m + log I.
Note that you can be extremely sloppy in this process. You don’t need an accurate estimate of the maximum of the integrand per se. If you’re within a few dozen orders of magnitude, for example, that could be sufficient to carry out your integration without underflow.
One way to estimate the maximum is to use a frequentist estimator of your parameters as an approximate MLE, and assume that this is approximately where your posterior density takes on its maximum value. This might actually be very accurate, but it doesn’t need to be.
Note also that it’s OK if some of the evaluations of your integrand underflow to zero. You just don’t want the entire integral to underflow to zero. Places where the integrand is many orders of magnitude less than its maximum don’t contribute to the integration anyway. (It’s important that your integration pays attention to the region where the integrand is largest. A naive integration could entirely miss the most important region and completely underestimate the integral. But that’s a matter for another blog post.)
Can you give an example of this trick with some math and pseudo code. Say for example, I want the product of a large number of probabilities, and I cannot simply add the sum of logs.
Interesting post, thank you. I’m curious when you find yourself doing this in practice. The first thing that comes to mind is evaluation of model evidence, but for that there are specialized methods like nested sampling or Bayesian quadrature that seem better suited than general-purpose numerical integration.
I would use this for problems that are tractable without MCMC, especially when I need the code to run unattended and want to avoid some of the possible pitfalls with MCMC.
A look at the log-sum-exp trick, very useful when calculating e.g. sums of particle weights in sequential Monte Carlo (a.k.a. particles filters) and therefore relevant for Bayesian computations
http://jblevins.org/log/log-sum-exp