MCMC burn-in is misunderstood

In Markov Chain Monte Carlo (MCMC), it’s common to throw out the first few states of a Markov chain, maybe the first 100 or the first 1000. People say they do this so the chain has had a chance to “burn in.” But this explanation by itself doesn’t make sense. It may be good to throw away a few samples, but it could betray a lack of understanding to say a chain has “burned in.”

A Markov chain has no memory. That’s its defining characteristic: its future behavior depends solely on where it is, not how it got there. So if you “burn in” a thousand samples, your future calculations are absolutely no different than if you had started where there first thousand samples left off. Also, any point you start at is a point you might return to, or at least return arbitrarily close to again.

So why burn in? To enter a high probability region, a place where the states of the Markov chain are more representative of the distribution you’re sampling. When someone says a chain has “burned in,” that’s fine if they mean “has entered a high probability region.” And why do you want to enter such a region? Because you’re going to average some function of your samples:

$E[ f(X) ] \approx \frac{1}{n} \sum_{i=1}^n f(x_i)$

The result will be correct as n → ∞, but you’re going to stop after some finite n. When n is small, and your samples are in a low probability region, the average on the right might be a poor approximation to the expectation on the left.

The idea of burn-in is that you can start your MCMC procedure at some point chosen for convenience, one which might be out in the weeds, but then after a few iterations you’ll be in a high probability region. However, you don’t know that this will happen. It probably will happen, eventually, by definition: a random process spends most of its time where it spends most of its time! It is possible, though unlikely, that you could be in a lower probability region at the end of your burn-in period than at the beginning. Or maybe your chain is slowly moving toward a higher probability region, but you’re still not close at the end of your burn-in.

If you know where a high probability region is, just start there. Then you’ve “burned in” immediately. However, with a very complicated problem you might not know where a high probability region is. So you hope that a few steps of your chain will land you in a high probability region. And maybe it will. But if you understand your problem so poorly that you have no idea where the probability is concentrated, you’re going to have a hard time evaluating your results.