Suppose you are designing an autonomous system that will gather data and adapt its behavior to that data.
At first you face the so-called cold-start problem. You don’t have any data when you first turn the system on, and yet the system needs to do something before it has accumulated data. So you prime the pump by having the system act at first on what you believe to be true prior to seeing any data.
Now you face a problem. You initially let the system operate on assumptions rather than data out of necessity, but you’d like to go by data rather than assumptions once you have enough data. Not just some data, but enough data. Once you have a single data point, you have some data, but you can hardly expect a system to act reasonably based on one datum.
Rather than abruptly moving from prior assumptions to empirical data, you’d like the system to gradually transition from reliance on the former to reliance on the latter, weaning the system off initial assumptions as it becomes more reliant on new data.
The delicate part is how to manage this transition. How often should you adjust the relative weight of prior assumptions and empirical data? And how should you determine what weights to use? Should you set the weight given to the prior assumptions to zero at some point, or should you let the weight asymptotically approach zero?
Fortunately, there is a general theory of how to design such systems. It’s called Bayesian statistics. The design issues raised above are all handled automatically by Bayes theorem. You start with a prior distribution summarizing what you believe to be true before collecting new data. Then as new data arrive, the magic of Bayes theorem adjusts this distribution, putting more emphasis on the empirical data and less on the prior. The effect of the prior gradually fades away as more data become available.
Reading the first lines of this post I became that annoying kid who always raises his hand and shouts out the answer while the question is still being asked. “Bayes!” I was shouting. “UKF is Bayes at the limit!” I said as my hand waved wildly.
Of course it is. It’s nice seeing a post I already know a bit about. Once in a while. Not too often, though.
More to the point, having good priors, but desiring to get away from them ASAP, is a problem I first encountered in “extreme sensing”, where delicate sensors may have nearly random startup transients depending on multiple factors in the startup environment.
The simple and powerful exponential filter is often a great way to start, but the desire to vary the smoothing constant as the data improves quickly leads one toward “smarter” filters. Another factor favoring the exponential filter is its terrific space/time performance, a critical factor in low-power real-time embedded systems. Getting acceptable dynamic filter performance from limited compute and memory resources often forces a pragmatic choice over the ideal.
Running a Bayesian analysis and UKF on many lab data runs allowed the suitability of the final filter to be evaluated with confidence.
Of course, now that embedded processors are 32-bit, have FPUs, megabytes of memory and run on milliwatts, limits the need for the above exercise. Especially given the quality of filters available in standard embedded math libraries.
I left “bare metal” programming when it became too easy, following the problem domain into Systems Engineering.
I think this is simplifying sequential decision processes. If you’re just trying to estimate something then bayes is correct answer. But if future data is a function of previous decision then you’ll have some feedback loops which could bias the relationships found in this data generating process. I don’t think bayes has much to say about this.
If you’re not just interested in estimation, you still need to make predictions regarding future outcomes, and posterior predictive probability is the natural way to do that. That’s how bandit problems work, for example. So it seems you’d still use Bayes theorem even if you’re not just interested in estimation.