Probability resources

Posted on 7 January 2015 by John

Each Wednesday I post a list of notes on some topic. This week it’s probability.

Next week: Regular expression resources

Random probability tweets

Posted on 2 December 2014 by John

For the next few weeks, I’ve scheduled @ProbFact tweets to come out at random times.

They will follow a Poisson distribution with an average of two per day. (Times are truncated to multiples of 5 minutes because my scheduling software requires that.)

After a coin comes up heads 10 times

Posted on 30 November 2014 by John

Suppose you’ve seen a coin come up heads 10 times in a row. What do you believe is likely to happen next? Three common responses:

Heads
Tails
Equal probability of heads or tails.

Each is reasonable in its own context. The last answer is correct assuming the flips are independent and heads and tails are equally likely.

But as I argued here, if you see nothing but heads, you have reason to question the assumption that the coin is fair. So there’s some justification for the first answer.

The reasoning behind the second answer is that tails are “due.” This isn’t true if you’re looking at independent flips of a fair coin, but it could reasonable in other settings, such as sampling without replacement.

Say there are a number of coins on a table, covered by a cloth. A fixed number are on the table heads up, and a fixed number tails up. You reach under the cloth and slide a coin out. Every head you pull out increases the chances that the next coin will be tails. If there were an equal number of heads and tails under the cloth to being with, then after pulling out 10 heads tails are indeed more likely next time.

Related post: Long runs

First two impressions of statistics

Posted on 25 November 2014 by John

When I was a postdoc I asked a statistician a few questions and he gave me an overview of his subject. (My area was PDEs; I knew nothing about statistics.) I remember two things that he said.

A big part of being a statistician is knowing what to do when your assumptions aren’t met, because they’re never exactly met.
A lot of statisticians think time series analysis is voodoo, and he was inclined to agree with them.

Blue Bonnet Bayes

Posted on 28 October 2014 by John

Blue Bonnet™ used to run commercials with the jingle “Everything’s better with Blue Bonnet on it.” Maybe they still do.

Perhaps in reaction to knee-jerk antipathy toward Bayesian methods, some statisticians have adopted knee-jerk enthusiasm for Bayesian methods. Everything’s better with Bayesian analysis on it. Bayes makes it better, like a little dab of margarine on a dry piece of bread.

There’s much that I prefer about the Bayesian approach to statistics. Sometimes it’s the only way to go. But Bayes-for-the-sake-of-Bayes can expend a great deal of effort, by human and computer, to arrive at a conclusion that could have been reached far more easily by other means.

Related: Bayes isn’t magic

Common sense and statistics

Posted on 18 September 2014 by John

College courses often begin by trying to weaken your confidence in common sense. For example, a psychology course might start by presenting optical illusions to show that there are limits to your ability to perceive the world accurately. I’ve seen at least one physics textbook that also starts with optical illusions to emphasize the need for measurement. Optical illusions, however, take considerable skill to create. The fact that they are so contrived illustrates that your perception of the world is actually pretty good in ordinary circumstances.

For several years I’ve thought about the interplay of statistics and common sense. Probability is more abstract than physical properties like length or color, and so common sense is more often misguided in the context of probability than in visual perception. In probability and statistics, the analogs of optical illusions are usually called paradoxes: St. Petersburg paradox, Simpson’s paradox, Lindley’s paradox, etc. These paradoxes show that common sense can be seriously wrong, without having to consider contrived examples. Instances of Simpson’s paradox, for example, pop up regularly in application.

Some physicists say that you should always have an order-of-magnitude idea of what a result will be before you calculate it. This implies a belief that such estimates are usually possible, and that they provide a sanity check for calculations. And that’s true in physics, at least in mechanics. In probability, however, it is quite common for even an expert’s intuition to be way off. Calculations are more likely to find errors in common sense than the other way around.

Nevertheless, common sense is vitally important in statistics. Attempts to minimize the need for common sense can lead to nonsense. You need common sense to formulate a statistical model and to interpret inferences from that model. Statistics is a layer of exact calculation sandwiched between necessarily subjective formulation and interpretation. Even though common sense can go badly wrong with probability, it can also do quite well in some contexts. Common sense is necessary to map probability theory to applications and to evaluate how well that map works.

Inverted sense of risk

Posted on 5 September 2014 by John

Watching the news gives you an inverted sense of risk.

We fear bad things that we’ve seen on the news because they make a powerful emotional impression. But the things rare enough to be newsworthy are precisely the things we should not fear. Conversely, the risks we should be concerned about are the ones that happen too frequently to make the news.

Intuition and Data at KeenCON

Posted on 14 August 2014 by John

I will be giving a talk “Bayesian statistics as a way to integrate intuition and data” at KeenCon, September 11, 2014 in San Francisco.

Update: Use promo code KeenCon-JohnCook to get 75% off registration.

Update: Here are the slides from the talk.

Erasure coding white paper

Posted on 12 August 2014 by John

Last year I worked with Hitachi Data Systems to evaluate the trade-offs of replication and erasure coding as ways to increase data storage reliability while minimizing costs. This lead to a white paper that has just been published:

Compare Cost and Performance of Replication and Erasure Coding
Hitachi Review Vol. 63 (July 2014)
John D. Cook
Robert Primmer
Ab de Kwant

Normal approximation details

Posted on 29 May 2014 by John

The normal distribution can approximate many other distributions, though the details such as quantitative error estimates and what factors improve or degrade the approximation are harder to find. Here are some notes on normal approximations to several common probability distributions.