Quantifying surprise with Kullback Leibler divergence

Motivating example: planet spacing

My previous post showed that planets are roughly evenly distributed on a log scale, not just in our solar system but also in extrasolar planetary systems. I hadn’t seen this before I stumbled on it by making some plots.

I didn’t think it was an original discovery—I assume someone did this exercise immediately when systems with several planets were discovered—but I didn’t know what this observation was called. I now know it’s known as the Titius-Bode law, a generalization of an observation about our solar system by Messrs. Titius and Bode a couple centuries ago. See, for example, [1].

Several people were skeptical of the claim that planets are distributed according to a power law and pointed out that uniformly distributed points can look fairly evenly distributed on a logarithmic scale. Which is true, and gets to the topic I want to discuss in this post. Planets are not spaced like uniform random samples (see [1]) and yet it reasonable, at first glance, to ask whether they are.

Asymmetric surprise

If you’re expecting a power law, and you’re given uniformly distributed data, it doesn’t look too surprising. On the other hand, if you’re expecting uniformly distributed data and you see data distributed according to a power law, you are surprised. I’ll formalize this below.

If you’ve ever tried to make a scaled model of our solar system, you were probably surprised that the planets are far from uniformly spaced. A scaled model of our solar system, say at a museum, is likely to position a few of the inner planets to scale, and then use text to explain where the outer planets should be. For example, there may be a footnote saying “And if everything were to scale, Pluto would be behind the Exxon station at the end of the street.” This is an example of implicitly expected a uniform distribution and receiving data distributed according to a power law.

Some people suspected that I was doing the opposite. By plotting distances on a log scale, I’m implicitly expected a power law distribution. Maybe the data were roughly uniform, but I fooled myself into seeing a power law.

Quantifying surprise

The Kullback-Leibler divergence from Y to X, written KL(X || Y), is the average surprise of seeing Y when you expected X. That’s one of the interpretations. See this post for more interpretations.

In general, Kullback-Leibler divergence is not symmetric. The divergence from X to Y typically does not equal the divergence from Y to X. The discussion above claims that the surprise from seeing power law data when expecting a uniform distribution is greater than the surprise from seeing uniform data when expected a power law distribution. We show below that this is true.

Let X be random variable uniformly distributed on [0, 1] and let Y be a random variable with distribution proportional to x^α on the same interval. (The proportionality constant necessary to make the probability integrate to 1 is α + 1.) We will show that KL(X || Y) is greater than KL(Y || X).

First we calculate the two divergences.

$\begin{eqnarray*} \mathrm{KL}(X || Y) &=& - \int_0^1 f_X(x) \, \log\left(\frac{f_Y(x)}{f_X(x)} \right) \, dx \\ &=& -\int_0^1 1 \cdot \left( \log(\alpha+1) + \alpha \log x - \log 1 \right) \, dx \\ &=& \alpha - \log(\alpha+1) \end{eqnarray*}$

and

$\begin{eqnarray*} \mathrm{KL}(Y || X) &=& - \int_0^1 f_Y(x) \, \log\left(\frac{f_X(x)}{f_Y(x)} \right) \, dx \\ &=& -\int_0^1 (\alpha + 1)x^\alpha \left(\log 1 -\log(\alpha+1) - \alpha \log x \right) \, dx \\ &=& \log(\alpha+1) - \frac{\alpha}{1 + \alpha} \end{eqnarray*}$

And here is a plot comparing the two results as a function of the exponent α.

***

[1] Timothy Bovaird, Charles H. Lineweaver; Exoplanet predictions based on the generalized Titius–Bode relation, Monthly Notices of the Royal Astronomical Society, Volume 435, Issue 2, 21 October 2013, Pages 1126–1138, https://doi.org/10.1093/mnras/stt1357

2 thoughts on “Asymmetric surprise”

Robert Matthews

29 July 2022 at 05:37

I think the labels of the two KL divergences on the graph need reversing.
Loved this application (was fascinated by TB’s Law as a kid; interesting how physical explanations of it are still lacking !). Also great way of understanding KL. Thank you !
Robert Matthews

29 July 2022 at 05:48

(Actually, the graph plots those KL divergences correctly; my bad!)

Comments are closed.

Motivating example: planet spacing

Asymmetric surprise

Quantifying surprise

Related posts

2 thoughts on “Asymmetric surprise”