When Benford’s law is exact

Eight years ago I wrote a blog post on the Leading digits of powers of 2. In that post I said that Benford’s law gives remarkably good predictions for the distribution of leading digits of 2n. In this post I’ll make that statement stronger.

A sequence obeys Benford’s law, in base 10, if the proportion of the first N elements of the sequence that begin with the digit d is asymptotically

N log10(1 + 1/d).

Benford’s law applies to powers of 2, but also to factorials, for example. In general it applies only in the limit as N goes to infinity. For a large but finite value of N it gives a good approximation, but there’s no guarantee on how good the approximation is.  Except sometimes there is.

You could use Benford’s law to estimate how many times n! begins with a 1, but it can tell you exactly how many times 2n begins with a 1. The exact number is the estimate given by Benford’s law with d = 1, rounded down to the nearest integers, i.e. the floor of the expression above.

N log10(2)⌋

See [1] for a proof. The following Python code demonstrates this.

    import decimal
    from math import log10, floor

    def exact(N):
        c = 0
        a = decimal.Decimal(1)
        for i in range(1, N+1):
            a = a*2
            if int(str(a)[0]) == 1:
                c += 1
        return c

    def estimated(N):
        return N*log10(2)

    for i in range(1, 10000):
        assert(exact(i) == int(floor(estimated(i))))

The code uses the decimal class for efficiency. See Andrew Dalke’s comment on the original post. [2]

Benford’s law predicts exactly how often the sequence 2n begins with a 1. It also predicts exactly how often it begins with 4, but it is not exact for other digits.

Related posts

[1] Zhaodong Cai, Matthew Faust, A. J. Hildebrand, Junxian Li, Yuan Zhang. The Surprising Accuracy of Benford’s Law in Mathematics. The American Mathematical Monthly, Vol. 127, No. 3 (MARCH 2020), pp. 217–237

[2] The function exact is efficient for individual values of N, but there is a lot of redundant calculation when you use it to test a range of N‘s as we do in the loop with the assert statement. It wouldn’t be hard to rewrite the code to have it more efficiently check a range.

Leave a Reply

Your email address will not be published. Required fields are marked *