Solving hard problems

We help companies solve hard problems in mathematics, statistics, and computing. Let’s explore how we might work together.

Complex golden convergence

Posted on 23 February 2025 by John

The previous post looked at how iterations of

$x \mapsto \sqrt{1 + x}$

converge to the golden ratio φ. That post said we could start at any positive x. We could even start at any x > −¾ because that would be enough for the derivative of √(1 + x) to be less than 1, which means the iteration will be a contraction.

We can’t start at any x less than −1 because then our square root would be undefined, assuming we’re working over real numbers. But what if we move over to the complex plane?

If f is a real-valued function of a real variable and|f ′(x)| is bounded by a constant less than 1 on an interval, then f is a contraction on that interval. Is the analogous statement true for a complex-valued function of a complex variable? Indeed it is.

If we start at a distance at least ¼ away from −1 then we have a contraction and the iteration converges to φ. If we start closer than that to −1, then the first step of the iteration will throw us further out to where we then converge.

To visualize the convergence, let’s look at starting at a circle of points in a circle of radius 10 around φ and tracing the path of each iteration.

For most of the starting points, the iterations converge almost straight toward φ, but points near the negative real axis take a more angular path. The gap in the plot is an artifact of iterations avoiding the disk of radius ¼ around −1. Let’s see what happens when we start on the boundary of this disk.

Next, instead of connecting each point to its next iteration, let’s plot the starting circle, then the image of that circle, and the image of that circle, etc.

The boundary of the disk, the blue circle, is mapped to the orange half-circle, which is then mapped to the green stretched half circle, etc.

Golden convergence

Posted on 22 February 2025 by John

The golden ratio φ satisfies the following equation.

$\varphi = \sqrt{1 + \sqrt{1 + \sqrt{1 + \sqrt{1 + \cdots}}}}$

The proof most commonly given is to let x equal the right-hand side of the equation, then observe that x² = 1 + x, the quadratic equation for the golden ratio. The quadratic has two roots: φ and −1/φ. Since x > 1, x = φ.

This proof tacitly assumes that the expression above is meaningfully defined, and then manipulates it algebraically. But is it meaningfully defined? What exactly does the infinitely nested sequence of radicals mean?

You could interpret the nested radicals to be the limit of the iteration

$x \mapsto \sqrt{1 + x}$

and show that the limit exists.

What should the starting value of our iteration be? It seems natural to choose x = 1, but other values will do. More on that shortly. We could compute φ by implementing the iteration in Python as follows.

    x_old, x_new = 0, 1
    while (abs(x_old - x_new) > 1e-15):
        x_new, x_old = (1 + x_new)**0.5, x_new
    print(x_new)

The program terminates, so the iteration must converge, right? Probably so, but that’s not a proof. To be more rigorous, define

$f(x) = \sqrt{1 + x}$

and so

$f^\prime(x) = \frac{1}{2\sqrt{1 + x}}$

This shows 0 < f ′(x) < ½ for any positive x and so the function f(x) is a contraction mapping. This means the iterations converge. We could start with any x > −¾ and the derivative would still be less than 1, so the iterations would converge.

We can visualize the process of convergence starting with x = 0 using the following cobweb plot.

Mnemonic images with Grok 3

Posted on 20 February 2025 by John

The Major mnemonic system makes numbers easier to memorize by encoding them as words. Each digit corresponds to one or more consonant sounds, and you can fill in vowels as you wish.

In August 2022 I tried creating a few images using DALL-E 2. The results were disappointing and sometimes disturbing.

To illustrate the use of the Major system, I gave the example of memorizing a list of the US presidents by creating mental images associating each president with the number of their term as president. For example, Franklin Delano Roosevelt was the 32nd POTUS. You can encode 32 as “moon”, so you might imagine FDR looking up at the moon.

At the time, Grover Cleveland was the only US President to serve two non-consecutive terms, being both the 22nd and 24th president. I asked DALL-E to create an image of Grover the Muppet holding an onion (22) and a wiener dog (24). This was too much for DALL-E at the time. The image below was as close as I could get.

Blue dog holding cucumber dog?

When I asked Grok 3 for a similar image it did a much better job. To be fair, it initially put a different dog breed in Grover’s hand, but I asked it to change the dog to a dachshund and it performed admirably.

Grover holding an onion and a wiener dog

Standing with Intellectual Giants

Posted on 20 February 2025 by Wayne Joubert

Is it possible to come up with truly innovative ideas when you’re not part of the institutions where the expertise resides?

According to one study, the answer would seem to be “No.” The book, The Sociology of Philosophies by Randall Collins, makes a case for how great ideas through history have always developed, almost without fail, in connection with the expert community.

Organizations, institutions and even loose associations of individuals can possess tacit knowledge giving a competitive moat that is hard for outsiders to cross. This may include explicit trade secrets and technical facts, but also certain thought styles, rules of thumb, recipes, etc., that reside only in the minds of the participants.

Sometimes these thought styles are more important than the bare facts themselves. In a recent interview, Terence Tao commented that sometimes his most appreciated lectures are those in which he makes a mistake and must show his thought process in real time for how he fixes the problem. By learning, not just what the solution is but how to approach the problem, one can be enabled to solve many problems, not just one.

Sometimes such learning occurs when a corporation or other institution imprints its attitudes, thought styles and mental habits onto its members over a period of time.

The book Sociology of Philosophies, however, may have a fatal flaw. In determining what is a great idea, the book, perhaps circularly, relies on the authority of what institutions say is a great idea—thus potentially arriving at the conclusion as a tautology. Diffusion of ideas may be a better tool for looking at the problem, by looking at societal impact rather than just elite opinion.

An opposite idea is the notion of maverick science—knowledge developed outside of the institutions, often ridiculed and sometimes vindicated. Some ideas like open source software were developed from a purposely anti-institutional perspective (thus spawning a new community of its own). Maverick thinking may be more important now than ever, as many institutions have become moribund (for perspectives see here and here).

Opportunities for the maverick may be better now than ever. For one, the Internet, and particularly the prevalence of online talks and lectures, a trend accelerated during Covid, make expert knowledge more accessible than ever. Second, AI chatbots now allow you to ask questions of this content, playing something of a mentoring role. It’s a better time than ever for institutional outsiders to do worthwhile things.

Rich Sutton’s bitter lesson of AI

Posted on 20 February 2025 by John

This morning I read Alberto Romeo’s article Grok 3: Another Win For The Bitter Lesson. I wasn’t clear on what exactly the bitter lesson was, so I followed the link to Romeo’s previous article GPT-4: The Bitterer Lesson, which lead to Rich Sutton’s original article The Bitter Lesson.

Sutton opens his article by saying

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.

Clear enough, except you have to read further to get the “compared to what.” I asked Grok 3 to read the article and tell me exactly what the bitter lesson is. It begins with this explanation:

The “Bitter Lesson” is the principle that, in the field of artificial intelligence, the most effective and lasting progress has historically come from methods that rely on leveraging ever-increasing computational power rather than depending on human ingenuity to craft domain-specific knowledge or shortcuts. …

and concludes with the following pithy summary of its summary:

In essence, the Bitter Lesson is: computation trumps human-crafted specialization in AI, and embracing this, though humbling, is key to future success.

Sutton supports his thesis with examples from chess, go, speech recognition, and computer vision.

In some sense this is the triumph of statistics over logic. All else being equal, collecting enormous amounts of data and doing statistical analysis (i.e. training an AI model) will beat out rule-based systems. But of course all else is often not equal. Logic is often the way to go, even on big problems.

When you do know whats going on and can write down a (potentially large) set of logical requirements, exploiting that knowledge via techniques like SAT solvers and SMT is the way to go. See, for example, Wayne’s article on constraint programming. But if you want to build the Everything Machine, building software that wants to be all things to all people, statistics is the way to go.

Sometimes you have a modest amount of relevant data, and you don’t have 200,000 Nvidia H100 GPUs. In that case you’re much better off with classical statistics than AI. Classical models are more robust and easier to interpret. They are also not subject to hastily written AI regulation.

Multiplication tables and Latin squares

Posted on 19 February 2025 by John

The multiplication table of a finite group forms a Latin square.

You form the multiplication table of a finite group just as you would the multiplication tables from your childhood: list the elements along the top and side of a grid and fill in each square with the products. In the context of group theory the multiplication table is called a Cayley table.

There are two differences between Cayley tables and the multiplication tables of elementary school. First, Cayley tables are complete because you can include all the elements of the group in the table. Elementary school multiplication tables are the upper left corner of an infinite Cayley table for the positive integers.

(The positive integers do not form a group under multiplication because only 1 has a multiplicative inverse. The positive integers form a magma, not a group, but we can still talk of the Cayley table of a magma.)

The other difference is that the elements of a finite group typically do not have a natural order, unlike integers. It’s conventional to start with the group identity, but other than that the order of the elements along the top and sides could be arbitrary. Two people may create different Cayley tables for the same group by listing the elements in a different order.

A Cayley table is necessarily a Latin square. That is, each element appears exactly once in each row and column. Here’s a quick proof. The row corresponding to the element a consists of a multiplied by each of the elements of the group. If two elements were the same, i.e. ab = ac for some b and c, then b = c because you can multiply on the left by the inverse of a. Each row is a permutation of the group elements. The analogous argument holds for columns, multiplying on the right.

Not all Latin squares correspond to the Cayley table of a group. Also, the Cayley table of an algebraic structure without inverses may not form a Latin square. The Cayley table for the positive integers, for example, is not a Latin square.

Here’s an example, the Cayley table for Q₈, the group of unit quaternions. The elements are ±1, ±i, ±j, and ±k and the multiplication is the usual multiplication for quaternions:

i² = j² = k² = ijk = −1.

as William Rowan Hamilton famously carved into the stone of Brougham Bridge. I colored the cells of the table to make it easier to scan to verify that table is a Latin square.

Latin square posts

Quaternion posts

The Buenos Aires constant

Posted on 18 February 2025 by John

The Buenos Aires constant is 2.92005097731613…

What’s so special about this number? Let’s see when we use it to initialize the following Python script.

s = 2.920050977316134

for _ in range(10):
    i = int(s)
    print(i)
    s = i*(1 + s - i)

What does this print?

    2, 3, 5, 7, 11, 13, 17, 19, 23, 29

If we started with the exact Buenos Aires constant and carried out our operations exactly, the procedure above would generate all prime numbers. In actual IEEE 754 arithmetic, it breaks down around Douglas Adams’ favorite number.

The Buenos Aires constant is defined by the infinite sum

$\lambda = \frac{2-1}{1} + \frac{3-1}{2} + \frac{5-1}{3\cdot 2} + \frac{7-1}{5\cdot 3 \cdot 2} + \cdots$

As you can tell, the primes are baked into the definition of λ, so the series can’t generate new primes.

I used mpmath to calculate λ to 100 decimal places:

2.920050977316134712092562917112019468002727899321426719772682533107733772127766124190178112317583742

This required carrying out the series defining λ for the first 56 primes. When I carried out the iteration above also to 100 decimal places, it failed on the 55th prime. So I got about as many primes out of the computation as I put into it.

Reference: Beyond Pi and e: a Collection of Constants. James Grime, Kevin Knudson, Pamela Pierce, Ellen Veomett, Glen Whitney. Math Horizons, Vol. 29, No. 1 (September 2021), pp. 8–12

1 + 2 + 3 + … = −1/12

Posted on 13 February 2025 by John

The other day MathMatize posted

roses are red
books go on a shelf
1+2+3+4+ …

with a photo of Ramanujan on X.

This was an allusion to the bizarre equation

1 + 2 + 3 + … = − 1/12.

This comes up often enough that I wanted to write a post that I could share a link to next time I see it.

The equation is nonsense if interpreted in the usual way. The sum on the left diverges. You could say the sum is ∞ if by that you mean you can make the sum as large as you like by taking the partial sum out far enough.

Here’s how the equation is mean to be interpreted. The Riemann zeta function is defined as

$\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s}$

for complex numbers s with positive real part, and defined for the rest of the complex plane by analytic continuation. The qualifiers matter. The infinite sum above does not define the zeta function for all numbers; it defines ζ(s) for numbers with real part greater than 1. The sum is valid for numbers like 7, or 42 −476i, or √2 + πi, but not for −1.

If the sum did define ζ(−1) then the sum would be 1 + 2 + 3 + …, but it doesn’t.

However, ζ(−1) is defined, and it equals −1/12.

What does it mean to define a function by analytic continuation? There is a theorem that essentially says there is only one way to extend an analytic function. It is possible to construct an analytic function that has the same values as ζ(s) when Re(s) > 1, where that function is defined for all s ≠ 1.

We could give that function a new name, say f(s). That is the function whose value at −1 equals − 1/12. But since there is only one possible analytic function f that overlaps with ζ(s) we go ahead and use the notation ζ(s) for this extended function.

To put it another way, the function ζ(s) is valid for all s ≠ 1, but the series representation for ζ(s) is not valid unless Re(s) > 1. There are other representations for ζ(s) for other regions of the complex plane, including for s = −1, and that’s what lets us compute ζ(−1) to find out that it equals −1/12.

So the rigorous but less sensational way to interpret the equation is to say

1^−s + 2^−s + 3^−s + …

is a whimsical way of referring to the function defined by the series, when the series converges, and defined by its analytic continuation otherwise. So in addition to saying

1 + 2 + 3 + … = − 1/12

we could also say

1² + 2² + 3² + … = 0

and

1³ + 2³ + 3³ + … = 1/120.

You can make up your own equation for any value of s for which you can calculate ζ(s).

Multiple angle asymmetry

Posted on 13 February 2025 by John

The cosine of a multiple of θ can be written as a polynomial in cos θ. For example,

cos 3θ = 4 cos³ θ − 3 cos θ

and

cos 4θ = 8 cos⁴ θ − 8 cos² θ + 1.

But it may or may not be possible to write the sine of a multiple of θ as a polynomial in sin θ. For example,

sin 3θ = −4 sin³ θ + 3 sin θ

but

sin 4θ = − 8 sin³ θ cos θ + 4 sin θ cos θ

It turns out cos nθ can always be written as a polynomial in cos θ, but sin nθ can be written as a polynomial in sin θ if and only if n is odd. We will prove this, say more about sin nθ for even n, then be more specific about the polynomials alluded to.

Proof

We start by writing exp(inθ) two different ways:

cos nθ + i sin nθ = (cos θ + i sin θ)ⁿ

The real part of the left hand side is cos nθ and the real part of the right hand side contains powers of cos θ and even powers of sin θ. We can convert the latter to cosines by replacing sin² θ with 1 − cos² θ.

The imaginary part of the left hand side is sin nθ. If n is odd, the right hand side involves odd powers of sin θ and even powers of cos θ, in which case we can replace the even powers of cos θ with even powers of sin θ. But if n is even, every term in the imaginary part will involve odd powers of sin θ and odd powers of cos θ. Every odd power of cos θ can be turned into terms involving a single cos θ and an odd power of sin θ.

We’ve proven a little more than we set out to prove. When n is even, we cannot write sin nθ as a polynomial in sin θ, but we can write it as cos θ multiplied by an odd degree polynomial in sin θ. Alternatively, we could write sin nθ as sin θ multiplied by an odd degree polynomial in cos θ.

Naming polynomials

The polynomials alluded to above are not arbitrary polynomials. They are well-studied polynomials with many special properties. Yesterday’s post on Chebyshev polynomials defined T_n(x) as the nth degree polynomial for which

T_n(cos θ) = cos nθ.

That post didn’t prove that the right hand side is a polynomial, but this post did. The polynomials T_n(x) are known as Chebyshev polynomials of the first kind, or sometimes simply Chebyshev polynomials since they come up in application more often than the other kinds.

Yesterday’s post also defined Chebyshev polynomials of the second kind by

U_n(cos θ) sin θ = sin (n+1)θ.

So when we say cos nθ can be written as a polynomial in cos θ, we can be more specific: that polynomial is T_n.

And when we say sin nθ can be written as sin θ times a polynomial in cos θ, we can also be more specific:

sin nθ = sin θ U_n−1(cos θ).

Solving trigonometric equations

A couple years ago I wrote about systematically solving trigonometric equations. That post showed that any polynomial involving sines and cosines of multiples of θ could be reduced to a polynomial in sin θ and cos θ. The results in this post let us say more about this polynomial, that we can write it in terms of Chebyshev polynomials. This might allow us to apply some of the numerous identities these polynomials satisfy and find useful structure.

Russian Morse Code

Posted on 12 February 2025 by John

I read something once about an American telegraph operator who had to switch over to using Russian Morse code during WWII. I wondered how hard that would be, but let it go. The idea came back to me and I decided to settle it.

It would be hard to switch from being able to recognize English words to being able to recognize Russian words, but that’s not the the telegrapher had to do. He had to switch from receiving code groups made of English letters to ones made of Russian letters.

Switching from receiving encrypted English to encrypted Russian is an easier task than switching from English plaintext to Russian plaintext. Code groups were transmitted at a slower speed than words because you can never learn to recognize entire code groups. Also, every letter of a code group is important; you cannot fill in anything from context.

Russian Morse code consists largely of the same sequences of dots and dashes as English Morse code, with some additions. For example, the Russian letter Д is transmitted in Morse code as -.. just like the English letter D. So our telegraph operator could hear -.. and transcribe it as D, then later change D to Д.

The Russian alphabet has 33 letters so it needs Morse codes for 7 more letters than English. Actually, it uses 6 more symbols, transmitting Е and Ё with the same code. Some of the additional codes might have been familiar to our telegrapher. For example Я is transmitted as .-.- which is the same code an American telegrapher would use for ä (if he bothered to distinguish ä from a).

All the additional codes used in Russian correspond to uncommon Latin symbols (ö, ch, ñ, é, ü, and ä) and so our telegrapher could transcribe Russian Morse code without using any Latin letters.

The next question is how the Russian Morse code symbols correspond to the English. Sometimes the correspondence is natural. For example, Д is the same consonant sound as D. But the correspondence between Я and ä is arbitrary.

I wrote about entering Russian letters in Vim a few weeks ago, and I wondered how the mapping of Russian letters to English letters implicit in Morse code corresponds to the mapping used in Vim.

Most Russian letters can be entered in Vim by typing Ctrl-k followed by the corresponding English letter and an equal sign. The question is whether Morse code and Vim have the same idea of what corresponds to what. Many are the same. For example, both agree that Д corresponds to D. But there are some exceptions.

Here’s the complete comparison.

     Vim   Morse
А    A=    A
Б    B=    B
В    V=    W
Г    G=    G
Д    D=    D
Е    E=    E
Ё    IO    E
Ж    Z%    V
З    Z=    Z
И    I=    I
Й    J=    J
К    K=    K
Л    L=    L
М    M=    M
Н    N=    N
О    O=    O
П    P=    P
Р    R=    R
С    S=    S
Т    T=    T
У    U=    U
Ф    F=    F
Х    H=    H
Ц    C=    C
Ч    C%    ö
Ш    S%    ch
Щ    Sc    Q
Ъ    ="    ñ
Ы    Y=    Y
Ь    %"    X
Э    JE    é
Ю    JU    ü
Я    JA    ä

Solving hard problems

Complex golden convergence

Related posts

Golden convergence

Related posts

Mnemonic images with Grok 3

Related posts

Standing with Intellectual Giants

Rich Sutton’s bitter lesson of AI

The Buenos Aires constant

Related posts

1 + 2 + 3 + … = −1/12

Multiple angle asymmetry

Proof

Naming polynomials

Solving trigonometric equations

Related posts

Russian Morse Code

Related posts