The ellipse hidden inside Pascal’s triangle

The nth row of Pascal’s triangle contains the binomial coefficients C(nr) for r ranging from 0 to n. For large n, if you print out the numbers in the nth row vertically in binary you can see a circular arc.

Here’s an example with n = 50.

You don’t have to use binary. Here’s are the numbers in the row for n = 100 written in base 10. You may be read the numbers if you squint, but you can see that the shape of the curve is something like a piece of a circle.

The length of the numerical representation of a number is roughly proportional to its logarithm. Changing the base only changes the proportionality constant. The examples above suggests that a plot of the logarithms of a row of Pascal’s triangle will be a portion of a circle, up to some scaling of one of the axes, so in general we have an ellipse.

Here’s a plot of log C(nr) for n = 1000. The shape is the same for all large n, so the choice of n = 1000 doesn’t matter much at all.

The best fitting ellipse is

\left(\frac{r - n/2}{b}\right)^2 + \left(\frac{y-y_0}{b}\right)^2= 1

where a = 554.2. b = 47.12, and y0 = −19.87.

The plots of log C(1000, r) and the ellipse lie on top of each other; the error is less than the width of a plotting line. Here’s a plot of the relative error in approximating log C(1000, r) with the ellipse.

Parabolic fit

WoЇfgang pointed out that the curve should be a parabola rather than an ellipse because the binomial distribution is asymptotically normal. Makes perfect sense.

So I redid my plots with the parabola that interpolates log C(nr) at 0, n/2, and n. This also gives a very good fit, but not as good!

But that’s not a fair comparison because it’s comparing the best (least squares) elliptical fit to a convenient parabolic fit.

So I redid my plots again with the least squares parabolic fit. The fit was better, but still not as good as the elliptical fit.

Part of the reason the ellipse fits better than the parabola has to do with the limitations of the central limit theorem. First of all, it applies to CDFs, not PDFs. Second, it applies to absolute error, not relative error. In practice, the CLT gives a good approximation in the middle but not in the tails. With all the curves mentioned above, the maximum error is in the tails.

Another part of the reason the ellipse fits better is that ellipses are more flexible than parabolas, and the discussion above suggests such flexibility would be useful. You can approximate a parabola arbitrarily well with an ellipse, so in the limit, elliptical fits include parabolic fits, as demonstrated in the next post.

Stacking positive and negative cannonballs

Imagine you are a soldier in charge of stacking cannonballs. Your fort has a new commander, a man with OCD who wants the cannonballs stacked in a very particular way.

The new commander wants the balls stacked in tetrahedra. The balls on the bottom of the stack are arranged into a triangle. Then the next layer is also a triangle, each ball resting in the space between balls in the previous layer. Keep doing this until there’s one ball on top.

For example, you might have 10 balls on the first layer, 6 on the next layer, then 3, then 1. That’s how you’d stack 20 cannonballs.

The number of cannonballs in a tetrahedron n layers high is called the nth tetrahedral number. We just showed that the 4th tetrahedral number is 20.

Not every number is a tetrahedral number, so it’s not always possible to stack a given number of cannonballs into a tetrahedron. But it’s always possible, as far as we know, to stack any number of cannonballs into no more than five tetrahedra. Usually it is possible to use no more than four tetrahedra. Naturally, your new commander would like the cannonballs arranged into no more than four tetrahedra.

There are two ways of meeting the commander’s wishes. One is to make a list of the number of cannonballs that cannot be arranged into four tetrahedral stacks and have a standing order that whenever the cannon fire is over, you always shoot one more ball to avoid having to stack a forbidden number of balls.

The other possibility is to allow the possibility of negative cannonballs.

We explain both possibilities below.

Tetrahedral numbers

Let T(n, 2) be the number of cannonballs on the bottom layer of a tetrahedron, arranged into a triangle with n balls on each side. This is the nth triangular number. The 2 in T(n, 2) stands for two dimensions.

T(n, 2) = 1 + 2 + 3 + \cdots + n = \frac{n(n+1)}{2}

Now let T(n, 3) be the nth tetrahedral number.

T(n, 3) = T_1 + T_2 + T_3 + \cdots + T_n = \frac{n(n+1)(n+2)}{6}

It’s possible to define and find a compact expression for T(n, d) for all positive integers d, which I wrote about here. But for this post we only need T(n, 3).

Pollock’s conjecture

Sir Frederick Pollock conjectured in 1850 that every positive integer can be written as the sum of at most 5 tetrahedral numbers, and all but a finite number of integers can be written as the sum of at most 4 tetrahedral numbers.

Both parts of the conjecture are still open. Nobody has found a number requiring more than 5 tetrahedral numbers in its sum, and it is believed that there are exactly 241 numbers that cannot be written as the sum of 4 tetrahedral numbers, the smallest being 17 and the largest being 343867. For a list of these numbers, see OEIS sequence A000797.

Negative cannonballs

It is possible to write any integer as the sum of four tetrahedral numbers if you take the formula for computing T(n, 3) as the definition of T(n, 3), which permits negative values of n [1]. Vadim Ponomarenko [2] showed that this could be done and shows how:

n = T(n, 3) + T(n-2, 3) + 2T(-n-1, 3)

Proof: expand the four terms on the right and simplify.

Satisfying the commander

I imagine the commander might not be pleased with the negative cannonball solution. If you could imagine negative cannonballs as balls missing from the stack, that would be one thing, but that’s not where we are. To stack 17 cannonballs into four tetrahedra, you’d need a stack of 969 balls, a stack of 680 balls and two stacks of 816 negative balls.

Better go back to saying certain numbers of cannonballs simply aren’t allowed, or require a fifth stack. But what if you tell the commander 17 balls cannot be arranged into four tetrahedral stacks and he tells you to try harder.

You could write a Python program to show by brute force when an arrangement isn’t possible. Brute force isn’t the most elegant approach, but it’s often the most convincing. This code will check whether n cannonballs can be stacked in four tetrahedral piles.

def t(n): return n*(n + 1)*(n + 2) // 6

def check(n):
    m = n + 1
    s = set()
    for i in range(m):
        for j in range(m):
            for k in range(m):
                for l in range(m):
                    s.add(t(i) + t(j) + t(k) + t(l))
    return n in s

You could be a little more clever by using itertools to eliminate the deeply nested for loops, though this goes against the spirit of brute force.

from itertools import product

def check2(n):
    m = n + 1
    s = set()
    for i, j, k, l in product(range(m), range(m), range(m), range(m)):
        s.add(t(i) + t(j) + t(k) + t(l))
    return n in s

Update: The code above could be made much more efficient. See the C code in the comment.

Related posts

[1] We tend to forget definitions once we have convenient formulas. This an be good or bad. It’s bad for comprehension. For example, calculus students usually don’t appreciate that the fundamental theorem of calculus is a theorem. They learn to compute integrals by using antiderivatives and come to think that the antiderivative is the definition of an integral.

The upside of blurring definitions and formulas is that it is common to use a formula to generalize a definition, as we do here. This is often a very useful approach, but it has to be done consciously.

[2] Vadim Ponomarenko. Pollock’s Generalized Tetrahedral Numbers Conjecture. The American Mathematical Monthly Vol. 128, No. 10 (December 2021), p. 921.

How Mathematica Draws a Dragonfly

Mathematica includes code to draw various whimsical images. For example, if you enter the following command in Mathematica

    Entity["PopularCurve", "DragonflyCurve"][
        EntityProperty["PopularCurve", "Image"]]

you get an image of a dragonfly.

It draws such images with Fourier series. You can tell by asking for the parameterization of the curve. If you enter

    Entity["PopularCurve", "DragonflyCurve"][
        EntityProperty["PopularCurve", "ParametricEquations"]]

you’ll get the following, after some rearrangement.

    Function[t, {
        (7714/27) Sin[47/20 - t] + (1527/37) Sin[16/5 - 2t] + 
        (3202/39) Sin[108/41 - 3t] + … + 2/9 Sin[15/19 - 81 t], 
        (9406/37) Sin[29/7 - t] + (3591/53) Sin[28/13 - 2t] + 
        (1111/20) Sin[9/23 - 3t] + … -(3/29) Sin[8/23 + 81 t]
        }]

The function is a parameterized curve, taking t to (x(t), y(t)) where x(t) and y(t) are Fourier series including frequencies up to sin(81t). Each of the sine terms has a phase shift that could be eliminated by expressing sin(φ + ωt) as a linear combination of sin(ωt) and cos(ωt).

Presumably somebody drew the dragonfly, say in Adobe Illustrator or Inkscape, then did a Fourier transform of a sampling of the curve.

To make sure Mathematica wasn’t doing anything behind the scenes that I wasn’t aware of, I reproduced the dragonfly curve by porting the Mathematica code to Python.

The number of Fourier components needed to draw an image depends on the smoothness and complexity of the image. The curve for π, for example, the highest frequency component is sin(32t).

The triceratops curve is more complicated and Mathematica uses frequencies up to sin(188t).

 

How often do the hands of a clock line up?

How often do the hour and minute hand of an analog clock point in the same direction?

The hands start pointing the same direction at midnight, then the hour hand moves clockwise (!) by 360° in 12 hours. The minute hand moves clockwise at the rate of 360° in one hour. So when does the minute hand, moving at 360°/ hour lap the hour hand moving at 30° per hour? When

360t = 30t mod 360

or equivalently

12tt mod 12

which means t is a multiple of 12/11.

So the times are at 12n/11 hours for n = 0, 1, 2, …, 21, which means 22 times a day. The first time after midnight is 12/11 hours later, or at 5/11 of a minute past 1:05.

If you also consider the second hand, the three hands will only line up at midnight and noon.

Representing octonions as matrices, sorta

It’s possible to represent complex numbers as a pair of real numbers or 2 × 2 matrices with real entries.

z \leftrightarrow (a, b) \leftrightarrow \begin{bmatrix}a & -b \\ b & a \end{bmatrix}

And it’s possible to represent quaternions as pairs of complex numbers or 2 × 2 matrices with complex entries

q \leftrightarrow (z_0, z_1) \leftrightarrow \begin{bmatrix} z_0 & z_1 \\ -z_1^* & z_0^* \end{bmatrix}

were z* is the complex conjugate of z.

And it’s also possible to represent octonions as pairs of quaternions or 2 × 2 matrices with quaternion entries, with a twist.

o \leftrightarrow (q_0, q_1) \leftrightarrow \begin{bmatrix} q_0 & q_1 \\ -q_1^* & q_0^* \end{bmatrix}

where q* is the quaternion conjugate of q.

Matrix multiplication is associative, but octonion multiplication is not, so something has to give. We have to change the definition of matrix multiplication slightly.

\begin{bmatrix} \alpha_0 & \alpha_1 \\ \alpha_2 & \alpha_3 \end{bmatrix}\circ\begin{bmatrix} \beta_0 & \beta_1 \\ \beta_2 & \beta_3 \end{bmatrix}=\begin{bmatrix} \alpha_0\beta_0+\beta_2\alpha_1 & \beta_1\alpha_0+\alpha_1\beta_3\\ \beta_0\alpha_2+\alpha_3\beta_2 & \alpha_2\beta_1+\alpha_3\beta_3 \end{bmatrix}

In half the products, the beta term comes before the alpha term. This wouldn’t matter if the alpha and beta terms commuted, e.g. if they were complex numbers this would be ordinary matrix multiplication. But the alphas and betas are quaternions, and so order matters, and the matrix product defined above is not the standard matrix product.

Going back to the idea of matrices of matrices that I wrote about a few days ago, we could represent the octonions as 2 × 2 matrices whose entries are 2 × 2 matrices of complex numbers, etc.

If you look closely at the matrix representations above, you’ll notice that the matrix representations of quaternions and octonions doesn’t quite match the pattern of the complex numbers. There should be a minus sign in the top right corner and not in the bottom left corner. You could do it that way, but there’s a sort of clash of conventions going on here.

Octonions sometimes associate

You can multiply pairs of real numbers using the rules of complex numbers. Complex numbers have all the algebraic structure of the real numbers, i.e. they form a field.

There is a general process, the Cayley-Dickson construction, that let’s you bootstrap multiplication from 1 real number to 2, from 2 to 4, from 4 to 8, etc. You can repeat the process as many times as you like, defining multiplication on lists of 2n numbers, but you lose structure as you go.

Quaternions

Multiplication for 4-tuples gives the quaternions. The quaternions retain most of the structure of the real and complex numbers. Multiplication is associative. Non-zero elements have a multiplicative inverse, i.e. you can divide. And multiplication plays well with the norm:

|| xy || = || x || · || y ||.

But multiplication is not commutative: in general, xy ≠ yx,

Octonions

Multiplication of 8-tuples produces the octonions . It’s still true that non-zero elements have a multiplicative inverse, and multiplication still plays well with the norm as above. But now, not only is multiplication not commutative, it’s not even associative: in general, (xy)z ≠ x(yz). It’s the “in general” part that this post wants to elaborate on.

The subalgebra generated by any two elements is associative. That means, for example, that (xy)xx(yx). If you fix x and y, and look at all the octonions you can form by taking adding, multiplying, conjugating, and inverting these elements, as well as multiplying them by a real number, you get a set of octonions for which multiplication is associative.

In fact, the subalgebra generated by two octonions is isomorphic to either the real numbers, the complex numbers, or the quaternions, depending on the two octonions you start with.

This was brought to my attention by a common on a post on octonions from a few years ago. Someone pointed out that an equation I had written

x* = − (x + (e1 xe1 + … + (e7 xe7) / 6

could be written more simply as

x* = − (x + e1 x e1 + … + e7 x e7) / 6.

because each term only involves two distinct octonions.

Sedenions

The next step, multiplying 16-tuples of real numbers, gives the sedenions [1]. Now we lose even more structure. Multiplication is not commutative, not associative, and it’s possible for two non-zero numbers to have a zero product. That means the norm property

|| xy || = || x || · || y ||

goes out the window since the left size can be zero when the right side is not.

Sedenions, and indeed all Cayley-Dickson algebras, are flexible, which means (xy)xx(yx). But it’s not true more generally that the algebra generated by two sedenions is associative.

Trigintaduonions

The next rung in the Cayley-Dickson ladder is the family of 32-tuples known as the trigintaduonions [2]. The sedenions are a mess, and they’re a subset of the trigintaduonions, so the trigintaduonions are a mess. But at least they’re flexible.

Summary of properties

More octonion posts

[1] From the Latin word sedecim for 16.

[2] From the Latin triginta for 30 and duo for 2.

Structured frameworks for complex systems

I wrote two unrelated blog posts this morning, one about the math paper H = W and one about a joke putting numbers into the D&D alignment matrix.

I used Grok to look up the reference to the H = W paper, and to confirm that the alignment matrix originated with Dungeons & Dragons [1]. I didn’t start a new chat when I asked a new question, but Grok felt obligated to connect the two queries. Near the end of the response to my D&D query, Grok added a section “Connection to Sobolev Spaces Query” with the following content:

While the alignment system and Sobolev spaces (from your previous query about the “H = W” paper) are unrelated mathematically, both reflect efforts to create structured frameworks for complex systems—alignments for character ethics, Sobolev spaces for function analysis. The “Neutral” alignment, like the regularity of distributions, represents a balanced or intermediate state, bridging extremes (Law/Chaos or regular/singular distributions).

Emphasis in the original.

On the one hand this is nonsense. On the other hand, Sobolev spaces and the D&D alignment matrix are structured frameworks for complex systems!

Grok’s response reminds me of when I would doze off while studying and my brain would find marvelous nonsensical connections between everything that I had been studying, such as a way to integrate French verbs.

Aligning functions

Classifying functions into the D&D alignment grid is silly, but what if we were to do it anyway? Here’s my quick take.

The good and lawful functions have classical definitions. C is the space of continuous functions, C² the space of functions with two continuous derivatives, and C the space of infinitely differentiable functions.

The function W classified as chaotic good is Weierstrass’ example of a nowhere differentiable, everywhere continuous function.

The true neutral functions Wk,p (no relation to the Weierstrass W) are the Sobolev spaces from the earlier post. The larger the k, the more regular the functions. Lp  is Wk,p with k = 0. These functions are not necessarily defined pointwise but only modulo a set of measure zero.

The function 1 is the indicator function of the rationals, i.e. the function that equals 1 if a number is rational and 0 if it is irrational.

Here δ is Dirac’s delta “function” and δ is its nth derivative. This is a generalized function, not a function in any classical sense, and taking its derivative only makes it worse.

[1] The original 1974 version of D&D had one axis: lawful, neutral, and chaotic. In 1977 the game added the second axis: good, neutral, and evil.

 

Dungeons, Dragons, and Numbers

Dan Piponi posted a chart like the one below on Mastodon.

| | Lawful | Neutral | Chaotic | |---------+--------+---------+---------| | Good | 6 | φ | π | | Neutral | 1 | 0 | δ | | Evil | -1 | Ω | Rayo |

At the risk of making a joke not funny by explaining it, I’d like to explain Dan’s table.

The alignment matrix above comes from Dungeons & Dragons and has become a kind of meme.

The number neutral good number φ is the golden ratio, (1 + √5)/2 = 1.6180339….

Presumably readers of this blog have heard of π. Perhaps it ended up in the chaotic column because it is believed to be (but hasn’t been proven to be) a “normal number,” i.e. the distribution of its digits are normal when expressed in any base.

The number δ = 4.6692016… above is Feigenbaum’s constant, something I touched on three weeks ago. I suppose putting it in the chaos column is a kind of pun because the number comes up in chaos theory. It is believed that δ is transcendental, but there’s no proof that it’s even irrational.

The number Ω is Chaitin’s constant. In some loose sense, this number is the probability that a randomly generated program will halt. It is a normal number, but it is also uncomputable. The smallest program to compute the first n bits of Ω must have length O(n) and so no program of any fixed length can crank out the bits of Ω.

Rayo’s number is truly evil. It is said to be the largest named number. Here’s the definition:

The smallest number bigger than any finite number named by an expression in any language of first-order set theory in which the language uses only a googol symbols or less.

You could parameterize Rayo’s definition by replacing googol (= 10100) with an integer n. The sequence Rayo(n) starts out small enough. According to OEIS Rayo(30) = 2 because

the 30 symbol formula “(∃x_1(x_1∈x_0)∧(¬∃x_1(∃x_2((x_2∈x_1∧x_1∈x_0)))))” uniquely defines the number 1, while smaller formulae can only define the number 0, the smallest being the 10 symbol “(¬∃x_1(x_1∈x_0))”.

OEIS does not give the value of Rayo(n) for any n > 30, presumably because it would be very difficult to compute even Rayo(31).

My favorite paper: H = W

A paper came out in 1964 with the title “H = W.” The remarkably short title was not cryptic, however. The people for whom the paper was written knew exactly what it meant. There were two families of function spaces, one denoted with H and another denoted with W, that were speculated to be the same, and this paper proved that indeed they were.

This post will explain the significance of the paper. It will be more advanced and less self-contained than most of my posts. If you’re still interested, read on.

Definitions

The derivative of a function might exist in a generalized sense when it doesn’t exist in the classical sense. I give an introduction to this idea in the post How to differentiate a non-differentiable function. The generalized derivative is a linear functional on test functions [1] and may not correspond to a classical function. The delta “function” δ(x) is the classic example.

A regular distribution is a distribution whose action on a test function is equal to multiplying the test function by a locally integrable function and integrating.

Given Ω ⊂ ℝn, the Sobolev space Wk,p(Ω) consists of functions whose partial derivatives of order up to k are regular distributions that lie in the space Lp(Ω).

For example, let I be the interval (−1, 1) and let f(x) = |x|. The function f is not differentiable at 0, but the generalized derivative of f is the sign function sgn(x), which is in Lp(I) for all p. The generalized derivative of sgn(x) is 2δ(x), which is not a regular distribution [2], and so fW1,p(I) but fW2,p(I).

The norm on Wk,p(Ω) is the sum of the Lp norms of the function and each of its partial derivatives up to order k.

The Sobolev space Hk,p(Ω) is the closure of test functions in the norm of the space Wk,p(Ω).

Theorem

It’s not obvious a priori that these two ways of defining a Sobolev space are equivalent, but James Serrin and Normal George Meyers [3] proved in 1964 that for all domains Ω, and for all non-negative integers k, and for all 1 ≤ p in < ∞ we have

Hk,p(Ω) = Wk,p(Ω).

The proof is remarkably brief, less than a page.

Significance

Why does this theorem matter? Sobolev spaces are the machinery of the modern theory of differential equations. I spent a lot of time in my mid 20s working with Sobolev spaces.

The grand strategy of PDE research is to first search for generalized solutions to an equation, solutions belonging to a Sobolev space, then if possible prove that the generalized solution is in fact a classical solution.

This is analogous to first proving an algebraic equation has complex solution, then proving that the complex solution is real, or proving that an equation has a real number solution, then proving that the real solution is in fact an integer. It’s easier to first find a solution in a larger space, then if possible show that the thing you found belongs to a smaller space.

Related posts

[1] A test function in this context is an infinitely differentiable function of compact support. In other contexts a test function is not required to have compact support but is required to go asymptotically approach zero rapidly, faster than the reciprocal of any polynomial.

[2] The classical derivative of sgn(x) is equal to zero almost everywhere. But the derivative as a distribution is not zero. The pointwise derivative may not equal the generalized derivative.

[2] Norman G, Meyers; Serrin, James (1964), “H = W”, Proceedings of the National Academy of Sciences, 51 (6): 1055–1056

Wilkinson’s polynomial

If you change the coefficients of a polynomial a little bit, do you change the location of its zeros a little bit? In other words, do the roots of a polynomial depend continuously on its coefficients?

You would think so, and you’d be right. Sorta.

It’s easy to see that small change to a coefficient could result in a qualitative change, such as a double root splitting into two distinct roots close together, or a real root acquiring a small imaginary part. But it’s also possible that a small change to a coefficient could cause roots to move quite a bit.

Wilkinson’s polynomial

Wilkinson’s polynomial is a famous example that makes this all clear.

Define

w(x) = (x − 1)(x − 2)(x − 3)…(x − 20) = x20 − 210x19 + …

and

W(x) = w(x) − 2−23 x19

so that the difference between the two polynomials is that the coefficient of x19 in the former is −210 and in the latter the coefficient is −210.0000001192.

Surely the roots of w(x) and W(x) are close together. But they’re not!

The roots of w are clearly 1, 2, 3, …, 20. But the roots of W are substantially different. Or at least some are substantially different. About half the roots didn’t move much, but the other half moved a lot. If we order the roots by their real part, the 16th root splits into two roots, 16.7308 ± 2.81263 i.

Here’s the Mathematica code that produced the plot above.

    w[x_] := Product[(x - i), {i, 1, 20}]
    W[x_] := w[x] - 2^-23 x^19
    roots = NSolve[W[x] == 0, x]
    ComplexListPlot[x /. roots]

A tiny change to one coefficient creates a change in the roots that is seven orders of magnitude larger. Wilkinson described this discovery as “traumatic.”

Modulus of continuity

The zeros of a polynomial do depend continuously on the parameters, but the modulus of continuity might be enormous.

Given some desired tolerance ε on how much the zeros are allowed to move, you can specify a corresponding degree of accuracy δ on the coefficients to meet that tolerance, but you might have to have extreme accuracy on the coefficients to have moderate accuracy on the roots. The modulus of continuity is essentially the ratio ε/δ, and this ratio might be very large. In the example above it’s roughly 8,400,000.

In terms of the δ-ε definition of continuity, for every ε there is a δ. But that δ might have to be orders of magnitude smaller than ε.

Modulus of continuity is an important concept. In applications, it might not matter that a function is continuous if the modulus of continuity is enormous. In that case the function may be discontinuous for practical purposes.

Continuity is a discrete concept—a function is either continuous or it’s not—but modulus of continuity is a continuous concept, and this distinction can resolve some apparent paradoxes.

For example, consider the sequence of functions fn(x) = xn on the interval [0, 1].  For every n, fn is continuous, but the limit is not a continuous function: the limit equals 0 for x < 1 and equals 1 at x = 1.

But this makes sense in terms of modulus of continuity. The modulus of continuity of fn(x) is n. So while the sequence is continuous for large n, it is becoming “less continuous” in the sense that the modulus of continuity is increasing.