Gluons, quarks, letters, and envelopes

Yesterday I wrote a couple of posts about a combinatorics question that lead to OEIS sequence A000255. That page has this intriguing line:

This comment derives from a family of recurrences found by Malin Sjodahl for a combinatorial problem for certain quark and gluon diagrams (Feb 27 2010)

I love how pulling on a thread can lead you to unexpected places. What in the world does my little permutation problem have to do with particle physics‽

I found Malin Sjödahl’s website, and apparently the citation above refers to this paper from 2009. The author’s site doesn’t list any papers from 2010. Maybe the paper was published electronically in 2009 and in print in 2010.

Counting tensors

The paper is mostly opaque to me since I don’t know particle physics, but at one point Sjödahl says

The problem of finding all such topologies is equivalent to the number of ways of mapping N elements to each other without mapping a single one to itself …

and says that the solution is

N! \sum_{i=0}^N \frac{(-1)^i}{i!} \to \frac{N!}{e}

Sjödahl is not counting physical permutations but the number possible tensors associated with gluons and quark interaction diagrams.

The right-hand side above is essentially the same as the asymptotic estimate for the function Q(n) in the previous post.

I didn’t find the recurrence that the OEIS comment alluded to. Perhaps I’m not looking at the same paper. Perhaps I’m not looking hard enough because I’m skimming a paper whose contents I don’t understand.

The Montmort letter problem

In more mathematical terminology, Sjödahl is counting the number of permutations of N objects with no fixed point, known as the number derangements of a set N objects.

If you divide by the number of possible permutations N! you have the probability that a random permutation moves every object.

This was historically known as the Montmort letter problem, named after Pierre-Remond Montmort who asked the following question. If N letters are randomly assigned to N envelopes, what is the probability that no letter ends up in the correct envelope?

The probability converges to 1/e as N approaches infinity. It approaches 1/e quickly, and so this is a good approximation even for moderate N. More on the rate of convergence in the next post.

Previous posts in this series

Example using a generating function to find asymptotic behavior

The previous post looked at how to compute Q(n), the number of permutations of 1, 2, 3, …, n + 1 that contain no consecutive integers. We found a way to numerically compute Q(n) but no analytic expression that would let us compute asymptotics.

The sequence Q(n) is sequence A000255 in OEIS, and OEIS gives the exponential generating function of Q:

\sum_{n=0}^\infty \frac{Q(n)}{n!} = \frac{\exp(-x)}{(1-x)^2}

We can use this function and Theorem 5.2.1 from [1] to get the asymptotic form of Q(n). According to that theorem, the coefficient of xn in our generating function is asymptotically the same as the coefficient of xn in the principle part at the singularity at 1. This principle part is

\frac{1}{e(1-x)^2} + \frac{1}{e(1-x)} = \frac{1}{e}\left(2 + 3x + 4x^2 + 5x^3 + \cdots\right)

and so the coefficient of xn is (n + 2)/e.

So

Q(n) \sim \frac{n + 2}{e} n!

and Q(n) / (n + 1)! approaches 1/e for large n, as conjectured in the previous post.

[1] Herbert S. Wilf. Generatingfunctionology. Second edition.

Permutations with no consecutive elements

I was working on something this week that made me wonder how many permutations break up all consecutive elements. For example, if we’re looking at permutations of abcde then acebd counts, but acdbe does not. I’d like to count the number of such permutations, and estimate for large N the number of permutations of N elements with no consecutive terms.

A brief search lead to OEIS sequence A000255 which gives a recursive formula for the answer. Before looking at that, let’s define some terms and do some brute force calculation to get a feel for the problem.

Brute force calculation

Given a positive integer n, define Q(n) to be the number of partitions p of 1, 2, 3, …, n + 1 such that for no k does p(k + 1) = p(k) + 1.

from itertools import permutations

def no_consec(perm):
    for k in range(len(perm) - 1):
        if perm[k+1] == perm[k] + 1:
            return False
    return True

def Q(n):
    count = 0
    for p in permutations(range(n+1)):
        if no_consec(p):
            count += 1
    return count

We could use this code to compute Q(n) for moderate-sized n, but the time required to compute Q(n) this way is proportional to n! and so that won’t scale well.

Recurrence relation

OEIS sequence A000255 gives the values of what we’re calling Q(n), and the first few values match the output of the code above. But OEIS does better, giving a recursive solution for Q:

Q(n) = n Q(n − 1) + (n − 1) Q(n − 2)

for n > 1 with initial conditions Q(0) = Q(1) = 1.

As with Fibonacci numbers, you could implement this easily as a recursive function

def Q2(n):
    if n in[0, 1]:
        return 1
    return n*Q2(n-1) + (n-1)*Q2(n-2)

but it is much faster to implement iteratively.

def Q3(n):
    q = [1]*(n+2)
    for k in range(2, n+1):
        q[k] = k*q[k-1] + (k-1)*q[k-2]
    return q[n]

You could make the recursive implementation faster with memoization, using lru_cache from functools. However, memoization does not prevent recursion; it just speeds up recursion using a cache, and you can still get an error saying maximum recursion depth exceeded.

Asymptotics

We can calculate Q(n) for large n, but we don’t have an analytic expression [1] that will let us find the asymptotic behavior. I intended to work that out in my next post.

Empirically, it seems Q(n) approaches (n + 1)!/e as n goes to infinity, but I don’t have a proof of that at the time of writing. If this is correct, this means that in the limit the probability of a permutation having no fixed point equals the probability of it having no consecutive values.

Update: Proof

***

[1] There is an analytic expression for Q(n) in terms of a hypergeometric function

Q(n) = (−1)n 2F0(2, −n; ; 1)

but this expression isn’t helpful in finding the asymptotic behavior.

Separable functions in different contexts

I was skimming through the book Mathematical Reflections [1] recently. He was discussing a set of generalizations [2] of the Star of David theorem from combinatorics.

\gcd\left(\binom{n - 1}{r - 1}, \binom{n}{r+1}, \binom{n+1}{r}\right) = \gcd\left(\binom{n-1}{r}, \binom{n+1}{r+1}, \binom{n}{r-1}\right)

The theorem is so named because if you draw a Star of David by connecting points in Pascal’s triangle then each side corresponds to the vertices of a triangle.

diagram illustrating the Star of David Theorem

One such theorem was the following.

\binom{n - \ell}{r - \ell} \binom{n - k}{r} \binom{n - k - \ell}{r - k} = \binom{n-k}{r-k} \binom{n - \ell}{r} \binom{n - k - \ell}{r - \ell}

This theorem also has a geometric interpretation, connecting vertices within Pascal’s triangle.

The authors point out that the binomial coefficient is a separable function of three variables, and that their generalized Star of David theorem is true for any separable function of three variables.

The binomial coefficient C(nk) is a function of two variables, but you can think of it as a function of three variables: n, k, and nk. That is

{n \choose k} = f(n) \, g(k) \, g(n-k)

where f(n) = n! and g(k) = 1/k!.

I was surprised to see the term separable function outside of a PDE context. My graduate work was in partial differential equations, and so when I hear separable function my mind goes to separation of variables as a technique for solving PDEs.

Coincidentally, I was looking a separable coordinate systems recently. These are coordinate systems in which the Helmholtz equation can be solved by separable function, i.e. a coordinate system in which the separation of variables technique will work. The Laplacian can take on very different forms in different coordinate systems, and if possible you’d like to choose a coordinate system in which a PDE you care about is separable.

Related posts

[1] Peter Hilton, Derek Holton, and Jean Pedersen. Mathematical Reflections. Springer, 1996.

[2] Hilton et al refer to a set of theorems as generalizations of the Star of David theorem, but these theorems are not strictly generalizations in the sense that the original theorem is clearly a special case of the generalized theorems. The theorems are related, and I imagine with more effort I could see how to prove the older theorem from the newer ones, but it’s not immediately obvious.

A variation on Rock, Paper, Scissors

Imagine in a game of Rock, Paper, Scissors one player is free to play as usual but the other is required to choose each option the same number of times. That is, in 3n rounds of the game, the disadvantaged player much choose Rock n times, Paper n times, and Scissors n times.

Obviously the unrestricted player would be expected to win more games, but how many more? At least one, because the unrestricted player knows what the restricted player will do on the last round.

If n is large, then in the early rounds of the game it makes little difference that one of the players is restricted. The restriction doesn’t give the unrestricted player much exploitable information. But in the later rounds of the game, the limitations on the restricted player’s moves increase the other players chances of winning more.

It turns out that the if the unrestricted player uses an optimal strategy, he can expect to win O(√n) more rounds than he loses. More precisely, the expected advantage approaches 1.4658√n as n grows. You can find a thorough analysis of the problem in this paper.

Related posts

Up-down permutations

An up-down permutation of an ordered set is a permutation such that as you move from left to right the permutation alternates up and down. For example

1, 5, 3, 4, 2

is an up-down permutation of 1, 2, 3, 4, 5 because

1 < 5 > 3 < 4 > 2.

Up-down permutations are also called alternating permutations for obvious reasons. The number of up-down permutations of a set of size n is often denoted A(n) because these permutations were first studied by Désiré André. It is also sometimes denoted En, where the E alludes to Euler.

André found the generating function for A(n):

\sum_{n=0}^\infty A(n) \frac{x^n}{n!} = \sec x + \tan x.

You could read this as saying sec(x) + tan(x) is the exponential generating function for A(n) or as saying sec(x) + tan(x) is the ordinary generating function for

P(n) =\frac{A(n)}{n!}

where P(n) is the proportion of all permutations of the digits 1 through n which are alternating permutations.

It’s amazing that up-down permutations have anything to do with trig functions, but they do.

Knowing the (exponential) generating function for A(n) lets us use generating function techniques to prove various things about these numbers. For example, we can use the generating function to discover their asymptotic behavior.

The secant and tangent functions are well behaved at 0, but both have a singularity at π/2. This means the generating function above is not just a formal power series but an analytic power series with radius of convergence π/2, the distance from the center of the power series to the closest singularity. It follows that

P(n) = {\cal O}\left( \left( \frac{2}{\pi} \right)^n \right)

and so the proportion of permutations which are alternating permutations decreases exponentially for large n.

Related posts

Ruzsa distance

A few days ago I wrote about Jaccard distance, a way of defining a distance between sets. The Ruzsa distance is similar, except it defines the distance between two subsets of an Abelian group.

Subset difference

Let A and B be two subsets of an Abelian (commutative) group G. Then the difference A B is defined the set

A - B = \{a - b \mid a \in A, b \in B \}

As is customary with Abelian groups, we denote the group operation by + and a b means the group operation applied to a and the inverse of b.

For example, let G be the group of integers mod 10. Let A = {1, 5} and B = {3, 7}. Then A B is the set {2, 4, 8}. There are only three elements because 1 − 3 and 5 − 7 are both congruent to 8.

Ruzsa distance

The Ruzsa distance between two subsets of an Abelian group is defined by

d(A,B) = \log \frac{|A-B|}{|A|^{1/2}\, |B|^{1/2}}

where |S| denotes the number of elements in a set S.

The Ruzsa distance is not a metric, but it fails in an unusual way. the four axioms of a metric are

  1. d(x, x) = 0
  2. d(x, y) > 0 unless x = y
  3. d(x, y) = d(y, x)
  4. d(x, z) ≤ d(x, y) + d(y, z)

The first axiom is usually trivial, but it’s the only one that doesn’t hold for Ruzsa distance. In particular, the last axiom, the triangle inequality, does hold.

To see that the first axiom does not always hold, lets again let G be the integers mod 10 and let A = {1, 3}. Then A A is the set {0, 2, 8} and d(A, A) = log 3/2 > 0.

Sometimes d(A, A) does equal zero. If A = G then A A = A, and so d(A, A) = log 1 = 0.

Entropic Ruzsa distance

If we switch from subsets of G to random variables taking values in G we can define an analog of Ruzsa distance between random variables X and Y, the entropic Ruzsa distance

d_{\text{ent}}(X, Y) = H(X' + Y') - \frac{1}{2}H(X) - \frac{1}{2}H(Y)

where X′ and Y′ are independent copies of X and Y and H is Shannon entropy. For more on entropic Ruzsa distance, see this post by Terence Tao.

Note that if A and B are subsets of G, and X and Y are uniform random variables with support on A and B respectively, then the negative terms above correspond to the log of 1/|A|½ |B|½.  The H(X′ + Y′) term isn’t the log of |AB| though because for one thing its a sum rather than a difference. For another, the sum of uniform random variables may not be uniform: there may be more than one way to end up at a particular sum, and so sum values will have higher probability.

 

The 10th Dedekind number

The nth Dedekind number M(n) is the number of monotone Boolean functions of n variables. The 9th Dedekind number was recently computed to be

M(9) = 286386577668298411128469151667598498812366.

The previous post defines monotone Boolean functions and explicitly enumerates the functions for one, two, or three variables. As that post demonstrates, M(1) = 3, M(2) = 6, and M(3) = 20. But as n increases, M(n) increases rapidly, with M(9) being on the order of 1041.

Although computing the Dedekind numbers exactly is difficult—M(8) was computed in 1991 and M(9) now in 2023—there is an explicit formula for these numbers, and much is known about their asymptotic growth. This post speculates about what M(10) might be.

Write the number k in binary and let bik be its ith bit:

b_i^k=\left\lfloor\frac{k}{2^i}\right\rfloor - 2\left\lfloor\frac{k}{2^{i+1}}\right\rfloor

Then the nth Dedekind number is given by

M(n)=\sum_{k=1}^{2^{2^n}} \prod_{j=1}^{2^n-1} \prod_{i=0}^{j-1} \left(1-b_i^k b_j^k\prod_{m=0}^{\log_2 i} (1-b_m^i+b_m^i b_m^j)\right)

and so

M(10)=\sum_{k=1}^{2^{1024}} \prod_{j=1}^{1023} \prod_{i=0}^{j-1} \left(1-b_i^k b_j^k\prod_{m=0}^{\log_2 i} (1-b_m^i+b_m^i b_m^j)\right)

In principle, all you have to do to compute M(10) is evaluate the sum above. However, since this sum has more than 10308 terms, it would take a while.

What can we say about M(10) without computing it? The number of monotone Boolean functions of n variables is less than the total number of Boolean functions of n variables, which equals

2^{2^n}

That tells us M(10) < 1.8 × 10308.

There are more useful bounds. It has been proven that

{n\choose \lfloor n/2\rfloor}\le \log_2 M(n)\le {n\choose \lfloor n/2\rfloor}\left(1+O\left(\frac{\log n}{n}\right)\right)

This gives us a definite lower bound but not a definite upper bound. We know M(10) ≥ 2252 which is approximately 7.237 × 1075, but we don’t know what the big-O term is. All we know is that for sufficiently large n, this term is smaller than some multiple of log(n)/n. How large does n need to be and what is this constant? I don’t know. Maybe researchers in this area have some partial results.

Let’s take a guess at the upper bound by seeing what the big-O term was for M(9). Find k such that

\log_2 M(9) = \binom{9}{4}\left(1 + k \frac{\log 9}{9}\right)

We get

k = \left(\frac{\log_2M(9)}{126} - 1 \right)\frac{9}{\log 9} \approx 0.3809

and we can use this to guess that

\log_2 M(10) \stackrel{?}{=} \binom{10}{5}\left(1 + 0.3809 \frac{\log 10}{10}\right) \approx 274.1

which would imply M(10) = 3.253 × 1082.

So to recap, we know for certain that M(10) is between 7.237 × 1075 and 1.8 × 10308, and our guess based on the heuristic above is that M(10) = 3.253 × 1082.

Computing Stirling numbers with limited integers

A couple days ago I wrote a post about a probability problem that involved calculating Stirling numbers. There are two kinds of Stirling numbers, creatively called “Stirling numbers of the first kind” and “Stirling numbers of the second kind.” The second kind come up more often in application, and so when authors say “Stirling numbers” without qualification, they mean the second kind. That’s the convention I’ll use here.

The Stirling number S(n, k) can be computed via the following series, though we will take a different approach for reasons explained below.

S(n,k) = \frac{1}{k!}\sum_{i = 0}^k (-1)^i {k \choose i} (k-i)^n

Problem statement

This made me think of the following problem. Suppose you want to calculate Stirling numbers. You want to use integer arithmetic in order to get exact results.

And suppose you can use integers as large as you like, but you have to specify the maximum integer size before you start calculating. Imagine you have to pay $N to work with N-digit integers, so you want to not use a larger value of N than necessary.

Intermediate results

There are a couple problems with using the equation above. Direct implementation of the equation would first calculate k! S(n, k) first, then divide by k! to get S(n, k). This would be costly because it would require calculating a number k! times bigger than the desired final result. (You could refactor the equation so that there is no division by k! at the end, but this gives an expression that includes non-integer terms.)

In the earlier post, I actually needed to calculate k! S(n, k) rather than S(n, k) and so this wasn’t a problem. This brings up the second problem with the equation above. It’s an alternating series, and it’s not clear whether evaluating the series produces intermediate results that are larger than the final result.

Recursion

Stirling numbers can be computed recursively using

S(n+1, k) = k S(n,k) + S(n, k-1)

with the base cases S(n, n) = 1 for n ≥ 0 and S(n, 0) = S(0, n) for n > 0. This computes Stirling numbers via a sum of smaller Stirling numbers, and so the intermediate results are no larger than the final result.

So now we’re left with the question of how big Stirling numbers can be. Before computing S(n, k) you need an upper bound on how many digits it will have.

Bounding Stirling numbers

Stirling numbers can be bound in terms of binomial coefficients.

S(n, k) \leq \frac{1}{2} \binom{n}{k}k^{n-k}

This reduces the problem to finding an upper bound on binomial coefficients. I wrote about a simple bound on binomial coefficients here.

Hypergeometric distribution symmetry

One of these days I’d like to read Feller’s probability book slowly. He often says clever things in passing that are easy to miss.

Here’s an example from Feller [1] that I overlooked until I saw it cited elsewhere.

Suppose an urn contains n marbles, n1 red and n2 black. When r marbles are drawn from the urn, the probability that precisely k of the marbles are red follows a hypergeometric distribution.

\frac{\binom{n_1}{k}\binom{n-n_1}{r-k}} {\binom{n}{r}}

Feller mentions that this expression can be rewritten as

 \frac{\binom{r}{k} \binom{n-r}{n_1 - k}} { \binom{n}{n_1} }

What just happened? The roles of the total number of red marbles n1 and the sample size r, have been swapped.

The proof is trivial—expand both expressions using factorials and observe that they’re equal—but that alone doesn’t give much insight. It just shows that two jumbles of symbols represent the same quantity.

The combinatorial interpretation is more interesting and more useful. It says that you have two ways to evaluate the probability: focusing on the sample or focusing on the population of red marbles.

The former perspective says we could multiply the number of ways to choose k marbles from the supply of n1 red marbles and rk black marbles from the supply of nn1 black marbles, then divide by the number of ways to choose a sample of r marbles from an urn of n marbles.

The latter perspective says that rather than partitioning our sample of r marbles, we could think of partitioning the population of red marbles into two groups: those included in our sample and those not in our sample.

I played around with adding color to make it easier to see how the terms move between the two expressions. I colored the number of red marbles red and the sample size blue.

\[ \frac{\binom{{\color{red} n_1}}{k}\binom{n-{\color{red} n_1}}{{\color{blue} r}-k}} {\binom{n}{{\color{blue}r}}} = \frac{\binom{{\color{blue} r}}{k} \binom{n-{\color{blue} r}}{{\color{red} n_1} - k}} { \binom{n}{{\color{red} n_1}} } \]

Moving from left to right, the red variable goes down and the blue variable goes up.

Related posts

[1] William Feller. An Introduction to Probability Theory and Its Applications. Volume 1. Third edition. Page 44.