Trinomial Coefficients and Kings

The term trinomial coefficient is used a couple different ways. The most natural use of the term is a generalization of bionomial coefficients to three variables, i.e. numbers of the form

\frac{n!}{i! \, j! \, k!}

where ijkn. These numbers are the coefficients of yj zk in the expansion of (x + y + z)n.

But there’s another use of the term trinomial coefficient, the one that we are interest in for this post, and that is the coefficients of xk in the expansion of (1 + x + 1/x)n. These numbers can be visualized in a triangle analogous to Pascal’s triangle.

In Pascal’s triangle, each entry is the sum of the two entries above it. In the trinomial triangle, each entry is the sum of the three entries above it.

If you’ve been reading this blog regularly you know I’ve been interested in chess puzzles lately. The thing that make me interested in trinomial coefficients is that they relate to moves of a king on a chessboard.

If you note the number paths a king can take to a square using the minimal number of moves, you get the trinomial triangle. The reason is simple: the number of ways to get to a square equals the number of ways to get there via the square up and to the left, plus the number of ways to get their via the square above, plus the number of ways to get their via the square up and to the right.

Related posts

Riff on an integration bee integral

I saw an integral posted online that came from this year’s MIT Integration Bee.

\int_0^1 x^{2024} (1 - x^{2025})^{2025}\, dx

My thoughts on seeing this were, in order:

  1. It looks like a beta function.
  2. The answer is a small number.
  3. You can evaluate the integral using the substitution u = 1 − x2025.

I imagine most students’ reactions would be roughly the opposite. They’d first see the substitution. Some might think of the value being small, and none would think of beta functions.

Size estimate

The value of the integral is small because you have numbers between 0 and 1 being raised to large powers. In real applications, being able to estimate the size of integral may be more valuable than being able to compute the integral.

But on a quiz, like the Integration Bee, knowing that the result is small would be worthless, except possibly to check a result. You know from the context of the problem being on a timed quiz that there must be a trick [1].

Incidentally, if you changed the problem slightly, say by replacing one of the instances of 2025 with 2025.03, the integration problem becomes much harder, but you could still estimate that the value is small.

Beta functions

The first time I saw someone evaluate an integral by recognizing a beta function it seemed like he was pulling a rabbit out of a hat. I was completely surprised that he thought something was common knowledge that I’d never seen.

Years later I went to work in biostatistics at MDACC and worked with the beta function and beta random variables all the time. If you look through my publications, you’ll see the word “beta” appears several times.

The beta function is defined as follows. You could take the first equality as the definition and the second as a theorem.

B(x, y) = \int_0^1 t^{x-1} (1 - t)^{y-1} \,dx = \frac{\Gamma(x) \, \Gamma(y)}{\Gamma(x+y)}

The integrand in the definition of the beta function is the probability density function for a beta(xy) random variable, and so B(x, y) is the normalizing constant for a beta(xy) probability distribution.

The integral in the Integration Bee question is not a beta function. I thought maybe it could be turned into a beta function, but the u substitution is simpler.

If you change the integrand above to

\int_0^1 x^{2024} (1 - x)^{2025}\, dx

then you do have a beta function. The value of this integral is B(2025, 2026) which equals 2024! 2025! / 4050!, which is an extremely small number.

Numerical estimation

The integral in the previous section is roughly the same as that of x2024 (1 − x)2024. This function has its maximum at ½, and so the integrand is less than 2−4048 ≈ 10−1219.

If we want to evaluate 2024! 2025! / 4050! numerically we’ll need to be indirect about it because 2024! would overflow and the final result would underflow. We could calculate the log base 10 of the value using Python as follows.

>>> from scipy.special import gammaln
>>> from math import log
>>> (gammaln(2025) + gammaln(2026) - gammaln(4051))/log(10)
-1220.576093208249

So our upper bound got us to within an order of magnitude or two of the result.

 

[1] When you see the current year used in a math problem, the quiz writer is winking at you. The year is often meant to suggest that a number is large but somewhat arbitrary. Here you suspect that the key to the question would be the same if 2024 and 2025 were replaced with 1997 and 1998. Also, the year is often a hint that you don’t want to use brute force. In this case, it’s a warning not to expand the integrand using the binomial theorem.

Multiplying by quaternions on the left and right

The map that takes a quaternion x to the quaternion qx is linear, so it can be represented as multiplication by a matrix. The same is true of the map that takes x to xq, but the two matrices are not the same because quaternion multiplication does not commute.

Let qa + bi + cjdk and let qM be the matrix that represents multiplication on the left by q. Then

_qM = \begin{bmatrix} a & -b & -c & -d \\ b & a & -d & c \\ c & d & a & -b \\ d & -c & b & a \\ \end{bmatrix}

Now let Mq be the matrix that represents multiplication on the right by q. Then

M_q = \begin{bmatrix} a & -b & -c & -d \\ b & a & d & -c \\ c & -d & a & b \\ d & c & -b & a \\ \end{bmatrix}

Can prove both matrix representations are correct by showing that they do the right thing when q = 1, ij, and k. The rest follows by linearity.

You might speculate that the matrix representation for multiplying on the right by q might be the transpose of the matrix representation for multiplying on the left by q. You can look at the matrices above and see that’s not the case.

In this post I talk about how to represent rotations with quaterions, and in this post I give an equation for the equivalent rotation matrix for a rotation described by a quaternion. You can prove that the matrix representation is correct by multiplying out qM and Mq* . Keep in mind that q in that case is a unit quaterion, so the sum of the squares of its components add to 1.

Related posts

Embeddings, Projections, and Inverses

I just revised a post from a week ago about rotations. The revision makes explicit the process of embedding a 3D vector into the quaternions, then pulling it back out.

The 3D vector is embedded in the quaternions by making it the vector part of a quaternion with zero real part:

(p1p2p3)  → (0, p1p2p3)

and the quaternion is returned to 3D by cutting off the real part:

(p0, p1p2p3)  → (p1p2p3).

To give names to the the process of moving to and from quaterions, we have an embedding E : ℝ3 to ℝ4 and a projection P from ℝ4 to ℝ3.

We can represent E as a 4 × 3 matrix

E = \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix}

and P by a 3 × 4 matrix

P = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

We’d like to say E and P are inverses. Surely P undoes what E does, so they’re inverses in that sense. But E cannot undo P because you lose information projecting from ℝ4 to ℝ3 and E cannot recover information that was lost.

The rest of this post will look at three generalizations of inverses and how E and P relate to each.

Left and right inverse

Neither matrix is invertible, but PE equals the identity matrix on ℝ3 , and so P is a left inverse of E and E is a right inverse of P.

PE = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix}

On the other hand, EP is not an identity matrix, and so E is not a left inverse of E, and neither is P a right inverse of E.

EP = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

Adjoint

P is the transpose of E, which means it is the adjoint of E. Adjoints are another way to generalize the idea of an inverse. More on that here.

Pseudo-inverse

The Moore-Penrose pseudo-inverse acts a lot like an inverse, which is somewhat uncanny because all matrices have a pseudo-inverse, even rectangular matrices.

Pseudo-inverses are symmetric, i.e. if A+ is the pseudo-inverse of A, then A is the pseudo-inverse of A+.

Given an m by n matrix A, the Moore-Penrose pseudoinverse A+ is the unique n by m matrix satisfying four conditions:

  1. A A+ A = A
  2. A+ A A+ = A+
  3. (A A+)* = A A+
  4. (A+ A)* = A+ A

To show that A+ = P we have to establish

  1. EPE = E
  2. PEP = A+
  3. (EP)* = EP
  4. (PE)* = PE

We calculated EP and PE above, and both are real and symmetric, so properties 3 and 4 hold.

We can also compute

 EPE = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} = E

and

PEP = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} =P

showing that properties 1 and 2 hold as well.

Related posts

Alternative exp and log notation

The other day I stumbled on an article [1] that advocated writing ab as ab and loga(b) as ab.

\begin{align*} a &\uparrow b \equiv a^b  \\ a &\downarrow b \equiv \log_a b \end{align*}

This is a special case of Knuth’s up arrow and down arrow notation. Knuth introduces his arrows with the intention of repeating them to represent hyperexponentiation and iterated logarithms. But the emphasis in [1] is more on the pedagogical advantages of using a single up or down arrow.

Advantages

One advantage is that the notation is more symmetric. Exponents and logs are inverses of each other, and up and down arrows are visual inverses of each other.

Another advantage is that the down arrow notation makes the base of the logarithm more prominent, which is sometimes useful.

Finally, the up and down arrow notation is more typographically linear:  ab and ab stay within a line, whereas ab and loga(b) extend above and below the line. LaTeX handles subscripts and superscripts well, but HTML doesn’t. That’s one reason I usually write exp(x) rather than ex here.

Comparison

Here are the basic properties of logs and exponents using conventional notation.

\begin{align} a^b = c &\iff \log_a c = b \\ \log_b 1 &= 0 \\ \log_b b &= 1 \\ \log_b(b^x) &= x \\ b^{\log_b x} &= x \\ \log_b xy &= \log_b x + \log_b y \\ \log_b \frac{x}{y} &= \log_b x - \log_by \\ a^{\log_b c} &= c^{\log_b a} \\ \log_a b^c &= c (\log_a b) \\ (\log_b a) (\log_a x) &= \log_b x \end{align}

Here are the same properties using up and down arrow notation.

\begin{align} a \uparrow b = c &\iff a \downarrow c = b \\ b \downarrow 1 &= 0 \\ b \downarrow b &= 1 \\ b \downarrow (b \uparrow x) &= x \\ b \uparrow (b \downarrow x) &= x \\ b \downarrow xy &= b \downarrow x + b \downarrow y \\ b \downarrow \frac{x}{y} &= b \downarrow x - b \downarrow y \\ a \uparrow (b \downarrow c) &= c \uparrow (b \downarrow a ) \\ a \downarrow (b \uparrow c) &= c (a \downarrow b) \\ (b \downarrow a) (a \downarrow x) &= b \downarrow x \end{align}

Related posts

[1] Margaret Brown. Some Thoughts on the Use of Computer Symbols in Mathematics. The Mathematical Gazette, Vol. 58, No. 404 (Jun., 1974), pp. 78-79

Decimal Separator and Internationalization

This morning I ran across the following tip from Joost Helberg on Mastodon:

TIL don’t report numbers with three digits after the decimal point. People may interpret the decimal point as a thousands separator. Using 2 or 4 digits, although wrong, avoids off by a factor thousand errors.

I usually report four decimal places, but I hadn’t thought about that in relation to the decimal separator problem.

In software development, it’s best to let a library handle numeric input and output, using local conventions. Failure to do so can lead to problems, as I’ll never forget.

Embarrassed in Bordeaux

In 2006, Peter Thall and I gave a week-long course on Bayesian clinical trial design in Bordeaux, France. Part of the course was presenting software that my team at MD Anderson Cancer Center had developed.

A few minutes into my first presentation I realized the software wasn’t working for the course attendees. The problem had to do with the US and France using opposite conventions for decimal separator and thousands separator. I had tested our software on a French version of Windows, but I had entered integers in my testing and decimals in my presentation.

I apologized and asked my French audience to enter decimals in the American style, such as 3.14 rather than 3,14. But that didn’t work either!

We were using a Windows API for parsing input, which correctly handles input and output per local conventions. But we had written custom validation code. We checked that the input fields contained only valid numeric characters, i.e. digits and periods. Oops!

Users were between a rock and hard place. The input validation would not accept French notation, and the parsing code would not accept American notation.

The solution was for the attendees to set their operating system locale to the US. They were gracious about having to apply the hack and said that it was a common problem. It was a humiliating way to start the course, but the rest of the week went well.

A crowded little chess puzzle

Here’s a puzzle by Martin Garnder [1].

Can a queen, king, rook, bishop, and knight be placed on a 4² board so no piece attacks another?

There are two solutions, plus symmetries.

Note that in all non-attacking chess puzzles, the colors of the pieces are irrelevant. In the solutions I chose the piece colors to be the opposite of the square colors strictly for aesthetic reasons.

More chess posts

More Martin Gardner posts

[1] Martin Gardner. Some New Results on Nonattacking Chess Tasks. Math Horizons. February 2001, pp 10–12.

Formulating eight queens as a SAT problem

The Boolean satisfiability problem is to determine whether there is a way to assign values to variables in a set of Boolean formulas to make the formulas hold [1]. If there is a solution, the next task would be to enumerate the solutions.

You can solve the famous eight queens problem, or its generalization to n-queens, by formulating the problem as a Boolean formula then using a SAT solver to find the solutions.

It’s pretty obvious how to start. A chessboard is an 8 by 8 grid, so you have a variable for each square on the board, representing whether or not that square holds a queen. Call the variables bij where i and j run from 1 to 8.

The requirement that every row contains exactly one queen can be turned into two subrequirements:

  1. Each row contains at least one queen.
  2. Each row contains at most one queen.

The first requirement is easy. For each row i, we have a clause

bi1bi2bi3 ∨ … ∨ bi8

The second requirement is harder. How do you express in terms of our boolean variables that there is no more than one queen in each row? This is the key difficulty. If we can solve this problem, then we’re essentially done. We just need to do the analogous thing for columns and diagonals. (We don’t require a queen on every diagonal, but we require that there be at most one queen on every diagonal.)

First approach

There are two ways to encode the requirement that every row contain at most one queen. The first is to use implication. If there’s a queen in the first column, then there is not a queen in the remaining columns. If there’s a queen in the second column, then there is not a queen in all but the second column, etc. We have an implication for each row in each column. Let’s just look at the first row and first column.

b11 ⇒ ¬ (b12b13 ∨ … ∨ b18)

We can turn an implication of the form a ⇒ b into the clause ¬a ∨ b.

Second approach

The second way to encode the requirement that every row contain at most one queen is to say that for every pair of squares in a row (ab) either a has no queen or b has no queen. So for the first row we would have 8C2 = 28 clauses because there are 28 ways to choose pairs from a set of 8 things.

b11 ∨ ¬b12) ∧ (¬b11 ∨ ¬b13) ∧ … ∧ (¬b17 ∨ ¬b18)

An advantage of this approach is that it directly puts the problem into conjunctive normal form (CNF). That is, our formula is a conjunction of terms that contain only disjunctions, an AND of ORs.

Related posts

[1] You’ll see the SAT problem described as finding the solution to a Boolean formula. If you have multiple formulas, then the first holds, and the second, etc. So you can AND them all together to make one formula.

Special solutions to the eight queens problem

There are 92 ways to place eight queens on a chessboard so that no queen is attacking any other. These fall into 12 equivalence classes. The 92 solutions are all rotations and reflections of these 12 basic solutions.

If you think about the previous numbers a minute, you might wonder why the total number of solutions is not a multiple of 12. There is one particular solution that is more symmetric than the others. So the total number of solutions 92 breaks down into

8 × 11 + 4 × 1.

To illustrate this, let’s look at two fundamental solutions: one that is particularly ordered and one that is particularly disordered in a sense that we’ll get to later on.

Most basic solutions, like the one below, are part of an equivalence class of eight solutions.

You can rotate the board 90° or reflect it about the middle[1], or some combination of both [2]. This amounts to eight possibilities.

This solution, however, is more symmetric.

You can rotate the basic solution, but flipping it over does not create a new solution: a flip produces the same result as two 90° rotations.

It’s curious that there is only one highly symmetric solution to the eight queens problem. When I first saw the problem as a child I expected all the solutions to be highly symmetric. That may be why I wasn’t able to find a solution.

Among the 11 basic solutions that are less ordered, the one shown above is uniquely disordered in the following sense: no three queens lie on a straight line.

The eight queens problem is a problem about restricted straight lines. It says no two queens lie on the same rank, file, or diagonal. But if we look at all straight lines, then of course there is a line through any two queens. In the most orderly solution, every queen is on a straight line with two others. In the least orderly solution, no queen is on a straight line with two others.

In 1900 Henry Dudenay introduced the no-three-in-line problem, looking at ways to place points on a lattice such that no line goes through three points, with no restriction on the slope of the lines. So one family of solutions to the eight queens problem is also a solution to the no-three-in-line problem.

Related posts

[1] It doesn’t matter whether you flip about the horizontal or vertical axis.

[2] In fancy terminology, the action of the dihedral group D8 applied to a solution yields another solution.