Euler function

This morning I wrote a post about the probability that a random matrix over a finite field is invertible. If the field has q elements and the matrix has dimensions n × n then the probability is

p(q, n) = \prod_{i=1}^n \left(1 - \frac{1}{q^i}\right)

In that post I made observation that p(q, n) converges very quickly as a function of n [1]. One way to see that the convergence is quick is to note that

\prod_{i=1}^\infty \left(1 - \frac{1}{q^i}\right) = \prod_{i=1}^n \left(1 - \frac{1}{q^i}\right) \, \prod_{i=n+1}^\infty \left(1 - \frac{1}{q^i}\right)

and

\prod_{i=n+1}^\infty \left(1 - \frac{1}{q^i}\right) = 1 - {\cal O}\left(\frac{1}{q^{n+1}}\right)

John Baez pointed out in the comments that p(q, ∞) = φ(1/q) where φ is the Euler function.

Euler was extremely prolific, and many things are named after him. Several functions are known as Euler’s function, the most common being his totient function in number theory. The Euler function we’re interested in here is

\phi(x) = \prod_{i=1}^\infty \left( 1 - x^i \right)

for −1 < x < 1. Usually the argument of φ is denoted “q” but that would be confusing in our context because our q, the number of elements in a field, is the reciprocal of Euler’s q, i.e. x = 1/q.

Euler’s identity [2] (in this context, not to be confused with other Euler identities!) says

\phi(x) = \sum_{n=-\infty}^\infty (-1)^n x^{(3n^2 - n)/2}

This function is easy to calculate because the series converges very quickly. From the alternating series theorem we have

\phi(x) = \sum_{n=-N}^N (-1)^n (-1)^n x^{(3n^2 - n)/2} + {\cal O}\left( x^{(3(N+1)^2 - (N+1))/2} \right)

When q = 2 and so x = 1/2, N = 6 is enough to compute φ(x) with an error less than 2−70, beyond the precision of a floating point number. When q is larger, even fewer terms are needed.

To illustrate this, we have the following Python script.

def phi(x, N):
    s = 0
    for n in range(-N, N+1):
        s += (-1)**n * x**((3*n**2 - n)/2)
    return s

print(phi(0.5, 6))

Every digit in the output is correct.

Related posts

[1] I didn’t say that explicitly, but I pointed out that p(2, 8) was close to p(2, ∞).

[2] This identity is also known as the pentagonal number theorem because of its connection to pentagonal numbers.

Inverse shift

What is the inverse of shifting a sequence to the right? Shifting it to the left, obviously.

But wait a minute. Suppose you have a sequence of eight bits

abcdefgh

and you shift it to the right. You get

0abcdefg.

If you shift this sequence to the left you get

abcdefg0

You can’t recover the last element h because the right-shift destroyed information about h.

A left-shift doesn’t fully recover a right-shift, and yet surely left shift and right shift are in some sense inverses.

Yesterday I wrote a post about representing bit manipulations, including shifts, as matrix operators. The matrix corresponding to shifting right by k bits has 1s on the kth diagonal above the main diagonal and 0s everywhere else. For example, here is the matrix for shifting an 8-bit number right two bits. A black square represents a 1 and a white square represents a 0.

This matrix isn’t invertible. When you’d like to take the inverse of a non-invertible matrix, your kneejerk response should be to compute the pseudoinverse. (Technically the Moore-Penrose pseudoinverse. There are other pseudoinverses, but Moore-Penrose is the most common.)

As you might hope/expect, the pseudoinverse of a right-shift matrix is a left-shift matrix. In this case the pseudoinverse is simply the transpose, though of course that isn’t always the case.

If you’d like to prove that the pseudoinverse of a matrix that shifts right by k places is a matrix that shifts left by k places, you don’t have to compute the pseudo inverse per se: you can verify your guess. This post gives four requirements for a pseudoinverse. You can prove that left shift is the inverse of right shift by showing that it satisfies the four equations.

Probability that a random binary matrix is invertible

The two latest posts have involved invertible matrices with 0 and 1 entries. If you fill an n × n matrix with 0s and 1s randomly, how likely is it to be invertible?

What kind of inverse?

There are a couple ways to find the probability that a binary matrix is invertible, depending on what you mean by the inverse.

Suppose you have a matrix M filled with 0s and 1s and you’re looking for a matrix N such that MN is the identity matrix. Do you want the entries of N to also be 0s and 1s? And when you multiply the matrices, are you doing ordinary integer arithmetic or are you working mod 2?

In the previous posts we were working over GF(2), the field with two elements, 0 and 1. All the elements of a matrix are either 0 or 1, and arithmetic is carried out mod 2. In that context there’s a nice expression for the probability a square matrix is invertible.

If you’re working over the real numbers, the probability of binary matrix being invertible is higher. One way to see this is that the inverse of a binary matrix is allowed to be binary but it isn’t required to be.

Another way to see this is to look at determinants. If you think of a matrix M as a real matrix whose entries happen to only be 0 or 1, M is invertible if and only its determinant is non-zero. But if you think of M as a matrix over GF(2), the entries are either 0 or 1 out of necessity, and M is invertible if and only if its determinant, computed in GF(2), is non-zero. If the determinant of M as a real matrix is a non-zero even number, then M is invertible as a real matrix but not as a matrix over GF(2).

Probability of invertibility in GF(2)

Working over GF(2), what is the probability that a random matrix is invertible? Turns out it’s just as easy to answer a more general question: what is the probability that a random n × n matrix over GF(q), a finite field with q elements, is invertible? This is

\prod_{i=1}^n\left( 1 - \frac{1}{q^i} \right)

When q = 2 and n = 8 this probability is 0.289919. The probability is roughly the same for all larger values of n, converging to approximately 0.288788 as n → ∞.

Probability of invertibility in ℝ

What is the probability that an 8 × 8 matrix with random 0 and 1 entries is invertible as a real matrix? We can estimate this by simulation.

import numpy as np

def simulate_prob_invertible_real(n, numreps=1000):
    s = 0
    for _ in range(numreps):
        M = np.random.randint(0, 2, size=(n, n))
        det = np.linalg.det(M)
        if abs(det) > 1e-9:
            s += 1
    return s/numreps

When n = 8, I got 0.5477 when running the code with 10,000 reps.

When n = 32, I got a probability of 1. Obviously it is possible for a 32 × 32 binary matrix to be singular, but it’s very unlikely: it didn’t happen in 10,000 random draws.

The linear algebra of bit twiddling

The previous post looked at the tempering step of the Mersenne Twister, formulating a sequence of bit operations as multiplication by a matrix mod 2. This post will look at the components more closely.

The theorems of linear algebra generally hold independent of the field of scalars. Typically the field is ℝ or ℂ, but most of basic linear algebra works the same over every field [1]. In particular, we can do linear algebra over a finite field, and we’re interested in the most finite of finite fields GF(2), the field with just two elements, 0 and 1.

In GF(2), addition corresponds to XOR. We will denote this by ⊕ to remind us that although it’s addition, it’s not the usual addition, i.e. 1 ⊕ 1 = 0. Similarly, multiplication corresponds to AND. We’ll work with 8-bit numbers to make the visuals easier to see.

Shifting a number left one bit corresponds to multiplication by a matrix with 1’s below the diagonal main. Shifting left by k bits is the same as shifting left by 1 bit k times, so the the matrix representation for x << k is the kth power of the matrix representation of shifting left once. This matrix has 1s on the kth diagonal below the main diagonal. Below is the matrix for shifting left two bits, x << k.

Right shifts are the mirror image of left shifts. Here’s the matrix for shifting right two bits, x >> k.

Shifts are not fully invertible because bits either fall off the left or the right end. The steps in the Mersenne Twister are invertible because shifts are always XOR’d with the original argument. For example, although the function that takes x to x >> 2 is not invertible, the function that takes x to x ⊕ (x >> 2) is invertible. This operation corresponds to the matrix below.

This is an upper triangular matrix, so its determinant is the product of the diagonal elements. These are all 1s, so the determinant is 1, and the matrix is invertible.

Bitwise AND multiplies each bit of the input by the corresponding bit in another number known as the mask. The bits aligned with a 1 are kept and the bits aligned with a 0 are cleared. This corresponds to multiplying by a diagonal matrix whose diagonal elements correspond to the bits in the mask. For example, here is the matrix that corresponds to taking the bitwise AND with 10100100.

Each of the steps in the Mersenne Twister tempering process are invertible because they all correspond to triangular matrices with all 1’s on the diagonal. For example, the line

y ^= (y <<  7) & 0x9d2c5680 

says to shift the bits of y left 7 places, then zero out the elements corresponding to 0s in the mask, then XOR the result with y. In matrix terms, we multiply by a lower triangular matrix with zeros on the main diagonal, then multiply by a diagonal matrix that zeros out some of the terms, then add the identity matrix. So the matrix corresponding to the line of code above is lower triangular, with all 1s on the diagonal, so it is invertible.

[1] Until you get to eigenvalues. Then it matters whether the field is algebraically complete, which no finite field is.

Reverse engineering Mersenne Twister with Linear Algebra

The Mersenne Twister (MT) is a random number generator with good statistical properties but bad cryptographic properties. In buzzwords, it’s a PRNG but not a CSPRNG.

This post will show how the internal state of a MT generator can be recovered from its output. We’ll do this using linear algebra. The bit twiddling approach is more common and more efficient, but the linear algebra approach is conceptually simpler.

How MT works

There are numerous variations on the Mersenne Twister. We’ll focus on the original version that had internal state consists of vector x of length 640 filled with 32-bit numbers. The ideas in this post would apply equally to all MT versions.

The first call to MT returns a “tempered” version of x[0]. The next call returned a tempered version of x[1], and so on. After every 640 calls, the state is scrambled. This scrambling is where the “twist” in the name Mersenne Twister comes from. (The Mersenne part comes from the fact that the period of an MT generator is a Mersenne prime.)

Tempering

Here is Python code for performing the tempering step.

def temper(y):
    y ^= (y >> 11) 
    y ^= (y <<  7) & 0x9d2c5680 
    y ^= (y << 15) & 0xefc60000 
    y ^= (y >> 18)  
    return y

Each step is reversible, and so the temper function is reversible.

Because the tempering step is reversible, the first output can be used to infer the first element of the internal state, the second output to infer the second element, and so on. After 640 calls one can know the entire internal state and predict the rest of the generator’s output from then on.

Linear algebra

The bitwise operations above all correspond to linear operations over GF(2), the field with just two elements, 0 and 1. Addition in this field is XOR and multiplication is AND.

Each step corresponds to multiplying a vector of 32 bits on the left by a 32 × 32 matrix with entries that are 0’s and 1’s, with the understanding that the sum of two bits is their XOR and the product of two bits is their AND. Equivalently, arithmetic is carried out mod 2. So you can compute the matrix-vector product as ordinary integers if you then reduce every integer mod 2.

We will find the matrix M that corresponds to the temper operation and prove that it is invertible by finding its inverse. This proves that tempering is invertible, and one could compute the inverse of tempering by multiplying by M−1, though it would be more efficient to undo temporing by bit twiddling.

One way to recover a matrix is to multiplying by unit vectors ei where the ith component of ei is 1 and the rest of the components are zero. Then

M ei

is the ith column of M.

So we can find the nth column of M by tempering 1 << n = 2n.

M = np.zeros((32, 32), dtype=int)
for i in range(32):
    t = temper(1 << (31-i))
    s = f"{t:032b}"
    for j in range(32):
        M[j, i] = int(s[j])

Let’s generate a random number and compute its tempered form two ways: directly and matrix multiplication.

x = random.getrandbits(32)
y = temper(x)
print(f"{y:032b}")

vx = np.array([int(b) for b in f"{x:032b}"]) # vector form of x
vy = M @ vx % 2 # vector form of y
print("".join(str(b) for b in vy))

Both produce the same bits:

10100101100101101100110101000110
10100101100101101100110101000110

We can find the matrix representation of the untemper function by inverting the matrix M. However, we need to invert it over the field GF(2), not over the integers or reals.

import galois
GF2 = galois.GF(2)
Minv = np.linalg.inv(GF2(M))

Here are visualizations of M and its inverse using a black square for a 1 and a white square for a 0.

M:

M−1:

The next post will back up and look at the linear algebra of the components that comprise the Mersenne Twister.

Calculating curvature

Curvature is conceptually simple but usually difficult to calculate. For a level set curve f(xy) = c, such as in the previous couple posts, the equation for curvature is

\kappa = \frac{\left|f_y^2f_{xx}-2f_xf_yf_{xy}+f_x^2f_{yy}\right|}{\bigl(f_x^2+f_y^2\bigr)^{3/2}}

Even when f has a fairly simple expression, the expression for κ can be complicated.

If we define

f(x, y) = y^3 - 3 x^2 - 3 y^2 - 3 x^2 y

then the level set of f(xy) = c is an equilateral triangle when c = −4. The level sets are smoothed triangles for −4 < c < 0.

The curvature of these level sets at any point is given by

\frac{\left|2 (y+1) \left((y-2)^2-3 x^2\right) \left(x^2+y^2\right)\right|}{\left(x^4+2 x^2 (y (y+6)+2)+(y-2)^2 y^2\right)^{3/2}}

Simplification

But there is one instance in which curvature is easy to calculate. For the graph of a function g(x) = y, the curvature is approximately the absolute value of the second derivative of g, provided the first derivative is small.

 \kappa = \frac{|g^{\prime\prime}(x)|}{(1 + g^\prime(x))^{3/2}} \approx |g^{\prime\prime}(x)|

At a local maximum or local minimum of g(x) the approximation is exact because the first derivative is zero.

Max and min curvature of smoothed triangles

This means that in the example above, we can calculate the maximum and minimum curvature of the level sets. The maximum curvature occurs in the corners and the minimum occurs in the middle of the sides. By symmetry there are three maxima and three minima, but we can take the ones corresponding to x = 0 for convenience. There we find the curvature is simply

|6 + 6y|

When x = 0, we have

f(x,y) = c = y^3 - 3y^2

and so the maximum and minimum curvature are the two roots of the cubic equation for y that lie in the interval [−1, 2]. (There’s another root greater than 2.)

For example, when c = −3, the roots are 0.8794, 1.3473, and 2.5321. The first root corresponds to the minimum curvature, the second to the maximum, and the third is outside our region of interest. It follows that the minimum curvature is 0.7237 and the maximum is 14.0838.

When c = −1 the minimum and maximum curvature are 2.80747 and 9.91622 respectively,

Since c = −4 corresponds to the triangle, the minimum curvature is 0 and the maximum is infinite. As c increases, the minimum and maximum curvature come together because the level set is becoming more round.

Smoothed polygons

The previous post constructed a triangular analog of the squircle, the unit circle in the p-norm where p is typically around 4. The case p = 2 is a Euclidean circle and the limit as p → ∞ is a Euclidean square.

The previous post introduced three functions Li(xy) such that the level set of each function

\{(x,y) \mid L_i(x,y) = 1\}

forms a side of a triangle. Then it introduced a soft penalty for each L being away 1, and the level sets of that penalty function formed the rounded triangles we were looking for.

Another approach would be to change the L‘s slightly so that the sides are the levels sets Li(xy) = 0. The advantage to this formulation is that the product of three numbers is 0 if and only if one of the numbers is zero. That means if we define

f(x, y) = L_1(x, y) L_2(x, y) L_3(x, y)
then the set of points

\{(x, y) \mid f(x,y) = c\}

corresponds to the three lines when c = 0 and the level sets for small c > 0 are nearly triangles. The level sets will be smooth if the gradient is non-zero, i.e. c is not zero.

This approach is not unique to triangles. You could create smooth approximations any polygon by multiplying linear functions that define the sides. Or you could do something similar with curved arcs.

If we define our L’s by

\begin{align*} L_1(x,y) &= y\\ L_2(x,y) &= y - 2x - 2 \\ L_3(x,y) &= 3y + x -3\\ \end{align*}

then our curves will be the level sets of

f(x, y) = y(y - 2 x - 2)  (3 y + x - 3)

A few level sets are shown below. The level set for c = 0 is the straight lines.

Note the level sets are not connected. If you’re interested in approximate triangles, you want the components that are inside the triangle.

 

Triangular analog of the squircle

TimF left a comment on my guitar pick post saying the image was a “squircle-ish analog for an isosceles triangle.” That made me wonder what a more direct analog of the squircle might be for a triangle.

A squircle is not exactly a square with rounded corners. The sides are continuously curved, but curved most at the corners. See, for example, this post.

Suppose the sides of our triangle are given by L1(x, y) = 1 for i = 1, 2, 3. For example,

\begin{align*}
L_1(x, y) &= y \\
L_2(x, y) &= \phantom{-}\frac{\sqrt{3}}{2}x - \frac{1}{2}y \\
L_3(x, y) &= -\frac{\sqrt{3}}{2}x - \frac{1}{2}y \\
\end{align*}

We design a function f(x, y) as a soft penalty for points not being on one of the sides and look at the set of points f(x, y) = 1.

f(x, y) = \left( \sum_{i=1}^3 \exp(p (L_i(x, y) - 1)) \right)^{1/p}

You might recognize this as a Lebesgue norm, analogous to the one used to define a squircle.

The larger p is, the heavier the penalty for being far from a side and the closer the level set f(x, y) = 1 comes to being a triangle.

Unified config files

I try to maintain a consistent work environment across computers that I use. The computers differ for important reasons, but I’d rather they not differ for unimportant reasons.

Unified keys

One thing I do is remap keys so that the same key does the same thing everywhere, to the extent that’s practical. This requires remapping keys. In particular, I want the key functionality, not the key name, to be the same across operating systems. For example, the Command key on a Mac does what the Control key does on Windows and Linux. I have my machines set up so the key in the lower left corner, whatever you call it, does things like copy, paste, and cut.

Unified config files

I use bash everywhere even though zshell is the default on Mac OS. But Linux and MacOS are sufficiently different that I have two .bashrc files, one for each OS. However, both .bashrc files source a common file .bash_aliases for aliases. You can set that up by putting the following code in your .bashrc file.

if [ -f ~/.bash_aliases ]; then
    . ~/.bash_aliases
fi

Sometimes you need some kind of branching logic even for two machines running the same OS. For example, if you have two computers, named Castor and Pollux, you might have code like the following in your .bashrc.

HOSTNAME_SHORT=$(hostname -s)
if [ "$HOSTNAME_SHORT" = "Castor" ]; then
...
elif [ "$HOSTNAME_SHORT" = "Pollux" ]; then
...

One problem with using hostname is that the OS can change the name on you. At least MacOS will do this; I don’t know whether other operating systems will. If you give your machine an uncommon name then this is unlikely to happen.

I have one init.el file for Emacs. It contains some branching logic for OS:

(cond
    ((string-equal system-type "windows-nt") ; Microsoft Windows
     ...)
    ((string-equal system-type "gnu/linux") ; linux
     ...)
    ((string-equal system-type "darwin") ; Mac
     ...)
)

You can also branch on hostname.

(if (string-equal (system-name) "Castor")
...)
(if (string-equal (system-name) "Pollux")
...)

This isn’t difficult, but in the short run it’s easier to just make one ad hoc edit to a config file at a time, letting the files drift apart. Along those lines, see bicycle skills.

The mythology of category theory

Yesterday a friend and I had a conversation about category theory, how it can be a useful pattern description language, but also about how people have unrealistic expectations for it, believing category theory can deliver something for nothing.

Later I ran across the following post from Qiaochu Yuan. It felt as if he had overheard my conversation and summarized it in a tweet:

category theory is just some straightforwardly useful stuff for some purposes in some fields! you can elegantly simplify and streamline some proofs. then there is the mythology of category theory, which is some other thing entirely, mostly wishful thinking and projection afaict

His phrase “the mythology of category theory” gives a name to this idea that category theory can deliver specific outputs without specific inputs. It helps to distinguish CT as a scrapbook of patterns from CT as sorcery.