Mersenne primes are unsafe

Posted on 24 September 2023 by John

In the previous post I mentioned that a particular Mersenne prime would be unsuitable for cryptography. In fact, all Mersenne primes are unsuitable for cryptography.

A prime number p is called “safe” if

p = 2q + 1

where q is also a prime. Safe primes are called safe because p − 1 does not have small factors (other than 2). The factors of p − 1 correspond to subgroups of the group used for encryption, and small groups can be exploited to attack encryption.

Mersenne numbers are numbers of the form

M_n = 2ⁿ − 1.

Mersenne primes are Mersenne numbers that are also prime. A necessary condition for M_n to be prime is for n to be prime. This condition is not sufficient. For example,

2¹¹ − 1 = 23 × 89.

But is necessary, for reasons we’ll get into shortly.

If M_n = 2q + 1, then q = M_n−1. But if n is a prime, then n − 1 is not a prime, with one exception: n = 3. So the only Mersenne prime that is a safe prime is M₃ = 7, which is not a particularly large prime. Public key cryptography uses numbers in the thousands of digits, not single digits.

Why does n have to be prime before M_n stands a chance of being prime?

If a > 1, then x^a − 1 can be factored:

x^a − 1 = (x − 1)(x^a−1 + x^a−2 + … + 1)

If n can be factored into ab, then set x = 2^b. This shows that 2^ab − 1 has a factor, namely 2^b − 1.

In the previous post we said that M₁₂₇ − 1 has a lot of small factors. We can find some of those factors easily:

M₁₂₇ − 1 = 2 M₁₂₆ = 2 (2¹²⁶ − 1)

and (2¹²⁶ − 1) is divisible by 2^k – 1 for every k that divides 126.

The nontrivial factors of 126 are 2, 3, 6, 7, 9, 14, 18, 21, 42, 63, and so 2^k – 1 is a factor of M₁₂₆ for k equal to each of these numbers. This is enough to fully factor 2¹²⁶ − 1 into

3³ × 7² × 19 × 43 × 73 × 127 × 337 × 5419 × 92737 × 649657 × 77158673929

given in the footnote from the previous post. You could easily come up with this factorization using pencil and paper, though it would not be easy to determine by hand that the last factor is a prime number.

Primes, weeds, and military precision

Posted on 22 September 2023 by John

Here’s a quote from Don Zagier that I found in Larry Rolen’s lecture notes on modular forms.

There are two facts about the distribution of prime numbers of which I hope to convince you so overwhelmingly that they will be permanently engraved in your hearts. The first is that, despite their simple definition and role as the building blocks of the natural numbers, the prime numbers grow like weeds among the natural numbers, seeming to obey no other law than that of chance, and nobody can predict where the next one will sprout. The second fact is even more astonishing, for it states just the opposite: that the prime numbers exhibit stunning regularity, that there are laws governing their behavior, and that they obey these laws with almost military precision.

Emphasis added.

Gauss map, Euclidean algorithm, and continued fractions

Posted on 18 September 2023 by John

The Gauss map [1] is the function

$f(x) = \frac{1}{x} - \left\lfloor \frac{1}{x}\right\rfloor$

where ⌊y⌋ is the floor of y, the greatest integer no larger than y.

I’ve written about this map a couple times before. First, I wrote about how this map is measure-preserving. Second, I wrote about the image at the top of the post, based on Michael Trott’s idea of extending the floor function to the complex plane and plotting it.

This post is a third take on the Gauss map, expanding on a comment by Giovanni Panti. His paper opens by saying

The fact that the euclidean algorithm eventually terminates is pervasive in mathematics. In the language of continued fractions, it can be stated by saying that the orbits of rational points under the Gauss map x ↦ x⁻¹ −⌊x⁻¹⌋ eventually reach zero.

What does the Gauss map have to do with continued fractions or the Euclidean algorithm? We’ll show this by working through an example.

A continued fraction has the form

$a_0 + \cfrac{1}{a_1+\cfrac{1}{a_2+\cfrac{1}{a_3+\ddots}}}$

Let’s start with 162/47 and see how we would write it as a continued fraction. An obvious place to start would be to write this as a proper fraction.

$\frac{162}{47} = 3 + \frac{21}{47}$

Next we turn 21/47 into 1 over something.

$\frac{162}{47} = 3 + \frac{21}{47} = 3 + \cfrac{1}{\cfrac{47}{21}}$

Now let’s do the same thing with 47/21: turn it into a proper fraction 2 + 5/21, then rewrite the fraction part 5/21 as the reciprocal of its reciprocal:

$\frac{162}{47} = 3 + \cfrac{1}{\cfrac{47}{21}} = 3 + \cfrac{1}{2 + \cfrac{5}{21}} = 3 + \cfrac{1}{2 + \cfrac{1}{\cfrac{21}{5}}$

Finally, we write 21/5 as 4 + 1/5, and we’re done:

$\frac{162}{47} = 3 + \cfrac{1}{2 + \cfrac{1}{4 + \cfrac{1}{5}}}$

Now go back and look at what happens to the fraction in the bottom left corner at each step:

$\frac{162}{47} = 3 + \frac{21}{47} = 3 + \cfrac{1}{2 + \cfrac{5}{21}} = 3 + \cfrac{1}{2 + \cfrac{1}{4 + \cfrac{1}{5}}}$

The sequence of bottom left fractions is 21/47, 5/21, 1/5. Each fraction is replaced by its Gauss map: f(21/47) = 5/21, and f(5/21) = 1/5. We applied the Gauss map above naturally in the process of creating a continued fraction.

Now suppose we wanted to find the greatest common divisor of 162 and 47 using the Euclidean algorithm.

Notice that these are the same numbers, produced by the same calculations as above.

[1] There are other things called the Gauss map, such as the map that takes a point on a surface to the unit normal at that point. That’s not the Gauss map we’re talking about here.

Elliptic curve addition formulas

Posted on 17 September 2023 by John

The geometric description of addition of points P and Q on an elliptic curve involves four logical branches:

If one of P or Q is the point at infinity …

Else if P = Q …

Else if P and Q lie on a vertical line …

Else …

It would seem that an algorithm for adding points would have to have the same structure, which is unfortunate for at least a couple reasons: it adds complexity, and it raises the possibility of a timing attack since some branches will execute faster than others.

However, an algebraic formula for addition need not mirror its geometric motivation.

Jacobian coordinates

Jacobian coordinates are a different way to describe elliptic curves. They have the advantage that addition formulas have fewer logic branches than the geometric description. They may have one test of whether a point is the point at infinity but they don’t have multiple branches.

Jacobian coordinates also eliminate the need to do division. The geometric description of addition on an elliptic curve involves calculating where the line through two points intersects the curve, and this naturally involves calculating slopes. But in Jacobian coordinates, addition is done using only addition and multiplication, no division. When you’re working over the field of integers modulo some gigantic prime, multiplication and addition are much faster than division.

You can find much more on Jacobian coordinates, including explicit formulas and operation counts, here.

Complete formulas

Sometimes it’s possible to completely eliminate logic branches as well as division operations. An elliptic curve addition formula is called complete if it is valid for all inputs. The first surprise is that complete addition formulas sometimes exist. The second surprise is that complete addition formulas often exist.

You can find much more on complete addition formulas in the paper “Complete addition formulas for prime order elliptic curves” available here.

McCabe versus Kolmogorov

Whether the Jacobian coordinate formulas or complete formulas are more or less complex than direct implementation of the geometric definition depends on how you define complexity. The formulas are definitely less complex in terms of McCabe complexity, also known as cyclomatic complexity. But they are more complex in terms of Kolmogorov complexity.

The McCabe complexity of a function is essentially the number of independent paths through the function. A complete addition formula, one that could be implemented in software with no branching logic, has the smallest possible McCabe complexity.

I’m using the term Kolmogorov complexity to mean simply the amount of code it takes to implement a function. It’s intuitively clear what this means. But if you make it precise, you end up with a measure of complexity that is only useful in theory and cannot be calculated in practice.

Counting operations

Instead of literally computing Kolmogorov complexity you’d count the number of multiplications and additions necessary to execute a formula. The links above do just that. If you know how many cycles it takes to execute an addition or multiplication (in the field you’re working over), and how many cycles it would take to carry out addition (on the elliptic curve) implemented directly from the definition, include the division required, then you could estimate whether these alternative formulas would save time.

It used to be common to count how many floating point operations it would take to execute an algorithm. I did that back when I took numerical analysis in college. But this became obsolete not long after I learned it. Counting floating point operations no longer tells you as much about an algorithm’s runtime as it used to due to changes in the speed of arithmetic relative to memory access and changes in computer architecture. However, in the context of this post, counting operations still matters because each operation—such as adding two 512-bit numbers modulo a 512-bit prime—involves a lot of more basic operations.

Rational height functions

Posted on 17 September 2023 by John

Mathematicians often speak informally about the relative simplicity of rational numbers. For example, musical intervals that correspond to simple fractions have less tension than intervals that correspond to more complicated fractions.

Such informal statements can be made more precise using height functions. There are a variety of height functions designed for different applications, but the most common height function defines the height of a fraction p/q in lowest terms to be the sum of the numerator and denominator:

height(p/q) = |p| + |q|.

This post will look at how this applies to musical intervals, to approximations for π, and the number of days in a year.

Musical intervals

Here are musical intervals, ranked by the height of their just tuning frequency ratios.

octave (2:1)
fifth (3:2)
forth (4:3)
major sixth (5:3)
major third (5:4)
minor third (6:5)
minor sixth (8:5)
minor seventh (9:5)
major second (10:9)
major seventh (15:8)
minor second (16:15)
augmented fourth (45:32)

The least tension is an interval of an octave. The next six intervals are considered consonant. A minor seventh is considered mildly dissonant, and the rest are considered more dissonant. The most dissonant interval is the augmented fourth, also known as a tritone because it is the same interval as three whole steps.

Incidentally, a telephone busy signal consists of two pitches, 620 Hz and 480 Hz. This is a ratio of 24:31, which has a height of 54. This is consistent with the signal being moderately dissonant.

Approximations to π

The first four continued fraction approximations to π are 3, 22/7, 333/106, and 335/113.

Continued fraction convergents give the best rational approximation to an irrational for a given denominator. But for a height value that is not the height of a convergent, the best approximation might not be a convergent.

For example, the best approximation to π with height less than or equal to 333 + 106 is 333/106. But the best approximation with height less than or equal to 400 is 289/92, which is not a convergent of the continued fraction for π.

Days in a year

The number of days in a year is 365.2424177. Obviously that’s close to 365 1/4, and 1/4 is the best approximation to 0.2424177 for its height.

The Gregorian calendar has 97 leap days every 400 years, which approximates 0.2424177 with 97/400. This approximation has practical advantages for humans, but 8 leap days every 33 years would be a more accurate approximation with a much smaller height.

Finite field Diffie Hellman primes

Posted on 15 September 2023 by John

Diffie-Hellman key exchange is conceptually simple. Alice and Bob want to generate a shared cryptographic key. They want to use asymmetric (public) cryptography to share a symmetric (private) key.

The starting point is a large prime p and a generator 1 < g < p.

Alice generates a large random number x, her private key, and sends Bob g^x mod p.

Similarly, Bob generates a large random number y, his private key, and sends Alice g^y mod p.

Alice takes g^y and raises it to her exponent x, and Bob takes g^x and raises it to the exponent y. They arrive at a common key k because

k = (g^y)^x = (g^x)^y mod p.

The security of the system rests on the assumption that the discrete logarithm problem is hard, i.e. given g and g^z it is computationally impractical to solve for z. This assumption appears to be true in general, but can fail when the group generated by g has exploitable structure.

You can read more about Diffie-Hellman here.

Recommended primes

The choice of prime p and generator g can matter is subtle ways and so there are lists of standard choices that are believed to be secure.

IETF RFC 7919 recommends five standard primes. These have the form

$p = 2^b - 2^{b-64} + 2^{64} \left( \lfloor 2^{b-130} e\rfloor + X \right) - 1$

where b is the size of p in bits, e is the base of natural logarithms, and X is the smallest such that p is a safe prime. In every case the generator is g = 2.

The values of b are 2048, 3072, 4096, 6144, and 8192. The values of X and p are given in RFC 7919, but they’re both determined by b.

I don’t imagine there’s anything special about the constant e above. I suspect it’s there to shake things up a bit in a way that doesn’t appear to be creating a back door. Another irrational number like π or φ would probably do as well, but I don’t know this for sure.

ffdhe2048

The recommended primes have names of the form “ffdhe” followed by b. For b = 2048, the corresponding value is X is 560316.

I wrote a little Python code to verify that this value of X does produce a safe prime and that smaller values of X do not.

    #!/usr/bin/env python3
    
    from sympy import isprime, E, N, floor
    
    b = 2048
    e = N(E, 1000)
    c = floor(2**(b-130) * e)
    d = 2**b - 2**(b-64) + 2**64*c - 1
    
    def candidate(b, x):
        p = d + 2**64*x
        return p
    
    for x in range(560316, 0, -1):
        p = candidate(b, x)
        if isprime(p) and isprime((p-1)//2):
            print(x)

This took about an hour to run. It only printed 560316, verifying the claim in RFC 7919.

Other groups

Finite field Diffie-Hellman is so called because the integers modulo a prime form a finite field. We don’t need a field per se; we’re working in the group formed by the orbit of g within that field. Such groups need to be very large in order to provide security.

It’s possible to use Diffie-Hellman over any group for which the discrete logarithm problem is intractable, and the discrete logarithm problem is harder over elliptic curves than over finite fields. The elliptic curve groups can be smaller and provide the same level of security. Smaller groups mean smaller keys to exchange. For this reason, elliptic curve Diffie-Hellman is more commonly used than finite field Diffie-Hellman.

Chinese Remainder Theorem synthesis algorithm

Posted on 14 September 2023 by John

Suppose m = pq where p and q are large, distinct primes. In the previous post we said that calculations mod m can often be carried out more efficiently by working mod p and mod q, then combining the results to get back to a result mod m.

The Chinese Remainder Theorem assures us that the system of congruences

$\begin{align*} x &\equiv a\pmod p \\ x &\equiv b\pmod q \end{align*}$

has a unique solution mod m, but the theorem doesn’t say how to compute x efficiently.

H. L. Garner developed an algorithm to directly compute x [1]:

$x = p \biggl(\big(\,(a-b)(q^{-1}\bmod p\,)\big)\bmod p\biggr) + b$

You compute the inverse of q mod p once and save it, then solving the system above for multiple values of a and b is very efficient.

Garner’s algorithm extends to more than two factors. We will present the general case of his algorithm below, but first we do a concrete example with RSA keys.

RSA key example

This is a continuation of the example at the bottom of this post.

This shows that the numbers in the key file besides those that are strictly necessary for the RSA algorithm are numbers needed for Garner’s algorithm.

What the key file calls “coefficient” is the inverse of q modulo p.

What the key file calls “exponent1” is the decryption exponent d reduced mod p − 1. Similarly, “exponent2” is d reduced mod q − 1 as explained here.

    from sympy import lcm

    prime1 = 0xf33514...d9
    prime2 = 0xfee496...51

    publicExponent = 65537
    privateExponent = 0x03896d...91

    coefficient = 0x4d5a4c...b7 # q^-1 mod p
    assert(coefficient*prime2 % prime1 == 1)

    exponent1 = 0x37cc69...a1 # e % phi(p)
    exponent2 = 0x2aa06f...01 # e % phi(q)
    assert(privateExponent % (prime1 - 1) == exponent1)
    assert(privateExponent % (prime2 - 1) == exponent2)

More factors

Garner’s algorithm can be used more generally when m is the product of more than two primes [2]. Suppose

$m = \prod_{i=1}^n m_i$

where the m_i are pairwise relatively prime (not necessarily prime). Then the system of congruences

$x \equiv x_i \pmod {m_i}$

for i = 1, 2, 3, …, n can be solved by looking for a solution of the form

$x = v_1 + v_2 m_1 + v_3 m_1m_2 + \cdots + v_n \prod_{i=1}^{n-1} m_i$

where

$v_k \equiv \bigg( x_k - \big(v_1 + v_2m_1 + \cdots + v_{k-1} \prod_{i=1}^{k-2}m_i \big) \bigg) \bigg( \prod_{i=1}^{k-1} m_i\bigg)^{-1} \pmod {m_k}$

Again, in practice the modular inverses of the products of the ms would be precomputed and cached.

[1] Ferguson, Schneier, Kohno. Cryptography Engineering. Wiley. 2010.

[2] Geddes, Czapor, and Labahn. Algorithms for Computer Algebra. Kluwer Academic Publishers. 1992.

Gaining efficiency by working modulo factors

Posted on 14 September 2023 by John

Suppose m is a large integer that you are able to factor. To keep things simple, suppose m = pq where p and q are distinct primes; everything in this post generalizes easily to the case of m having more than two factors.

You can carry out calculations mod m more efficiently by carrying out the same calculations mod p and mod q, then combining the results. We analyze m into its remainders by p and q, carry out our calculations, then synthesize the results to get back to a result mod m.

The Chinese Remainder Theorem (CRT) says that this synthesis is possible; Garner’s algorithm, the subject of the next post, shows how to compute the result promised by the CRT.

For example, if we want to multiply xy mod m, we can analyze x and y as follows.

$\begin{align*} x &\equiv x_p \pmod p \\ x &\equiv x_q \pmod q \\ y &\equiv y_p \pmod p \\ y &\equiv y_q \pmod q \\ \end{align*}$

Then

$\begin{align*} xy &\equiv x_py_p \pmod p \\ xy &\equiv x_qy_q \pmod q \\ \end{align*}$

and by repeatedly multiplying x by itself we have

$\begin{align*} x^e &\equiv x_p^e \pmod p \\ x^e &\equiv x_q^e \pmod q \\ \end{align*}$

Now suppose p and q are 1024-bit primes, as they might be in an implementation of RSA encryption. We can carry out exponentiation mod p and mod q, using 1024-bit numbers, rather than working mod n with 2048-bit numbers.

Furthermore, we can apply Euler’s theorem (or the special case Fermat’s little theorem) to reduce the size of the exponents.

$\begin{align*} x^e &\equiv x_p^e \equiv x_p^{e \bmod p-1}\pmod p \\ x^e &\equiv x_q^e \equiv x_q^{e \bmod q-1}\pmod q \end{align*}$

Assuming again that p and q are 1024-bit numbers, and assuming e is a 2048-bit number, by working mod p and mod q we can use exponents that are 1024-bit numbers.

We still have to put our pieces back together to get the value of x^e mod n, but that’s the subject of the next post.

The trick of working modulo factors is used to speed up RSA decryption. It cannot be used for encryption since p and q are secret.

The next post shows that is in fact used in implementing RSA, and that a key file contains the decryption exponent reduced mod p − 1 and mod q − 1.

Can the chi squared test detect fake primes?

Posted on 21 August 2023 by John

Jeweler examining gemstones

This morning I wrote about Dan Piponi’s fake prime function. This evening I thought about it again and wondered whether the chi-squared test could tell the difference between the distribution of digits in real primes and fake primes.

If the distributions were obviously different, this would be apparent from looking at histograms. When distributions are more similar, visual inspection is not as reliable as a test like chi-squared. For small samples, we can be overly critical of plausible variations. For large samples, statistical tests can detect differences our eyes cannot.

When data fall into a number of buckets, with a moderate number of items expected to fall in each bucket, the chi squared test attempts to measure how the actual counts in each bucket compare to the expected counts.

This is a two-sided test. For example, suppose you expect 12% of items to fall in bucket A and 88% to fall in bucket B. Now suppose you test 1,000 items. It would be suspicious if only 50 items landed in bucket A since we’d expect around 120. On the other hand, getting exactly 120 items would be suspicious too. Getting exactly the expected number is unexpected!

Let’s look a the primes, genuine and fake, less than N = 200. We’ll take the distribution of digits in the list of primes as the expected values and compare to the distribution of the digits in the fake primes.

When I ran this experiment, I got a chi-squared value of 7.77. This is an unremarkable value for a sample from a chi-squared distribution with 9 degrees of freedom. (There are ten digits, but only nine degrees of freedom because if you rule out nine possibilities then digit is determined with certainty.)

The p-value in this case, the probability of seeing a value as large as the one we saw or larger, is 0.557.

Next I increased N to 1,000 and ran the experiment again. Now I got a chi-squared value of 19.08, with a corresponding p-value of 0.024. When I set N to 10,000 I got a chi-squared value of 18.19, with a corresponding p-value of 0.033.

When I used N = 100,000 I got a chi-squared value of 130.26, corresponding to a p-value of 10⁻²³. Finally, when I used N = 1,000,000 I got a chi-squared value of 984.7 and a p-value of 3.4 × 10⁻²⁰⁶.

In short, the chi-squared test needs a fair amount of data to tell that fake primes are fake. The distribution of digits for samples of fake primes less than a thousand or so is plausibly the same as that of actual primes, as far as the test can distinguish. But the chi-squared values get implausibly large for fake primes up to 100,000.

Number theory

Mersenne primes are unsafe

Primes, weeds, and military precision

Gauss map, Euclidean algorithm, and continued fractions

Elliptic curve addition formulas

Jacobian coordinates

Complete formulas

McCabe versus Kolmogorov

Counting operations

Related posts

Rational height functions

Musical intervals

Approximations to π

Days in a year

Related posts

Finite field Diffie Hellman primes

Recommended primes

ffdhe2048

Other groups

Chinese Remainder Theorem synthesis algorithm

RSA key example

More factors

Related posts

Gaining efficiency by working modulo factors

Can the chi squared test detect fake primes?