The RSA encryption setup begins by finding two large prime numbers. These numbers are kept secret, but their product is made public. We discuss below just how difficult it is to recover two large primes from knowing their product.
Suppose two people share one prime. That is, one person chooses primes p and q and the other chooses p and r, and q ≠ r. (This has happened [1].) Then you can easily find p. All three primes can be obtained in less than a millisecond as illustrated by the Python script below.
In a way, it would be better to share both your primes with someone else than to share just one. If both your primes are the same as someone else’s, you could read each other’s messages, but a third party attacker could not. If you’re lucky, the person you share primes with doesn’t know that you share primes or isn’t interested in reading your mail. But if that person’s private key is ever disclosed, so is yours.
Python code
Nearly all the time required to run the script is devoted to finding the primes, so we time just the factorization.
from secrets import randbits from sympy import nextprime, gcd from timeit import default_timer as timer numbits = 2048 p = nextprime(randbits(numbits)) q = nextprime(randbits(numbits)) r = nextprime(randbits(numbits)) m = p*q n = p*r t0 = timer() g = gcd(m, n) assert(p == g) assert(q == m//p) assert(r == n//p) t1 = timer() # Print time in milliseconds print(1000*(t1 - t0))
Python notes
There are a few noteworthy things about the Python code.
First, it uses a cryptographic random number generator, not the default generator, to create 2048-bit random numbers.
Second, it uses the portable default_timer
which chooses the appropriate timer on different operating systems. Note that it returns wall clock time, not CPU time.
Finally, it uses integer division //
to recover q and r. If you use the customary division operator Python will carry out integer division and attempt to cast the result to a float, resulting in the error message “OverflowError: integer division result too large for a float.”
GCD vs factoring
If you wanted to find the greatest common divisor of two small numbers by hand, you’d probably start by factoring the two numbers. But as Euclid knew, you don’t have to factor two numbers before you can find the greatest common factor of the numbers. For large numbers, the Euclidean algorithm is orders of magnitude faster than factoring.
Nobody has announced being able to factor a 1024 bit number yet. The numbers m and n above have four times as many bits. The largest of the RSA numbers factored so far has 768 bits. This was done in 2009 and took approximately two CPU millennia, i.e. around 2000 CPU years.
Fast Euclidean algorithm
The classic Euclidean algorithm takes O(n²) operations to find the greatest common divisor of two integers of length n. But there are modern sub-quadratic variations on the Euclidean algorithm that take
O(n log²(n) log(log(n)))
operations, which is much faster for very large numbers. I believe SymPy is using the classic Euclidean algorithm, but it could transparently switch to using the fast Euclidean algorithm for large inputs.
Related posts
- How long does it take to find large primes?
- Fermat’s factoring trick
- Most RSA implementations use the same public exponent
[1] As noted in the comments, this has happened due to faulty software implementation, but it shouldn’t happen by chance. By the prime number theorem, the number of n-bit primes is approximately 2n / (n log 2). If M people choose 2M primes each n bits long, the probability of two of these primes being the same is roughly
22−n M n log 2.
If M = 7.5 billion (the current world population) and n = 1024, the probability of two primes being the same is about 10−295. This is roughly the probability of winning the Powerball jackpot 40 times in a row.
In 2012, a few teams of researcher used this technique to assess the security of RSA keys used online, and managed to obtain a non-negligible number of private keys: see https://factorable.net/ for the first team, and https://eprint.iacr.org/2012/064.pdf for the second.
I found the following comment confusing / ambiguous:
> If you use the customary division operator Python will carry out integer division and attempt to cast the result to a float
In Python 3, the division `/` operator will try and return a fractional/floating point result when applied to two integer values. It tries to do this intelligently at full precision, but this will fail if the magnitude of the operands are too far apart.
Specifically, your usage is getting caught here:
https://github.com/python/cpython/blob/master/Objects/longobject.c#L3955
The description in the comments at the top of that function are kind of impressive — I didn’t realise Python goes to so much trouble to give a “sensible” answer!