Are guidance documents laws?

Are guidance documents laws? No, but they can have legal significance.

The people who generate regulatory guidance documents are not legislators. Legislators delegate to agencies to make rules, and agencies delegate to other organizations to make guidelines. For example [1],

Even HHS, which has express cybersecurity rulemaking authority under the Health Insurance Portability and Accountability Act (HIPAA), has put a lot of the details of what it considers adequate cybersecurity into non-binding guidelines.

I’m not a lawyer, so nothing I can should be considered legal advice. However, the authors of [1] are lawyers.

The legal status of guidance documents is contested. According to [2], Executive Order 13892 said that agencies

may not treat noncompliance with a standard of conduct announced solely in a guidance document as itself a violation of applicable statutes or regulations.

Makes sense to me, but EO 13992 revoked EO 13892.

Again according to [3],

Under the common law, it used to be that government advisories, guidelines, and other non-binding statements were non-binding hearsay [in private litigation]. However, in 1975, the Fifth Circuit held that advisory materials … are an exception to the hearsay rule … It’s not clear if this is now the majority rule.

In short, it’s fuzzy.

Update: On June 28, 2024 the US Supreme Court overruled the 1984 decision Chevron v. Natural Resources Defense Council. The Chevron doctrine held that courts should defer to regulatory agencies regarding the interpretation of regulations. In a pair of decisions, the Supreme Court held that the Chevron doctrine was a violation of separation of powers, giving executive agencies judicial power.

 

[1] Jim Dempsey and John P. Carlin. Cybersecurity Law Fundamentals, Second Edition, page 245.

[2] Ibid., page 199.

[3] Ibid., page 200.

 

Randomize, then humanize

Yesterday I wrote about a way to memorize a random 256-bit encryption key. This isn’t trivial, but it’s doable using memory techniques.

There’s a much easier way to create a memorable encryption key: start with something memorable, then apply a hash function. Why not just do that?

There are two conflicting criteria to satisfy: cryptographic strength and ease of memorization. Any password is a compromise between these two goals.

You get better security if you generate a random password, then try to make it memorable through some technique for making it more palatable to a human mind. You get something easier to remember if you start with something human-friendly and apply some process to make it appear more random.

In a nutshell, you can either randomize then humanize, or humanize then randomize.

Humanize then randomize, or randomize then humanize

You get better ease of use if you humanize then randomize. You get better security if you randomize then humanize.

This morning I ran across a paper by Arnold Reinhold suggesting that people generate 10-digit passwords by first generating 10 random letters, then create a mnemonic sentence with each word starting with one of the letters. Reinhold says that this leads to a greater variety of passwords than if you were to start with mnemonic sentence and somehow reduce it to 10 letters. This is an example of the randomize-then-humanize pattern.

Why?

There are more possibilities for an attacker to have to explore if you start with random input.

For example, suppose a site requires an 8-character password and you choose an 8-letter word. There are only about 30,000 English words with eight letters [1], and people are far more likely to choose some of these words than others. If you randomly choose a 3-character password using digits and letters (upper case and lower case) there are 623 = 238,328 possibilities. A three-character random password is far better than an 8-character word.

In Reinhold’s example, there are 2610 possible passwords made of 10 lowercase letters. That’s over 100 trillion possibilities. There are certainly fewer than 100 trillion pass phrases that humans are likely to come up with. Say you want to use your favorite sentence from a famous book. Suppose there are 1,000 famous books and each has 10,000 sentences. That’s only 10 million possibilities.

Human-generated randomness

People are not that good at generating randomness. Here’s a passage from David Kahn’s book The Codebreakers about the results of asking typists to create pages of random numbers for use in one-time pads.

Interestingly, some pads seem to be produced by typists and not by machines. They show strike-overs and erasures — neither likely to be made by machines. More significant are statistical analyses of the digits. One such pad, for example, has seven times as many groups in which digits in the 1-to-5 group alternate with digits in the 6-to-0 group, like 18293, as a purely random arrangement would have. This suggests that the typist is striking alternately with her left hand (which would type the 1-to-5 group on a Continental machine) and her right (which would type the 6-to-0 group). Again, instead of just half the groups beginning with a low number, which would be expected in a random selection, three quarters of them do, possibly because the typist is spacing with her right hand, then starting a new group with her left. Fewer doubles and triples appear than chance expects.

How hacks work

Websites implicitly use the humanize-then-randomize approach. When you create a password, the site hashes what you type and stores the hashed value. (A naive site might store the actual password.) Then the next time you log in, the site hashes your password input and compares it to the stored value.

If the site is hacked, and the site’s hashing algorithm is known, then many of these passwords can be recovered. This happens routinely. If you apply a hash function to a list of 10,000 common passwords, there are only 10,000 hash values, and you simply search the hacked list for these values. And since people often reuse passwords, someone who knows your password on one site can try that password on another site.

If you use a randomly generated password for each site, it’s less likely any individual password will be exposed. And if a password is exposed, a hacker cannot use it on another site.

Related posts

[1] On my Macbook, grep '^........$' words | wc -l returned 30,001. You’d get different results from searching different word lists, but your results wouldn’t vary too much.

Why not reuse passwords?

Perhaps you’ve heard that you should not reuse passwords but don’t understand why. After all, you have a really good password, one that nobody would ever guess, so why not use it everywhere?

Your password is not as good as you think

First of all, people who think they have a really good password are usually wrong. They implicitly assume that an attacker would have to recreate the devious process they went through to create the password. This is not the case. A password needs to be hard for an attacker to crack, not hard for the owner to create, and these are not the same thing. The former requires passwords that are long and randomly generated.

Credential stuffing

Suppose you use the same password on sites X and Y. Then site X gets hacked, revealing your password. Now your username (most likely your email address) and password is part of a list posted online somewhere. Hackers then use scripts to see whether the same credentials will work somewhere else, and they often do. If they try site Y, then your account on Y has been compromised as well.

Now suppose you use a different username at the two sites. This is better, but the hack on X still means that your password has been added to a list of known passwords.

The worst case scenario is a site that stores passwords in the clear. This practice has long been discouraged, but the more times you reuse a password, the higher the chance that you’ll use your password at a site that does not hash passwords.

Hashing

Most sites these days don’t store your password per se but a cryptographic hash of your password. When you enter your password, the site hashes it and compares the result to the hashed value it has on file. If they match, you’re in.

If your hashed password for X is part of a breach, and the hashing algorithms for X and Y known, an attacker can try hashing a list of known passwords and maybe one of them will match the hash of your password on Y.

If sites X and Y use different hashing algorithms, then a hash of your password for X is not directly useful to hacking your account on site Y. But it is possible to “unhash” passwords, especially if a site uses a hashing algorithm that has been broken. This takes a lot of computation, but people do it.

Example

This is not hypothetical. It has happened many times. For example, it was part of what lead to the recent 23andMe breach. And when a hacker obtains one person’s family tree data, they obtain data on many other people at the same time. If you used a unique, long, randomly generated password on 23andMe but your cousin used password123, then your data may have been stolen.

What to do?

What can you do? You almost have to use a password manager. A strong password is necessarily hard to remember, and memorizing hundreds of strong passwords would be quite a chore. Still, you might want to memorize one or two strong passwords if they protect something really valuable.

Related posts

Pratt Primality Certificates

The previous post implicitly asserted that J = 8675309 is a prime number. Suppose you wanted proof that this number is prime.

You could get some evidence that J is probably prime by demonstrating that

2J−1 = 1 mod J.

You could do this in Python by running the following [1].

    >>> J = 8675309
    >>> assert( pow(2, J-1, J) == 1 )

This shows J is a probable prime to base 2.

If you want more evidence, you could also show J is a probable prime to base 3.

    >>> assert( pow(3, J-1, J) == 1 )

But no matter how many bases you try, you won’t have proof that J is prime, only evidence. There are pseudoprimes, (rare) composite numbers that satisfy the necessary-but-not-quite-sufficient conditions of Fermat’s primality test.

Primality certificates

A primality certificate is a proof that a number is prime. To be practical, a certificate must be persuasive and efficient to compute.

We could show that J is not divisible by any integer less than √J. That would actually be practical because J is not that large.

    >>> for n in range(2, int(J**0.5)+1):
    ...     assert(J % n > 0)

But we’d like to use J to illustrate a method that scales to much larger numbers than J.

Pratt certificates

Pratt primality certificates are based on a theorem by Lucas [2] that says a number n is prime if there exists a number a such that two conditions hold:

an−1 = 1 mod n,

and

a(n−1)/p ≠ 1 mod n

for all primes p that divide n − 1.

How do you find a? See this post.

Example

To find a Pratt certificate for J, we have to factor J − 1. I assert that

J − 1 = 8675308 = 4 × 2168827

and that 2168827 is prime. Here’s verification that Lucas’ theorem holds:

    >>> assert( pow(2, (J-1)//2, J) != 1 )
    >>> assert( pow(2, (J-1)//2168827, J) != 1 )

What’s that you say? You’re not convinced that 2168827 is prime? Well then, we’ll come up with a Pratt certificate for 2168827.

Pratt certificates are generally recursive. To prove that p is prime, we have to factor p − 1, then prove all the claimed prime factors of p − 1 are prime, etc. The recursion ends when it gets down to some set of accepted primes.

Now I assert that

    2168827 − 1 = 2168826 = 2 × 3 × 11 × 17 × 1933

and that all these numbers are prime. I’ll assume you’re OK with that, except you’re skeptical that 1933 is prime.

The following code is proof that 2168827 is prime, assuming 1933 is prime.

    >>> m = 2168827
    >>> for p in [2, 3, 11, 17, 1933]:
    ...     assert( pow(3, (m-1)//p, m) != 1 )

Finally, we’ll prove that 1933 is prime.

You can verify that

    1933 − 1 = 1932 = 2² × 3 × 7 × 23

and I assume you’re convinced that each of these factors is prime.

    >>> m = 1933
    >>> for p in [2, 3, 7, 23]:
    ...     assert( pow(5, (m-1)//p, m) != 1 )

Pratt certificates can be written in a compact form that verification software can read. Here I’ve made the process more chatty just to illustrate the parts.

Update: Here’s a visualization of the process above, drawing arrows from each prime p to the prime factors of p − 1.

In this post we’ve agree to recognize 2, 3, 7, 11, 17, and 23 as primes. But you only have to assume 2 is prime. This would make the software implementation more elegant but would make the example tedious for a human consumption.

Efficiency

A primality certificate does not have to be efficient to produce, though of course that would be nice. It has to be efficient to verify. You could imagine that the prime number salesman has more compute power available than the prime number customer. In the example above, I used a computer to generate the Pratt certificate, but it wouldn’t be unreasonable to verify the certificate by hand.

The brute force certificate above, trying all divisors up to √p, obviously takes √p calculations to verify. A Pratt certificate, however, takes about

4 log2 p

calculations. So verifying a 10-digit prime requires on the order of 100 calculations rather than on the order of 100,000 calculations.

Atkin-Goldwasser-Kilian-Morain certificates

Producing Pratt certificates for very large numbers is difficult. Other certificate methods, like Atkin-Goldwasser-Kilian-Morain certificates, scale up better. Atkin-Goldwasser-Kilian-Morain certificates are more complicated to describe because they involve elliptic curves.

Just as Pratt took a characterization of primes by Lucas and turned it into a practical certification method, Atkin and Morain turned a characterization of primes by Goldwasser and Kilian, one involving elliptic curves, and turned it into an efficient certification method.

These certificates have the same recursive nature as Pratt certificates: proving that a number is prime requires proving that another (smaller) number is prime.

Update: More on elliptic curve primality proofs.

***

[1] This is a more efficient calculation than it seems. It can be done quickly using fast exponentiation. Note that it is not necessary to compute 2J−1 per se; we can carry out every intermediate calculation mod J.

[2] Lucas was French, and so the final s in his name is silent.

The Orange Book

I was spelunking around in Unicode and saw that there’s an emoji for orange book, U+1F4D9.

Orange Book emoji Linux

As is often the case, the emoji renders differently in different contexts. The image above is from my Linux desktop and the image below is from my Macbook. I tried created an image on my Windows box but it didn’t have a glyph for this character.

Orange book emoji Apple

When I saw “orange book” my first thought was the US Department of Defense publication Trusted Computer System Evaluation Criteria (TCSEC), aka DoDD 5200.28-STD, commonly know in security circles as the Orange Book. For a split second I thought “Is this book so famous that there’s an emoji for it?!” But of course that’s not the case.

There are emoji for several colors of books, and there are several books in the DOD “Rainbow Series” publications. such as the Green Book for password management. (There’s also an emoji for green book: U+1F4D7).

TCSEC

So what is The Orange Book? Here’s a photo of the cover.

Department of Defense Trusted Computer System Evaluation Criteria

Here’s how Bruce Schneier describes the book in Applied Cryptography.

The NCSC publishes the infamous “Orange Book.” It’s actual title is the Department of Defense Trusted Computer System Evaluation Criteria, but that’s a mouthful to say and the book has an orange cover. The Orange Book attempts to define security requirements, gives computer manufacturers an objective way to measure the security of their systems, and guides them as to what to build into their secure products. It focuses on computer security and doesn’t really say a lot about cryptography.

The Orange Book is now deprecated in favor of The Common Criteria for Information Technology Security Evaluation, ISO/IEC 15408.

Related posts

Repunits: primes and passwords

A repunit is a number whose base 10 representation consists entirely of 1s. The number consisting of n 1s is denoted Rn.

Repunit primes

A repunit prime is, unsurprisingly, a repunit number which is prime. The most obvious example is R2 = 11. Until recently the repunit numbers confirmed to be prime were Rn for n = 2, 19, 23, 317, 1031. Now the case for n = 49081 has been confirmed.

R_{49081} = \frac{10^{49081} - 1}{9} = \underbrace{\mbox{111 \ldots 1}}_{\mbox{{\normalsize 49,081 ones}} }

Here is the announcement. The date posted at the top of the page is from March this year, but I believe the announcement is new. Maybe the author edited an old page and didn’t update the shown date.

Repunit passwords

Incidentally, I noticed a lot of repunits when I wrote about bad passwords a few days ago. That post explored a list of commonly used but broken passwords. This is the list of passwords that password cracking software will try first. The numbers Rn are part of the list for the following values of n:

1–45, 47–49, 51, 53–54, 57–60, 62, 67, 70, 72, 77, 82, 84, 147

So 46 is the smallest value of n such that Rn is not on the list. I would not recommend using R46 as a password, but surprisingly there are a lot of worse choices.

The bad password file is sorted in terms of popularity, and you might expect repunits to appear in the file in order, i.e. shorter sequences first. That is sorta true overall. But you can see streaks in the plot below showing multiple runs where longer passwords are more common than shorter passwords.

Exploring bad passwords

If your password is in the file rockyou.txt then it’s a bad password. Password cracking software will find it instantly. (Use long, randomly generated passwords; staying off the list of worst passwords is necessary but not sufficient for security.)

The rockyou.txt file currently contains 14,344,394 bad passwords. I poked around in the file and this post reports some things I found.

To make things more interesting, I made myself a rule that I could only use command line utilities.

Pure numeric passwords

I was curious how many of these passwords consisted only of digits so I ran the following.

    grep -P '^\d+$' rockyou.txt | wc -l

This says 2,346,744 of the passwords only contain digits, about 1 in 6.

Digit distribution

I made a file of digits appearing in the passwords

    grep -o -P '\d' rockyou.txt > digits

and looked at the frequency of digits.

    for i in 0 1 2 3 4 5 6 7 8 9
    do 
        grep -c $i digits
    done

This is what I got:

    5740291
    6734380
    5237479
    3767584
    3391342
    3355180
    3118364
    3100596
    3567258
    3855490

The digits are distributed more evenly than I would have expected. 1’s are more common than other digits, but only about twice as common as the least common digits.

Longest bad passwords

How long is the longest bad password? The command

    wc -L rockyou.txt

shows that one line in the file is 285 characters long. What is this password? The command

    grep -P '.{285}' rockyou.txt

shows that it’s some HTML code. Nice try whoever thought of that, but you’ve been pwned.

A similar search for all-digit passwords show that the longest numeric passwords are 255 digits long. One of these is a string of 255 zeros.

Dictionary words

A common bit of advice is to not choose passwords that can be found in a database. That’s good advice as far as it goes, but it doesn’t go very far.

I used the comm utility to see how many bad passwords are not in the dictionary by running

    comm -23 sorted dict | wc -l

and the answer was 14,310,684. Nearly all the bad passwords are not in a dictionary!

(Here sorted is a sorted version of the rockyou.txt file; I believe the file is initially sorted by popularity, worst passwords first. The comm utility complained that my system dictionary isn’t sorted, which I found odd, but I sorted it to make comm happy and dict is the sorted file.)

Curiously, the command

    comm -13 sorted dict | wc -l

shows there are 70,624 words in the dictionary (specifically, the american-english file on my Linux box) that are not on the bad password list.

Smallest ‘good’ numeric password

What is the smallest number not in the list of pure numeric passwords? The following command strips leading zeros from purely numeric passwords, sorts the results as numbers, removes duplicates, and stores the results in a file called nums.

    grep -P '^\d+$' rockyou.txt | sed 's/^0\+//' | sort -n | uniq > nums

The file nums begins with a blank. I removed this with sed.

    sed -i 1d nums

Next I used awk to print instances where the line number does not match the line in the file nums.

    awk '{if (NR-$0 < 0) print $0 }' nums | less

The first number this prints is 61. This means that the first line is 1, the second line is 2, and so on, but the 60th line is 61. That means 60 is missing. The file rockyou.txt does not contain 60. You can verify this: the command

    grep '^60$' rockyou.txt

returns nothing. 60 is the smallest number not in the bad password file. There are passwords that contain ’60’ as a substring, but just 60 as a complete password is not in the file.

Related posts

More on attacking combination locks

A couple weeks ago I wrote about how De Bruijn sequences can be used to attack locks where there is no “enter” key, i.e. the lock will open once the right symbols have been entered.

Here’s a variation on this theme: what about locks that let you press more than one button at a time? [1]

Lock that lets you push more than one button at a time

You could just treat this as if it were a keypad with more buttons. For example, suppose you can push either one or two buttons at a time on the lock pictured above. Then you could treat this as a lock with 15 buttons: the five actual buttons and 10 virtual buttons corresponding to the ten ways you could choose 2 buttons out of 5 to press at the same time.

Maybe a lock would let you press even more buttons at once. I don’t think I’ve ever used a combination that required more than two buttons at once, but in principle you could push up to five buttons at once in the photo above, for a total of 31 possible button combinations. And since 31 is prime, you could use the algorithm here to generate the corresponding De Bruijn sequence.

If you know how long the password is, you can try the De Bruijn sequence for passwords of that length. But what if you don’t know the length of the password a priori? You could try the De Bruijn sequence for passwords of length 1, then length 2, then length 3, etc. But is there a more efficient way?

If there are k button combinations and the password has length n, then the optimal solution is to start entering a De Bruijn sequence B(k, n) until the lock opens. If you know the password has length no more than 4, then you could try a B(k, 4) sequence, and if the password is actually shorter than 4, say length 3, you’ll still open it.

But what if you’ve tried a B(k, 4) sequence and it turns out the password has length 5? You could do better than starting over with a B(k, 5) sequence, because some of the substrings in that B(k, 5) sequence will have already been tried. But how could you do this systematically? If you don’t know the length of the password, how could you do better than trying B(k, 1), then B(k, 2), then B(k, 3) etc.?

Related posts

[1] For this post, I’m assuming the lock will open as soon as you enter the right sequence of buttons. For example, if the pass code is 345 and you enter 12345 the lock will open. I don’t know whether these locks work that way. Maybe you have to turn the handle, which would effectively act as an enter key. But maybe there’s a way to listen to the lock so that you’ll know when the combination has been entered, before you twist the handle.

Update: According to the first comment below, locks of the kind featured in the photo only allow each button to be used once. That puts severe limits on the number of possible combinations, and the approach outlined here would be unnecessary. However, the post brings up more general questions that could be useful in another setting.

Cracking pass codes with De Bruijn sequences

keypad

Suppose you have a keypad that will unlock a door as soon as it sees a specified sequence of four digits. There’s no “enter” key to mark the end of a four-digit sequence, so the four digits could come at any time, though they have to be sequential. So, for example, if the pass code is 9235, if you started entering 1139235… the lock would open as soon as you enter the 5.

How long would it take to attack such a lock by brute force? There are 104 possible 4-digit codes, so you could enter

000000010002…99989999

until the lock opens, but there’s a more efficient way. It’s still brute force, but not quite as brute.

The sequence above could be make more efficient. For example, it tests for codes 0000 and 0001 by entering both full codes. But entering 00001 would test for both. This takes advantage of the fact that the code could start anywhere, not just at every fourth position.

De Bruijn sequences

The optimal solution to this problem uses a De Bruijn sequence B(k, n), the shortest sequence of symbols from an alphabet of size k that contains every possible subsequence of length n. In the example above, n = 4 and k = 10. A sequence in B(k, n) has length kn. Using a De Bruijn sequence, you could break the lock above by entering at most 10,000 digits rather than 40,000 as in the most naive approach. (That’s almost true. We’ve left out a detail that we’ll get to shortly.)

Update: See this post on generating De Bruijn sequences.

Binary example

Before we go any further, let’s look at a couple examples of De Bruijn sequences. First, let’s use k = 2 and n = 3. Then

00010111

is a B(2,3) sequence. You can see that this contains 000, 001, 010, and 011. But what about 100? Here’s the detail we left out: In order to contain every possible subsequence, you have to consider the De Bruijn sequence as a cycle. So if we use the last bit and then start over with the first two bits, now we have 100. We also have to wrap around to find 110.

DNA example

Next, here’s an example with k = 4 and n = 3. The following is an example of a De Bruijn sequence for all triples of DNA base pairs.

AAACAAGAATACCACGACTAGCAGGAGTATCATGATTCCCGCCTCGGCGTCTGCTTGGGTGTTT

You could specify any triple by giving its starting location in the sequence: AAA starts in position 1, AAC in position 2, AAG in position 4, etc. Note that TTA starts in position 63 and wraps around. TAA starts in position 64 and also wraps around.

De Bruijn sequences are as short as possible for the problem they solve. For example, the sequence above has 64 characters. Since each of the 64 possible triples corresponds to a starting location, the sequence could not have any fewer than 64 characters.

Brute force with and without an enter key

The De Bruijn sequence has kn symbols, but to attack a pass code of length n on an alphabet of k symbols we might have to enter kn + n − 1 symbols because of wrapping the sequence back to the beginning. Using De Bruijn sequences cuts the amount of work necessary to perform a brute force attack by about n; it would be exactly n if it weren’t for accounting for the need to possibly wrap around and reuse the first few symbols.

In our keypad example, an attack on 4-digit pass codes might need to enter as many as 10,003 digits. If someone added an “enter” key to the keypad and required the user to enter exactly four digits at a time, this would increase the effort of a brute force attack by a factor of approximately 4, from 10,003 keys to 40,000 keys.

But this requires the user to enter five keys rather than four, the last one being the enter key. If the designer increased the pass code length to five digits that could occur at any time, then a brute force attack using De Bruijn sequences would require up to 100,004 keys. Increasing the pass code length by one increases the difficulty of a brute force attack more than requiring a terminator key would.

This is true in general when the alphabet size k is larger than the pass code length n. Suppose you have an alphabet of size k and are using pass codes of length n with no terminator. Requiring a terminator multiplies the difficulty of a brute force attack by n, but requiring one more character in pass codes multiplies the difficulty by k [1].

Increasing the alphabet size also increases security. So instead of using #, for example, as an enter key, a designer could use it as a possible pass code symbol. It could still appear at the end of a pass code, but it could also appear anywhere else, increasing the number of possible pass codes.

Related posts

[1] To be precise, a brute force attack on keys of length n using De Bruijn sequences takes kn + n − 1 symbols. Adding a terminator changes the brute force difficulty to n kn. Requiring one more symbol in a pass code instead changes it to kn+1 + n.

So roughly n kn versus k kn. If k > n, multiplying by k makes a bigger increase than multiplying by n.

Making public keys factorable with Rowhammer

The security of RSA encryption depends on the fact that the product of two large primes is difficult to factor. So if p and q are large primes, say 2048 bits each, then you can publish n = pq with little fear that someone can factor n to recover p and q.

But if you can change n by a tiny amount, you may make it much easier to factor. The Rowhammer attack does this by causing DRAM memory to flip bits. Note that we’re not talking about breaking someone’s encryption in the usual sense. We’re talking about secretly changing their encryption to a system we can break.

To illustrate on a small scale what a difference changing one bit can make, let p = 251 and q = 643.  Then n = pq = 161393. If we flip the last bit of n we get m = 161392. Although n is hard to factor by hand because it has no small factors, m is easy to factor, and in fact

161392 = 24 × 7 × 11 × 131.

For a larger example, I generated two 100-bit random primes in Mathematica

p = 1078376712338123201911958185123
q = 1126171711601272883728179081277

and was able to have it factor n = pq in about 100 seconds. But Mathematica was able to factor n − 1 in a third of a second.

So far we have looked at flipping the least significant bit. But Rowhammer isn’t that precise. It might flip some other bit.

If you flip any bit of a product of two large primes, you’re likely to get an easier factoring problem, though the difficulty depends on the number you start with and which bit you flip. To illustrate this, I flipped each of the bits one at a time and measured how long it took to factor the result.

The median time to factor n with one bit flipped was 0.4 seconds. Here’s a plot of the factoring times as a function of which bit was flipped.

factoring times for corrupted product of primes

The plot shows about 80% of the data. Twenty percent of the time the value was above 11 seconds, and the maximum value was 74 seconds. So in every case flipping one bit made the factorization easier, usually quite a lot easier, but only a little easier in the worst case.

To verify that the results above were typical, I did a second experiment. This time I generated a sequence of pairs of random 100-bit primes. I factored their product, then factored the product with a randomly chosen bit flipped. Here are the factoring times in seconds.

    |----------+---------|
    | Original | Flipped |
    |----------+---------|
    |  117.563 |   3.828 |
    |  108.672 |   4.875 |
    |   99.641 |   0.422 |
    |  103.031 |   0.000 |
    |   99.188 |   0.000 |
    |  102.453 |   0.234 |
    |   79.594 |   0.094 |
    |   91.031 |   0.875 |
    |   64.313 |   0.000 |
    |   95.719 |   6.500 |
    |  131.125 |   0.375 |
    |   97.219 |   0.000 |
    |   98.828 |   0.203 |
    |----------+---------|

By the way, we started this post by talking about maliciously flipping a bit. The same thing can happen without a planned attack if memory is hit by a cosmic ray.

More security posts