The web is all abuzz about how SHA-1 is “broken”, “a failure,” “obsolete”, etc.
It is supposed to be computationally impractical to create two documents that have the same secure hash code, and yet Google has demonstrated that they have done just that for the SHA-1 algorithm.
I’d like to make two simple observations to put this in perspective.
This is not a surprise. Cryptography experts have suspected since 2005 that SHA-1 was vulnerable and recommended using other algorithms. The security community has been gleeful about Google’s announcement. They feel vindicated for telling people for years not to use SHA-1.
This took a lot of work, both in terms of research and computing. Crypto researchers have been trying to break SHA-1 for 22 years. And according to their announcement, these are the resources Google had to use to break SHA-1:
- Nine quintillion (9,223,372,036,854,775,808) SHA-1 computations in total
- 6,500 years of CPU computation to complete the attack first phase
- 110 years of GPU computation to complete the second phase
While SHA-1 is no longer recommended, it’s hardly a failure. I don’t imagine it would take 22 years of research and millions of CPU hours to break into my car.
I’m not saying people should use SHA-1. I recently advised a client not to use SHA-1. Why not use a better algorithm (or at least what is currently widely believed to be a better algorithm) like SHA-256? But I am saying it’s easy to exaggerate what it means to say SHA-1 has failed.
Update: Attacks on SHA-1 have gotten orders of magnitude more efficient in the time since this post was first written. See this announcement of a chosen prefix collision.
While, probably, “there’s still time” (though not a lot), what are current ideas for migrating git to a different hash?
See Linus Torvald’s response.
It’s very misleading to emphasize the millions of CPU hours that it took to find the first SHA-1 collision, because an unsophisticated reader might think that the second SHA-1 collision will also take millions of CPU hours to find. In fact, second (and subsequent) collisions can be trivially found just by appending arbitrary identical suffixes to the first collision, since SHA-1 is an iterated hash function. The ease of finding multiple collisions makes SHA-1 considerably less secure than one would naively expect if one were a non-expert.
DJ: I agree that the second instance of hash collisions is easier to create than the first, but duplicating Google’s result doesn’t show a security weakness per se. Yes, you could take their artificial example and create a minor variation on it.
But the real equation is how hard would it be to take a document you care about, say a contract, and for someone to tamper with it while preserving the hash value.
Google chose PDFs for a reason. They’re easy to change without making it noticeable. See Linus Torvald’s comments on why the Google announcement isn’t necessarily disastrous news for git.
John, Linus is far from an expert on cryptography and his opinion shouldn’t be given any more weight than if a crypto expert like me were to try to opine on operating systems or some other area in which I’m not an expert. His comments right off the bat contain several obvious factual errors. For example he states that the length field in git should “make collision attacks much harder” which is simply complete nonsense since all known SHA-1 collisions have identical length so the length field would be of absolutely no use in preventing a collision attack.
> For example he states that the length field in git should
> “make collision attacks much harder” which is simply
> complete nonsense since all known SHA-1 collisions have
> identical length
The problem is trying to find a collision that 1) has the same length, 2) is compilable and 3) does what you want.
The easiest way to create a collision in C is inside of a comment:
/* put some garbage here */
if you add 100 characters of garbage you’ll have to remove 100 characters of useful code, with some limitations: you can’t just cut keywords in the middle. If you change a variable name you must change it everywhere.
And, then, you’ll have to inject your own malicious code.
All in all, I think Linus is right.