Yesterday, I took a critical look at the difficulty of interpreting progress in password cracking. Today I’ll make a broader argument that even if we had good data to evaluate cracking efficiency, recent progress isn’t a major threat the vast majority of web passwords. Efficient and powerful cracking tools are useful in some targeted attack scenarios, but just don’t change the economics of industrial-scale attacks against web accounts. The basic mechanics of web passwords mean highly-efficient cracking doesn’t offer much benefit in untargeted attacks.
First, although the majority of web sites actually don’t impose explicit rate-limiting measures on password guessing, in practice the number of online guesses which can be made without triggering attention from administrators or anti-DOS mechanisms is usually still only in the thousands or hundreds of thousands. As I argued yesterday and and explored in detail in my thesis, it’s easy to to perform a relatively efficient password guessing attack of a million guesses simply using lists of known passwords from previous leaks. Sophisticated cracking algorithms are mostly useless for online guessing.
Making hundreds of millions of guesses, at which point cracking becomes an interesting technical challenge, is generally only possible for web passwords after a database of hashes is compromised. Consider the three most common cases though: if passwords are stored unhashed (we estimated 29-40% of sites didn’t hash in 2010), cracking is unnecessary. If passwords are stored with a standard, unsalted hash like MD5 or SHA1 (which are still quite common), then large rainbow tables are freely available covering all passwords up to 9 or 10 characters long. This makes it possible to break the vast majority of passwords near-instantly, as witnessed in KoreLogic’s rapid success against the MilitarySingles leak. In effect, rainbow tables allow so much cracking power to be wielded that the importance of efficiency is minimised. Finally, if passwords are stored properly with a salted, iterated hash, then it probably makes little sense to expend trillions of costly guesses against each of millions of accounts due to the economics of untargeted attacks. Extra cracking power certainly helps the attacker here, but if only a few tens of millions of guesses are made per account then once again attackers can be successful just using known password lists.
Thus, for large scale compromise of web accounts by cracking leaked hashes (possibly to use at another site where the password was re-used) improved cracking efficiency is of minimal effect and increased power likely means only a marginal difference in the proportion of compromised accounts. Furthermore, this threat model, despite an increase number of leaks, still appears to be less common than malware (keyloggers), phishing, and perhaps sniffing passwords on open wireless networks. Regardless of improvements in password cracking, administrators of any large web site deploying passwords need protocols for dealing with password compromise due to all of these threat models. Improvements in cracking don’t materially change the game here.
Cracking efficiency matters primarily in targeted attack scenarios, such as taking over celebrity social media accounts, breaking system passwords in an “advanced persistent threat” scenario, attempting to decrypt an encrypted file system or stolen PGP private key, or perhaps cracking the access password for a wireless network. Yet much of the public discussion around cracking is in relation to large-scale password leaks, which are still the most common test used to measure cracking efficiency. I’d suggest the next challenge for the password cracking community is developing targeted cracking tools which incorporate information about the owner of the target password (such as the target’s email address or information which can be extracted from search engines). Perhaps this could be the goal of the next public cracking contest?
This post is the second in a two part series on password cracking. On a personal note, I’ve completed my PhD dissertation on guessing statistics and will soon be joining Google to take on new research challenges. This post reflects my opinions only and not those of my future employer.
Thanks to Moxie Marlinspike, Richard Clayton and Sören Preibusch for reviewing a draft of this post.
You argue that algorithms aren’t needed if you only have a limited number of attempts because you can just take a static list of passwords to try. However, in many real-world cases the attacker has not only the password hash but also some kind of contextual information such as the associated username/email address, the name of the site the hashes come from (or address/names/telephone numbers etc. in case of WPA2-PSK) or even a custom wordlist generated by crawling the company website. Using this contextual information can significantly improve cracking efficiency but requires some algorithmic support to combine the contextual information with a general-purpose wordlist.