If your password is in the file rockyou.txt then it’s a bad password. Password cracking software will find it instantly. (Use long, randomly generated passwords; staying off the list of worst passwords is necessary but not sufficient for security.)
The rockyou.txt
file currently contains 14,344,394 bad passwords. I poked around in the file and this post reports some things I found.
To make things more interesting, I made myself a rule that I could only use command line utilities.
Pure numeric passwords
I was curious how many of these passwords consisted only of digits so I ran the following.
grep -P '^\d+$' rockyou.txt | wc -l
This says 2,346,744 of the passwords only contain digits, about 1 in 6.
Digit distribution
I made a file of digits appearing in the passwords
grep -o -P '\d' rockyou.txt > digits
and looked at the frequency of digits.
for i in 0 1 2 3 4 5 6 7 8 9 do grep -c $i digits done
This is what I got:
5740291 6734380 5237479 3767584 3391342 3355180 3118364 3100596 3567258 3855490
The digits are distributed more evenly than I would have expected. 1’s are more common than other digits, but only about twice as common as the least common digits.
Longest bad passwords
How long is the longest bad password? The command
wc -L rockyou.txt
shows that one line in the file is 285 characters long. What is this password? The command
grep -P '.{285}' rockyou.txt
shows that it’s some HTML code. Nice try whoever thought of that, but you’ve been pwned.
A similar search for all-digit passwords show that the longest numeric passwords are 255 digits long. One of these is a string of 255 zeros.
Dictionary words
A common bit of advice is to not choose passwords that can be found in a database. That’s good advice as far as it goes, but it doesn’t go very far.
I used the comm utility to see how many bad passwords are not in the dictionary by running
comm -23 sorted dict | wc -l
and the answer was 14,310,684. Nearly all the bad passwords are not in a dictionary!
(Here sorted
is a sorted version of the rockyou.txt
file; I believe the file is initially sorted by popularity, worst passwords first. The comm
utility complained that my system dictionary isn’t sorted, which I found odd, but I sorted it to make comm
happy and dict
is the sorted file.)
Curiously, the command
comm -13 sorted dict | wc -l
shows there are 70,624 words in the dictionary (specifically, the american-english
file on my Linux box) that are not on the bad password list.
Smallest ‘good’ numeric password
What is the smallest number not in the list of pure numeric passwords? The following command strips leading zeros from purely numeric passwords, sorts the results as numbers, removes duplicates, and stores the results in a file called nums
.
grep -P '^\d+$' rockyou.txt | sed 's/^0\+//' | sort -n | uniq > nums
The file nums
begins with a blank. I removed this with sed
.
sed -i 1d nums
Next I used awk
to print instances where the line number does not match the line in the file nums
.
awk '{if (NR-$0 < 0) print $0 }' nums | less
The first number this prints is 61. This means that the first line is 1, the second line is 2, and so on, but the 60th line is 61. That means 60 is missing. The file rockyou.txt
does not contain 60. You can verify this: the command
grep '^60$' rockyou.txt
returns nothing. 60 is the smallest number not in the bad password file. There are passwords that contain ’60’ as a substring, but just 60 as a complete password is not in the file.
Very interesting stuff! As an alternative to using “sort -n | uniq”, you can remove duplicates in one pass through the data, and preserve the original record order with the following nifty awk one-liner (from the gawk manual: https://www.gnu.org/software/gawk/manual/gawk.html#History-Sorting):
awk ‘! seen[$0]++’
Very clever!