I recently stumbled upon a large collection of account names and passwords that have been harvested from the various data breaches over the past 10 or so years. This data comes from Yahoo, Target, Facebook, Hotmail, Twitter, MySpace, hacked PHPBB instances, and many, many more places. Each account name is in the form of an email address, and all passwords have been cracked and are in plain text. There are over 1.4 billion of them in total.
The archive has been made available for download via BitTorrent as far back as early 2017. Someone has taken the time to break the entire archive in to multiple flat text files, and has written scripts to search through the archive for specific account names by email address.
For those who want to download the entire 40GB+ archive, here’s the torrent magnet link.
How Is This Information Useful?
It should be noted that the passwords provided here are not necessarily the passwords of the email accounts themselves (although they could be), rather, the email address provided were harvested as the username for the breached entity in question, and the password provided is the password that had been used with that username at the time of the breach.
This information can still be helpful for an attacker, especially if a user re-uses their passwords across many sites and/or never changes their passwords. Lack of password diversity and poor password maintenance are both huge no-no’s, but we all know that most people guilty of these things. This information could also be helpful to generate a possible attack wordlist if the user continues to use the same context with new passwords as they have with old passwords. For example, simply incrementing a number (Summer2015, Summer2016, Summer2017, etc.)
Passwords in this list that belong to users’ work accounts can provide an attacker with insight in to how the organization formats their internal usernames, since Active Directory usernames are typically in the same context as the company’s email addresses. Users’ first and last names can be harvested from work-heavy sites like LinkedIn and then those names can be formatted to make attacks like password spraying much easier and more accurate, greatly increasing the chance of success when attacking a web-facing Active Directory-integrated portal, like Outlook Web Access.
Searching the Archive
Obviously you’re not going to want to manually open each and every text file to search for information – there are over 1900 of them. Luckily, the entities who compiled this information have written a little BASH script that’s included with the download. This script will let you search for a specific email address to see if it’s been compromised, and will show you the password that was used with it at the time of the compromise.
You can execute this script in either Linux or macOS, or in Windows if you install Cygwin. Although the script limits your search to a single, specific email address, it gets the job done when all you need is a quick and dirty search.
If the email address provided matches, it will return a result in the form of the username and password separated by a colon. The result returns almost instantaneously due to the way the script is written in conjunction with the way the text files are laid out within alphabetically ordered directories.
Obviously this script will limit you to a single address, which may appear in the output multiple times if it was involved in multiple breaches, true, but what if you wanted to perform a more powerful search?
Searching Better with grep
Although it takes much longer, you can use grep to search with a bit more power. grep is a standard command on macOS and Linux, but can be used in Windows using Cygwin.
For this we’ll use two flags, -a and -R. The -a flag will tell grep to treat binary files as text files. None of the files in the breach are actually binary files, but I found that some of them contain international and non-standard characters, which confuses grep and makes it think they are binaries. The second flag, -R, tells grep to keep digging recursively through all of the files in all of the directories below the one we specify.
Now, instead of searching for simply and email address / username in its entirety, we can search for any string we want. Last names, passwords, parts of passwords, entire domains, etc.
Here, I performed a quick search for accounts @noway.com, which is a domain that I usually put in when I know someone is just asking for an email address to spam me and I’m not really trying to establish a real account. I quickly killed the command after just a few seconds of searching in an effort to not expose any real data.
Note that you can put anything you want inside the single quotes. You can use this to search for passwords that you yourself use to see if anyone has been caught using them (in which case they’re probably included in one of the massive wordlists available online), and you can use something like ‘@domain.com‘ to search all of the accounts breached for a certain company / domain.
If you are performing a pen test on a company, this would be a good way to see if any of their users have been breached before. From the output you could learn the account naming convention for their AD Domain (account names are typically also their email addresses). You could also find users to target based on the weakness of their passwords, say they’re still at the organization. Someone who got breached using Summer2015 could possibly using Winter2018 or Winter2019 now (it’s currently 1/30/2019 at the time of this writing. You get the idea.