The Practical Usage of CeWL
CeWL is a very useful utility that is written in Ruby. It allows you to create custom wordlists, which are used for cracking passwords, based on a specified URL. The utility will “crawl” the specified site, as well as any links on the site to the depth you specify, and return a list of words from that site. You can filter the returned wordlist to include only words of a certain length of your choosing. It comes installed with Kali Linux by default.
The real-world use of an application like this is for targeted password list generation. When cracking passwords, you do have the option to throw a multi-gigabyte wordlist of known passwords at a hash. You also have the ability to modify each of those line items on the fly with something like rules in Hashcat, which we learned about here. Both of those approaches may work, but what if you want something more targeted to a specific person, organization, or occupation (law, healthcare, etc.) when attempting to crack their hashes? That’s where CeWL comes in.
Humans are creatures of habit, and tend to use real words as a basis for their passwords. Often, they tend to use words based on what they like, or words that are associated with hobbies, or their occupation. Mining this information when targeting a specific individual is typically pretty simple, since everyone tends to publicly “like” things on social media. Little do they know, these likes paint a very vivid picture of what their passwords may be to a potential attacker. Favorite shows or movies, favorite bands, hobbies, occupation, etc. can all be found on a social media profile, and there’s a good chance with most people that a word related to one of those things could be the basis for a password. (Bonus tip: Don’t forget to manually throw maiden names in the wordlist file as well!)
Similarly, people who work in a specific organization may use basis words that revolve around their job. If you aren’t targeting an individual person, but instead of list of hashes from multiple people who work at the same place, say for example, a hospital, you may want to quickly generate a list of words that are common in the medical field. CeWL can help us out there as well.
CeWL Parameters and Planning
Now that we know what CeWL is, and what it’s used for, let’s see how it works. CeWL is able to take several parameters, all of which are listed with the command:
In its basic form, CeWL is a very powerful tool that can be used with a fairly small set of parameters. Of use to us in this example will be to set a minimum word size (5 letters is a good place to start), an output file that will contain all of our words, the depth of links that we want CeWL to crawl (if not specified this default to 2 links deep), and a URL to crawl.
A couple of parameters to really think about here are the word size and the depth of links that we’re going to crawl. Personally, I like to start with a word size of 5 at a minimum. Typically, passwords are required to be 7 or 8 letters in length, so a user is fairly unlikely to pick a 3 or 4 letter word as a root word to their password. 5 characters is a good place to start, and to make the list smaller, you could start with 6 or 7 letters if you thought that’d be best. Experimentation is king here and everyone has their own technique.
The second big decision to make with CeWL is how many links deep we want to crawl. Say, for example, if you leave this at the default of two links. Specifying two links deep essentially tells CeWL “Crawl the page I specify… all of the links on that page… and then all of the links on each of those pages, and then stop.” As you can imagine, depending on the initial page that you specify, your list can get very large very quickly. Two links is a good place to start, but the operation may take a very long time. Also, if you go beyond two links deep, mind the file size and how off-topic the words in your wordlist may grow. If your initial page has a high number of links, you may want to start with 0 or at most 1 for depth.
Another parameter that you may include is the -o flag. This takes no value, and if set, simply tells CeWL that it’s ok to follow off-site links. If this isn’t specified, CeWL will keep its searches on-site only with the URL you specify. So, if for example you specify a Wikipedia.org URL to CeWL, CeWL won’t follow the many off-site reference links at the bottom of the page. This may or may not be what you want. Again, experimentation is key.
Running CeWL to Generate a Wordlist
For this example, let’s assume that we’re targeting an individual. We’ve obtained the password hash to their user account, WPA2 key, etc. and we know by browsing their open social media profiles that they are infatuated with the series Game of Thrones. We might use a command like this to generate a CeWL wordlist.
cewl -m 5 -w wordlist.txt -d 0 -v https://en.wikipedia.org/wiki/Game_of_Thrones
In this command, -m 5 specifies that we want our gathered words to be no less than 5 characters in length. -w followed by a text file name provides a place for our wordlist to be captured. If this isn’t specified, CeWL will only display monitor output and no file will be created.
-d 0 (default is 2) explicitly specifies that we want CeWL to crawl 0 links deep, meaning we only want to gather words from this specific page. The -v flag followed by the URL above tells CeWL that we want it to crawl the Wikipedia Game of Thrones page in order to gather words.
When the command is executed, we’ll see activity on the console. Simply wait until it’s done and a wordlist.txt file will be created. Note that if you use CTRL+C to kill the process, only a 0 byte file will be created.
Since we didn’t crawl a bunch of links in this example, the scan ended almost immediately, and generated a 41K file with 4,735 possible passwords. Not bad for very little work. If you check out the file using the more command, it contains many important words from the series that a fan may use as their password. Character names, location names, words describing plot points in the show, etc.
As you can see, this is a very fast way to gather custom word lists based on the interests of a targeted user or group. When combined with rules in Hashcat, the result of the effectiveness is far greater than the sum of its parts.
Using Bash Commands to Clean Up Wordlists
We’ve seen how powerful the CeWL tool can be in generating wordlists, but what if you have multiple wordlists for each target? You’ve gone through and dumped pages into wordlists for all of their favorite bands, movies, shows, etc. Concatenating several text files into a single text file is as simple as one command:
cat file1.txt file2.txt file3.txt file4.txt > merged.txt
Once your wordlists are all merged together, you can clean the merged list up a bit by words in alphabetical order, as well as removing all duplicates from the list.
sort wordlist.txt | uniq -u > cleaned_wordlist.txt
Now that we’ve seen how powerful CeWL is and how useful it is to generate custom wordlists, you should experiment with it. Combining the power of CeWL with a little research about the likes and interests of the person or group that you’re targeting is a great way to create a wordlist. When combined with rules in Hashcat, the result is a very powerful targeted attack vector.