Information Gathering Part I: TheHarvester
Introduction to Information Gathering
During the earliest stages of either a penetration test or a hack, gathering as much information about an organization or a person as possible is the very first thing that we’ll want to do. This can be a very laborious and manual process that oftentimes isn’t the most fun. Fortunately, there are a couple of tools, namely theHarvester and Recon-ng, which can make that process much faster and much easier for us. Both of these tools comes installed with Kali Linux by default.
Disclaimer: None of the information gathering in this article is illegal, as we are basically referencing web registration data, public search engines, DNS zone data, the WHOIS database, etc. At no time during this process are we going to be connecting to or attempting to connect to any private server information. By specifying the domain name(s) of remote hosts, we are simply using that information to query WHOIS and other services for public information they have on those systems. Again, although gathered information is not illegal to obtain or hold, what you do with the information you find, however, can illegal. Don’t be stupid, and don’t get yourself into trouble.
Setting Up theHarvester
theHarvester comes installed by default in Kali Linux, and requires no configuration files or API keys to set up. It’s a very simple program that takes only a few parameters to work its magic.
theHarvester – Syntax and Usage
theHarvester is a simple tool that is very easy to use and can produce some great results. We can feed theHarvester a domain name, and then tell it what services we want it to use to return results. Fortunately, the writers did provide a way to tell theHarvester to use all of the search services it knows about, which is the command switch that I almost always use. For this example, we’ll pick on Pixar. Syntax is as follows:
theharvester -d pixar.com -b all
The -d switch specifies the domain, and the -b switch tells theHarvester which search engine to use. In this case, all of them. That’s it.
theHarvester will then gather data, organize it, and spit it out in a very readable format. Optionally, you can pipe the output to file to parse later, if you so choose.
theHarvester starts by searching for information. Even if the -b all parameter is specified, the application runs very quickly.
One of the important returned items from theHarvester is a list of email addresses that are associated with the domain. Keep in mind that these aren’t all email addresses associated, but they are ones that theHarvester was able to find during its search. For a public company like Pixar, this is a very long list.
For a smaller company, you’re likely to get email addresses of only service accounts and (more importantly) network administrator or web administrator accounts. Since most organizations also use their email addresses as their login ids, you’ve also just got a list of potential login information.
The second important returned item from theHarvester is IP and hostname information. Obviously this is important information to know when we’re gathering. In this case, we’re shown some sub domain information. I assumed that www.pixar.com existed, but I didn’t know about systems.pixar.com or graphics.pixar.com in this instance.
It may also point you to other domains that are owned by the organization who own the specific domain that you specified. You have to be careful here though, since if the website is on a server with a bunch of other hosted websites, all of the sites typically show up with the same IP address and have nothing to do with each other. This is because typically a web host will host multiple sites on a single server and redirect incoming connections to individual sites based on hostname data specified in the request instead of destination IP.
This seems to be the case in this instance, since I checked out www.fightingeel.com from the list, and it appears to be a womens’ clothing site.
As you can see, theHarvester is a very easy tool to use, and aside from the above, provides command line switches that allow us to do things such as output to an XML or HTML file. Of course you can always pipe the output from theHarvester to a text file, which is the simplest way to interact with its output data, but my not be presentable if you’re presenting your findings to someone else.
In the next part to this two-part series, we’ll be looking at Recon-ng, which is a much more complicated and powerful tool to use. Don’t worry though, we can manually import some of the data we found with theHarvester into Recon-ng to improve our searches within Recon-ng.
If you enjoyed this tutorial and would like to see more, please feel free to share this article on social media, comment below letting me know what else you’d like to see, and follow me on Twitter @JROlmstead.