Gathering Public Information - jude-lindale/Wiki GitHub Wiki

The amount of information you can gather about an organization’s business and information systems from the Internet is staggering. To see for yourself, use the techniques outlined in the following sections to gather information about your own organization.

Social media

Social media sites are the new means for businesses to interact online. Perusing the following sites can provide untold details on any business and its people:

As we’ve all witnessed, employees are often very forthcoming about what they do for work, details about their business, and even what they think about their bosses — especially after throwing back a few when their social filters have gone off track! I’ve also found interesting insights based on what people say about their former employers at Glassdoor (www.glassdoor.com/Reviews/index.htm).

Web search

Performing a web search or simply browsing your organization’s website can turn up the following information:

  • Employee names and contact information
  • Important company dates
  • Incorporation filings
  • Securities and Exchange Commission (SEC) filings (for public companies)
  • Press releases about physical moves, organizational changes, and new products
  • Mergers and acquisitions
  • Patents and trademarks
  • Presentations, articles, webcasts, or webinars (which often reveal sensitive information — often ironically labeled confidential)

With Google, you can search the Internet in several ways:

  • Typing keywords: This kind of search often reveals hundreds and sometimes millions of pages of information — such as files, phone numbers, and addresses — that you never guessed were available.
  • Performing advanced web searches: Google’s advanced search options can find sites that link back to your company’s website. This type of search often reveals a lot of information about partners, vendors, clients, and other affiliations.
  • Using switches to dig deeper into a website: If you want to find a certain word or file on your website, simply enter a line like one of the following into Google:

site:www.your_domain.com keyword

site:www.your_domain.com filename

You can even do a generic file-type search across the Internet to see what turns up:

filetype:swf company_name

Use the following search to hunt for PDF documents containing sensitive information that can be used against your business:

filetype:pdf company_name confidential

Web crawling

Web-crawling utilities, such as HTTrack Website Copier (www.httrack.com), can mirror your website by downloading every publicly accessible file from it, similar to the way a web vulnerability scanner crawls the website it’s testing. Then you can inspect that copy of the website offline, digging into the following:

  • The website layout and configuration
  • Directories and files that may not otherwise be obvious or readily accessible
  • The HTML and script source code of web pages
  • Comment fields

Comment fields often contain useful information such as the names and email addresses of the developers and internal IT personnel, server names, software versions, internal IP addressing schemes, and general comments about how the code works. In case you’re interested, you can prevent some types of web crawling by creating Disallow entries in your web server’s robots.txt ou can even enable web tarpitting in certain firewalls and intrusion prevention systems. Crawlers (and attackers) that are smart enough, however, can find ways around these controls.

Websites

The following websites may provide specific information about an organization and its employees:

Government and business websites:

The website for your state’s secretary of state or a similar organization can offer incorporation and corporate-officer information.

Background checks and other personal information: