An essential aspect of footprinting is identifying the level of risk associated with the organization’s publicly accessible information. Footprinting, the first step in ethical hacking, refers to the process of collecting information about a target network and its environment. Using footprinting, you can find a number of opportunities to penetrate and assess the target organization’s network. After you complete the footprinting process in a methodological manner, you will obtain the blueprint of the security profile of the target organization. Here, the term “blueprint” refers to the unique system profile of the target organization acquired by footprinting.
There is no single methodology for footprinting, as information can be traced in a number of ways. However, the activity is important, as you need to gather all the crucial information about the target organization before beginning the hacking phase. For this reason, footprinting needs to be carried out in an organized manner. The information gathered in this step helps in uncovering vulnerabilities existing in the target network and in identifying different ways of exploiting these vulnerabilities.
Footprinting can be categorized into passive footprinting and active footprinting.
Passive footprinting involves gathering information about the target without direct interaction. It is mainly useful when the information gathering activities are not to be detected by the target. Performing passive footprinting is technically difficult, as active traffic is not sent to the target organization from a host or anonymous hosts or services over the Internet. We can only collect archived and stored information about the target using search engines, social networking sites, and so on. Passive footprinting techniques include:
- Finding information through search engines
- Finding the Top-level Domains (TLDs) and sub-domains of a target through web services
- Collecting location information on the target through web services
- Performing people search using social networking sites and people search services
- Gathering financial information about the target through financial services
- Gathering infrastructure details of the target organization through job sites
- Collecting information through deep and dark web footprinting
- Determining the operating systems in use by the target organization
- Performing competitive intelligence
- Monitoring the target using alert services
- Gathering information using groups, forums, blogs, and NNTP Usenet newsgroups
- Collecting information through social engineering on social networking sites
- Extracting information about the target using Internet archives
- Gathering information using business profile sites
- Monitoring website traffic of the target
- Tracking the online reputation of the target
Active footprinting involves gathering information about the target with direct interaction. In active footprinting, the target may recognize the ongoing information gathering process, as we overtly interact with the target network. Active footprinting requires more preparation than passive footprinting, as it may leave traces that may alert the target organization. Active footprinting techniques include:
- Querying published name servers of the target
- Searching for digital files
- Extracting website links and gathering wordlists from the target website
- Extracting metadata of published documents and files
- Gathering website information using web spidering(网络爬虫) and mirroring tools
- Gathering information through email tracking
- Harvesting email lists
- Performing Whois lookup
- Extracting DNS information
- Performing traceroute analysis
- Performing social engineering
The major objectives of footprinting include collecting the network information, system information, and organizational information of the target. By conducting footprinting across different network levels, you can gain information such as network blocks, specific IP addresses, employee details, and so on. Such information can help attackers in gaining access to sensitive data or performing various attacks on the target network.
Such information about an organization is available from its website. In addition, you can query the target’s domain name against the Whois database and obtain valuable information. The information collected includes:
- Employee details (employee names, contact addresses, designations, and work experience)
- Addresses and mobile/telephone numbers
- Branch and location details
- Partners of the organization
- Weblinks to other company-related sites
- Background of the organization
- Web technologies
- News articles, press releases, and related documents
- Legal documents related to the organization
- Patents(专利) and trademarks related to the organization
Attackers can access organizational information and use such information to identify key personnel and launch social engineering attacks to extract sensitive data about the entity.
You can gather network information by performing Whois database analysis, trace routing, and so on. The information collected includes:
- Domain and sub-domains
- Network blocks
- Network topology, trusted routers, and firewalls
- IP addresses of the reachable systems
- Whois records
- DNS records and related information
You can gather system information by performing network footprinting, DNS footprinting, website footprinting, email footprinting, and so on. The information collected includes:
- Web server OS
- Location of web servers
- Publicly available email addresses
- Usernames, passwords, and so on.
To build a hacking strategy, attackers need to gather information about the target organization’s network. They then use such information to locate the easiest way to break through the organization’s security perimeter. As mentioned previously, the footprinting methodology makes it easy to gather information about the target organization; this plays a vital role in the hacking process. Footprinting helps to:
Performing footprinting on the target organization gives the complete profile of the organization’s security posture. Hackers can then analyse the report to identify loopholes in the security posture of the target organization and build a hacking plan accordingly.
By using a combination of tools and techniques, attackers can take an unknown entity (for example, XYZ Organization) and reduce it to a specific range of domain names, network blocks, and individual IP addresses of systems directly connected to the Internet, as well as many other details pertaining to its security posture.
A detailed footprint provides maximum information about the target organization. It allows the attacker to identify vulnerabilities in the target systems to select appropriate exploits. Attackers can build their own information database about the security weaknesses of the target organization. Such a database can then help in identifying the weakest link in the organization’s security perimeter(周边).
Combining footprinting techniques with tools such as Tracert allows the attacker to create diagrammatic representations of the target organization’s network presence. Specifically, it allows attackers to draw a map or outline of the target organization’s network infrastructure to know about the actual environment that they are going to break into. A network map will depict(描绘) the attacker’s understanding of the target’s Internet footprint. These network diagrams can guide the attacker in performing an attack.
Attackers perform footprinting as the first step of any attacks on information systems. In this phase, attackers attempt to collect valuable system-level information such as account details, operating system and other software versions, server names, database schema details, and so on, which will be useful in the hacking process. The following are assorted threats made possible through footprinting:
Without using any intrusion methods, hackers directly and indirectly collect information through persuasion and other means. Hackers gather crucial information from willing employees who are unaware of the hackers’ intent.
Footprinting enables an attacker to perform system and network attacks. Thus, attackers can gather information related to the target organization’s system configuration, the operating system running on the machine, and so on. Using this information, attackers can find vulnerabilities in the target system and then exploit such vulnerabilities. They can then take control of a target system or the entire network.
Information leakage poses a threat to any organization. If sensitive information of an entity falls into the hands of attackers, they can mount an attack based on the information or alternatively use it for monetary benefit.
Through footprinting, hackers can access the systems and networks of the organization and even escalate the privileges up to admin levels, resulting in the loss of privacy for the organization as a whole and for its individual personnel.
Corporate espionage is a central threat to organizations, as competitors often aim to attempt to secure sensitive data through footprinting. Through this approach, competitors can launch similar products in the market, alter prices, and generally undermine(逐渐削弱) the market position of a target organization.
Footprinting can have a major effect on organizations such as online businesses and other e-commerce websites as well as banking and finance-related businesses. Billions of dollars are lost every year due to malicious attacks by hackers.
The footprinting methodology is a procedure for collecting information about a target organization from all available sources. It involves gathering information about a target organization, such as URLs, locations, establishment details, number of employees, specific range of domain names, contact information, and other related information. Attackers collect this information from publicly accessible sources such as search engines, social networking sites, Whois databases, and so on. Footprinting techniques include:
- Footprinting through search engines
- Footprinting through web services
- Footprinting through social networking sites
- Website footprinting
- Email footprinting
- Whois footprinting
- DNS footprinting
- Network footprinting
- Footprinting through social engineering
Search engines are the main sources of key information about a target organization. They play a major role in extracting critical details about a target from the Internet. Search engines use automated software, i.e., crawlers, to continuously scan active websites and add the retrieved results in the search engine index that is further stored in a massive database. When a user queries the search engine index, it returns a list of Search Engine Results Pages (SERPs). These results include web pages, videos, images, and many different file types ranked and displayed according to their relevance. Many search engines can extract target organization information such as technology platforms, employee details, login pages, intranet portals, contact information, and so on. The information helps the attacker in performing social engineering and other types of advanced system attacks. Also, A Google search could reveal submissions to forums by security personnel, disclosing the brands of firewalls or antivirus software used by the target. This information helps the attacker in identifying vulnerabilities in such security controls. Attackers can use advanced search operators available with these search engines and create complex queries to find, filter, and sort specific information regarding the target. Search engines are also used to find other sources of publicly accessible information. For example, you can type “top job portals” to find major job portals that provide critical information about the target organization. As an ethical hacker, if you find any deleted pages/information about your company in SERPs or the search engine cache, you can request the search engine to remove the pages/information from its indexed cache.
Google hacking refers to the use of advanced Google search operators for creating complex search queries to extract sensitive or hidden information. The accessed information is then used by attackers to find vulnerable targets. Footprinting using advanced Google hacking techniques involves locating specific strings of text within search results using advanced operators in the Google search engine. Advanced Google hacking refers to the art of creating complex search engine queries. Queries can retrieve valuable data about a target company from Google search results. Through Google hacking, an attacker tries to find websites that are vulnerable to exploitation. Additionally, attackers can use the Google Hacking Database (GHDB), a database of queries, to identify sensitive data. Google operators help in finding the required text and avoiding irrelevant data. Using advanced Google operators, attackers can locate specific strings of text such as specific versions of vulnerable web applications. When a query without advanced search operators is specified, Google traces the search terms in any part of the webpage, including the title, text, URL, digital files, and so on. To confine a search, Google offers advanced search operators. These search operators help to narrow down the search query and obtain the most relevant and accurate output.
The syntax to use an advanced search operator is as follows:
Some popular Google advanced search operators include:
This operator restricts search results to the specified site or domain. For example, the query of
games site: www.certifiedhacker.com
give information on games from the certifiedhacker.com site.
This operator restricts results to only the pages containing all the query terms specified in the URL. For example, the query of
allinurl: google career
returns only pages containing the words “google” and “career” in the URL.
This operator restricts the results to only the pages containing the specified word in the URL. For example, the query of
inurl: copy site:www.google.com
returns only Google pages in which the URL has the word “copy.”
This operator restricts results to only the pages containing all the query terms specified in the title. For example, the query of
allintitle: detect malware
returns only pages containing the words “detect” and “malware” in the title.
This operator restricts results to only the pages containing the specified term in the title. For example, the query of
malware detection intitle:help
returns only pages that have the term “help” in the title, and the terms “malware” and “detection” anywhere within the page.
This operator restricts results to only the pages containing the query terms specified in the anchor text on links to the page. For example, the query of
returns only pages with anchor text on links to the pages containing the word “Norton” and the page containing the word “Anti-virus.”
This operator restricts results to only the pages containing all query terms specified in the anchor text on links to the pages. For example, the query of
allinanchor: best cloud service provider
returns only pages for which the anchor text on links to the pages contains the words “best,” “cloud,” “service,” and “provider.”
This operator displays Google’s cached version of a web page instead of the current version of the web page. For example, the query of
will show Google’s cached version of the Electronic Frontier Foundation home page.
This operator searches websites or pages that contain links to the specified website or page. For example, the query of
finds pages that point to Google Guide’s home page. Note: According to Google’s documentation, “you cannot combine a link: search with a regular keyword search.” Also note that when you combine link: with another advanced operator, Google may not return all the pages that match.
This operator displays websites that are similar or related to the URL specified. For example, the query of
provides the Google search engine results page with websites similar to microsoft.com.
This operator finds information for the specified web page. For example, the query of
provides information about the national hotel directory GotHotel.com home page.
This operator finds information for a specific location. For example, the query of
location: 4 seasons restaurant
will give you results based on the term “4 seasons restaurant.”
This operator allows you to search for results based on a file extension. For Example, the query of
will provide jpg files based on jasmine.
An attacker can create complex search engine queries to filter large amounts of search results to obtain information related to computer security. The attacker uses Google operators that help locate specific strings of text within the search results. Thus, the attacker can not only detect websites and web servers that are vulnerable to exploitation but also locate private, sensitive information about others, such as credit card numbers, social security numbers, passwords, and so on. Once a vulnerable site is identified, attackers try to launch various possible attacks, such as buffer overflow and SQL injection, which compromise information security. Examples of sensitive information on public servers that an attacker can extract with the help of Google Hacking Database (GHDB) queries include:
- Error messages that contain sensitive information
- Files containing passwords
- Sensitive directories
- Pages containing logon portals
- Pages containing network or vulnerability data, such as IDS, firewall logs, and configurations
- Advisories and server vulnerabilities
- Software version information
- Web application source code
- Connected IoT devices and their control panels, if unprotected
- Hidden web pages such as intranet and VPN services
The Google Hacking Database (GHDB) is an authoritative source for querying the ever-widening scope of the Google search engine. In the GHDB, you will find search terms for files containing usernames, vulnerable servers, and even files containing passwords. The Exploit Database is a Common Vulnerabilities and Exposures (CVE) compliant archive of public exploits and corresponding vulnerable software, developed for use by penetration testers and vulnerability researchers. Using GHDB dorks, attackers can rapidly identify all the publicly available exploits and vulnerabilities of the target organization’s IT infrastructure. Attackers use Google dorks in Google advanced search operators to extract sensitive information about the target, such as vulnerable servers, error messages, sensitive files, login pages, and websites.
Google Hacking Database Categories include Footholds; Files Containing Usernames; Sensitive Directories; Web Server Detection; Vulnerable Files; Vulnerable Servers; Error Messages; Files Containing Juicy Info; Files Containing Passwords; Sensitive Online Shopping Info; Network or Vulnerability Data; Pages Containing Login Portals; Various Online Devices; Advisories and Vulnerabilities.
Google hacking involves the implementation of advanced operators in the Google search engine to match specific strings of text within the search result. These advanced operators help refine searches to expose sensitive information, vulnerabilities, and passwords. You can use these Google hacking operators or Google dorks for footprinting VoIP and VPN networks. Thus, you can extract information such as pages containing login portals, VoIP login portals, directories with keys of VPN servers, and so on.
The following tables summarize some of the Google hacking operators or Google dorks to obtain specific information related to VoIP and VPN footprinting, respectively.
Google search queries for VPN footprinting
Video search engines are Internet-based search engines that crawl(爬行) the web for video content. These video search engines either provide the functionality of uploading and hosting video content on their own web servers or parse video content that is hosted externally. The video content obtained from video search engines is of high value, as it can be used for gathering information about the target. Video search engines such as YouTube, Google videos, Yahoo videos, and Bing videos allow attackers to search for video content based on the format type and duration. After searching for videos related to the target using video search engines, an attacker can further analyse the video content to gather hidden information such as the time/date and thumbnail(缩略图) of the video. Using video analysis tools such as YouTube DataViewer, EZGif, and VideoReverser.com, an attacker can reverse a video or convert a video into text and other formats to extract critical information about the target.
Meta search engines are a different type of search engines that use other search engines (Google, Bing, Ask.com, etc.) to produce their own results from the Internet in a very short time span. These search engines do not have their own search indexes; instead, they take the inputs from the users and simultaneously(同时地) send out the queries to the third-party search engines to obtain the results. Once sufficient results are gathered, they are ranked according to their relevance and presented to the user through the web interface. Meta search engines also include a functionality whereby(凭此) identical search results are filtered out so that if the user searches the same query again, then it will not display the same results twice. A meta search engine is advantageous compared to simple search engines, as it can retrieve more results with the same amount of effort. Using meta search engines, such as Startpage, MetaGer, and eTools.ch, attackers can send multiple search queries to several search engines simultaneously and gather substantially detailed information such as information from shopping sites (Amazon, eBay, etc.), images, videos, blogs, news, and articles from different sources. Further, meta search engines also provide privacy to the search engine user by hiding the user’s IP address.
FTP search engines are used to search for files located on FTP servers that contain valuable information about the target organization. Many industries, institutions, companies, and universities use FTP servers to store large file archives and other software that are shared among their employees. A special client such as FileZilla (https://filezilla-project.org) can be used to access the FTP accounts; it also supports functionalities such as uploading, downloading and renaming files. Although FTP servers are usually protected with passwords, many servers are left unsecured and can be accessed through web browsers directly. Using FTP search engines such as NAPALM FTP Indexer, Global FTP Search Engine, and FreewareWeb FTP File Search, attackers can search for critical files and directories containing valuable information such as business strategies, tax documents, employee’s personal records, financial records, licensed software, and other confidential information.
Listed below are some of the important advanced Google search queries to find FTP servers:
Internet of Things (IoT) search engines crawls the Internet for IoT devices that are publicly accessible. Through a basic search on these search engines, an attacker can gain control of Supervisory Control and Data Acquisition (SCADA) systems, traffic control systems, Internet-connected household appliances, industrial appliances, CCTV cameras, etc. Many of these IoT devices are unsecured, i.e., they are without passwords or they use the default credentials, which can be exploited easily by attackers. With the help of IoT search engines such as Shodan, Censys, and Thingful, attackers can obtain information such as the manufacturer details, geographical location, IP address, hostname, and open ports of the target IoT device. Using this information, the attacker can establish a back door to the IoT devices and gain access to them to launch further attacks.
As shown in the screenshot, attackers can use Shodan to find all the IoT devices of the target organization that are having open ports and services.
Web services such as people search services can provide sensitive information about the target. Internet archives may also provide sensitive information that has been removed from the World Wide Web (WWW). Social networking sites, people search services, alerting services, financial services, and job sites provide information about a target such as infrastructure details, physical location, and employee details. Moreover, groups, forums(论坛), and blogs can help attackers in gathering sensitive information about a target, such as public network information, system information, and personal information. Using this information, an attacker may build a hacking strategy to break into the target organization’s network and carry out other types of advanced system attacks.
A company’s top-level domains (TLDs) and sub-domains can provide a large amount of useful information to an attacker. A public website is designed to show the presence of an organization on the Internet. It is available for free public access. It is designed to attract customers and partners. It may contain information such as organizational history, services and products, and contact information. The target organization’s external URL can be located with the help of search engines such as Google and Bing.
The sub-domain is available to only a few people. These persons may be employees of an organization or members of a department. In many organizations, website administrators create sub-domains to test new technologies before deploying them on the main website. Generally, these sub-domains are in the testing stage and are insecure; hence, they are more vulnerable to various exploitations. Sub-domains provide insights into the different departments and business units in an organization. Identifying such sub-domains may reveal critical information regarding the target, such as the source code of the website and documents on the webserver. Access restrictions can be applied based on the IP address, domain or subnet, username, and password. The sub-domain helps to access the private functions of an organization. Most organizations use common formats for sub-domains. Therefore, a hacker who knows the external URL of a company can often discover the sub-domain through trial and error, or by using a service such as Netcraft. You can also use the advanced Google search operator shown below to identify all the sub-domains of the target:
Netcraft provides Internet security services, including anti-fraud and anti-phishing services, application testing, and PCI scanning. They also analyse the market share of web servers, operating systems, hosting providers and SSL certificate authorities, and other parameters of the Internet. As shown in the screenshot below, attackers can use Netcraft to obtain all the sub-domains related to the target domain.
Sublist3r is a Python script designed to enumerate(枚举) the subdomains of websites using OSINT. It enables you to enumerate subdomains across multiple sources at once. Further, it helps penetration testers and bug hunters in collecting and gathering subdomains for the domain they are targeting. It enumerates subdomains using many search engines such as Google, Yahoo, Bing, Baidu, and Ask. It also enumerates subdomains using Netcraft, VirusTotal, ThreatCrowd, DNSdumpster, and ReverseDNS.
Sublist3r [-d DOMAIN] [-b BRUTEFORCE] [-p PORTS] [-v VERBOSE] [-t THREADS] [-e ENGINES] [-o OUTPUT]
As shown in the screenshot, Sublist3r helps attackers in enumerating the subdomains of a target company from multiple sources at the same time.
Sublist3r also helps attackers in enumerating the subdomains of a target company with a specific port open. As shown in the screenshot, attackers search for subdomains of google.com (-d google.com) using the Bing search engine (-e Bing) with port 80 (-p 80) open.
Pentest-Tools Find Subdomains is an online tool used for discovering subdomains and their IP addresses, including network information and their HTTP servers. As shown in the screenshot, attackers search for sub-domains related to microsoft.com to obtain critical information about the target company domain, such as sub-domains, IP addresses, operating systems, servers used, technology used, web platform, and page titles.
Information such as the physical location of an organization plays a vital(至关重要的) role in the hacking process. Attackers can obtain this information using footprinting. In addition to the physical location, a hacker can also acquire information such as surrounding public Wi-Fi hotspots that may offer a way to break into the target organization’s network. Attackers with the knowledge of a target organization’s location may attempt dumpster diving, surveillance(监视), social engineering, and other non-technical attacks to gather more information. Once the attackers discern(识别) the location of the target, they can obtain detailed satellite images of the location using various sources available on the Internet such as Google Earth and Google Maps. The attackers can use this information to gain unauthorized access to buildings, wired and wireless networks, and systems.
The tools for finding the geographical location allow you to find and explore most locations on the earth. They provide information such as images of buildings, as well as their surroundings, including Wi-Fi networks. Tools such as Google Maps even locate entrances of the building, security cameras, and gates. These tools provide interactive maps, outline maps, satellite imagery, and information on how to interact with and create one’s own maps. Google Maps, Yahoo Maps, and other tools provide driving directions, traffic conditions, landmarks, and detailed address and contact information. Attackers may use tools such as Google Earth, Google Maps, and Wikimapia, to find or locate entrances to buildings, security cameras, gates, places to hide, weak spots in perimeter fences, and utility resources such as electricity connections, to measure the distance between different objects, and so on.
Searching for a particular person on a social networking website is fairly easy. Social networking services are online services, platforms, or sites that focus on facilitating the building of social networks or social relations among people. These websites contain information that users provide in their profiles. They help to directly or indirectly relate people to each other through various fields such as common interests, work location, and education.
Social networking sites allow people to share information quickly, as they can update their personal details in real time. Such sites allow users to update facts about upcoming or current events, recent announcements and invitations, and so on. Social networking sites are a great platform for finding people and their related information. Many social networking sites allow visitors to search for people without registering on the site; this makes people searching on social networking sites an easy and anonymous task. A user can search for a person using the name, email, or address. Some sites allow users to check whether an account is active, which then provides information on the status of the person being searched. Social networking sites such as Facebook, Twitter, LinkedIn, and Instagram allow you to find people by name, keyword, company, school, friends, colleagues, and the people living around them. Searching for people on these sites returns personal information such as name, position, organization name, current location, and educational qualifications. In addition, you can also find professional information such as company or business, current location, phone number, email ID, photos, videos and so on. Social networking sites such as Twitter are used to share advice, news, concerns, opinions, rumours(传闻), and facts. Through people searching on social networking services, an attacker can gather critical information that will help them in performing social engineering or other kinds of attacks.
Several online services and resources are available to gather valuable information about a target from one or more social media sites. These services allow attackers to discover most shared content across social media sites by using hashtags or keywords, track accounts and URLs on various social media sites, obtain a target’s email address, etc. This information helps attackers to perform phishing, social engineering, and other types of attacks. Attackers use tools such as BuzzSumo, Google Trends, Hashatit, and Ubersuggest to locate information on social media sites.
Conducting location search on social media sites such as Twitter, Instagram, and Facebook helps attackers to detect the geolocation of the target. This information further helps attackers to perform various social engineering and non-technical attacks. Many online tools such as Followerwonk, Hootsuite, and Sysomos are available to search for both geotagged and non-geotagged information on social media sites. Attackers search social media sites using these online tools using keywords, usernames, date, time, and so on.
Attackers use various tools such Sherlock, Social Searcher, and UserRecon to footprint social networking sites such as Twitter, Instagram, Facebook, and Pinterest to gather sensitive information about the target such as DOB, educational qualification, employment status, name of the relatives, and information about the organization that they are working for, including the business strategy, potential clients, and upcoming project plans.
As shown in the screenshot, attackers use Sherlock to search a vast number of social networking sites for a target username. This tool helps the attacker to locate the target user on various social networking sites along with the complete URL.
- Social Searcher
Social Searcher allows attackers to search for content in social networks in real time and provides deep analytics data. Attackers use this tool to track a target user on various social networking sites and obtain information such as complete URLs to their profiles, their postings, and other personal information.
You can use public record websites to find information about email addresses, phone numbers, house addresses, and other information. Many individuals use online people search services to find information about other people. Generally, online people search services such as pipl, Intelius, BeenVerified, Whitepages, and PeekYou provide people’s names, addresses, contact details, date of birth, photographs, videos, profession, details about their family and friends, social networking profiles, property information, and optional background on criminal checks. Further, online people search services may often reveal the profession of an individual, businesses owned by a person, upcoming projects and operating environment, websites and blogs, contact numbers, important dates, company email addresses, cell phone numbers, fax numbers, and personal e-mail addresses. Using this information, an attacker can try to obtain bank details, credit card details, past history, and so on. This information proves to be highly beneficial for attackers to launch attacks. There are many available online people search services that help in obtaining information regarding people. Examples of such people search services include Intelius, pipl, and AnyWho.
Attackers can use the Intelius people search online service to search for people belonging to the target organization. Using this service, attackers obtain information such as phone numbers, address history, age, date of birth, relatives, previous work history, educational background, and so on.
LinkedIn is a social networking website for professionals. It connects the world’s human resources to aid productivity and success. The site contains personal information such as name, position, organization name, current location, educational qualifications, and so on. Information gathered from LinkedIn helps an attacker in performing social engineering or other kinds of attacks. Attackers can use theHarvester tool to gather information from LinkedIn based on the target organization name.
The theHarvester is a tool designed to be used in the early stages of a penetration test. It is used for open-source intelligence gathering and helps to determine a company’s external threat landscape on the Internet. Attackers use this tool to perform enumeration on the LinkedIn social networking site to find employees of the target company along with their job titles. As shown in the screenshot, the attacker uses the following command to enumerate users on LinkedIn:
theHarvester -d microsoft -l 200 -b linkedin
In the above command, -d specifies the domain or company name to search, -l specifies the number of results to be retrieved, and -b specifies the data source as LinkedIn.
Gathering email addresses related to the target organization acts as an important attack vector during the later phases of hacking. Attackers can use automated tools such as theHarvester and Email Spider to collect publicly available email addresses of the employees of the target organization. These tools harvest email lists related to a specified domain using search engines such as Google, Bing, and Baidu. Attackers use these email lists and usernames to perform social engineering and brute force attacks on the target organization.
Attackers use theHarvester tool to extract email addresses related to the target domain. For example, attackers use the following command to extract email addresses of microsoft.com using the Baidu search engine:
theharvester -d microsoft.com -l 200 -b baidu
In the above command, -d specifies the domain used for harvesting the emails, -l will limit the results to 200, and -b tells theHarvester to extract the results from the Baidu search engine; alternatively, you can use Google, Bing, etc.
Attackers who seek access to personal information or financial information often target financial data such as stock quotes and charts, financial news, and portfolios. Financial services such as Google Finance, MSN Money, Yahoo Finance, and Investing.com can provide a large amount of useful information such as the market value of a company’s shares, company profile, competitor details, stock exchange rates, corporate press releases, financial reports along with news, and blog search articles about corporations. The information provided varies from one service to the other. Financial firms rely on web services to perform transactions and grant users access to their accounts. Attackers can obtain sensitive and private information regarding these firms by using malware, exploiting software design flaws, breaking authentication mechanisms, service flooding, and performing brute force attacks and phishing attacks.
The Google finance service features business and enterprise headlines for many corporations, including their financial decisions and major news events. Stock information is also available, as are stock price charts that contain marks for major news events and corporate actions. The site also aggregates Google news and Google blog search articles about each corporation.
Attackers can gather valuable information about the operating system, software versions, company’s infrastructure details, and database schema of an organization through footprinting job sites using different techniques. Many organizations’ websites provide recruiting(招聘) information on a job posting page that, in turn, reveals hardware and software information, network-related information, and technologies used by the company (e.g., firewall, internal server type, OS used, network appliances, and so on.). In addition, the website may have a key employee list with email addresses. Such information may prove to be beneficial for an attacker. For example, if an organization advertises a Network Administrator job, it posts the requirements related to that position. Further, attackers can go through employee resumes posted on job sites and extract information such as an individual’s expertise, educational qualifications, and job history. The job history of an employee can reveal technical information about the target organization. Attackers can use the technical information obtained through job sites such as Dice, LinkedIn, and Simply Hired to detect underlying vulnerabilities in the target IT infrastructure.
The surface web is the outer layer of the online cyberspace that allows the user to find web pages and content using regular web browsers. Search engines use crawlers that are programmed bots to access and download web pages. The surface web can be accessed by browsers such as Google Chrome, Mozilla Firefox, and Opera.
The deep web is the layer of the online cyberspace that consists of web pages and content that are hidden and unindexed. Such content cannot be located using traditional web browsers and search engines. The size of the deep web is incalculable, and it expands to almost the entire World Wide Web. The deep web does not allow the crawling process of basic search engines. It consists of official government or federal databases and other information linked to various organizations.
The dark web or Darknet is a deeper layer of the online cyberspace, and it is the subset of the deep web that enables anyone to navigate(航行) anonymously without being traced. The dark web can be accessed only through specialized tools or darknet browsers. Attackers primarily use the dark web to perform footprinting on the target organization and launch attacks.
Attackers can use deep and dark web searching tools such as Tor Browser, ExoneraTor, and OnionLand Search engine to gather confidential information about the target, such as credit card details, passports information, identification card details, medical records, social media accounts, and Social Security Numbers (SSNs). With the help of this information, they can launch further attacks on the targets.
Competitive intelligence gathering is the process of identifying, gathering, analysing, verifying, and using information about your competitors from resources such as the Internet. Competitive intelligence means understanding and learning about other businesses to become as competitive as possible. It is non-interfering and subtle in nature compared to direct intellectual property theft carried out via hacking or industrial espionage. It focuses on the external business environment. In this method, professionals gather information ethically and legally instead of gathering it secretly. Competitive intelligence helps in determining:
- What the competitors are doing?
- How competitors are positioning their products and services?
- What customers are saying about competitors’ strengths and weaknesses?
Companies carry out competitive intelligence either by employing people to search for information or by utilizing a commercial database service, which involves lower costs. The information that is gathered can help the managers and executives of a company make strategic decisions.
Competitive Intelligence gathering can be performed using a direct or indirect approach.
The direct approach serves as the primary source for competitive intelligence gathering. Direct approach techniques include gathering information from trade shows, social engineering of employees and customers, and so on.
Through an indirect approach, information about competitors is gathered using online resources. Indirect approach techniques include:
- Company websites and employment ads o Support threads and reviews
- Search engines, Internet, and online database
- Social media postings
- Press releases and annual reports
- Trade journals, conferences, and newspapers
- Patent and trademarks
- Product catalogs(目录) and retail outlets
- Analyst and regulatory reports
- Customer and vendor interviews
- Agents, distributors, and suppliers
- Industry-specific blogs and publications
- Legal databases, e.g., LexisNexis
- Business information databases, e.g., Hoover’s
- Online job postings
Gathering competitor documents and records helps to improve productivity and profitability, which in turn stimulates the growth of the company. It helps in determining answers to the following:
Through competitive intelligence, companies can collect the history of a particular company, such as its establishment date. Sometimes, they gather crucial information that is not often available to others.
What are the various strategies that the company uses? Development intelligence can include advertisement strategies, customer relationship management, and so on.
This information helps a company learn about the competitor’s decision-makers.
Competitive intelligence also includes the location of the company and information related to various branches and their operations.
When Did it Begin? How Did it Developed?
Attackers can use the information gathered through competitive intelligence to build a hacking strategy. Information resource sites that help to gain competitive intelligence include:
- EDGAR Database
The Electronic Data Gathering, Analysis, and Retrieval system (EDGAR) performs automated collection, validation, indexing, acceptance, and forwarding of submissions by companies and others who are required by law to file with the U.S. Securities and Exchange Commission (SEC). Its primary purpose is to increase the efficiency and fairness of the securities market for the benefit of investors, corporations, and the economy by accelerating(增速) the receipt, acceptance, dissemination(发布), and analysis of time-sensitive corporate information filed with the agency.
- D&B Hoovers
D&B Hoovers leverages a commercial database of 120 million business records and analytics to deliver a sales intelligence solution that enables sales and marketing professionals to focus on the right prospects(预期) so that they can generate immediate growth for their business.
LexisNexis provides content-enabled workflow solutions designed specifically for professionals in the legal, risk management, corporate, government, law enforcement, accounting, and academic markets. It maintains an electronic database of information related to legal and public records. It enables customers to access documents and records of legal, news, and business sources. It is beneficial for companies and government agencies seeking data analytics supporting compliance, customer acquisition(获得), fraud detection, health outcomes, identity solutions, investigation, receivables management, risk decisioning, and workflow optimization.
- Business Wire
Business Wire focuses on press release distribution and regulatory disclosure. This company distributes full-text news releases, photos, and other multimedia content from various organizations across the globe to journalists, news media, financial markets, investors, information website, databases, and general audiences. It has its own patented electronic network through which it releases news.
Factiva is a global news database and licensed content provider. It is a business information and research tool that gets information from licensed and free sources and provides capabilities such as searching, alerting, dissemination, and business information management. Factiva products provide access to more than 33,000 sources such as licensed publications, influential websites, blogs, images, and videos. Its resources are made available from nearly every country worldwide in 28 languages, including more than 600 continuously updated newswires(新闻专线).
What Are the Company’s Plan?
Information resource sites that help attackers gain a company’s business plans include:
MarketWatch tracks the pulse(脉搏) of markets for engaged investors. The site is an innovator in business news, personal finance information, real-time commentary, and investment tools and data, with journalists generating headlines, stories, videos, and market briefs.
- The Wall Street Transcript
The Wall Street Transcript(抄本) is a website as well as a paid subscription-based publication that publishes industry reports. It expresses the views of money managers and equity analysts of different industry sectors. The site also publishes interviews with CEOs of companies.
Alexa is a great tool to dig deep into the analytics of other companies. It allows users to discover influencer outreach opportunities by uncovering sites that link to their competitors using Competitor Backlink Checker; and benchmark and track their company’s performance relative to their competitors using Competitive Intelligence Tools.
Euromonitor provides strategy research capabilities for consumer markets. It publishes reports on industries, consumers, and demographics(人口统计学的). It provides market research and surveys focused on the organization’s needs.
Experian provides insights into competitors’ search, affiliate, display, and social marketing strategies and metrics to improve marketing campaign results. It allows the user to: benchmark the effectiveness of existing customer acquisition strategies, determine what is driving competitors’ success, use historical consumer data to forecast future trends and quickly respond to changing behaviours, and measure website’s performance against industry or specific sites.
- SEC Info
SEC Info offers the U.S. Securities and Exchange Commission (SEC) EDGAR database service on the web, with many links added to SEC documents. It allows searches by name, industry, business, SIC code, area code, accession number, file number, CIK, topic, ZIP code, and so on.
- The Search Monitor
The Search Monitor provides competitive intelligence to monitor brand and trademark use, affiliate compliance, and competitive advertisers on paid search, organic search, local search, social media, mobile, and shopping engines worldwide. It helps interactive agencies, search marketers, and affiliate marketers to track ad rank, ad copy, keyword reach, click rates and CPCs, monthly ad spending, market share, trademark use, and affiliate activity.
The United States Patent and Trademark Office (USPTO) provides information related to patent(专利权) and trademark registration. It provides general information concerning patents and search options for patents and trademark databases.
What Expert Opinions Say About the Company?
Information resource sites that help the attacker to obtain expert opinions about the target company include:
SEMRush is a competitive keyword research tool. It can provide a list of Google keywords and AdWords for any site, as well as a competitor list in the organic and paid Google search results. It enables an approach for gaining in-depth knowledge about what competitors are advertising and their budget allocation to specific Internet marketing tactics.
AttentionMeter is a tool for comparing websites (traffic) by using Alexa, Compete, and Technorati. It gives a snapshot of traffic data as well as graphs from Alexa, Compete, and Technorati for the specified websites.
- ABI/INFORM Global
ABI/INFORM Global is a business database. ABI/INFORM Global offers the latest business and financial information for researchers. With ABI/INFORM Global, users can determine business conditions, management techniques, business trends, management practice and theory, corporate strategy and tactics, and the competitive landscape.
SimilarWeb aggregates data from multiple sources to estimate traffic, geography, and referral data for a company’s websites and mobile apps. It also provides a panel through a browser extension that allows refining other data sources by anonymously tracking browser activity across millions of browsers worldwide.
Finding useful information from corporate websites is a necessary step in the information gathering phase. These business profile sites contain business information of companies located in a particular region with their contact information, which can be viewed by anyone. Attackers use business profile sites such as opencorporates, Crunchbase, and corporationwiki to gather important information about the target organizations, such as their location, addresses, contact information (such as phone numbers, email addresses), employee database, department names, type of service provided, and type of industry.
Alerts are content monitoring services that provide automated, up-to-date information based on user preference, usually via email or SMS. To receive alerts, a user must register on the website and provide either an email address or a phone number. Online alert services automatically notify users when new content from news, blogs, and discussion groups matches a set of search terms selected by the user. These services provide up-to-date information about competitors and the industry. Alerts are sent via email or SMS notifications. Tools such as Google Alerts, Twitter Alerts, and Giga Alerts help attackers to track mentions of the organization’s name, member names, website, or any people or projects that are important. Attackers can gather updated information about the target periodically from the alert services and use it for further attacks.
Online Reputation Management (ORM) is a process of monitoring displays when someone searches for your company’s reputation on the Internet. ORM then takes measures to minimize negative search results or reviews. The process helps to improve brand reputation. Companies often track the public feedback given to them using ORM tracking tools and then take measures to improve their credibility and retain their customers’ trust. For positive online reputation management, organizations will often try to be more transparent over the Internet. This transparency may help the attacker to collect genuine information about the target organization.
Online reputation tracking tools help us to discover what people are saying online about the company’s brand in real time across the web, social media, and news. They help in monitoring, measuring, and managing one’s reputation online. An attacker may use ORM tracking tools to track a company’s online reputation; collect a company’s search engine ranking information; obtain email notifications when a company is mentioned online; track conversations; and obtain social news about the target organization.
Mention is an online reputation tracking tool that helps attackers in monitoring the web, social media, forums, and blogs to learn more about the target brand and industry. As shown in the screenshot, this tool helps attackers in tracking online conversations as they happen, wherever they happen. Using Mention, attackers can have live, up-to-date reports delivered to any email address in real time.
Many Internet users use blogs, groups, and forums for knowledge sharing purposes. For this reason, attackers often focus on groups, forums, and blogs to find information about a target organization and its people. Organizations generally fail to monitor the exchange of information that employees reveal to other users in forums, blogs, and group discussions. Attackers see this as an advantage and collect sensitive information about the target, such as public network information, system information, and employee personal information. Attackers can register with fake profiles in Google groups, Yahoo groups, and so on. They try to join the target organization’s employee groups, where they can obtain personal and company information. Attackers can also search for information in groups, forums, and blogs by Fully Qualified Domain Names (FQDNs), IP addresses, and usernames. Employee information that an attacker can gather from groups, forums, and blogs may include full name of the employee; place of work and residence; home telephone, cell number, or office number; personal and organizational email address; pictures of the employee residence or work location that include identifiable information; and pictures of employee awards and rewards or upcoming goals.
Usenet newsgroup is a repository containing a collection of notes or messages on various subjects and topics that are submitted by the users over the Internet. Network News Transfer Protocol (NNTP) is used to relay Usenet news articles from the discussions over the newsgroup. Usenet newsgroups can be a useful source of valuable information about the target. People seek help by posting questions and asking for a solution on Usenet newsgroups. Many professionals use the newsgroups to resolve their technical issues by posting questions on Usenet. To obtain solutions for these issues, sometimes they post more detailed information about the target than needed. Attackers can search Usenet newsgroups or mailing lists such as Newshosting, Eweka, and Supernews to find valuable information about the operating systems, software, web servers, etc., used by the target organization.
Website footprinting refers to monitoring and analysing a target organization’s website for information. An attacker can build a detailed map of a website’s structure and architecture without triggering the IDS or arousing(引起) the suspicion of any system administrator. Attackers use sophisticated footprinting tools or the basic tools that come with the operating system, such as Telnet, or a browser. The Netcraft tool can gather website information such as IP address, registered name and address of the domain owner, domain name, host of the site, and OS details. However, the tool may not give all these details for every site. In such cases, the attacker can browse the target website. Browsing the target website will typically provide the following information:
An attacker can easily find the software and version in use on an off-the-shelf(现成的) software-based website.
Usually, the operating system in use can also be determined.
Searches can reveal the sub-directories and parameters by making a note of the URLs while browsing the target website.
The attacker will often carefully analyse anything after a query that looks like a filename, path, database field name, or query to check whether it offers opportunities for SQL injection.
With the help of script filename extensions such as .php, .asp, or .jsp, one can easily determine the scripting platform that the target website is using.
By inspecting the URLs of the target website, one can easily determine the technologies (.NET, J2EE, PHP, etc.) used to build that website.
The contact pages usually offer details such as names, phone numbers, email addresses, and locations of admin or support personnel. An attacker can use these details to perform a social engineering attack. CMS software allows URL rewriting to disguise(隐藏) the script filename extensions if the attacker is willing to devote additional effort toward determining the scripting platform.
Attackers use Burp Suite, Zaproxy, WhatWeb, BuiltWith, Wappalyzer, and Website Informer to view headers that provide:
- Connection status and content type
- Accept-Ranges and Last-Modified information
- X-Powered-By information
- Web server in use and its version
2.3.1. Burp Suite
Burp Suite is an integrated platform for performing security testing of web applications. Its various tools work together to support the entire testing process, from initial mapping and analysis of an application’s attack surface to finding and exploiting security vulnerabilities. Burp Proxy allows attackers to intercept all requests and responses between the browser and the target web application and obtain information such as web server used, its version, and web-application-related vulnerabilities.
Website footprinting can be performed by examining HTML source code and cookies.
Attackers can gather sensitive information by examining the HTML source code and following the comments that are inserted manually or those that the CMS system creates. The comments may provide clues as to what is running in the background. They may even provide contact details of the web developer or administrator. Observe all the links and image tags to map the file system structure. This will reveal the existence of hidden directories and files. Enter fake data to determine how the script works. It is sometimes possible to edit the source code.
To determine the software running and its behaviour, one can examine cookies set by the server. Identify the scripting platforms by observing sessions and other supporting cookies. The information about cookie name, value, and domain size can also be extracted.
A web spider (also known as web crawler or web robot) is a program or automated script that browses websites in a methodical manner to collect specific information such as employee names and email addresses. Attackers then use the collected information to perform footprinting and social engineering attacks. Attackers can uncover all the files and web pages on the target website by simply feeding the web spider with a URL. Then, the web spider sends hundreds of requests to the target website and analyses the HTML code of all the received responses for identifying additional links. If any new links are found, then the spider adds them to the target list and starts spidering and analysing the newly discovered links. This method helps attackers to not only detect exploitable web-attack surfaces but also to find all the directories, web pages, and files that make up the target website. Web spidering fails if the target website has the robots.txt file in its root directory with a listing of directories to prevent crawling.
Web spidering tools such as Web Data Extractor, ParseHub, and SpiderFoot can collect sensitive information from the target website.
Web Data Extractor
Web Data Extractor automatically extracts specific information from web pages. It extracts targeted contact data (email, phone, and fax) from the website, extracts the URL and meta tags (title, description, keyword) for website promotion, searches directory creation, performs web research, and so on. As shown in the screenshot, attackers use Web Data Extractor to automatically gather critical information such as lists of meta tags, e-mail addresses, and phone and fax numbers from the target website.
Attackers, in some cases, use a more sophisticated technique for spidering the target website instead of using automated tools. They use standard web browsers to walk through the target website in an attempt to navigate through all the functionalities provided by the web application. While performing this task, the resulting incoming and outgoing traffic of the website is monitored and analysed by the tools that include features of both a web spider and an intercepting proxy. Further, these tools create a map of the web application consisting of all the URLs visited by the browser. It also analyses the responses of the application and updates the map with the discovered content and its functionalities. Attackers use tools such as Burp Suite and WebScarab to perform user-directed spidering.
Website mirroring is the process of creating a replica(复制品) or clone of the original website. Users can duplicate websites using mirroring tools such as HTTrack Web Site Copier and NCollector Studio. These tools download a website to a local directory and recursively build all the directories including HTML, images, flash, videos, and other files from the webserver on another computer. Website mirroring has the following benefits:
- It is helpful for offline site browsing
- It enables an attacker to spend more time in viewing and analysing the website for vulnerabilities and loopholes
- It helps in finding the directory structure and other valuable information from the mirrored copy without multiple requests to the webserver
Attackers can use this information to perform various web application attacks on the target organization’s website.
HTTrack Web Site Copier
HTTrack is an offline browser utility. It downloads a website from the Internet to a local directory and recursively builds all the directories including HTML, images, and other files from the webserver on another computer. As shown in the screenshot, attackers use HTTrack to mirror the entire website of the target organization, store it in the local system drive, and browse the local website to identify possible exploits and vulnerabilities.
Archive is an Internet Archive Wayback Machine that explores archived versions of websites. Such exploration allows an attacker to gather information on an organization’s web pages since its creation. As the website https://archive.org keeps track of web pages from the time of their creation, an attacker can retrieve even information removed from the target website, such as web pages, audio files, video files, images, text, and software programs. Attackers use this information to perform phishing and other types of web application attacks on the target organization.
Octoparse offers automatic data extraction, as it quickly scrapes(刮取) web data without coding and turns web pages into structured data. As shown in the screenshot, attackers use Octoparse to capture information from webpages, such as text, links, image URLs, or html code.
The words available on the target website may reveal critical information that helps attackers to perform further exploitation. Attackers gather a list of email addresses related to the target organization using various search engines, social networking sites, web spidering tools, etc. After obtaining these email addresses, an attacker can gather a list of words available on the target website. This information helps the attacker to perform brute-force attacks on the target organization. An attacker uses the CeWL tool to gather a list of words from the target website and perform a brute-force attack on the email addresses gathered earlier.
To run the CeWL tool, issue the following commands:
ruby cewl.rb --help
This command displays various options that a user can use to obtain a list of words from the target website.
This command returns a list of unique words present in the target URL.
cewl --email www.certifiedhacker.com
In this case, the target website is www.certifiedhacker.com, and the ‘–email’ option is used to fetch a list of words and email addresses from the target website.
Useful information may reside on the target organization’s website in the form of pdf documents, Microsoft Word files, and other files in various formats. Attackers extract valuable data, including metadata and hidden information from such documents. The data mainly contains hidden information about the public documents that can be analysed to extract information such as the title of the page, description, keywords, creation/modification date and time of the content, and usernames and e-mail addresses of employees of the target organization.
An attacker can misuse this information to perform malicious activities against the target organization by brute-forcing authentication using the usernames and e-mail addresses of employees, or perform social engineering to send malware, which can infect the target system. Metadata extraction tools such as Metagoofil, Exiftool, and Web Data Extractor automatically extract critical information that includes the usernames of clients, operating systems (exploits are OS-specific), email addresses (possibly for social engineering), list of software (version and type) used, list of servers, document date creation/modification, and authors of the website.
Metagoofil extracts metadata of public documents (pdf, doc, xls, ppt, docx, pptx, and xlsx) belonging to a target company. It performs a Google search to identify and download the documents to the local disk and then extracts the metadata with different libraries such as Hachoir, PdfMiner, and others. As shown in the screenshot, Metagoofil generates a report with usernames, software versions, and servers or machine names, which helps attackers in the information gathering phase.
Attackers monitor the target website to detect web updates and changes. Monitoring the target website helps attackers to access and identify changes in the login pages, extract password-protected pages, track changes in the software version and driver updates, extract and store images on the modified web pages, and so on. Attackers analyse the gathered information to detect underlying vulnerabilities in the target website, and based on these vulnerabilities, they perform exploitation of the target web application. Web updates monitoring tools are capable of detecting any changes or updates on a particular website, and they can send notifications or alerts to interested users through email or SMS.
WebSite-Watcher helps to track websites for updates and automatic changes. When an update or change occurs, WebSite-Watcher automatically detects and saves the last two versions onto your disk. As shown in the screenshot, attackers use WebSite-Watcher to extract the older and newer versions of web pages related to the target website.
Attackers can search the target company’s website to gather crucial information about the company. Generally, organizations use websites to inform the public about what they do, what type of services or products they provide, how to contact them, etc. Attackers can exploit this information to launch further attacks on the target company. For example, attackers can search for the following information on the company’s website:
- Company contact names, phone numbers, and email addresses
- Company locations and branches
- Partner Information
- Links to other sites
- Product, project, or service data
Copyright is a protecting mechanism provided by the law of a country, which grants the creator of an original work exclusive rights for its use and distribution. To restrict third parties from accessing their data freely, most organizations ensure that there is a copyright notice on every single piece of their published work. A typical copyright notice contains the following information:
- The Copyright Symbol
- The Year of Creation
- The Name of the Author
- A Rights Statement
An attacker can search for copyright notices on the web and use these details to perform a deep analysis of the target organization. Further, attackers can search and note down the revision number of a product. The revision number is a unique string that acts as an identifier for the revision of a given document, and it can be found within the documents of the company. Attackers can also search for the document numbers that are assigned to the documents after revision, which can be searched from the Internet and recorded to launch further attacks on the target.
Attackers can monitor a target company’s website traffic using tools such as Web-Stat, Alexa, and Monitis to collect valuable information. These tools help to collect information about the target’s customer base, which help attackers to disguise(伪装) themselves as customers and launch social engineering attacks on the target. The information collected includes:
- Total visitors:
Tools such as Clicky find the total number of visitors browsing the target website.
- Page views:
Tools such as Opentracker monitor the total number of pages viewed by the users along with the timestamps and the status of the user on a particular web page (whether the webpage is still active or closed).
- Bounce rate:
Tools such as Google Analytics measure the bounce rate of the target company’s website. Bounce rate is an Internet marketing term used in web traffic analysis, which represents the percentage of visitors who enter the site and then leave (‘bounce’) rather than continuing on to view other pages within the same site.
- Live visitors map:
Tools such as Web-Stat track the geographical location of the users visiting the company’s website.
- Site ranking:
Tools such as Alexa track a company’s rank on the web.
- Audience geography:
Tools such as Alexa track a company’s customer locations on the globe.
- Track Visitors and monitor sales:
Tools such as goingup! track visitors, monitor sales, and show conversation rates with the company’s website.
Email tracking monitors the email messages of a particular user. This kind of tracking is possible through digitally time-stamped records that reveal the time and date when the target receives and opens a specific email. Email tracking tools allow an attacker to collect information such as IP addresses, mail servers, and service providers involved in sending the email. Attackers can use this information to build a hacking strategy and to perform social engineering and other attacks. Examples of email tracking tools include eMailTrackerPro, Infoga, and Mailtrack. Information about the victim gathered using email tracking tools includes:
- Recipient’s System IP address:
Allows tracking of the recipient’s IP address
Estimates and displays the location of the recipient on the map and may even calculate the distance from the attacker’s location
- Email Received and Read:
Notifies the attacker when the email is received and read by the recipient
- Read Duration:
The time spent by the recipient in reading the email sent by the sender
- Proxy Detection:
Provides information about the type of server used by the recipient
Checks whether the links sent to the recipient through email have been checked
- Operating System and Browser information:
Reveals information about the operating system and the browser used by the recipient. The attacker can use this information to find loopholes in that version of the operating system and browser to launch further attacks
- Forward Email:
Determines whether the email sent to the user is forwarded to another person
- Device Type:
Provides information about the type of device used to open and read the email, e.g., desktop computer, mobile device, or laptop
- Path Travelled:
Tracks the path through which the email travelled via email transfer agents from source to destination system
An email header contains the details of the sender, routing information, addressing scheme, date, subject, and recipient. Email headers also help attackers to trace the routing path taken by an email before it is delivered to the recipient. Each email header is a useful source of information for an attacker to launch attacks against the target. The process of viewing the email header varies with different email programs. Commonly used email programs include eM Client, Mailbird Lite, Hiri, Mozilla Thunderbird, Spike, Claws Mail, SmarterMail Webmail, Outlook, and so on. The email header contains the following information:
- Sender’s mail server
- Date and time of receipt by the originator’s email servers
- Authentication system used by the sender’s mail server
- Data and time of sending the message
- A unique number assigned by mx.google.com to identify the message
- Sender’s full name
- Sender’s IP address and address from which the message was sent
The attacker can trace and collect all this information by performing a detailed analysis of the complete email header.
Infoga is a tool used for gathering email account information (IP, hostname, country, etc.) from different public sources (search engines, pgp key servers, and Shodan), and it checks if an email was leaked using the haveibeenpwned.com API. For example, the command
python infoga.py --domain microsoft.com --source all --breach -v 2 --report ../microsoft.txt
will retrieve all the publicly available email addresses related to the domain microsoft.com along with email account information.
python infoga.py --info email@example.com --breach -v 3 --report ../m4ll0k.txt
The above command will retrieve email account information for a specified email address.
As shown in the screenshot, attackers use eMailTrackerPro to analyse email headers and extract information such as the sender’s geographical location, IP address, and so on. It allows an attacker to review the traces later by saving past traces.
Whois is a query and response protocol used for querying databases that store the registered users or assignees of an Internet resource, such as a domain name, an IP address block, or an autonomous system. This protocol listens to requests on port 43 (TCP). Regional Internet Registries (RIRs) maintain Whois databases, which contain the personal information of domain owners. For each resource, the Whois database provides text records with information about the resource itself and relevant information regarding assignees(代理人), registrants(登记者), and administrative information (creation and expiration dates). Two types of data models exist to store and lookup Whois information:
- Thick Whois - Stores the complete Whois information from all the registrars for a particular set of data.
- Thin Whois - Stores only the name of the Whois server of the registrar of a domain, which in turn holds complete details on the data being looked up.
Whois query returns the following information:
- Domain name details
- Contact details of the domain owner
- Domain name servers
- Net Range
- When a domain has been created
- Expiry records
- Records last updated
An attacker queries a Whois database server to obtain information about the target domain name, contact details of its owner, expiry date, creation date, and so on, and the Whois server responds to the query with the requested information. Using this information, an attacker can create a map of the organization’s network, mislead domain owners with social engineering, and then obtain internal details of the network.
Regional Internet Registries (RIRs) include:
- ARIN (American Registry for Internet Numbers) (https://www.arin.net)
- AFRINIC (African Network Information Center) (https://www.afrinic.net)
- APNIC (Asia Pacific Network Information Center) (https://www.apnic.net)
- RIPE (Réseaux IP Européens Network Coordination Centre) (https://www.ripe.net)
- LACNIC (Latin American and Caribbean Network Information Center) (https://www.lacnic.net)
Whois services such as http://whois.domaintools.com or https://www.tamos.com can help to perform Whois lookups. The screenshot shows the result analysis of a Whois lookup obtained with the two above-mentioned Whois services. The services perform Whois lookup by entering the target’s domain or IP address. The domaintools.com service provides Whois information such as registrant information, email, administrative contact information, creation and expiry date, and a list of domain servers. SmartWhois gives information about an IP address, hostname, or domain, including information about the country, state or province, city, phone number, fax number, name of the network provider, administrator, and technical support contact information. It also helps in finding the owner of the domain, the owner’s contact information, the owner of the IP address block, registered date of the domain, and so on. It supports Internationalized Domain Names (IDNs), which means one can query domain names that use non-English characters. It also supports IPv6 addresses.
Attackers use Whois lookup tools such as Batch IP Converter, WhoIs Analyzer Pro, and ActiveWhois to extract information such as IP addresses, hostnames or domain names, registrant information, and DNS records, including the country, city, state, phone and fax numbers, network service providers, administrators, and technical support information, for any IP address or domain name.
IP geolocation helps to obtain information regarding a target such as its country, region/state, city, latitude and longitude of its city, ZIP/postal code, time zone, connection speed, ISP (hosting company), domain name, IDD country code, area code, weather station code and name, mobile carrier, and elevation. Using the information obtained from IP geolocation, an attacker may attempt to gather more information about a target with the help of social engineering, surveillance, and non-technical attacks such as dumpster diving, hoaxing, or acting as a technical expert. With the help of the information obtained, an attacker can also set up a compromised web server near the victim’s location, and if the exact location of the victim is detected, the attacker can perform malicious activities and infect the victim with malware designed for that specific area or gain unauthorized access to the target device or attempt to launch an attack using the target device. IP geolocation lookup tools such as IP2Location, IP Location Finder, and IP Address Geographical Location Finder help to collect IP geolocation information about the target, which enables attackers to launch social engineering attacks such as spamming and phishing.
As shown in the screenshot, attackers use IP2Location tool to identify a visitor’s geographical location, i.e., country, region, city, latitude and longitude of city, ZIP code, time zone, connection speed, ISP, domain name, IDD country code, area code, weather station code and name, mobile carrier, elevation, and usage type information using a proprietary IP address lookup database and technology.
DNS footprinting reveals information about DNS zone data. DNS zone data include DNS domain names, computer names, IP addresses, and much more information about a network. An attacker uses DNS information to determine key hosts in the network and then performs social engineering attacks to gather even more information. DNS footprinting helps in determining the following records about the target DNS:
DNS interrogation tools such as Professional Toolset (https://tools.dnsstuff.com) and DNS Records (https://network-tools.com) enable the user to perform DNS footprinting. DNSstuff (Professional Toolset) extracts DNS information about IP addresses, mail server extensions, DNS lookups, Whois lookups, and so on. It can extract a range of IP addresses using an IP routing lookup. If the target network allows unknown, unauthorized users to transfer DNS zone data, then it is easy for an attacker to obtain the information about DNS with the help of the DNS interrogation tool. When the attacker queries the DNS server using the DNS interrogation tool, the server responds with a record structure that contains information about the target DNS. DNS records provide important information about the location and types of servers.
Attackers also use DNS lookup tools such as DNSdumpster.com, Bluto, and Domain Dossier to retrieve DNS records for a specified domain or hostname. These tools retrieve information such as domains and IP addresses, domain Whois records, DNS records, and network Whois records.
DNS lookup is used for finding the IP addresses for a given domain name, and the reverse DNS operation is performed to obtain the domain name of a given IP address. When you are looking for a domain and type the domain name in the browser, the DNS converts that domain name into an IP address and forwards the request for further processing. This conversion of a domain name into an IP address is performed by a record. Attackers perform a reverse DNS lookup on the IP range to locate a DNS PTR record for such IP addresses. Attackers use various tools such as DNSRecon and Reverse IP Domain Check for performing the reverse DNS lookup on the target host. When we get an IP address or a range of IP addresses, we can use these tools to obtain the domain name.
As shown in the screenshot, attackers use the following command to perform a reverse DNS lookup on the target host:
dnsrecon -r 188.8.131.52-184.108.40.206
In the above command, the -r option specifies the range of IP addresses (first-last) for a reverse lookup by brute force.
Attackers also find the other domains that share the same web server using tools such as Reverse IP Domain Check. These tools list the possible domains that are hosted on the same web server.
Reverse IP Domain Check
As shown in the screenshot, a reverse IP domain check takes a domain name or IP address pointing to a web server and searches for other sites known to be hosted on the same web server.
One needs to gather basic and important information about the target organization, such as what the organization does, who works there, and what type of work they do to perform network footprinting. The answers to these questions provide information about the internal structure of the target network. After gathering the information, an attacker can proceed to find the network range of a target system. Detailed information is available from the appropriate regional registry database regarding IP allocation and the nature of the allocation. An attacker can also determine the subnet mask of the domain and trace the route between the system and the target system. Traceroute tools that are widely used include Path Analyzer Pro and VisualRoute. Obtaining private IP addresses can be useful to attackers. The Internet Assigned Numbers Authority (IANA) has reserved the following three blocks of the IP address space for private internets: 10.0.0.0–10.255.255.255 (10/8 prefix), 172.16.0.0–172.31.255.255 (172.16/12 prefix), and 192.168.0.0–192.168.255.255 (192.168/16 prefix). Using the network range, the attacker can get information about how the network is structured and which machines in the networks are alive. Using the network range also helps to identify the network topology, access control device, and OS used in the target network. To find the network range of the target network, one needs to enter the server IP address (that was gathered in Whois footprinting) in the ARIN Whois database search tool. A user can also visit the ARIN website and enter the server IP in the SEARCH Whois text box. This gives the network range of the target network. Improperly set up DNS servers offer attackers a good chance of obtaining a list of internal machines on the server. In addition, sometimes, if an attacker traces a route to a machine, it is possible to obtain the internal IP address of the gateway, which can be useful.
Attackers typically use more than one tool to obtain network information, as a single tool cannot provide all the required information.
Finding the route of the target host on the network is necessary to test against man-in-the-middle attacks and other related attacks. Most operating systems come with a Traceroute utility to perform this task. It traces the path or route through which the target host packets travel in the network. Traceroute uses the ICMP protocol concept and Time to Live (TTL) field of the IP header to find the path of the target host in the network. The Traceroute utility can detail the path through which IP packets travel between two systems. The utility can trace the number of routers the packets travel through, the round-trip time (duration in transiting between two routers), and, if the routers have DNS entries, the names of the routers and their network affiliation(从属关系). It can also trace geographic locations. It works by exploiting a feature of the Internet Protocol called TTL. The TTL field indicates the maximum number of routers a packet may traverse(穿越). Each router that handles a packet decrements(消耗) the TTL count field in the ICMP header by one. When the count reaches zero, the router discards the packet and transmits an ICMP error message to the originator of the packet.
The utility records the IP address and DNS name of the router and sends out another packet with a TTL value of two. This packet makes it through the first router and then times-out at the next router in the path. This second router also sends an error message back to the originating host. Traceroute continues to do this and records the IP address and name of each router until a packet finally reaches the target host or until it decides that the host is unreachable. In the process, it records the time taken for each packet to make a round trip to each router. Finally, when it reaches the destination, the normal ICMP ping response will be sent back to the sender. The utility helps to reveal the IP addresses of the intermediate hops in the route to the target host from the source.
Windows operating system by default uses ICMP traceroute. Go to the command prompt and type the tracert command along with the destination IP address or domain name as follows:
Many devices in any network are generally configured to block ICMP traceroute messages. In this scenario, an attacker uses TCP or UDP traceroute, which is also known as Layer 4 traceroute. Go to the terminal in Linux operating system and type the tcptraceroute command along with the destination IP address or domain name as follows:
Like Windows, Linux also has a built-in traceroute utility, but it uses the UDP protocol for tracing the route to the destination. Go to the terminal in the Linux operating system and type the traceroute command along with the destination IP address or domain name as follows:
We have seen how the Traceroute utility helps to find the IP addresses of intermediate devices such as routers and firewalls present between a source and its destination. After running several traceroutes, an attacker will be able to find the location of a hop(跃点) in the target network. Consider the following traceroute results obtained:
Traceroute tools such as Path Analyzer Pro, VisualRoute, Traceroute NG, and PingPlotter are useful for extracting information about the geographical location of routers, servers, and IP devices in a network. Such tools help us to trace, identify, and monitor the network activity on a world map. Some of the features of these tools are Hop-by-hop traceroutes, Reverse tracing, Historical analysis, Packet loss reporting, Reverse DNS, Ping plotting, Port probing, Detect network problems, Performance metrics analysis, and Network performance monitoring.
Path Analyzer Pro
Path Analyzer Pro performs network route tracing with performance tests, DNS, Whois, and network resolution to investigate network issues. Attackers use Path Analyzer Pro to identify the route from the source to destination target systems graphically. As shown in the screenshot, this tool helps attackers to gather information such as the hop number, its IP address, hostname, ASN, network name, percentage loss, latency, average latency, and standard deviation for each hop in the path.
VisualRoute is a traceroute and network diagnostic tool. Attackers use VisualRoute to identify the geographical location of routers, servers, and other IP devices in the target network. This tool helps attackers in tracking the path between the source and destination systems and obtaining the results in a graphical format. As shown in the screenshot, using VisualRoute tool enables attackers to gather information such as hop number, IP address, node name, and geographical location of each hop in the route.
So far, we have discussed the different techniques for gathering information using online resources or tools. Now, we will discuss footprinting through social engineering, i.e., the art of obtaining information from people by exploiting their weaknesses. Social engineering is a non-technical process in which an attacker misleads a person into providing confidential information inadvertently(疏忽地). In other words, the target is unaware of the fact that someone is stealing confidential information. The attacker takes advantage of the gullible(易受骗的) nature of people and their willingness to provide confidential information. To perform social engineering, an attacker first needs to gain the confidence of an authorized user and then mislead that user into revealing confidential information. The goal of social engineering is to obtain the required confidential information and then use that information for malicious purposes such as gaining unauthorized access to the system, identity theft, industrial espionage, network intrusion, fraud, and so on. The information obtained through social engineering may include credit card details, social security numbers, usernames and passwords, other personal information, security products in use, OS and software versions, IP addresses, names of servers, network layout information, and so on. Social engineering can be performed in many ways, such as eavesdropping, shoulder surfing, dumpster diving, impersonation, tailgating, third-party authorization, piggybacking, reverse social engineering, and so on.
Eavesdropping is the act of secretly listening to the conversations of people over a phone or video conference without their consent. It also includes reading confidential messages from communication media, such as instant messaging(即时通讯信息) or fax transmissions. It is the act of intercepting communication in any form such as audio, video, or text without the consent of the communicating parties. The attacker gains information by tapping phone conversations or intercepting audio, video, or written communication.
Shoulder surfing is a technique whereby attackers secretly observe the target to gain critical information. In the shoulder surfing technique, an attacker stands behind the victim and secretly observes the victim’s activities on the computer, such as keystrokes(敲击按键) while entering usernames, passwords, and so on. The technique is effective in gaining passwords, personal identification numbers, security codes, account numbers, credit card information, and similar data. Attackers can easily perform shoulder surfing in a crowded place, as it is relatively easy to stand behind and watch the victim without his or her knowledge.
This uncouth(笨拙的) technique, also known as trashing, involves the attacker rummaging(搜查) for information in garbage bins. The attacker may gain vital information such as phone bills, contact information, financial information, operations-related information, printouts of source codes, printouts of sensitive information, and so on from the target company’s trash bins, printer waste bins, sticky notes(便利贴) at users’ desks, and so on. The attacker may also gather account information from ATM trash bins. The information can help the attacker to commit attacks.
Impersonation is a technique whereby an attacker pretends to be a legitimate or authorized person. Attackers perform impersonation attacks personally or use phones or other communication media to mislead targets and trick them into revealing information. The attacker might impersonate a courier/delivery person, janitor(看门人), businessman, client, technician, or he/she may pretend to be a visitor. Using this technique, an attacker gathers sensitive information by scanning terminals for passwords, searching important documents on desks, rummaging bins, and so on. The attacker may even try to overhear confidential conversations and “shoulder surf” to obtain sensitive information.
Various tools help attackers in footprinting. Many organizations offer tools that make information gathering an easy task. This section describes tools intended for obtaining information from various sources. Footprinting tools are used to collect basic information about target systems to exploit them. Information collected by the footprinting tools includes the target’s IP location information, routing information, business information, address, phone number and social security number, details about a source of an email and a file, DNS information, domain information, and so on.
Maltego is a program that can be used to determine the relationships and real-world links between people, groups of people, organizations, websites, Internet infrastructure, documents, etc. Attackers can use different entities available in the tool to obtain information such as email addresses, a list of phone numbers, and a target’s Internet infrastructure (domains, DNS names, Netblocks, IP addresses information). As shown in the screenshot, attackers add a Person entity, rename it with the target’s name, and obtain the email addresses associated with the target.
Recon-ng is a web reconnaissance framework with independent modules for database interaction that provides an environment in which open-source web-based reconnaissance can be conducted. As shown in the screenshot, attackers use the module ‘recon/domains-hosts/hackertarget’ to extract a list of subdomains and IP addresses associated with the target URL.
Fingerprinting Organizations with Collected Archives (FOCA) is a tool used mainly to find metadata and hidden information in the documents that its scans. FOCA is capable of scanning and analysing a wide variety of documents, with the most common ones being Microsoft Office, Open Office, or PDF files. The features of FOCA include:
- Web Search
Searches for hosts and domain names through URLs associated with the main domain. Each link is analysed to extract information from its new host and domain names.
- DNS Search
Checks each domain to ascertain(查明) the host names configured in NS, MX, and SPF servers to discover the new host and domain names.
- IP Resolution
Resolves each host name by comparison with the DNS to obtain the IP address associated with this server name. To perform this task accurately, the tool performs analysis against the organization’s internal DNS.
- PTR Scanning
Finds more servers in the same segment of a determined address; IP FOCA executes a PTR log scan.
- Bing IP
Launches FOCA, which is a search process for new domain names associated with that IP address for each IP address discovered.
- Common Names
Perform dictionary attacks against the DNS.
As shown in the screenshot, attackers search the target domain and obtain the file information stored in it. The extracted files can be viewed on the web browser. Further, the attackers can view additional information such as network domains, roles, vulnerabilities, and metadata of the target domain.
OSRFramework includes applications related to username checking, DNS lookups, information leaks research, deep web search, and regular expression extraction. The tools included in the OSRFramework package that attackers can use to gather information on the target are listed below:
Checks for a user profile on up to 290 different platforms
Check for the existence of a given email
Performs a query on the platforms in OSRFramework
Checks for the existence of domains
Checks for the existence of a given series of phones
Uses regular expressions to extract entities
As shown in the screenshot, attackers use the following command to search for a target user on social media platforms,
usufy.py -n Mark Zuckerberg -p twitter facebook youtube
OSINT Framework is an open-source intelligence gathering framework that helps security professionals in performing automated footprinting and reconnaissance, OSINT research, and intelligence gathering. It is focused on gathering information from free tools or resources. This framework includes a simple web interface that lists various OSINT tools arranged by category, and it is shown as an OSINT tree structure on the web interface. As shown in the screenshot, the tools listed include the following indicators:
Indicates a link to a tool that must be installed and run locally
- ( R )
Indicates a URL that contains the search term and the URL itself must be edited manually
Recon-Dog is an all-in-one tool for all basic information gathering needs. It uses APIs to collect information about the target system. Features of it include:
Uses censys.io to gather a massive amount of information about an IP address.
- NS lookup
Performs name server lookup o Port scan: Scans most common TCP ports
- Detect CMS
Can detect 400+ content management systems
- Whois lookup
Performs a Whois lookup
- Detect honeypot
Uses shodan.io to check if the target is a honeypot
- Find subdomains
Uses findsubdomains.com to find subdomains
- Reverse IP lookup
Performs a reverse IP lookup to find domains associated with an IP address
- Detect technologies
Uses wappalyzer.com to detect 1000+ technologies
Runs all utilities against the target
BillCipher is an information gathering tool for a website or IP address. It can work on any operating system that supports Python 2, Python 3, and Ruby. This tool includes various options such as DNS lookup, Whois lookup, port scanning, zone transfer, host finder, and reverse IP lookup, which help to gather critical information.
Some additional footprinting tools are listed below:
So far, we have discussed the importance of footprinting, various ways to perform the task, and the tools that help in its execution. Now, we will discuss footprinting countermeasures, i.e., the measures or actions taken to prevent or offset information disclosure. Some of the footprinting countermeasures are as follows:
- Restrict the employees’ access to social networking sites from the organization’s network
- Configure web servers to avoid information leakage
- Educate employees to use pseudonyms(假名) on blogs, groups, and forums
- Do not reveal critical information in press releases, annual reports, product catalogs(产品目录), and so on.
- Limit the amount of information that you are publishing on the website/Internet
- Use footprinting techniques to discover and remove any sensitive information publicly available
- Prevent search engines from caching a web page and use anonymous registration services
- Develop and enforce security policies such as information security policy, password policy, and so on, to regulate the information that employees can reveal to third parties
- Set apart internal and external DNS or use split DNS, and restrict zone transfer to authorized servers
- Disable directory listings in the web servers
- Conduct security awareness training periodically to educate employees about various social engineering tricks and risks
- Opt for privacy services on Whois lookup database
- Avoid domain-level cross-linking for critical assets
- Encrypt and password-protect sensitive information
- Do not enable protocols that are not required
- Always use TCP/IP and IPsec filters for defense-in-depth
- Configure IIS to avoid information disclosure through banner grabbing
- Hide the IP address and the related information by implementing VPN or keeping the server behind a secure proxy
- Request archive.org to delete the history of the website from the archive database
- Keep the domain name profile private
- Place critical documents such as business plans and proprietary documents offline to prevent exploitation
- Train employees to thwart social engineering techniques and attacks
- Sanitize(清理) the details provided to the Internet registrars to hide the direct contact details of the organization
- Disable the geo-tagging functionality on cameras to prevent geolocation tracking
- Avoid revealing one’s location or travel plans on social networking sites
- Turn-off geolocation access on all mobile devices when not required
- Ensure that no critical information such as strategic plans, product information, and sales projections is displayed on notice boards or walls