An Empirical Study of Real-world Polymorphic Code Injection Attacks
Michalis Polychronakis
FORTH-ICS, Greece, email: mikepo@ics.forth.gr
Kostas G. Anagnostakis
I2R, Singapore, email: kostas@i2r.a-star.edu.sg
Evangelos P. Markatos
FORTH-ICS, Greece, email: markatos@ics.forth.gr
Abstract:
1 Introduction
Despite considerable advances in host-level security hardening and network-level defenses, remote code injection attacks against network services persist as one of the most common methods for system compromise. Along with the more recently popularized client-side attacks that exploit vulnerabilities in users' software such as browsers and media players [15], remote code execution vulnerabilities continue to plague even the latest versions of popular OSes and server applications [2] and are effectively being exploited by malware, resulting in millions of infected hosts [3].
Motivated by the illicit financial gain against their victims, cyber-criminals constantly try to improve the effectiveness and evasiveness of their attacks, with the aim to compromise as many systems as possible and keep them under control for as long as possible. Code obfuscation and polymorphism [20] are among the most widely used evasion techniques employed by attackers to circumvent virus scanners and intrusion detection systems.
When polymorphism is applied to remote code injection attacks, the initial attack code is mutated so that every attack instance acquires a unique pattern, thereby making fingerprinting of the whole breed a challenge. The injected code--often dubbed shellcode--is the first piece of code that is executed after the instruction pointer of the vulnerable process has been hijacked, and carries out the first stage of the attack, which usually involves the download and execution of a malware binary on the compromised host. Polymorphic shellcode engines [10,7,21,16,4,1] create different mutations of the same initial shellcode by encrypting it with a different random key, and prepending to it a decryption routine that makes the code self-decrypting. Since the decryption code itself cannot be encrypted, advanced polymorphic encoders also mutate the exposed part of the shellcode using metamorphism [20].
Although the design and implementation of polymorphic shellcode has been covered extensively in the literature [8,18,7,16,6,13,14], and several research works have focused on the detection of polymorphic attacks [11,23,13,14], the actual prevalence and characteristics of real-world polymorphic attacks have not been studied to the same extent [12]. In this work, we present an analysis of more than 1.2 million polymorphic attacks against real Internet hosts--not honeypots--detected over the course of more than 20 months. The attacks were captured by monitoring the traffic of thousands of production systems in research and education networks using network-level emulation [13,14]. Nemu, our prototype implementation, uses a CPU emulator to dynamically analyze every potential instruction sequence in the inspected traffic and identify the execution behavior of self-decrypting shellcode.
Our study focuses on the attack activity in relation to the targeted network services, the structure of the polymorphic shellcode used, and the different operations performed by its actual payload. Besides common exploits against popular OS services associated with well known vulnerabilities, we witnessed sporadic attacks against a large number of less widely used services and third-party applications. At the same time, although the bulk of the attacks use naive encryption or polymorphism, and extensive sharing of code components is prevalent among different shellcode types, we observed a few attacks employing more sophisticated obfuscation schemes.
2 Network-level Emulation
We briefly describe the design and operation of nemu, the detector used for capturing the attacks. The interested reader is referred to our previous work [13,14] for a more thorough description and implementation details.
The principle behind network-level emulation is that the machine code interpretation of arbitrary data results to random code, which, when it is attempted to run on an actual CPU, usually crashes soon, e.g., due to an illegal instruction. In contrast, if a network request actually contains polymorphic shellcode, then the shellcode runs normally, exhibiting a certain detectable behavior.
Nemu inspects the client-initiated data of each network flow, which may contain malicious requests towards vulnerable services. Each input is mapped to a random memory location in the virtual address space of an IA-32 emulator, as shown in Fig. 1. The execution of self-decrypting shellcode is identified by two key runtime behavioral characteristics: the execution of some form of GetPC code, and the occurrence of several self references, i.e., read operations from the memory addresses of the input stream itself, as illustrated in Fig 1. The GetPC code is used by the shellcode for finding the absolute address of the injected code, which is mandatory for subsequently decrypting the encrypted payload, and involves the execution of an instruction from the call or fstenv instruction groups [13].
We should note that for all captured attacks, nemu was able to successfully decrypt the original shellcode, while so far has resulted to zero false positives.
3 Data Set