This is a non-comprehensive list of papers and tools dealing with automated unpacking. Please let me know if I’ve missed another technique or if I misunderstood any of the techniques below.
1. Ring0/Ring3 components, using manual unpacking and heuristics
OllyBonE
OllyBonE (Break on Execution) uses a Windows driver to prevent memory pages from being executed by overloading the user/supervisor bit and exploiting the separation of data TLB and instruction TLB in the X86 architecture, and an OllyDbg plug-in communicating with the driver. As such it is not an automatic unpacker and requires manual tagging of the pages in which the unpacked code is expected to be found.
Technology used: Windows driver to track write and execution behaviors
Handles unknown packers: no
Drawbacks: requires a priori knowledge of the memory location of the unpacked code, vulnerable to anti-debugging techniques, modification of the integrity of the host operating system due to the driver.
Code Available: yes, http://www.joestewart.org/ollybone/
OmniUnpack
OmniUnpack also relies on OllyBonE for identifying executed pages and invokes AV scanning before every “dangerous” system call. In addition, it incorporates two additional optimizations to reduce the total number of AV scans. First, it invokes an AV scan only when there is a control transfer to a dynamically modified page between the previous and current dangerous system calls. Second, whenever an AV scanner is invoked, it only scans those pages that are modified since the last dangerous system call.
Technology used: Windows driver to track write and execution behaviors
Handles unknown packers: yes
Drawbacks: the fact that it requires whole-binary scanning is incompatible with almost all existing commercial AV scanners.
Page-level tracking decreases the granularity of monitoring, often resulting in incorrectly detecting unpacking stages.
Code Available: no
Justin (Just-in-Time AV Scanning)
The key idea of Justin is to detect the end of unpacking during the execution of a packed binary and invoke AV scanning at that instant. For accurate end-to-unpacking detection, Justin incorporates the following heuristics: Dirty Page Execution, Unpacker Memory Avoidance, Stack Pointer Check and Command-Line Argument Access. Justin includes several counter-measures that are designed to be against evasion techniques that existing packers use.
Technology used: Justin leverages NX support in modern Intel X86 processors and Windows OS to detect pages that are executed at run time.
Handles unknown packers: yes
Drawbacks: Not all current processor architecture support NX bit (e.g. Athlon 64, Opteron, Itanium, and IA-64 support); some heuristic generates false negatives.
Code Available: no
Dream of Every Reverser/Generic Unpacker:
It is a Windows driver used to hook ring 3 memory accesses. It is used in a project called Generic Unpacker by the same author to find the original entry point. The tool then tries to find all import references, dumps the file and fixes the imports. It is reported to work against UPX, FSG and AsPack, but not against more complex packers.
Technology used: Windows driver to hook user mode memory access
Handles unknown packers; no.
Drawbacks: requires a priori knowledge of the memory location of the unpacked code, modification of the integrity of the host operating system due to the driver.
Code Available: yes, http://deroko.phearless.org/GenericUnpacker.rar
RL!Depacker
Technology used: No description for this one, however it looks similar to Dream of Every Reverser / Generic Unpacker. RL!Depacker is tested with 101+ packers.
Drawbacks: can unpack ONLY packers that do not use IAT redirection, that don’t steal APIs and which fill out IAT table in correct order. This unpacker does NOT work with AV/FW software (e.g. Kaspersky) which hooks LoadLibrary and GetProcAddress in ring3.
Code Available: yes, http://ap0x.jezgra.net/RL!dePacker.rar
QuickUnpack
Technology used: Again, no real description, but it looks similar to RL!Depacker and DOER / Generic Unpacker. It is a scriptable engine using a debugging API. It is reported to work against 60+ simple packers.
Drawbacks: systems running Kaspersky not supported
Code Available: yes, http://rapidshare.com/files/104264619/qunpack21.zip
Original Site (in Russian): http://qunpack.ahteam.org/
Universal PE Unpacker
This is an IDA Pro plug-in, using the IDA Pro Debugger interface. It waits for the packer to call GetProcAddress and then activates single-stepping mode until EIP is in a predefined range (an estimate for the OEP). It only works well against UPX, Morphine, Aspack, FSG and MEW (according to the authors of Renovo).
Technology used: Debugging and heuristics.
Handles unknown packers: no, needs an approximation of the OEP and assumes that the unpacker will call GetProcAddress before calling the original code.
Drawbacks: not fully automatic, very vulnerable to debugger detection, does not necessarily work against all packers or self-modifying code.
Code Available: yes, since IDA Pro 4.9
Original Site: http://www.hex-rays.com/idapro/unpack_pe/unpacking.pdf
2. Instruction-level analysis, comparison between written addresses and executed addresses
Renovo
Built on TEMU (BitBlaze), it uses full system emulation to record memory writes (and mark those memory locations as dirty). Each time a new basic block is executed; if it contains a dirty memory location a hidden layer has been found. Cost: 8 times slower than normal execution. It seems to unpack everything correctly except Armadillon, Obsidium (likely because the executables are not compatible with the Renovo's emulation engine). It seems to only obtain partial results against Themida with the VM option on(due to VM virtualization).
Technology used: Full system emulation.
Handles unknown packers: yes.
Drawbacks: order of magnitude slowdown, easily evaded by anti-emulation and anti-memory dumping techniques.
Code Available: no
Original Site: http://www.andrew.cmu.edu/user/ppoosank/papers/renovo.pdf
Azure
Paul Royal’s solution, named after BluePill because it is based on KVM, a Linux-based hypervisor. It uses Intel’s VT extension to trace the target process (at the instruction-level), by setting the trap flag and intercepting the resulting exception. The memory writes are then recorded and compared to the address of the current instruction. According to the paper, it handles every packer correctly (including Armadillo, Obsidium and Themida VM).
Technology used: Hardware assisted virtualization and virtual machine introspection.
Handles unknown packers: yes.
Drawbacks: detection of the hypervisor. Slowdown
Code Available: yes, http://blackhat.com/presentations/bh-usa-08/Royal/Royal_Extras.zip.
Original Site:
Saffron
Developed by Danny Quist and Valsmith, a first version uses Intel PIN to dynamically instrument the analyzed code. It actually inserts instructions in the code flow, allowing lightweight fine-grained control (no need for emulation or virtualization). If execution jumps to previously written memory, the target memory becomes a candidate original entry point and the memory is dumped to a file. But it modifies the integrity of the packer. A second version modifies the page fault handler of Windows and traps when a written memory page is executed. It has mixed results with Molebox, Themida, Obsidium, and doesn’t handle Armadillo correctly (according to Paul Royal).
Technology used: Dynamic instrumentation; Pagefault handling (with a kernel component in the host operating system).
Handles unknown packers: yes.
Drawbacks: modifies the integrity of the code (with DI) and of the host operating system. It must not work in a virtual machine. The dynamic instrumentation is very slow. The memory monitoring of the pagefault handler is coarse-grained (pages are aligned on a 4k boundary), and therefore some memory access can go unnoticed.
Code Available: dynamic instrumentation available, what about the driver ?
Original Site: http://www.offensivecomputing.net/?q=node/492
Pandora's Bochs
The main task of Pandora's Bochs was to modify the Bochs PC emulator(a pure software virtual machine) so that it can unobtrusively monitor execution of runtime-packed malware samples within the emulated environment, determine when unpacking is complete, and store a memory dump of the monitored process as well as additional information gathered during the unpacking process for further analysis. Interestingly, the assumptions about the program are stated explicitly (which is a GOOD thing): the unpacking does not involve multiple processes, it does not happen in kernel mode; the unpacked code is reached through a branch instruction (not a fall-through edge), etc… Another interesting point in this approach is that it uses no component in the guest OS (as opposed to Renovo for example), all the information is retrieved from outside the matrix (as with Azure).
Technology used: Full system emulation based on Bochs.
Handles unknown packers: yes.
Drawbacks: As stated in the paper the limitations are speed, compatibility (not all packed samples seemed to run under Bochs), detection of OEP and reconstruction of imports sometimes failed.
Code Available: http://damogran.de/blog/archives/21-To-release,-or-not-to-release-....html
Original Site:
http://pi1.informatik.uni-mannheim.de/filepool/theses/diplomarbeit-2008-boehne.pdf
3. Comparison with static disassembly or disk image
PolyUnpack
The idea behind PolyUnpack is to address the fundamental nature of unpacking, which is runtime code generation. To identify code that has been generated at runtime, PolyUnpack uses a conceptually elegant technique: it first statically analyses the program to build a map of statically accessible code, and then traces the execution of the program. The dynamically intercepted instructions are compared with the static disassembly; if they do not appear in the static disassembly then they have been generated at runtime.
Technology used: comparison between static disassembly and dynamic tracing. The dynamic trace is extracted with single-step debugging APIs.
Handles unknown packers: yes
Drawbacks: vulnerable to debugger detection. Note that this is a limitation of the implementation, not of the concept.
Code Available: http://polyunpack.cc.gt.atl.ga.us/polyunpack.zip
Original Site: http://www.acsac.org/2006/papers/122.pdf
Secure and Advanced Unpacking
The idea developed by Sebastien Josse is to use full system emulation (based on QEMU) and to compare the basic blocks that are going to be executed by the virtual CPU with the equivalent address in the file image of the executable. If the memory and the disk version differ, it means that the code has been generated on the fly and therefore a hidden layer has been found. Josse then proposes techniques to rebuild a fully functional executable based on the memory dump. This technique seems to work well (but sometimes requires human intervention) against several packers, including Armadillo, ASProtect, PEtite, UPX, yC…
Technology used: Full system emulation, comparison between memory images and disk images.
Handles unknown packers: yes, manual intervention might be required in some cases.
Drawbacks: slowdown due to the full system emulation, full reconstruction of the unpacked program is not always possible.
Code Available: I couldn’t find it
Original Site: http://www.springerlink.com/content/5135489032458wm2/
4. Using dynamic binary instrumentation for analyzing packer binary code
Saffron
Saffron also uses dynamic binary instrumentation technique (it uses Pin framework as the dynamic binary instrumentation framework) to monitor program execution together with monitoring memory writes.
MmmBop
MmmBop, using dynamic binary instrumentation techniques to find original entry point (OEP) and stop the execution at its place, is able to bypass the protection layers equipped with anti-reversing tricks, obfuscated and self-modifying code. MmmBop is completely user land application and it does not interfere with the stability of operating system.
Technology used: Dynamic Binary Instrumentation; Hook KiUserExceptionDispatcher and NtContinue to make sure MmmBop will retain control before executing new basic block.
Drawbacks: It's unavoidable that most of the dynamic binary instrumentation solutions need to modify target process address space. It also doesn’t work in a virtual machine.
Code Available: I couldn’t find it
Original Site: http://piotrbania.com/all/articles/pbania-dbi-unpacking2009.pdf
Paradyn Project
Paradyn Project, a very similar approach to MmmBop, use dynamic binary instrumentation to observe control transfers that may lead to new code (it uses Dyninst for this purpose).
Technology used: Dynamic Binary Instrumentation
Drawbacks: It appears to be directed for UNIX operating systems and currently it cannot handle self-modifying code.
Code Available: I couldn’t find it
Original Site: http://pages.cs.wisc.edu/~paradyn/
5. Other techniques
Malware Normalization
Unpacking by malware normalization consists of two basic steps:
First, execute the program in a controlled environment (QEMU) to identify the control-flow instruction that transfers control into the generated-code area.
Second, with the information (memory writes and memory execution flow) captured in the previous step, construct a normalized program that contains the generated code. Using the captured data, an equivalent program can be constructed that does not contain the code generator.
Technology used: No implementation details about the modifications that were made to QEMU are given and the software has not been made available in binary or source code form.
Drawbacks: The unpacked executable is not ready-to-run (the import table may not be recovered). The OEP of packed executables is not always correctly determined,
Code Available: I couldn’t find it
Original Site: http://www.cs.wisc.edu/wisa/papers/tr1539/tr1539.pdf
Dynamic Translation (DT)
The new emerging packers even extensively employ virtual machine protection technology. Packers like Themida and VMProtect implement their own virtual machine which transforms parts of the original code. The result is that the original host code no longer exists anywhere, making it hard to analyze and essentially impossible to reverse. The DT method relies on disassembling the analyzed code dynamically and performing just-in-time compilation targeted for the host CPU, with little degradation in execution speed, compared to the original code. This provides the same flexibility as emulation but performance in terms of speed is dramatically improved.
Technology used: No implementation details and the software has not been made available in binary or source code form.
Drawbacks: I haven’t find any unpacker using Dynamic Translation. Maybe it’s due to the implementation difficulties.
Code Available: no
Original Site: Defeating polymorphism: Beyond emulation. In Virus Bulletin Conference, Oct 2005.
Hump-and-dump
It proposes a rather interesting and new effective method of unpacking. The method relies solely on an execution trace of the instruction pointer (EIP on x86, collected by IDA trace plug-in). It creates a histogram of the addresses of executed instructions ordered by the last time an address is executed. The algorithm is based upon the dual observation that
(a) Even in a packed program, the OEP bytes are almost always only executed once, and
(b) Most packers unpack the original program to an area of memory which has not been previously executed.
This kind of histogram clearly shows the difference/boundary between the unpacking activity and the original executable file. Decryption, decompression and copying appear as large spikes at the start of the histogram, followed by a flat section of height one, which is usually the OEP area.
Unpacking trace address histogram
Technology used: Tracing an execution of EIP and ordered address execution histogram
Handles unknown packers: yes
Drawbacks: OEP is not enough accurate; two thresholds of the “hump” and “flat section” need to be tuned.
Code Available: no
Original Site: http://www.datasecurity-event.com/uploads/hump_dump.pdf
Eureka
Eureka distinguishes itself from existing unpacking systems in several important ways. First, it introduces a new methodology for automated malware unpacking, using coarse-grained NTDLL system call monitoring. The system provides support for both statistical and heuristic-based unpacking triggers and allows child process monitoring. In particular, Eureka’s statistical bigram-based triggering algorithm offers a highly successful methodology for providing an informed basis from which to execute process image dumping. Eureka also includes an API resolution system that is capable of overcoming several contemporary malware API address obfuscation strategies.
Technology used: Hook SSTD; Mining statistical patterns in x86 code; API resolution
Handles unknown packers: yes
Drawbacks: vulnerable to VM detection and easily evaded by attacking Eureka’s heuristics
Code Available: no
Original Site: http://eureka.cyber-ta.org/