The Art Of ELF: Analysis and Exploitations

New systems make attackers life hard and common exploitation techniques get harder to reproduce. The purpose of this article is to be very general on mitigation techniques and to cover attacks on x32 as a reference to x64 architectures to stick with the new constraints of today.

Here, you will find the first step which is an ELF format file analysis. After that we will speak about the protections and ways to bypass them. To finish, we will introduce the x86_64 that makes things more difficult for nowadays exploitations.

Pre-requisites:

Basics in Linux, asm x86,
a good understanding of buffer overflows, format string exploitations, heap overflows,
0x00900dc0ff33,
Ubuntu 11.04 on x86_64.
a default song…: Zeads dead – Paradise Circus (Massive attack Remix)

Here is the contents:

The ELF format
- A standard
- Where is it used?
- ELF Layout
Dissecting the ELF
- The “magic” field
- Reversing ELF’s header
- Sections
- Relocations
- Program Headers
Exploitations
- Old is always better (for attackers)
- Nonexecutable stack
- Address Space Layout Randomization
- Brute-force
- Return-to-registers
- Stack Canary
- RELRO
The x86_64 fact and current systems hardening
References & Acknowledgements

The ELF format

A standard

Replacing the COFF and “a.out” formats that Linux previously used, ELF (Executable and Linking Format) increased flexibility. Indeed, when shared libraries are difficult to create and dynamically loading a shared library is cumbersome with “a.out” format, the ELF format has come with these two benefits[1]:

It is much simpler to make shared libraries,
It make dynamic loading and has comes with other suggestions for dynamic loading have included super-fast MUDs (Multi-User Domains also known as “Multi-User Dungeon”), where extra code could be compiled and linked into the running executable without having to stop and restart the program.

This format has been selected by the Tool Interface Standards committee (TIS) as a standard for portable object files for a variety of (Unix-Like) operating systems.

Where is it used?

Actually ELFs cover object files (.o), shared libraries (.so) and is also used for loadable kernel modules. As follows in listing 1, you can see also which systems[4] have adopted the ELF format:

Listing 1. Applications of ELF format

ELF Layout

An ELF as at least two headers: the ELF header (Elf32_Ehdr/Elf64_Ehdr struct) and the program header (Elf32_Phdr/struct Elf64_Phdr struct)[5]. But there is also a header which is called the “section header” (Elf32_Shdr/struct Elf64_Shdr struct) and which describes section like: .text, .data, .bss and so on (we will describe them later).

Figure 1. ELF Layout – execution view linking view
(source: ELF Format specifications[2])

As you can see in figure 1, there is two views. Indeed, the linking view is partitioned by sections and is used when program or library is linked. The sections contain some object files informations like: datas, instructions, relocation informations, symbols, debugging informations, and so on.
From the other part, the execution view, which is partitioned by segments, is used during a program execution. The program header as shown in the left, contains informations for the kernel on how to start the program, will walk through segments and load them into memory (mmap).

Dissecting the ELF

The “magic” field

In Linux forensic, it is common to use the “file” command to the type of a particular file, as follows:

    fluxiux@nyannyan:~$ file /bin/ls  
   
/bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, stripped

Now lets focus on the “ELF” string. As you had probably noticed using “hexdump” on any ELF file (like /bin/ls for example), the file starts with 0x7f then there are three next bytes for the encoded string “ELF”:

    fluxiux@nyannyan:~$ hd -n 16 /bin/ls  
   
00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|

The first 16 bytes represent the elf “magic” field, which is a way to identify an ELF file. But if bytes 1, 2 and 3 represent the encoded string “ELF”, what represent bytes 4, 5, 6, 7, 8, 9?
Just have a look at “elf.h” source code:

    #define EI_CLASS        4               /* File class byte index */ 
   
#define ELFCLASSNONE    0               /* Invalid class */ 
   
#define ELFCLASS32      1               /* 32-bit objects */ 
   
#define ELFCLASS64      2               /* 64-bit objects */ 
   
#define ELFCLASSNUM     3 
   
#define EI_DATA         5               /* Data encoding byte index */ 
   
#define ELFDATANONE     0               /* Invalid data encoding */ 
   
#define ELFDATA2LSB     1               /* 2's complement, little endian */ 
   
#define ELFDATA2MSB     2               /* 2's complement, big endian */ 
   
#define ELFDATANUM      3 
   
#define EI_VERSION      6               /* File version byte index */ 
   
                                        /* Value must be EV_CURRENT */ 
   
#define EI_OSABI        7               /* OS ABI identification */ 
   
#define ELFOSABI_NONE           0       /* UNIX System V ABI */ 
   
#define ELFOSABI_SYSV           0       /* Alias.  */ 
   
#define ELFOSABI_HPUX           1       /* HP-UX */ 
   
#define ELFOSABI_NETBSD         2       /* NetBSD.  */ 
   
#define ELFOSABI_LINUX          3       /* Linux.  */ 
   
#define ELFOSABI_SOLARIS        6       /* Sun Solaris.  */ 
   
#define ELFOSABI_AIX            7       /* IBM AIX.  */ 
   
#define ELFOSABI_IRIX           8       /* SGI Irix.  */ 
   
#define ELFOSABI_FREEBSD        9       /* FreeBSD.  */ 
   
#define ELFOSABI_TRU64          10      /* Compaq TRU64 UNIX.  */ 
   
#define ELFOSABI_MODESTO        11      /* Novell Modesto.  */ 
   
#define ELFOSABI_OPENBSD        12      /* OpenBSD.  */ 
   
#define ELFOSABI_ARM_AEABI      64      /* ARM EABI */ 
   
#define ELFOSABI_ARM            97      /* ARM */ 
   
#define ELFOSABI_STANDALONE     255     /* Standalone (embedded) application */ 
   
#define EI_ABIVERSION   8               /* ABI version */ 
   
#define EI_PAD          9               /* Byte index of padding bytes */

We can affirmatively say, that our file is an ELF of class 64, encoded in little endian with a UNIX System V ABI standard and has 0 padding bytes. By the way, if you did not expected yet, we have compared to the structure we have observed here the “e_ident” of “Elf64_Ehdr” structure.

Reversing ELF’s header

To begin the complete dissection, let’s just start making a simple binary file as follows:

 
   #include <stdio.h>  
   
main 
   ( 
   )  
   
   {  
   
   printf 
   ( 
   "huhu la charrue" 
   ) 
   ;  
   
   }

And produce an ELF before linking it:

    gcc toto.c -c 
  

We will use now one of the most used tool as “objdump” to analysis ELF files which is readelf from binutils to display every fields. That will simplify our analysis but if you are interested for dissecting ELF files yourself, you can look for libelf and we will also talk about some interesting libraries in Python to do it much more quickly.

Now, we observe the ELF header:

    fluxiux@nyannyan:~$ readelf -h toto  
   
ELF Header:  
   
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00  
   
  Class:                             ELF64  
   
  Data:                              2's complement, little endian  
   
  Version:                           1 (current)  
   
  OS/ABI:                            UNIX - System V  
   
  ABI Version:                       0  
   
  Type:                              REL (Relocatable file)  
   
  Machine:                           Advanced Micro Devices X86-64  
   
  Version:                           0x1  
   
  Entry point address:               0x0  
   
  Start of program headers:          0 (bytes into file)  
   
  Start of section headers:          312 (bytes into file)  
   
  Flags:                             0x0  
   
  Size of this header:               64 (bytes)  
   
  Size of program headers:           0 (bytes)  
   
  Number of program headers:         0  
   
  Size of section headers:           64 (bytes)  
   
  Number of section headers:         13  
   
  Section header string table index: 10

The result seems to be very implicit, but now just let’s try to identify these field using our lovely hexdump tool (in “warrior forensic style!” or not):

    fluxiux@nyannyan:~$ hd -n 64 toto  
   
00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|  
   
00000010  01 00 3e 00 01 00 00 00  00 00 00 00 00 00 00 00  |..>.............|  
   
00000020  00 00 00 00 00 00 00 00  38 01 00 00 00 00 00 00  |........8.......|  
   
00000030  00 00 00 00 40 00 00 00  00 00 40 00 0d 00 0a 00  |....@.....@.....|  
   
00000040

We already know the first line, but what can say about the three others? As you can see, in the second line, the first two bytes represent the “e_type”. Indeed, if you look at “elf.h” file, you could observe that “01 00” in little-Indian, means: “Relocatable file”.

Now look at the two next bytes. We have “3e 00” that is equivalent to 62 in decimal (3*16¹ + c = 62), which defines the AMD x86-64 architecture:

    #define EM_X86_64       62              /* AMD x86-64 architecture */ 
  

After we have the “e_version” field with “01 00” as a value for “Current version”:

    /* Legal values for e_version (version).  */ 
   
#define EV_NONE         0               /* Invalid ELF version */ 
   
#define EV_CURRENT      1               /* Current version */ 
   
#define EV_NUM          2

Bytes 24 to 26 indicate the entry point address (which is 0x0 while it is not linked) . And we finish with two more most important think that we will talk about in this article :

Program Headers with 6 headers, starting at byte 64 (byte 32 and 33 in hexdump),
section headers with 29 headers, starting at byte (byte 40 – 43 in hexdump).

For the rest, we will use readelf and I will let you finish the header part by yourself.

Sections

Let’s just see “toto.o” sections with the following command:

    fluxiux@nyanyan:~$ readelf -S toto.o  
   
There are 13 section headers, starting at offset 0x138:  
   
Section Headers:  
   
  [Nr] Name              Type             Address           Offset  
   
       Size              EntSize          Flags  Link  Info  Align  
   
  [ 0]                   NULL             0000000000000000  00000000  
   
       0000000000000000  0000000000000000           0     0     0  
   
  [ 1] .text             PROGBITS         0000000000000000  00000040  
   
       0000000000000018  0000000000000000  AX       0     0     4  
   
  [ 2] .rela.text        RELA             0000000000000000  00000598  
   
       0000000000000030  0000000000000018          11     1     8  
   
  [ 3] .data             PROGBITS         0000000000000000  00000058  
   
       0000000000000000  0000000000000000  WA       0     0     4  
   
  [ 4] .bss              NOBITS           0000000000000000  00000058  
   
       0000000000000000  0000000000000000  WA       0     0     4  
   
  [ 5] .rodata           PROGBITS         0000000000000000  00000058  
   
       0000000000000010  0000000000000000   A       0     0     1  
   
  [ 6] .comment          PROGBITS         0000000000000000  00000068  
   
       000000000000002b  0000000000000001  MS       0     0     1  
   
  [ 7] .note.GNU-stack   PROGBITS         0000000000000000  00000093  
   
       0000000000000000  0000000000000000           0     0     1  
   
  [ 8] .eh_frame         PROGBITS         0000000000000000  00000098  
   
       0000000000000038  0000000000000000   A       0     0     8  
   
  [ 9] .rela.eh_frame    RELA             0000000000000000  000005c8  
   
       0000000000000018  0000000000000018          11     8     8  
   
  [10] .shstrtab         STRTAB           0000000000000000  000000d0  
   
       0000000000000061  0000000000000000           0     0     1  
   
  [11] .symtab           SYMTAB           0000000000000000  00000478  
   
       0000000000000108  0000000000000018          12     9     8  
   
  [12] .strtab           STRTAB           0000000000000000  00000580  
   
       0000000000000014  0000000000000000           0     0     1  
   
Key to Flags:  
   
  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)  
   
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)  
   
  O (extra OS processing required) o (OS specific), p (processor specific)

As you can see, there is a lot of sections which are part of the ELF64_Shdr:

Code sections (.text),
data section (.data, .bss, .rodata),
the .comment which is used to store extra informations,
relocation tables (.rela.*),
symbol tables (.symtab),
section String Tables (.shstrtab) which stores the name of each section,
string tables (.strtab).

The address column normally shows where sections should be loaded into virtual memory, but this was not filled in for each section. The reason is that we did not linked it yet, so we will do that:

    fluxiux@nyannyan:~$ gcc toto.o -o toto  
   
fluxiux@nyannyan:~$ readelf -S toto  
   
There are 30 section headers, starting at offset 0x1178:  
   
Section Headers:  
   
  [Nr] Name              Type             Address           Offset  
   
       Size              EntSize          Flags  Link  Info  Align  
   
  [ 0]                   NULL             0000000000000000  00000000  
   
       0000000000000000  0000000000000000           0     0     0  
   
  [ 1] .interp           PROGBITS         0000000000400238  00000238  
   
       000000000000001c  0000000000000000   A       0     0     1  
   
  [ 2] .note.ABI-tag     NOTE             0000000000400254  00000254  
   
       0000000000000020  0000000000000000   A       0     0     4  
   
  [ 3] .note.gnu.build-i NOTE             0000000000400274  00000274  
   
       0000000000000024  0000000000000000   A       0     0     4  
   
  [ 4] .gnu.hash         GNU_HASH         0000000000400298  00000298  
   
       000000000000001c  0000000000000000   A       5     0     8  
   
  [ 5] .dynsym           DYNSYM           00000000004002b8  000002b8  
   
       0000000000000060  0000000000000018   A       6     1     8  
   
  [ 6] .dynstr           STRTAB           0000000000400318  00000318  
   
       000000000000003f  0000000000000000   A       0     0     1  
   
  [ 7] .gnu.version      VERSYM           0000000000400358  00000358  
   
       0000000000000008  0000000000000002   A       5     0     2  
   
  [ 8] .gnu.version_r    VERNEED          0000000000400360  00000360  
   
       0000000000000020  0000000000000000   A       6     1     8  
   
  [ 9] .rela.dyn         RELA             0000000000400380  00000380  
   
       0000000000000018  0000000000000018   A       5     0     8  
   
  [10] .rela.plt         RELA             0000000000400398  00000398  
   
       0000000000000030  0000000000000018   A       5    12     8  
   
  [11] .init             PROGBITS         00000000004003c8  000003c8  
   
       0000000000000018  0000000000000000  AX       0     0     4  
   
  [12] .plt              PROGBITS         00000000004003e0  000003e0  
   
       0000000000000030  0000000000000010  AX       0     0     4  
   
  [13] .text             PROGBITS         0000000000400410  00000410  
   
       00000000000001d8  0000000000000000  AX       0     0     16  
   
  [14] .fini             PROGBITS         00000000004005e8  000005e8  
   
       000000000000000e  0000000000000000  AX       0     0     4  
   
  [15] .rodata           PROGBITS         00000000004005f8  000005f8  
   
       0000000000000014  0000000000000000   A       0     0     4  
   
  [16] .eh_frame_hdr     PROGBITS         000000000040060c  0000060c  
   
       0000000000000024  0000000000000000   A       0     0     4  
   
  [17] .eh_frame         PROGBITS         0000000000400630  00000630  
   
       000000000000007c  0000000000000000   A       0     0     8  
   
  [18] .ctors            PROGBITS         0000000000600e28  00000e28  
   
       0000000000000010  0000000000000000  WA       0     0     8  
   
  [19] .dtors            PROGBITS         0000000000600e38  00000e38  
   
       0000000000000010  0000000000000000  WA       0     0     8  
   
  [20] .jcr              PROGBITS         0000000000600e48  00000e48  
   
       0000000000000008  0000000000000000  WA       0     0     8  
   
  [21] .dynamic          DYNAMIC          0000000000600e50  00000e50  
   
       0000000000000190  0000000000000010  WA       6     0     8  
   
  [22] .got              PROGBITS         0000000000600fe0  00000fe0  
   
       0000000000000008  0000000000000008  WA       0     0     8  
   
  [23] .got.plt          PROGBITS         0000000000600fe8  00000fe8  
   
       0000000000000028  0000000000000008  WA       0     0     8  
   
  [24] .data             PROGBITS         0000000000601010  00001010  
   
       0000000000000010  0000000000000000  WA       0     0     8  
   
  [25] .bss              NOBITS           0000000000601020  00001020  
   
       0000000000000010  0000000000000000  WA       0     0     8  
   
  [26] .comment          PROGBITS         0000000000000000  00001020  
   
       0000000000000054  0000000000000001  MS       0     0     1  
   
  [27] .shstrtab         STRTAB           0000000000000000  00001074  
   
       00000000000000fe  0000000000000000           0     0     1  
   
  [28] .symtab           SYMTAB           0000000000000000  000018f8  
   
       0000000000000600  0000000000000018          29    46     8  
   
  [29] .strtab           STRTAB           0000000000000000  00001ef8  
   
       00000000000001f2  0000000000000000           0     0     1

Wow! Some new sections appeared:

.interp which holds pathname of the program interpreter,
code sections (.plt, .init, .fini),
table of imported/exported symbols (.dynsym),
dynamic names table (.dynstr),
dynamic hash table (.hash),
new relocation tables (.rela.*),
constructor and Destructor tables (.ctors, .dtors),
section reserved for dynamic binaries (.got, .dynamic, .plt).

After the address column, you have the offset within the file of the section, then you have the size in byte of each section, the section header size in byte, the required alignment, the Flags (Read, Write, Execute), and so on.

In this article, we will discover some important sections to target for any attack.

Relocations

The relocation is made to modify the memory image of mapped segments to make them executable. As you saw before, there are some “.rela.*” sections which are used to show where to patch the memory and how. Let’s look the different relocations using our favorite tool “readelf”:

    fluxiux@nyannyan:~$ readelf -r toto  
   
Relocation section '.rela.dyn' at offset 0x380 contains 1 entries:  
   
  Offset          Info           Type           Sym. Value    Sym. Name + Addend  
   
000000600fe0  000200000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0  
   
Relocation section '.rela.plt' at offset 0x398 contains 2 entries:  
   
  Offset          Info           Type           Sym. Value    Sym. Name + Addend  
   
000000601000  000100000007 R_X86_64_JUMP_SLO 0000000000000000 printf + 0  
   
000000601008  000300000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0

For example, it means for “printf” we need to patch the offset 0x000000600fe0 from the beginning of the .plt section.

For more informations, you have also a description of relocation types in “elf.h”:

     /* x86-64 relocation types, taken from asm-x86_64/elf.h */ 
   
#define R_X86_64_NONE           0       /* No reloc */ 
   
[...] 
   
#define R_X86_64_GLOB_DAT       6       /* Create GOT entry */ 
   
#define R_X86_64_JUMP_SLOT      7       /* Create PLT entry */ 
   
[...]

Program Headers

The section header table is not loaded into memory, because the kernel nor the dynamic loader will be able to use that table. To load a file into memory, program headers are used to provide informatios that are required:

    fluxiux@nyanyan:~$ readelf -W -l toto  
   
Elf file type is EXEC (Executable file)  
   
Entry point 0x400410  
   
There are 9 program headers, starting at offset 64  
   
Program Headers:  
   
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align  
   
  PHDR           0x000040 0x0000000000400040 0x0000000000400040 0x0001f8 0x0001f8 R E 0x8  
   
  INTERP         0x000238 0x0000000000400238 0x0000000000400238 0x00001c 0x00001c R   0x1  
   
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]  
   
  LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x0006ac 0x0006ac R E 0x200000  
   
  LOAD           0x000e28 0x0000000000600e28 0x0000000000600e28 0x0001f8 0x000208 RW  0x200000  
   
  DYNAMIC        0x000e50 0x0000000000600e50 0x0000000000600e50 0x000190 0x000190 RW  0x8  
   
  NOTE           0x000254 0x0000000000400254 0x0000000000400254 0x000044 0x000044 R   0x4  
   
  GNU_EH_FRAME   0x00060c 0x000000000040060c 0x000000000040060c 0x000024 0x000024 R   0x4  
   
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x8  
   
  GNU_RELRO      0x000e28 0x0000000000600e28 0x0000000000600e28 0x0001d8 0x0001d8 R   0x1  
   
 Section to Segment mapping:  
   
  Segment Sections...  
   
   00      
   
   01     .interp  
   
   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame  
   
   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss  
   
   04     .dynamic  
   
   05     .note.ABI-tag .note.gnu.build-id  
   
   06     .eh_frame_hdr  
   
   07      
   
   08     .ctors .dtors .jcr .dynamic .got

As you can see, each program header corresponds to one segment where you can find sections into it. But how does it work?

In the beginning, when the kernel sees the INTERP segment, it loads first the LOAD segments to the specified virtual addresses, then load segments from program interpreter (/lib64/ld-linux-x86-64.so.2) and jumps to interpreter’s entry point. After that, the loader gets the control and loads libraries specified in LD_PRELOAD and also DYNAMIC segments of the executable that are needed:

    fluxiux@nyannyan:~$ readelf -d toto  
   
Dynamic section at offset 0xe50 contains 20 entries:  
   
  Tag        Type                         Name/Value  
   
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

After relocations, the loader invokes all libraries INIT function and then jumps to executable’s entry point.

In static, there is less thinks to say because the kernel only loads LOAD segments to the virtual addresses and then jumps to the entry points (easy eh?).

For some more details (I think), you can see an old but very good article published in Linux Journal #13 about ELF dissection by Eric Youngdale[6].

Exploitation

Old is always better (for attackers)

Once upon a the time, you where at home and waiting for the rain to stop. As always you “googled” for some interesting informations (of course!) and you found a kind of bible: Smashing the stack for fun and Profit[7].
Identifying the stack address, putting your shellcode at the beginning, adding some padding and rewriting the EIP, you could see that we can execute anything we want while exploiting a stack overflow. But times have changed, and you’re now confronted to canaris, ASLR (Address Space Layout Randomization), no executable stack, RELRO (read-only relocations), PIE support, binary-to-text encoding, and so on.

Nonexecutable stack

To make the stack nonexecutable, we use the bit NX (No eXecute for AMD) or bit XD (eXecute Disable for Intel). In figure 2, you could see that it matches with the most significant bit of a 64-bit Page Table Entry:

Figure 2 – 64-bit Page Table Entry
(Source : A hardware-enforced BOF protection )

So trying to exploit a stack based overflow, you should be surprised by the fact your shellcode doesn’t produce what you expected, and that’s the power of the bit NX (NX = 0 → Execute, NX = 1 → No eXecute).

Using “readlef -l ” you can see if the stack is executable or not :

      GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000  
   
                 0x0000000000000000 0x0000000000000000  RW     8

As you can see, the only flags we got is the Read and Write ones. You can disable the eXecute flag using “execstack -s [binaryfile]” and see the difference (RWE).

To bypass it, we can use a method called “Return-into-libc”. Endeed, we know that any program that includes libc will have access to its shared functions (such as printf, exit, and so on), and we can execute “system(“/bin/sh”)” to get a shell.

First, we fill the vulnerable buffer with some junk data up to EIP (“AAAAAAAAAHH…”! is great). After that, we have to find “system()” function, but if we want to exit the program properly, the “exit()” will be also needed (using gdb):

    (gdb) r main  
   
Starting program: /home/fluxius/toto main  
   
huhu la charrue  
   
Program exited with code 017.  
   
(gdb) p system  
   
$1 = {<text variable, no debug info>} 0x7ffff6b8a134 <system>  
   
(gdb) p exit  
   
$2 = {<text variable, no debug info>} 0x7ffff6b81890 <exit>

Then, we overwrite the return address with system() function’s address and follow it with the exit() function’s address. To finish, we put the address of “/bin/sh” (that you can retrieve from a memcmp() or an environment variable).

Inject = [junk][system()][exit()][”/bin/sh”]

Note: NX bit is only available in Physical Address Extension (PAE), but can be emulated by PaX or ExecShield.

Moreover, we will see after on x86_64 platforms that “return-into-libc” doesn’t work because of the ABI specifications[8], and that’s probably a problem you’ve already encountered.

Address Space Layout Randomization

To avoid attackers to execute a dangerous shellcode, people has created a concept named “ASLR” (Address Space Layout Randomization). Indeed, it is a technique to arrange the position of the stack, heap, text, vdso, shared libraries and the base address of the executable (when builded with Position-independent executable support). So if you try to execute any shellcode at a saved position, you’ll observe a little fail, because the shellcode isn’t executed (or you are very lucky) and you get the classic error for segmentation faults as we did not ended properly.

When performing a stack overflow for example, you could disable ASLR changing the current level to “0”:

    # echo 0 > /proc/sys/kernel/randomize_va_space 
  

Or there is another trick (proposed by “perror), that does not require “root” privileges, using “setarch” to change reported architecture in a new program environnement and setting personality flags:

    $ setarch `uname -m` -R /bin/bash 
  

But it’s not quite fun, is it? So, attackers have found some ways to bypass this kind of technique. Indeed, in older kernels, they saw the ESP points to the stack, and of course, the buffer is on the stack too. A technique using linux-gate’s instructions, that were static before the kernel 2.6.18, was used to retrieve the address of any interesting pattern “\xff\xe4” (“jump esp” on x86) in memory. Other techniques to bypass ASLR exist like Brute-force.

Brute-force

Thinking about exec() family functions, we can use “execl” to replace the current process image with a new process image. Let’s make a simple code to observe the randomization:

    main 
   ( 
   )  
   
 
   {  
   
         
   char buffer 
   [ 
   100 
   ] 
   ;  
   
         
   printf 
   ( 
   "Buffer address: %p\n" 
   ,  
   &buffer 
   ) 
   ;  
   
 
   } 
  

If ASLR is enabled, you should see something like this:

    fluxiux@handgrep 
   :~ 
   /aslr$ . 
   /buffer_addr  
   
Buffer address 
   :  
   0x7fff5e149710  
   
fluxiux@handgrep 
   :~ 
   /aslr$ . 
   /buffer_addr  
   
Buffer address 
   :  
   0x7fff71f6f0b0  
   
fluxiux@handgrep 
   :~ 
   /aslr$ . 
   /buffer_addr  
   
Buffer address 
   :  
   0x7fff763299c0 
  

We see that 4 bytes change for each execution, and we have to be very lucky to point in our shellcode, if we try the brute-force way. So we will use “execl” now to see any weakness when the memory layout is randomized for the process:

    main 
   ( 
   )  
   
 
   {  
   
         
   int stack 
   ;  
   
         
   printf 
   ( 
   "Stack address: %p\n" 
   ,  
   &stack 
   ) 
   ;  
   
        execl 
   ( 
   "./buffer_addr" 
   ,  
   "buffer_addr" 
   , NULL 
   ) 
   ;  
   
 
   } 
  

Compare the memory layouts with different runs of “buffer_addr”:

    fluxiux@handgrep:~/aslr$ ./buffer_addr  
   
Buffer address: 0x7fffc5cfa180  
   
fluxiux@handgrep:~/aslr$ ./buffer_addr  
   
Buffer address: 0x7fff1964d1f0  
   
fluxiux@handgrep:~/aslr$ ./buffer_addr  
   
Buffer address: 0x7fffba20bd30  
   
fluxiux@handgrep:~/aslr$ ./buffer_addr  
   
Buffer address: 0x7fffc8505ed0  
   
fluxiux@handgrep:~/aslr$ ./buffer_addr  
   
Buffer address: 0x7ffff39cbc10  
   
fluxiux@handgrep:~/aslr$ ./buffer_addr  
   
Buffer address: 0x7fff6eb3aa90 
   
fluxiux@handgrep:~$ gdb -q --batch -ex "p 0x7fffc5cfa180 - 0x7fff1964d1f0"  
   
$1 = 2892681104  
   
fluxiux@handgrep:~$ gdb -q --batch -ex "p 0x7fffc8505ed0 - 0x7fffba20bd30"  
   
$1 = 238002592  
   
fluxiux@handgrep:~$ gdb -q --batch -ex "p 0x7ffff39cbc10 - 0x7fff6eb3aa90"  
   
$1 = 2229866880

And now with “execl” function:

    fluxiux@handgrep:~/aslr$ ./weakaslr  
   
Stack address: 0x7fff526d959c  
   
Buffer address: 0x7fff2e95efd0  
   
fluxiux@handgrep:~/aslr$ gdb -q --batch -ex "p 0x7fffaffcde50 - 0x7fff54800abc"  
   
$1 = 1534907284 
   
fluxiux@handgrep:~/aslr$ ./weakaslr  
   
Stack address: 0x7fffed12acfc  
   
Buffer address: 0x7fffa3a4f8f0  
   
fluxiux@handgrep:~$ gdb -q --batch -ex "p 0x7fffdaf7d5fc - 0x7fff08361da0"  
   
$1 = 3535911004 If we dig a little bit more, we can reduce the domain of probabilistic addresses using “/proc/self/maps” files (local bypass), as shown below: 
   
fluxiux@handgrep:~/aslr$ ./weakaslr  
   
Stack address: 0x7ffffbe8326c  
   
Buffer address: 0x7fff792120c0  
   
fluxiux@handgrep:~$ gdb -q --batch -ex "p 0x7ffffbe8326c - 0x7fff792120c0"  
   
$1 = 2194084268 
   
fluxiux@handgrep:~/aslr$ ./weakaslr  
   
Stack address: 0x7fffed12acfc  
   
Buffer address: 0x7fffa3a4f8f0  
   
fluxiux@handgrep:~$ gdb -q --batch -ex "p 0x7fffed12acfc - 0x7fffa3a4f8f0"  
   
$1 = 1231926284

Using this method, we could fill the buffer with return address, add a large NOP sled after the return address + the shellcode and guess any correct offset, to point to it. As you can see, the degree of randomization is not the same, but you can play with that. Of course, this attack is more effective on 32-bits and on older kernel versions[9].

If we dig a little bit more, we can reduce the domain of probabilistic addresses using “/proc/self/maps” files (local bypass), as shown below:

    ... 
   
00fa8000-00fc9000 rw-p 00000000 00:00 0                                  [heap]  
   
7ffd77890000-7ffd77a1a000 r-xp 00000000 08:05 396967                     /lib/x86_64-linux-gnu/libc-2.13.so  
   
7ffd77a1a000-7ffd77c19000 ---p 0018a000 08:05 396967                     /lib/x86_64-linux-gnu/libc-2.13.so  
   
7ffd77c19000-7ffd77c1d000 r--p 00189000 08:05 396967                     /lib/x86_64-linux-gnu/libc-2.13.so  
   
7ffd77c1d000-7ffd77c1e000 rw-p 0018d000 08:05 396967                     /lib/x86_64-linux-gnu/libc-2.13.so  
   
7ffd77c1e000-7ffd77c24000 rw-p 00000000 00:00 0  
   
7ffd77c24000-7ffd77c26000 r-xp 00000000 08:05 397045                     /lib/x86_64-linux-gnu/libutil-2.13.so  
   
7ffd77c26000-7ffd77e25000 ---p 00002000 08:05 397045                     /lib/x86_64-linux-gnu/libutil-2.13.so  
   
7ffd77e25000-7ffd77e26000 r--p 00001000 08:05 397045                     /lib/x86_64-linux-gnu/libutil-2.13.so  
   
7ffd77e26000-7ffd77e27000 rw-p 00002000 08:05 397045                     /lib/x86_64-linux-gnu/libutil-2.13.so  
   
7ffd77e27000-7ffd77e48000 r-xp 00000000 08:05 396954                     /lib/x86_64-linux-gnu/ld-2.13.so  
   
7ffd7801d000-7ffd78020000 rw-p 00000000 00:00 0  
   
7ffd78043000-7ffd78044000 rw-p 00000000 00:00 0  
   
7ffd78045000-7ffd78047000 rw-p 00000000 00:00 0  
   
7ffd78047000-7ffd78048000 r--p 00020000 08:05 396954                     /lib/x86_64-linux-gnu/ld-2.13.so  
   
7ffd78048000-7ffd7804a000 rw-p 00021000 08:05 396954                     /lib/x86_64-linux-gnu/ld-2.13.so  
   
7fff7d479000-7fff7d49a000 rw-p 00000000 00:00 0                          [stack]  
   
7fff7d589000-7fff7d58a000 r-xp 00000000 00:00 0                          [vdso]  
   
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Unfortunately, this leak is partially patched since 2.6.27 according to Julien Tinnes and Tavis Ormandy[10], and these files seem to be protected if you cannot ptrace a pid. Anyway, there was any other way using “/proc/self/stat” and “/proc/self/wchan” that leak informations such as stack pointer and instruction pointer (=>ps -eo pid,eip,esp,wchan), and by sampling “kstkeip”, we could reconstruct the maps (see fuzzyaslr by Tavis Ormandy[11]).

Brute-forcing is always a very offensive way to get what you want, it takes time, and you should know that every tries recorded in logs. The solution is maybe in registers.

Return-to-registers

Using a debugger like GDB, can help you to find other ways to bypass some protections like DEP as shown previously and ASLR of course. To study this case, we will work with a better example:

 
   #include <stdio.h>  
   
 
   #include <string.h>  
   
vuln 
   ( 
   char 
   * string 
   )  
   
 
   {  
   
         
   char buffer 
   [ 
   50 
   ] 
   ;  
   
         
   strcpy 
   (buffer 
   , string 
   ) 
   ;  
   // Guys! It's vulnerable! 
   
 
   }  
   
 
   
main 
   ( 
   int argc 
   ,  
   char 
   ** argv 
   )  
   
 
   {  
   
         
   if  
   (argc  
   >  
   1 
   )  
   
                vuln 
   (argv 
   [ 
   1 
   ] 
   ) 
   ;  
   
 
   } 
  

By the way, don’t forget to disable the stack protector (compile as follows: gcc -fno-stack-protector -z execstack -mpreferred-stack-boundary=4 vuln2.c -o vuln2). Will see after what a canary is, but now, just let’s focus on ASLR for the moment.

With few tries, we see that we can rewrite the instruction pointer:

    (gdb) run `python -c 'print "A"*78'`  
   
The program being debugged has been started already.  
   
Start it from the beginning? (y or n) y  
   
Starting program: /home/fluxiux/aslr/vuln2 `python -c 'print "A"*78'`  
   
Program received signal SIGSEGV, Segmentation fault.  
   
0x0000414141414141 in ?? ()

Put now a break to the “vuln()” function’s call and the return address:

    (gdb) disas main  
   
Dump of assembler code for function main:  
   
   0x0000000000400515 <+0>: push   %rbp  
   
   0x0000000000400516 <+1>: mov    %rsp,%rbp  
   
   0x0000000000400519 <+4>: sub    $0x10,%rsp  
   
   0x000000000040051d <+8>: mov    %edi,-0x4(%rbp)  
   
   0x0000000000400520 <+11>:    mov    %rsi,-0x10(%rbp)  
   
   0x0000000000400524 <+15>:    cmpl   $0x1,-0x4(%rbp)  
   
   0x0000000000400528 <+19>:    jle    0x40053d <main+40>  
   
   0x000000000040052a <+21>:    mov    -0x10(%rbp),%rax  
   
   0x000000000040052e <+25>:    add    $0x8,%rax  
   
   0x0000000000400532 <+29>:    mov    (%rax),%rax  
   
   0x0000000000400535 <+32>:    mov    %rax,%rdi  
   
   0x0000000000400538 <+35>:    callq  0x4004f4 <vuln>  
   
   0x000000000040053d <+40>:    leaveq  
   
   0x000000000040053e <+41>:    retq   (gdb) break *0x000000000040053e  
   
Breakpoint 2 at 0x40053e 
   
End of assembler dump.  
   
(gdb) break *0x400538  
   
Breakpoint 1 at 0x400538 
   
(gdb) break *0x000000000040053e  
   
Breakpoint 2 at 0x40053e

After that, put a break on the return address of the “vuln()” function:

    (gdb) disas vuln  
   
Dump of assembler code for function vuln:  
   
   0x00000000004004f4 <+0>: push   %rbp  
   
   0x00000000004004f5 <+1>: mov    %rsp,%rbp  
   
   0x00000000004004f8 <+4>: sub    $0x50,%rsp  
   
   0x00000000004004fc <+8>: mov    %rdi,-0x48(%rbp)  
   
   0x0000000000400500 <+12>:    mov    -0x48(%rbp),%rdx  
   
   0x0000000000400504 <+16>:    lea    -0x40(%rbp),%rax  
   
   0x0000000000400508 <+20>:    mov    %rdx,%rsi  
   
   0x000000000040050b <+23>:    mov    %rax,%rdi  
   
   0x000000000040050e <+26>:    callq  0x400400 <strcpy@plt>  
   
   0x0000000000400513 <+31>:    leaveq  
   
   0x0000000000400514 <+32>:    retq    
   
End of assembler dump.  
   
(gdb) break *0x0000000000400514  
   
Breakpoint 3 at 0x400514

As we can see, the RSP contains the return address:

    (gdb) info reg rsp  
   
rsp            0x7fffffffe148   0x7fffffffe148  
   
(gdb) x/20x $rsp - 40  
   
[...] 
   
0x7fffffffe140: 0x00000000  0x00000000  0x0040053d  0x00000000  
   
[…]

The return address as been overwritten (we also noticed that previously):

    (gdb) info reg rsp  
   
rsp            0x7fffffffe148   0x7fffffffe148  
   
(gdb) x/20x $rsp - 40  
   
0x7fffffffe120: 0x41414141  0x41414141  0x41414141  0x41414141  
   
0x7fffffffe130: 0x41414141  0x41414141  0x41414141  0x41414141  
   
0x7fffffffe140: 0x41414141  0x41414141  0x41414141  0x00004141  
   
0x7fffffffe150: 0xffffe248  0x00007fff  0x00000000  0x00000002  
   
0x7fffffffe160: 0x00000000  0x00000000  0xf7a66eff  0x00007fff

And running at the last breakpoint, we can observe that register RAX points to the beginning of our buffer:

    (gdb) stepi  
   
0x00000000004004f8 in vuln ()  
   
(gdb) info reg rax  
   
rax            0x7fffffffe520   140737488348448  
   
(gdb) x/20x $rax - 40  
   
0x7fffffffe4f8: 0x36387816  0x0034365f  0x00000000  0x2f000000  
   
0x7fffffffe508: 0x656d6f68  0x756c662f  0x78756978  0x6c73612f  
   
0x7fffffffe518: 0x75762f72  0x00326e6c  0x41414141  0x41414141  
   
0x7fffffffe528: 0x41414141  0x41414141  0x41414141  0x41414141  
   
0x7fffffffe538: 0x41414141  0x41414141  0x41414141  0x41414141

(Note that if you’re not sure, try with this payload: `python -c ‘print “A”*70+”B”*8’`).

After that, we look for a valid “jmp/callq rax”:

    fluxiux@handgrep:~/aslr$ objdump -d ./vuln2 | grep "callq"  
   
  4003cc:   e8 6b 00 00 00          callq  40043c <call_gmon_start>  
   
  [...] 
   
  400604:   ff d0                   callq  *%rax 
   
..

At “0x400604” could be great, we just have to replace the junk data (“A”) by NOP sled and a precious shellcode that fits on the buffer and we replace the instruction pointer by the address “0x400604”. On 32-bits, “Sickness” has written a good article about that if you are interested[12].

But as you know, by default on Linux (especially the user friendly one: Ubuntu), programs are compiled with the bit NX support, so be lucky to use this technique on nowadays systems. Indeed, we use also an option to disable the stack protector, but what is it exactly?

Stack Canary

Named for their analogy to a canary in a coal mine, stack canary are used to protect against stack overflow attacks. Compiling with the stack protector option (which is used by default), each dangerous function is changed in his prologue and epilogue.

If we compile the previous code letting stack protector to be used, we get something like that:

    fluxiux@handgrep:~/ssp$ gcc -z execstack -mpreferred-stack-boundary=4 vuln2.c -o vuln3  
   
fluxiux@handgrep:~/spp$ ./vuln3  
   
fluxiux@handgrep:~/spp$ ./vuln3 `python -c 'print "A"*76'`  
   
*** stack smashing detected ***: ./vuln3 terminated

Disassembling the “vuln()” function, we can see in the epilogue that a comparison is done:

    (gdb) disas vuln  
   
Dump of assembler code for function vuln:  
   
   [...] 
   
   0x000000000040058d <+41>:    callq  0x400470 <strcpy@plt>  
   
   0x0000000000400592 <+46>:    mov    -0x8(%rbp),%rdx  
   
   0x0000000000400596 <+50>:    xor    %fs:0x28,%rdx  
   
   0x000000000040059f <+59>:    je     0x4005a6 <vuln+66>  
   
   0x00000000004005a1 <+61>:    callq  0x400460 <__stack_chk_fail@plt>  
   
   0x00000000004005a6 <+66>:    leaveq  
   
   [...]

If the value in “fs:0x28” is the same as in ”%rdx”, the “vuln()” function will end properly. In other case, the function “__stack_chk_fail()” will be called and an error message shows up (“*** stack smashing detected ***: ./vuln3 terminated ”).

Putting a break on “__stack_chk_fail()” function, we can observe the values on $RSP:

    (gdb) run `python -c 'print "A"*57'`  
   
Starting program: /home/fluxiux/aslr/vuln3 `python -c 'print "A"*57'`  
   
Breakpoint 1, 0x00000000004005cb in main ()  
   
(gdb) c  
   
Continuing.  
   
Breakpoint 2, 0x00000000004005a1 in vuln ()  
   
(gdb) x/30x $rsp  
   
0x7fffffffe100: 0x00000000  0x00000000  0xffffe535  0x00007fff  
   
0x7fffffffe110: 0x41414141  0x41414141  0x41414141  0x41414141  
   
0x7fffffffe120: 0x41414141  0x41414141  0x41414141  0x41414141  
   
0x7fffffffe130: 0x41414141  0x41414141  0x41414141  0x41414141  
   
0x7fffffffe140: 0x41414141  0x41414141  0xbf630041  0xe3b6079a  
   
0x7fffffffe150: 0xffffe170  0x00007fff  0x004005d0  0x00000000  
   
0x7fffffffe160: 0xffffe258  0x00007fff  0x00000000  0x00000002  
   
0x7fffffffe170: 0x00000000  0x00000000

At “0x7fffffffe148”, we have rewrote 1 byte of the stack cookie value saved on RSP (that’s why the breakpoint 2 stopped __stack_chk_fail()). At “0x7fffffffe158” , we see the return address of main. So the structure of this canary should be like in figure 3:

There are 3 kinds of canaries:

Null (0x0),
terminator (letting the first bytes to be “\a0\xff”),
random.

The first 2 kinds are easy to bypass[14], because you just have to fill the buffer with your shellcode, giving a desired value to be at the right position and rewrite the instruction pointer. But for the random one, it is a little more fun, because you have to guess its value at each execution (Ow! A kind like ASLR?).

For random canaries, the “__gard__setup()” fills a global variable with random bytes generated by “/dev/urandom”, if possible. Latter in the program, only 4|8 bytes are used to be the cookie. But, if we cannot use the entropy of “/dev/urandom”, by default we will get a terminator or a null cookie.

Brute-force is a way, but you will use to much time. By overwriting further than the return address, we can hook the execution flow using GOT entries. The canary will of course detect the compromising, but too late. A very good article covering the StackGuard and StackShield explain four ways to bypass these protections[15].

However, on new kernels you also have to noticed that the random cookie is set with a null-byte at the end, and trying to recover the value from forking or brute-forcing will not work with functions like “strcpy”. So the better way to do that, is to have the control of the initialized cookie.

Format string vulnerabilities or heap overflow for example, are more easy to exploit with this protection, but this article is not finished yet and we will see another memory corruption mitigation technique.

RELRO

In recent Linux distributions, a memory corruption mitigation technique has been introduced to harden the data sections for binaries/processes. This protection can be viewable reading the program headers (with readelf for example):

    fluxiux@handgrep:~/ssp$ readelf -l vuln3  
   
[...] 
   
Program Headers:  
   
[...] 
   
  GNU_RELRO      0x0000000000000e28 0x0000000000600e28 0x0000000000600e28  
   
                 0x00000000000001d8 0x00000000000001d8  R      1

On current Linux, your binaries are often compiled with RELRO. So that mean that following sections are mapped as read-only:

       08     .ctors .dtors .jcr .dynamic .got 
  

Optionally, you can compare dissecting a non-RELRO binary, as follows:

    fluxiux@handgrep:~/ssp$ gcc -Wl,-z,norelro vuln2.c -o vuln4  
   
[…] 
   
  LOAD           0x0000000000000768 0x0000000000600768 0x0000000000600768  
   
                 0x0000000000000200 0x0000000000000210  RW     200000  
   
[…] 
   
   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss  
   
[...]

The exploitation of a format string bug for example, using the format parameter “%n” to write to any arbitrary address like GOTs is suppose to fail. But as we noticed previously, PLT GOTs have “write” permissions and then we are face to a partial-RELRO only.

With the example in trapkit’s article about RELRO[16], we could see that it is very easy to rewrite a PLT entry. But in some cases (mostly in dist-packages), binaries are compiled with a full-RELRO:

    fluxiux@handgrep:~/relro$ gcc -Wl,-z,relro,-z,now -o fullrelro fmstr.c  
   
[..] 
   
fluxiux@handgrep:~/relro$ readelf -l ./fullrelro | grep "RELRO"  
   
  GNU_RELRO      0x0000000000000df8 0x0000000000600df8 0x0000000000600df8  
   
fluxiux@handgrep:~/relro$ readelf -d ./fullrelro | grep "BIND"  
   
 0x0000000000000018 (BIND_NOW)

Note: BIND_NOW indicates that the binary is using full-RELRO.

The entire GOT is remapped as read-only, but there are other sections to write on. GOTs are use mostly for flexibility. Detour with “.dtors” can be perform as Sebastian Krahmer described in his article about RELRO[17].

We have seen common Linux protection used by default, but the evolution of kernels and architectures have made things more difficult.

The x86_64 fact and current systems hardening

With time, the new versions of Linux distribution become well hardened by default. In my studies, the Ubuntu one surprised me a lot, because in addition to these protections implanted by default, this system turns to take some openBSD solutions to be as user friendly and secure as possible. Moreover, we have seen few protections and ways to bypass it, but the 64-bits give us other difficulties.

As you notices, addresses have changed and it more difficult to exploit some memory corruption because of the byte “\x00”, considered as a EOF for some functions like “strcpy()”. We saw that NX is enabled and the compilation in gcc with its support are made by default. But the worst is coming. Indeed, we now that the randomization space is larger but what interest us, is the System V ABI for x86_64[8].

Things have changed for parameters in functions. Indeed, instead of copying parameters in the stack, the first 6 integer and 8 float/double/vector arguments are passed in registers, rest on stack. See an example:

 
   // Passing arguments example 
   
 
   typedef  
   struct  
   { 
   
 
   int a 
   , b 
   ; 
   
 
   double d 
   ; 
   
 
   } structparm 
   ; 
   
structparm s 
   ; 
   
 
   int e 
   , f 
   , g 
   , h 
   , i 
   , j 
   , k 
   ; 
   
 
   long  
   double ld 
   ; 
   
 
   double m 
   , n 
   ; 
   
__m256 y 
   ; 
   
 
   extern  
   void func  
   ( 
   int e 
   ,  
   int f 
   , 
   
structparm s 
   ,  
   int g 
   ,  
   int h 
   , 
   
 
   long  
   double ld 
   ,  
   double m 
   , 
   
__m256 y 
   , 
   
 
   double n 
   ,  
   int i 
   ,  
   int j 
   ,  
   int k 
   ) 
   ; 
   
func  
   (e 
   , f 
   , s 
   , g 
   , h 
   , ld 
   , m 
   , y 
   , n 
   , i 
   , j 
   , k 
   ) 
   ; 
  

The given register allocation looks like this (in figure 4):

Figure 4 – Register allocation example
(source: System V ABI x86_64)

I suggest you to read the slides Jon Larimer about “Intro to x64 Reversing”[18].

We could use the knowledge of borrowed code chunks’ article[19] that can help us to understand problems of NX, System V ABI x86_64 differences with x32, and ways to bypass them using instructions to write a value on one register, and call the function “system()”, for example, that will use this register as a parameter.
Other sophisticated attacks like Return-oriented Programing are use to bypass these protection that make life difficult in an exploit process.

As you could see, protections didn’t make things impossible, but just harder and harder. So be aware of new applied protections and conventions to not waste too much time.